Training GPTs on cellular automata

แชร์
ฝัง

ความคิดเห็น •

  • @hjups
    @hjups 21 วันที่ผ่านมา +8

    I think you may have misunderstood part of the experimental setup, or at least implied it. I believe, they trained 256 different models, each on a singular rule set. That's how they generated the plots in Figure 2 with rule complexity on the x-axis. So the model would not be looking at previous states to determine which rule it was seeing, but instead learning a more complex understanding of the training rule as the authors hypothesized. It could be interesting to see if training on multiple rules simultaneously would improve downstream performance, especially across complexity classes.

    • @arjavgarg5801
      @arjavgarg5801 19 วันที่ผ่านมา

      Why don't all of them just share code!

    • @hjups
      @hjups 19 วันที่ผ่านมา +2

      ​@@arjavgarg5801 Sharing the code likely would not have clarified this. I would have written a script to generate a single rule dataset and then concatenated them manually.
      As for in general, sharing code is A LOT of work. Research grade code is messy and hacky, which needs to be cleaned, documented, and tested on clean builds for release. I recently uploaded a project to github and it took 4 days to clean and document, yet still would be an embarrassment for many groups / organizations.
      Besides, if a paper is written correctly, someone in the field would not require the code to verify their experiments, and releasing code is often an excuse to put less effort into reproducibility (that also increases scrutiny during review and increases the chance of rejection).

    • @arjavgarg5801
      @arjavgarg5801 18 วันที่ผ่านมา

      @hjups no need to document the code itself. The paper is the documenation
      And here all you folks are in the field, still you're making mistakes due to ambiguity of natural language.

    • @hjups
      @hjups 17 วันที่ผ่านมา

      @@arjavgarg5801 Perfectionism says otherwise. Publicly releasing code is releasing an artifact, and can have as much of an impact on job prospects as a paper. Unfinished paper or undocumented / poorly structured code, both are bad.
      There are other issues to contend with too, such as making sure the licensing is correct (when merging code bases), and making sure components under NDA are disentangled.
      As for mistakes due to ambiguity of natural language, most of the time this is intentional.

  • @mathematicalninja2756
    @mathematicalninja2756 20 วันที่ผ่านมา +6

    We have developed a novel statergy to unlock reasoning in LLMs at the speed of greedy decoding and were looking to ground our work in theory, and your video just got recommended and this paper shares a lot of similarities with our work. Thank you for covering. We are targetting ICML, wish us luck!

  • @johnc4957
    @johnc4957 19 วันที่ผ่านมา +1

    Bro you're going to be a monster human being. Godspeed

  • @M-dv1yj
    @M-dv1yj 21 วันที่ผ่านมา +1

    U got now idea how far this type of work has already gone in other parallel developments. Intelligence of the model matters. Claude, vs gpt vs Gemini etc… it’s fascinating to see what they do with the rules given and how those rules interact to create new output types and understanding. Cool stuff bro, keep sharing please 🙏🏽

  • @allurbase
    @allurbase 9 วันที่ผ่านมา

    Lol, loved that clarification about Peterson.

  • @mootytootyfrooty
    @mootytootyfrooty 20 วันที่ผ่านมา +1

    So is the idea like, if you can train a model on a pattern generator that is capable of generating more complex and varied (but structured i.e. a discrete ruleset) patterns, then feed inputs into it that mimic the training data, it can then predict the patterns from that, e.g. a parameterized chess game? That seems like a good essential example for ML or genetic algorithms.
    It's interesting to look at hierarchical systems in biology from that lens, too, like how our body has essentially fully "learned" itself and how that produces rich sensations on top of it.

    • @duxoakende
      @duxoakende 18 วันที่ผ่านมา

      It could also allow for diversity of ability between models in a genetic fashion. With infinitely many rulesets in infinitely many possible automata systems, only some can reasonably be targeted and trained for, so you can develop a kind of diversity in ability from model to model.

  • @FD286
    @FD286 21 วันที่ผ่านมา +2

    Have you read Dr McGilchrist’s book “The Divided Brain”? There’s some interesting overlap with that book and Dr Peterson’s book “Maps of Meaning”.
    The thesis that different halves of the brain use a different type of attention could possibly map to the order / chaos idea.

    • @Tunadorable
      @Tunadorable  21 วันที่ผ่านมา +1

      yeee I read I wanna say 2/3 of that book years ago but don't remember it as well as MoM

  • @hobrin4242
    @hobrin4242 19 วันที่ผ่านมา +1

    now the question is: what is the best general task to pretrain the models on for anything

    • @Tunadorable
      @Tunadorable  19 วันที่ผ่านมา

      text hahaha, well at least if we're talking about already existing data

  • @ThankYouESM
    @ThankYouESM 14 วันที่ผ่านมา

    I wonder how it would be for a square image of all the rgb values

  • @tornyu
    @tornyu 20 วันที่ผ่านมา +2

    23:45 curriculum learning

    • @KevinKreger
      @KevinKreger 20 วันที่ผ่านมา

      Not advisable for LLMs though

    • @tornyu
      @tornyu 20 วันที่ผ่านมา

      @@KevinKreger why not? I've actually been meaning to try it

  • @proskub5039
    @proskub5039 20 วันที่ผ่านมา

    Reminds me of the c64 demo 'A mind is born'

  • @diga4696
    @diga4696 21 วันที่ผ่านมา

    Thanks, great explanation, good insight :)

  • @CustomDabber360
    @CustomDabber360 20 วันที่ผ่านมา +1

    Funny thing is, I did the same thing but with GAME OF LIFE by john conway. A form of CA. Stealing my ideas :( ... (JK!) love your videos bro. I be eating cereal watching your videos.

  • @Dissimulate
    @Dissimulate 19 วันที่ผ่านมา +1

    Eww, violin charts...

  • @epajarjestys9981
    @epajarjestys9981 21 วันที่ผ่านมา +1

    first

    • @sirynka
      @sirynka 21 วันที่ผ่านมา

      Sure, first on the video that got 100 views in 2h

    • @tautalogical
      @tautalogical 21 วันที่ผ่านมา +1

      impressive

    • @fkxfkx
      @fkxfkx 20 วันที่ผ่านมา

      Now last 🤷‍♂️