Such a great format of video! Very short and I love it more that 2min papers because of author’s explanation. Now to the questions: Great insight of words and tokens, does this mean that we need bigger models, that will learn on their own how to transform words into tokens. From my understanding words are just less computational easy tokens. So if we do go to 400B or larger models, following the idea that “size is all you need” maybe it will work. I wonder what is the performance if largest model published would be
Thanks for the kind words! Now, to your question: Yes, we already kind of have such models that are tokenizer-free and work directly on characters or byte representation, but this makes the input sequence length frow much bigger. So, models of the like of MAMBA or long-context transformers could eventually, if big enough, surpass existing tokenizer-based models. But we have not seen that happen yet at GPT4 scale. I do not know if they even tried that path.
I agree. But I am sure that some sentences like this were there. Every LLM today trains of Wikipedia and they must have seen the entries of Douglas Hofstadter and his books.
@@AICoffeeBreak Adding I Am a Strange Loop to my reading list. I agree that it is in wikipedia, but how do you explain "how many R in strawberry" as if not with tokenization bug? That we need different kind of models (or tokenizers) to solve this
Training on data exactly like this might make models perform better on these examples. But I doubt these models didn't see any self-referential statements somewhere in their training data.
I understand compleatly nothing. I was focused on this guy aspergery passion. How can I work with such cyborgs :D sorry. I am a hater here but I started questioning my role on the job market after watching it.
Wow, I can't believe it, two of my idols together! Congrats, Tristan! Go Contextual!
Such a great format of video! Very short and I love it more that 2min papers because of author’s explanation.
Now to the questions: Great insight of words and tokens, does this mean that we need bigger models, that will learn on their own how to transform words into tokens. From my understanding words are just less computational easy tokens. So if we do go to 400B or larger models, following the idea that “size is all you need” maybe it will work. I wonder what is the performance if largest model published would be
Thanks for the kind words!
Now, to your question: Yes, we already kind of have such models that are tokenizer-free and work directly on characters or byte representation, but this makes the input sequence length frow much bigger. So, models of the like of MAMBA or long-context transformers could eventually, if big enough, surpass existing tokenizer-based models. But we have not seen that happen yet at GPT4 scale. I do not know if they even tried that path.
I mean I'd definitely assume that sentences like these don't occur a whole lot in the training data
I agree. But I am sure that some sentences like this were there. Every LLM today trains of Wikipedia and they must have seen the entries of Douglas Hofstadter and his books.
@@AICoffeeBreak Adding I Am a Strange Loop to my reading list. I agree that it is in wikipedia, but how do you explain "how many R in strawberry" as if not with tokenization bug? That we need different kind of models (or tokenizers) to solve this
D'accord.
Interesting, another reason might be that the task is just too far out of distribution?
Training on data exactly like this might make models perform better on these examples. But I doubt these models didn't see any self-referential statements somewhere in their training data.
Those E and F shapes test dont make sense unless fed as an image
well it would still look different to the model from just feeding the sentences as normal. I think most LLM tokenizers even preserve the linebreaks
I understand compleatly nothing. I was focused on this guy aspergery passion. How can I work with such cyborgs :D sorry. I am a hater here but I started questioning my role on the job market after watching it.
chenged my mind I like the explanation sorry for my ADHD