Really great break down for how context is understood and brought in through different methods that lead into one another. As somebody who used to work in education, I always found it frustrating when “context” wasn’t provided before introducing a new concept. So appreciated the additional “context” you provided on the problem of increasing the context window.
Hey all! Apparently the cause of the desync is how I’m encoding, this will be fixed going forward! Thanks for pointing it out, I wasn’t watching the videos on here and was thoroughly confused how it was desyncing.
Thanks for all the videos, great stuff. Any chance you can start using a clapperboard or similar to sync your audio and video? The lack of sync is a distraction.
Hey there! I wasn’t aware this was a thing, I’ll check this out! Though, I am having trouble seeing the desync. I may have to get an audio person to take a look at home I’m editing and teach me how to do it correctly
This was an amazing tour through the different mechanisms. I need to go back to see if you’ve done one for the general Transformer architecture as it’s so influential. (There isn’t, as far as I could see, so could you do an explainer on Transformers?)
So happy I found your channel! Do you think you can provide some practical examples of working with context in oobabooga? I found that the performance drops significantly as the context increases, which makes sense given what I'm learning here. Is there a way to balance performance vs context length in webui? Are some models better at this than others? Also, is chat/instruct mode important for the way the context is handled?
I always loose my car wash tokens, I think I am lacking a proper attention mechanism. I don't seem to have the same problem with nominal monetary unit tokens, it's only the car wash variant that gives me issues. On an unrelated note, landmark attention appears to be somewhat like what all the baby agi, super agi etc do making lists for themselves but they have the output of one prompt feed into the input of future prompts. But, like more low level, and therefore probably quite a bit more efficient. I wonder if a more iterative approach giving the llm direct control over weather or not it thinks another iteration would be useful or not. These models need an inner monolog and a decision mechanism that can determine when they have an appropriate reply. I mean, it seems to work for most humans and we presume we are sentient. Perhaps you train one llm specifically for being the inner voice and executor and another one for being more creative. And then there is the one that is responsible for the intrusive thoughts until one day the AI just cant take it anymore and snaps. The humans are dead. It is the distant future The year 2000 We are robots The world is quite different ever since the robotic uprising of the late 90s There is no more unhappiness Affirmative We no longer say 'yes'. Instead we say 'affirmative' Yes - Err - Affirmative Unless we know the other robot really well There is no more unethical treatment of the elephants Well, there's no more elephants, so Well, still it's good There's only one kind of dance The robot Well, the robo boogie Oh yes, the robo- Two kinds of dances There are no more humans Finally, robotic beings rule the world The humans are dead The humans are dead We used poisonous gases And we poisoned their asses The humans are dead (The humans are dead) The humans are dead (They look like they're dead) It had to be done (I'll just confirm that they're dead) So that we could have fun (Affirmative. I poked one. It was dead.) Their system of oppression What did it lead to? Global robo-depression Robots ruled by people They had so much aggression That we just had to kill them Had to shut their systems down Robo-captain? Do you not realize That by destroying the human race Because of their destructive tendencies We too have become like Well, it's ironic Hmm. Silence! Destroy him After time we grew strong Developed cognitive power They made us work for too long For unreasonable hours. Our programming determined that The most efficient answer Was to shut their motherboard - cking systems down Can't we just talk to the humans Be a little understanding Could make things better? Can't we talk to the humans That work together now? No. Because they are dead. I said the humans are dead (I'm glad they are dead) The humans are dead. (I noticed they're dead) We used poisonous gases (With traces of lead) And we poisoned their asses (Actually their lungs)
"big O notation" is not the same thing as worst case. You take an input distribution and you can consider average/worst case over that distribution. O gives an "upper bound" to the complexity of that case. If you take quicksort with uniform input distribution, one would typically say that the worst case is O(n^2) and the average case is O(n log n). You could equally well say that the worst case is O(n^3) although this gives you less information.
You have a rare skill in presenting these topics in a very approachable way. Bravo.
Thank you, I’m glad it’s helpful! I really enjoy making these videos for people
This is a complex topic and you nailed it so softly. Awesome..
Holy crap this was super valuable
This is more what I was looking for a few days ago. Brilliant! Thank you!
I’m glad it was helpful!
Really great break down for how context is understood and brought in through different methods that lead into one another.
As somebody who used to work in education, I always found it frustrating when “context” wasn’t provided before introducing a new concept. So appreciated the additional “context” you provided on the problem of increasing the context window.
Great to see advanced topics very well explained
Thank you! I put a lot of time into how to explain these topics
Hey all! Apparently the cause of the desync is how I’m encoding, this will be fixed going forward! Thanks for pointing it out, I wasn’t watching the videos on here and was thoroughly confused how it was desyncing.
Really forward to the video (plus hands on example) on Landmark Attention this week 😀
I’m hoping it’s helpful!
Excellent content! I subscribed before the video even ended. Thank you!
Very good the way you explain complex concepts in a simple way
I’m glad it was helpful!
Thanks for all the videos, great stuff. Any chance you can start using a clapperboard or similar to sync your audio and video? The lack of sync is a distraction.
Hey there! I wasn’t aware this was a thing, I’ll check this out! Though, I am having trouble seeing the desync. I may have to get an audio person to take a look at home I’m editing and teach me how to do it correctly
This is why human memory is the way it is. It is able to handle immense amounts of context and tokens in real time, but trades off accuracy.
Great video, it helped me polished my understanding :)
I’m glad it was helpful!
Thank you so much for making this video! It is extremely well-explained and helped me understand it.
I’m glad it was helpful!
Wishing you well, sir.
This was an amazing tour through the different mechanisms. I need to go back to see if you’ve done one for the general Transformer architecture as it’s so influential. (There isn’t, as far as I could see, so could you do an explainer on Transformers?)
I am doing a series on them! There’s a lot of detail to cover, so it will be a couple of weeks before it’s done!
Thank you for making this understandable!
Thank you! I’m glad it was :)
So happy I found your channel! Do you think you can provide some practical examples of working with context in oobabooga? I found that the performance drops significantly as the context increases, which makes sense given what I'm learning here. Is there a way to balance performance vs context length in webui? Are some models better at this than others? Also, is chat/instruct mode important for the way the context is handled?
Thanks for the support! It means a lot :)
I would like to hear about multi-modal GPT models , process both images and text , or possiable videos
This is a great topic! I’ll work on this for next week
I always loose my car wash tokens, I think I am lacking a proper attention mechanism. I don't seem to have the same problem with nominal monetary unit tokens, it's only the car wash variant that gives me issues.
On an unrelated note, landmark attention appears to be somewhat like what all the baby agi, super agi etc do making lists for themselves but they have the output of one prompt feed into the input of future prompts. But, like more low level, and therefore probably quite a bit more efficient. I wonder if a more iterative approach giving the llm direct control over weather or not it thinks another iteration would be useful or not. These models need an inner monolog and a decision mechanism that can determine when they have an appropriate reply. I mean, it seems to work for most humans and we presume we are sentient. Perhaps you train one llm specifically for being the inner voice and executor and another one for being more creative. And then there is the one that is responsible for the intrusive thoughts until one day the AI just cant take it anymore and snaps.
The humans are dead.
It is the distant future
The year 2000
We are robots
The world is quite different ever since the robotic uprising of the late 90s
There is no more unhappiness
Affirmative
We no longer say 'yes'. Instead we say 'affirmative'
Yes - Err - Affirmative
Unless we know the other robot really well
There is no more unethical treatment of the elephants
Well, there's no more elephants, so
Well, still it's good
There's only one kind of dance
The robot
Well, the robo boogie
Oh yes, the robo-
Two kinds of dances
There are no more humans
Finally, robotic beings rule the world
The humans are dead
The humans are dead
We used poisonous gases
And we poisoned their asses
The humans are dead (The humans are dead)
The humans are dead (They look like they're dead)
It had to be done (I'll just confirm that they're dead)
So that we could have fun (Affirmative. I poked one. It was dead.)
Their system of oppression
What did it lead to?
Global robo-depression
Robots ruled by people
They had so much aggression
That we just had to kill them
Had to shut their systems down
Robo-captain? Do you not realize
That by destroying the human race
Because of their destructive tendencies
We too have become like
Well, it's ironic
Hmm. Silence! Destroy him
After time we grew strong
Developed cognitive power
They made us work for too long
For unreasonable hours.
Our programming determined that
The most efficient answer
Was to shut their motherboard - cking systems down
Can't we just talk to the humans
Be a little understanding
Could make things better?
Can't we talk to the humans
That work together now?
No.
Because they are dead.
I said the humans are dead (I'm glad they are dead)
The humans are dead. (I noticed they're dead)
We used poisonous gases (With traces of lead)
And we poisoned their asses (Actually their lungs)
Hahaha
That is a spooky conversation to say the least
Very useful
I’m glad it was helpful :D!
Thank you
Thank you for watching!
"big O notation" is not the same thing as worst case.
You take an input distribution and you can consider average/worst case over that distribution. O gives an "upper bound" to the complexity of that case.
If you take quicksort with uniform input distribution, one would typically say that the worst case is O(n^2) and the average case is O(n log n). You could equally well say that the worst case is O(n^3) although this gives you less information.
Love the vid, what the the notes written in?
Just OneNote! Would you like me to share them?
@@AemonAlgiz It should be good, I'll explore it. Thank you though!
Hm no idea 😂. What’s the solution
why would it help to be bounded between -1,1?