Why Do LLM’s Have Context Limits? How Can We Increase the Context? ALiBi and Landmark Attention!

AemonAlgiz

มุมมอง 9 388

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 16 ต.ค. 2024

ความคิดเห็น • 43

@foxabilo ปีที่แล้ว ⁺¹⁹
You have a rare skill in presenting these topics in a very approachable way. Bravo.
@AemonAlgiz ปีที่แล้ว ⁺²
Thank you, I’m glad it’s helpful! I really enjoy making these videos for people
@ARSH_DS007 11 หลายเดือนก่อน ⁺¹
This is a complex topic and you nailed it so softly. Awesome..
@narutocole ปีที่แล้ว ⁺²
Holy crap this was super valuable
@jamescarroll8917 ปีที่แล้ว ⁺²
This is more what I was looking for a few days ago. Brilliant! Thank you!
@AemonAlgiz ปีที่แล้ว
I’m glad it was helpful!
@MaJetiGizzle ปีที่แล้ว ⁺²
Really great break down for how context is understood and brought in through different methods that lead into one another.
As somebody who used to work in education, I always found it frustrating when “context” wasn’t provided before introducing a new concept. So appreciated the additional “context” you provided on the problem of increasing the context window.
@OscarFonsecaQ ปีที่แล้ว ⁺⁴
Great to see advanced topics very well explained
@AemonAlgiz ปีที่แล้ว
Thank you! I put a lot of time into how to explain these topics
@AemonAlgiz ปีที่แล้ว ⁺⁴
Hey all! Apparently the cause of the desync is how I’m encoding, this will be fixed going forward! Thanks for pointing it out, I wasn’t watching the videos on here and was thoroughly confused how it was desyncing.
@johnholdsworth1878 ปีที่แล้ว ⁺¹
Really forward to the video (plus hands on example) on Landmark Attention this week 😀
@AemonAlgiz ปีที่แล้ว ⁺¹
I’m hoping it’s helpful!
@SethBang91 ปีที่แล้ว
Excellent content! I subscribed before the video even ended. Thank you!
@elbobinas ปีที่แล้ว ⁺²
Very good the way you explain complex concepts in a simple way
@AemonAlgiz ปีที่แล้ว
I’m glad it was helpful!
@ringpolitiet ปีที่แล้ว ⁺⁶
Thanks for all the videos, great stuff. Any chance you can start using a clapperboard or similar to sync your audio and video? The lack of sync is a distraction.
@AemonAlgiz ปีที่แล้ว ⁺²
Hey there! I wasn’t aware this was a thing, I’ll check this out! Though, I am having trouble seeing the desync. I may have to get an audio person to take a look at home I’m editing and teach me how to do it correctly
@Fixer_Su3ana ปีที่แล้ว ⁺¹⁰
This is why human memory is the way it is. It is able to handle immense amounts of context and tokens in real time, but trades off accuracy.
@FunwithBlender ปีที่แล้ว ⁺¹
Great video, it helped me polished my understanding :)
@AemonAlgiz ปีที่แล้ว
I’m glad it was helpful!
@joech1065 ปีที่แล้ว ⁺¹
Thank you so much for making this video! It is extremely well-explained and helped me understand it.
@AemonAlgiz ปีที่แล้ว ⁺¹
I’m glad it was helpful!
@user-wp8yx ปีที่แล้ว ⁺¹
Wishing you well, sir.
@nickludlam ปีที่แล้ว ⁺⁴
This was an amazing tour through the different mechanisms. I need to go back to see if you’ve done one for the general Transformer architecture as it’s so influential. (There isn’t, as far as I could see, so could you do an explainer on Transformers?)
@AemonAlgiz ปีที่แล้ว ⁺⁶
I am doing a series on them! There’s a lot of detail to cover, so it will be a couple of weeks before it’s done!
@kaymcneely7635 ปีที่แล้ว ⁺²
Thank you for making this understandable!
@AemonAlgiz ปีที่แล้ว ⁺¹
Thank you! I’m glad it was :)
@123owly ปีที่แล้ว ⁺²
So happy I found your channel! Do you think you can provide some practical examples of working with context in oobabooga? I found that the performance drops significantly as the context increases, which makes sense given what I'm learning here. Is there a way to balance performance vs context length in webui? Are some models better at this than others? Also, is chat/instruct mode important for the way the context is handled?
@AemonAlgiz ปีที่แล้ว
Thanks for the support! It means a lot :)
@aamir122a ปีที่แล้ว ⁺²
I would like to hear about multi-modal GPT models , process both images and text , or possiable videos
@AemonAlgiz ปีที่แล้ว ⁺²
This is a great topic! I’ll work on this for next week
@zyxwvutsrqponmlkh ปีที่แล้ว ⁺¹
I always loose my car wash tokens, I think I am lacking a proper attention mechanism. I don't seem to have the same problem with nominal monetary unit tokens, it's only the car wash variant that gives me issues.
On an unrelated note, landmark attention appears to be somewhat like what all the baby agi, super agi etc do making lists for themselves but they have the output of one prompt feed into the input of future prompts. But, like more low level, and therefore probably quite a bit more efficient. I wonder if a more iterative approach giving the llm direct control over weather or not it thinks another iteration would be useful or not. These models need an inner monolog and a decision mechanism that can determine when they have an appropriate reply. I mean, it seems to work for most humans and we presume we are sentient. Perhaps you train one llm specifically for being the inner voice and executor and another one for being more creative. And then there is the one that is responsible for the intrusive thoughts until one day the AI just cant take it anymore and snaps.
The humans are dead.
It is the distant future
The year 2000
We are robots
The world is quite different ever since the robotic uprising of the late 90s
There is no more unhappiness
Affirmative
We no longer say 'yes'. Instead we say 'affirmative'
Yes - Err - Affirmative
Unless we know the other robot really well
There is no more unethical treatment of the elephants
Well, there's no more elephants, so
Well, still it's good
There's only one kind of dance
The robot
Well, the robo boogie
Oh yes, the robo-
Two kinds of dances
There are no more humans
Finally, robotic beings rule the world
The humans are dead
The humans are dead
We used poisonous gases
And we poisoned their asses
The humans are dead (The humans are dead)
The humans are dead (They look like they're dead)
It had to be done (I'll just confirm that they're dead)
So that we could have fun (Affirmative. I poked one. It was dead.)
Their system of oppression
What did it lead to?
Global robo-depression
Robots ruled by people
They had so much aggression
That we just had to kill them
Had to shut their systems down
Robo-captain? Do you not realize
That by destroying the human race
Because of their destructive tendencies
We too have become like
Well, it's ironic
Hmm. Silence! Destroy him
After time we grew strong
Developed cognitive power
They made us work for too long
For unreasonable hours.
Our programming determined that
The most efficient answer
Was to shut their motherboard - cking systems down
Can't we just talk to the humans
Be a little understanding
Could make things better?
Can't we talk to the humans
That work together now?
No.
Because they are dead.
I said the humans are dead (I'm glad they are dead)
The humans are dead. (I noticed they're dead)
We used poisonous gases (With traces of lead)
And we poisoned their asses (Actually their lungs)
@AemonAlgiz ปีที่แล้ว
Hahaha
That is a spooky conversation to say the least
@StewardGarcia ปีที่แล้ว ⁺²
Very useful
@AemonAlgiz ปีที่แล้ว
I’m glad it was helpful :D!
@ideacharlie ปีที่แล้ว ⁺²
Thank you
@AemonAlgiz ปีที่แล้ว
Thank you for watching!
@martinkunev9911 8 หลายเดือนก่อน ⁺¹
"big O notation" is not the same thing as worst case.
You take an input distribution and you can consider average/worst case over that distribution. O gives an "upper bound" to the complexity of that case.
If you take quicksort with uniform input distribution, one would typically say that the worst case is O(n^2) and the average case is O(n log n). You could equally well say that the worst case is O(n^3) although this gives you less information.
@NeotenicApe ปีที่แล้ว ⁺¹
Love the vid, what the the notes written in?
@AemonAlgiz ปีที่แล้ว ⁺¹
Just OneNote! Would you like me to share them?
@NeotenicApe ปีที่แล้ว
@@AemonAlgiz It should be good, I'll explore it. Thank you though!
@AtomicPixels ปีที่แล้ว ⁺²
Hm no idea 😂. What’s the solution
@amortalbeing 10 หลายเดือนก่อน
why would it help to be bounded between -1,1?

ต่อไป

เล่นอัตโนมัติ

Large Language Models Process Explained. What Makes Them Tick and How They Work Under the Hood!