For "rnn" cache ideas check RMT (recurrent memory transformers, seems they mentioned it) and especially block recurrent transformers(gates, cross attention). Rmt is like "nahh, too difficult, let's inject tokens in stream from end of previous segment and let model learn it's cache". Somehow it works. And nobody implemented it yet for llamas. Also check luna(Luna: Linear Unified Nested Attention) which essentially asks "guys what if we instead of caching past we use smaller size of values as packed representation of current tokens". They don't say it in paper but after BRT and RMT I can't shake off this feeling. For cache check memorizing transformers (and retro+) Cache Transformers in video is closer to retro as the inference doesn't change cache. And afair retro just queries large db
I enjoyed your explanation about SDXL. It was actually good. I have one request. Can you make one video on any virtual try on paper explanation. models who give good accuracy like dior or tryondiffsuion. and if it possible can you explain code explanation as well. because I was trying to understand it since past month but coudn't get one word on it. It is a humble request.
For "rnn" cache ideas check RMT (recurrent memory transformers, seems they mentioned it) and especially block recurrent transformers(gates, cross attention). Rmt is like "nahh, too difficult, let's inject tokens in stream from end of previous segment and let model learn it's cache". Somehow it works.
And nobody implemented it yet for llamas.
Also check luna(Luna: Linear Unified Nested Attention) which essentially asks "guys what if we instead of caching past we use smaller size of values as packed representation of current tokens". They don't say it in paper but after BRT and RMT I can't shake off this feeling.
For cache check memorizing transformers (and retro+)
Cache Transformers in video is closer to retro as the inference doesn't change cache. And afair retro just queries large db
I enjoyed your explanation about SDXL. It was actually good. I have one request. Can you make one video on any virtual try on paper explanation. models who give good accuracy like dior or tryondiffsuion. and if it possible can you explain code explanation as well. because I was trying to understand it since past month but coudn't get one word on it. It is a humble request.