Out of distribution is basically a statistical term indicating that the data is outside the training data, and they want to divide it into 2 types: Extrapolatory Data (where the data is not known and its not "in between known data points) and Interpolatory Data (where the data is known... and the AI could potentially interpolate between the data points. For example, An AI is trained on colors of the rainbow from ROYGBIV but let's exclude Yellow. If it were "asked: about Yellow it could possibly interpolate the data based on what it knows about the other colors around it... so that would be interpolative. But, if it were asked about Ultraviolet, that would be extrapolative.
Just want to say thank you for making this particular video. The organization is excellent, and it really helps me to review the papers very quickly. Please take another look at: Multi-label Learning with Random Circular Vectors. While it may seem uninteresting at first glance, this could have far reaching consequences for tagging or applying metadata to data. (Of course, if you don't use tagging of data at all, then yea, it makes no sense to review it).
2:29 : sounds cool :) 3:49 : oh nice 18:02 : I’m not sure what “boosting” is, but applying the “quantum calculus” (only very indirectly related to QM) to the topic of “how to optimize a loss function which isn’t required to be continuous” sounds kinda cool to me personally. Maybe I’ll take a little look at this one Edit: oh, 58:06 : I remember the DisCoCat thing. I think string diagrams and tensor networks etc. are cool, but like, I really don’t think my brain is using quantum computing to understand the meaning of sentences, and so the exponentially large spaces that this program describes, seems to me like it is rather unlikely to correspond closely to how people use language, and so there’s probably something with less uses-exponential-space-y behavior which works? Maybe I’m misremembering the details and it doesn’t actually strictly require exponential space (when on classical computers.) 58:37 : aaand maybe this paper will make me eat my words about the previous one
I like when you’re surprised! It adds personality. If you choose to use python, it might be more natural to simply become an essayist and read from a script while going through the document. Something like what Professor Dave does when he debunks creationist and ID.
The grammar masking paper is not even remotely novel. We've had multiple libraries that did exactly this for over a year now eg. Guidance. The authors are very late to the party on that one unless they're just now writing a paper on work done over a year ago and claiming to be first. It's a super great technique though so sad to hear you say you weren't interested. The most common application to to simply force to model to respond in some very specific JSON schema so that you can integrate LMs with traditional software and have them actually reliable generate parsable text. There's a reason I said LMs instead of LLMs too: the deterministic nature means it works even with small models like GPT2 with no fine-tuning for tool use. (although not as well as larger/fine-tuned models of course)
Out of distribution is basically a statistical term indicating that the data is outside the training data, and they want to divide it into 2 types: Extrapolatory Data (where the data is not known and its not "in between known data points) and Interpolatory Data (where the data is known... and the AI could potentially interpolate between the data points.
For example, An AI is trained on colors of the rainbow from ROYGBIV but let's exclude Yellow. If it were "asked: about Yellow it could possibly interpolate the data based on what it knows about the other colors around it... so that would be interpolative. But, if it were asked about Ultraviolet, that would be extrapolative.
Mixture of Trillion Experts
mixture of gazillion experts
Just want to say thank you for making this particular video. The organization is excellent, and it really helps me to review the papers very quickly.
Please take another look at: Multi-label Learning with Random Circular Vectors. While it may seem uninteresting at first glance, this could have far reaching consequences for tagging or applying metadata to data. (Of course, if you don't use tagging of data at all, then yea, it makes no sense to review it).
2:29 : sounds cool :)
3:49 : oh nice
18:02 : I’m not sure what “boosting” is, but applying the “quantum calculus” (only very indirectly related to QM) to the topic of “how to optimize a loss function which isn’t required to be continuous” sounds kinda cool to me personally. Maybe I’ll take a little look at this one
Edit: oh, 58:06 : I remember the DisCoCat thing. I think string diagrams and tensor networks etc. are cool, but like,
I really don’t think my brain is using quantum computing to understand the meaning of sentences, and so the exponentially large spaces that this program describes, seems to me like it is rather unlikely to correspond closely to how people use language, and so there’s probably something with less uses-exponential-space-y behavior which works?
Maybe I’m misremembering the details and it doesn’t actually strictly require exponential space (when on classical computers.)
58:37 : aaand maybe this paper will make me eat my words about the previous one
@3:00 I was just thinking about this. let me cook and I'll get back to you in a week or two.
Are you going to build a rig implementing all these papers to beat GPT and get the best A.I.
beat chatGPT? lmao maybe if you give me a billion dollars
I have a question, how do you find those news AI papers? I don't know if I should go on a research field, your answer will help me thank you :)
I like when you’re surprised! It adds personality. If you choose to use python, it might be more natural to simply become an essayist and read from a script while going through the document. Something like what Professor Dave does when he debunks creationist and ID.
Having a script put together would be the most productive way to inform us about developments in the field.
Fix the date in the title
The grammar masking paper is not even remotely novel. We've had multiple libraries that did exactly this for over a year now eg. Guidance. The authors are very late to the party on that one unless they're just now writing a paper on work done over a year ago and claiming to be first. It's a super great technique though so sad to hear you say you weren't interested. The most common application to to simply force to model to respond in some very specific JSON schema so that you can integrate LMs with traditional software and have them actually reliable generate parsable text. There's a reason I said LMs instead of LLMs too: the deterministic nature means it works even with small models like GPT2 with no fine-tuning for tool use. (although not as well as larger/fine-tuned models of course)
“I don’t know what a Hopper GPU is”, dude…
math background not a programming background. hopper is the word i don’t know
Nice
Bro the audio 😢
?
?
?
@@TunadorableMaybe a bit crackly.
Audio seems ok on my end. Is there a timestamp you are referring to?