How ChatGPT is Trained

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 มิ.ย. 2024
  • This short tutorial explains the training objectives used to develop ChatGPT, the new chatbot language model from OpenAI.
    Timestamps:
    0:00 - Non-intro
    0:24 - Training overview
    1:33 - Generative pretraining (the raw language model)
    4:18 - The alignment problem
    6:26 - Supervised fine-tuning
    7:19 - Limitations of supervision: distributional shift
    8:50 - Reward learning based on preferences
    10:39 - Reinforcement learning from human feedback
    13:02 - Room for improvement
    ChatGPT: openai.com/blog/chatgpt
    Relevant papers for learning more:
    InstructGPT: Ouyang et al., 2022 - arxiv.org/abs/2203.02155
    GPT-3: Brown et al., 2020 - arxiv.org/abs/2005.14165
    PaLM: Chowdhery et al., 2022 - arxiv.org/abs/2204.02311
    Efficient reductions for imitation learning: Ross & Bagnell, 2010 - proceedings.mlr.press/v9/ross...
    Deep reinforcement learning from human preferences: Christiano et al., 2017 - arxiv.org/abs/1706.03741
    Learning to summarize from human feedback: Stiennon et al., 2020 - arxiv.org/abs/2009.01325
    Scaling laws for reward model overoptimization: Gao et al., 2022 - arxiv.org/abs/2210.10760
    Proximal policy optimization algorithms: Schulman et al., 2017 - arxiv.org/abs/1707.06347
    Special thanks to Elmira Amirloo for feedback on this video.
    Links:
    TH-cam: / ariseffai
    Twitter: / ari_seff
    Homepage: www.ariseff.com
    If you'd like to help support the channel (completely optional), you can donate a cup of coffee via the following:
    Venmo: venmo.com/ariseff
    PayPal: www.paypal.me/ariseff
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 279

  • @joshelguapo5563
    @joshelguapo5563 ปีที่แล้ว +165

    Since chatgpt blew up it's been tough to find technical content on chatgpt so thanks for pulling this up!

    • @lucasjackson7647
      @lucasjackson7647 ปีที่แล้ว +1

      Just chatgpt it lol

    • @technophobian2962
      @technophobian2962 ปีที่แล้ว +2

      One of the reasons for that is openAI not being very open.

    • @AkolytosCreations
      @AkolytosCreations ปีที่แล้ว +2

      +1 to this. I have spent hours trying to find technical content like this. Videos either assume you know everything about AI and jump straight into in depth things (and even these videos are rare), or are so superficial it doesn’t really say anything. This was that perfect inbetween.

  • @Mutual_Information
    @Mutual_Information ปีที่แล้ว +300

    Very insightful. Following Dall-e, it seems OpenAI was a little bit more protective of their training IP (only a blog on ChatGPT - no paper). You have enough familiarity with the surrounding papers and tech to paint a clear picture of what their doing. Excellent work and again, very insightful!

    • @ariseffai
      @ariseffai  ปีที่แล้ว +16

      Thanks DJ, appreciate the kind words :)

    • @laurenpinschannels
      @laurenpinschannels ปีที่แล้ว +1

      by the way mutual information, I would love to see you make your subscription lists public

    • @b0nce
      @b0nce ปีที่แล้ว +1

      For real, DJ, on every ML/DL/Math YT channel I like, I've seen your comment at least once :D

    • @Mutual_Information
      @Mutual_Information ปีที่แล้ว

      @@laurenpinschannels ha I didn't realize it was private. Switched! Enjoy :)

    • @danielhenderson7050
      @danielhenderson7050 ปีที่แล้ว

      Agreed, thank you for sharing

  • @dkarkada
    @dkarkada ปีที่แล้ว +76

    one of those elusive youtube gems. Wish there was more content out there for the serious nonexpert. Thanks!!

  • @abhishekpatil6071
    @abhishekpatil6071 ปีที่แล้ว +6

    A really fun video to watch, kudos to you for making such an esoteric topic easy to understand (at least in broad terms) for a layman as well.

  • @kumakumako
    @kumakumako ปีที่แล้ว +9

    Thank you for making the video. Great balance of technical content and accessibility for people (like me) who aren't in the field.

  • @charlesje1966
    @charlesje1966 ปีที่แล้ว

    Thankyou. I've been learning chatgpt to program microcontrollers and this video clear up a lot of questions and helps explain the common problems I get from the chatgpt bot output. I'm finding that it takes a lot of work on the part of the user to establish context, provide training examples, and to find the best wording to achieve your goal.

  • @ChocolateMilkCultLeader
    @ChocolateMilkCultLeader ปีที่แล้ว +5

    One of the only useful videos on ChatGPT on this platform. Great work

  • @mertozlutiras
    @mertozlutiras ปีที่แล้ว

    You are doing an amazing job explaining the complex concepts in a simple way. Keep up the good work!

  • @BlueBirdgg
    @BlueBirdgg ปีที่แล้ว +5

    Best video Ive watched describing ChatGPT! (and watched more than 20+)
    You have great insights!

  • @user-gg7vb8te9v
    @user-gg7vb8te9v ปีที่แล้ว +2

    Great work, Ari! Thank you very much for crafting the content, it's really easy to digest.

  • @whatamievendoing
    @whatamievendoing ปีที่แล้ว +3

    Amazing video. Thanks for publishing this. Going to dig through the rest of your videos too

  • @nedyalkokarabadzhakov5405
    @nedyalkokarabadzhakov5405 ปีที่แล้ว +3

    That we need people on youtube that provide actual useful easy to comprehend knowledge, based on their leanring experience. Basicaly any human that have signigicant leanign expience and knowledge in one or more domains is a human chatgpt. Thanks for the content.

  • @curumo_curunir
    @curumo_curunir ปีที่แล้ว +5

    Very simple and effective explanation. Thank you.

  • @minsohee
    @minsohee ปีที่แล้ว +1

    Thank you so much for your efforts, this video was by far the most helpful for my project!

  • @Etcher
    @Etcher ปีที่แล้ว

    Excellent video, thank you - definitely one of the best technical explanations of what is going on under the hood of ChatGPT I have found on YT to-date.

    • @TasteTheStory
      @TasteTheStory ปีที่แล้ว

      On my TH-cam channel, I tested how good ChatGPT is at writing movie scripts! I found the results to be interesting.

  • @thegooddoctor6719
    @thegooddoctor6719 ปีที่แล้ว +1

    Brilliant. On aspect of Intelligence is a measure of one's ability to describe a complex topic into simplistic terms everyone can understand. My friend - you have that ability in spades. Congrats and Thank You !!!!!

  • @Billionaire-Odyssey
    @Billionaire-Odyssey หลายเดือนก่อน

    Very much valuable content explained with clarity I wonder why you channel haven't still exploded you earned a new sub and continue making videos on such topics

  • @miguelalba2106
    @miguelalba2106 11 หลายเดือนก่อน +1

    Technical, concrete and easy to follow explanation, good video 🔥

  • @PrafulKava
    @PrafulKava ปีที่แล้ว

    Best step-by-setp explanation !

  • @Francis-gg4rn
    @Francis-gg4rn ปีที่แล้ว +5

    amazing, please make more!

  • @DanielTorres-gd2uf
    @DanielTorres-gd2uf ปีที่แล้ว +5

    Hey, just found your channel. Awesome stuff (currently studying for a masters in ML, it's crazy to see topics I've covered in class come up here)!

    • @billvvoods
      @billvvoods ปีที่แล้ว +1

      @Daniel Torres, Congratulations. Just curious but what was your bachelors in?

    • @DanielTorres-gd2uf
      @DanielTorres-gd2uf ปีที่แล้ว +2

      @@billvvoods Mechanical Engineering!

    • @billvvoods
      @billvvoods ปีที่แล้ว +3

      @@DanielTorres-gd2uf very nice!
      I wish you the best in your studies. I’m now inspired 😉

    • @DanielTorres-gd2uf
      @DanielTorres-gd2uf ปีที่แล้ว +1

      @@billvvoods Thanks, you as well! :)

  • @sdsd5450
    @sdsd5450 ปีที่แล้ว +1

    Thank you so much! It is such a great video even for beginners!

  • @FiEnD749
    @FiEnD749 ปีที่แล้ว

    Dude, your content is incredible!

  • @albertkwan4261
    @albertkwan4261 ปีที่แล้ว +1

    This is the best explanation of ChatGPT!

  • @VaibhavShewale
    @VaibhavShewale ปีที่แล้ว +1

    good insight to how it works learned something new!

  • @lij3900
    @lij3900 ปีที่แล้ว +2

    Hi Ari, really appreciate you made the video! It is great learning experience. Do you mind sharing the transcript on your website as well? For tech stuff, people like me learned better by reading than by watching videos. I tried use the extension to get the video script, but it is not 100% accurate so some tech words are not correct.

  • @bogdanpatedakislitvinov2549
    @bogdanpatedakislitvinov2549 ปีที่แล้ว

    Very well-made presentation, please make more! Subscribed

  • @jeffhayes8543
    @jeffhayes8543 ปีที่แล้ว +1

    Very well presented. Thanks!

  • @Bianchi77
    @Bianchi77 9 หลายเดือนก่อน

    Cool video shot, well done, thanks for sharing :)

  • @narendiranchembu5893
    @narendiranchembu5893 ปีที่แล้ว +8

    This is a very nice explanation, thanks! What tools do you use to make your videos?

    • @ariseffai
      @ariseffai  ปีที่แล้ว +3

      Thanks! For this one I used a combination of keynote & FCP

  • @pw7225
    @pw7225 ปีที่แล้ว +3

    Dang, this is a GOOD video. So many crap videos have been published on the topic. Hard to find one that has substance. THANK YOU!

  • @dwt6273
    @dwt6273 ปีที่แล้ว

    Thank you! Very informative!

  • @arijitdas4504
    @arijitdas4504 ปีที่แล้ว +3

    Absolute gem ❤

  • @alonsamuel7106
    @alonsamuel7106 ปีที่แล้ว

    Great explanation and naration...! Thanks!

  • @mdzeeshansiddique8185
    @mdzeeshansiddique8185 ปีที่แล้ว

    On the same boat here, after minutes of going through click baits, finally a worthy explainer. Thank you.

  • @johnchange5691
    @johnchange5691 ปีที่แล้ว

    Thank you! Well explained.

  • @dr.mikeybee
    @dr.mikeybee ปีที่แล้ว +12

    This is a coherent nicely structured explanation of ChatGPT's architecture. Thank you for sharing this. BTW, how likely is it that OpenAI will create a new model with primarily supervised learning? I assume they are curating a new training set from both human responses and model-generated responses. It seems to me that a smallish self-supervised transformer model, trained in an autoregressive fashion from a well-curated knowledge base like Wikipedia and the Encyclopedia Britannica, etc., would be a great start for transfer learning from a curated supervised training set. Your video seemed to suggest this possibility. Moreover, it would be very interesting to run this side-by-side with a different architecture based on a vector database and semantic search for knowledge collection, retrieval, and context building. The results of this could be passed through an LLM for human readability and probabilistic generation. This should result in some sort of fuzzy-verified responses.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w ปีที่แล้ว +8

    Given that the scores used to train the reward function is small, compared to the universe of potential questions and answers, it's hard to see how a small training set can possibly be sufficient to train adequately. Still amazes me.

  • @jadenlorenc2577
    @jadenlorenc2577 ปีที่แล้ว

    the clearest ai expert on youtube

  • @user-fj9bh7kt7t
    @user-fj9bh7kt7t ปีที่แล้ว

    Very good presentation!

  • @MyMmmd
    @MyMmmd ปีที่แล้ว +1

    I'd love to know more about those "expert" conversations. Do you need to be an expert in the conversation matter or is it just used to make sure it's good at conversing (rather than getting the facts right)? How many of these expert conversations are useful? Is it a case of diminishing returns beyond a certain point?
    I'm guessing this isn't freely available information but it's fascinating to me.

  • @jaymehta5886
    @jaymehta5886 ปีที่แล้ว +2

    Nice explaination. Thanks

  • @Doggieluv25
    @Doggieluv25 9 หลายเดือนก่อน

    This was so helpful thank you!!

  • @vijayanandpalaniswamy2240
    @vijayanandpalaniswamy2240 ปีที่แล้ว

    Excellent insight dude! Awesome work. I need some help on time series algorithms? dataset with multiple parameters. can you help?

  • @sethjchandler
    @sethjchandler 2 หลายเดือนก่อน

    Great job. Going to show this to my class (Large Language Models for Lawyers, University of Houston Law Center)

  • @tejshah7258
    @tejshah7258 ปีที่แล้ว +4

    Legend has returned - pls make more videos!

  • @panashifzco3311
    @panashifzco3311 ปีที่แล้ว

    Well- explained video. So cool!

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w ปีที่แล้ว +1

    Great content, Thanks

  • @AR-iu7tf
    @AR-iu7tf ปีที่แล้ว +2

    Nicely done. Thanks for creating this video. Few quick questions/clarifications. (1) Given the reward model rates an entire response as opposed to each partially complete sentence as tokens are emitted, isn't the final stage also rating the reward for an entire possible sentence (that is terminated by a stop token?). Or do you believe the output sentence is rated for each token emitted until stop token? (2) Also was the use of the SFT also in the third stage for KL divergence calculation omitted in the figure because it was too much detail? (3) You mention 3000 words max limit. Is this an approximation for the max sequence length of 8k tokenized length (4) Lastly, do we know if the model parameters are 16bit floats or 32 bit? Thanks again for making this informative video.

    • @ariseffai
      @ariseffai  ปีที่แล้ว +2

      Thanks!
      1) Yes, the reward model rates an entire completed response. So an "action" here is a full response (sequence of tokens w/ ) emitted by the model.
      2) Are you referring to the plot at 12:06?
      3) The 3K words is an approximation to 4K max tokens, as described here: help.openai.com/en/articles/6787051
      4) Great question! I'm not sure of the precision of the production model. Do post if you find it :)

    • @AR-iu7tf
      @AR-iu7tf ปีที่แล้ว

      @@ariseffai thank you for your response. Regarding question 2- yes exactly. From the openai blog picture and the instructgpt paper I assumed three models were used in the final RL training - a copy of the SFT that became final production model(RL model ) with updated weights , the reward model and a frozen SFT model for the KL divergence computation that constraints the RL model to generate original sentence but not too far off from SFT. Is that your understanding too ? Regarding 3- certainly will post in case I find it . Thanks again !

  • @ConceptsWithCode
    @ConceptsWithCode ปีที่แล้ว

    Nicely done. Thanks for creating this video. Few quick questions/clarifications. (1) Given the reward model rates an entire response as opposed to each partially complete sentence as tokens are emitted, isn't the final stage also rating the reward for an entire possible sentence (that is terminated by a stop token?). (2) Also was the use of the SFT also in the third stage for KL divergence calculation omitted in the figure because it seemed like too much detail? (3) You mention the upper limit is 3000 words. Is this an approximation for tokenized words that would a maximum sequence length of 8k? (4) Lastly, any idea if the parameters of the model is 16bit float or 32 bit float? Thanks in advance!

  • @nebrasothman1817
    @nebrasothman1817 ปีที่แล้ว

    thank you great video, great detailed explanation

    • @ariseffai
      @ariseffai  ปีที่แล้ว

      Thanks Nebras!

  • @aethermass
    @aethermass ปีที่แล้ว

    Great explanation.

  • @SirajRaval
    @SirajRaval ปีที่แล้ว +1

    this is so good, subscribed.

    • @yugen3968
      @yugen3968 ปีที่แล้ว

      Scammer pos

  • @masoncdyer
    @masoncdyer ปีที่แล้ว +1

    Excellent review

  • @muidhasan9498
    @muidhasan9498 ปีที่แล้ว

    please make more videos like this

  • @hoangviet1381
    @hoangviet1381 ปีที่แล้ว

    nice video, thanks !!!

  • @gorgolyt
    @gorgolyt ปีที่แล้ว

    Great summary. I didn't follow when you said "we need to model to act during training" as a way of mitigating distributional shift... can you explain some more?

    • @ariseffai
      @ariseffai  ปีที่แล้ว

      So basically, if the model takes zero actions during training, this means we'll have a big difference between the deployment distribution of states (when the model selects actions itself) and the training distribution (when the model merely observes the human's actions).
      There are different ways to have the model select actions during training. One is by using a standard reinforcement learning setup, as mentioned in the video. In that case, the policy model is directly rewarded for actions it itself executes. But another possibility comes from "on-policy" imitation learning, such as the DAgger algorithm. We iteratively execute the current policy to gather new training states, but then have an expert provide the correct action labels -- see arxiv.org/abs/1011.0686

  • @EducatedButton
    @EducatedButton 11 หลายเดือนก่อน

    Thanks a lot for the explaination. How does it work during inference time to keep a conversation back and forth?Is the user's current chat session provided to the model as input along with a new user prompt?

    • @ariseffai
      @ariseffai  10 หลายเดือนก่อน

      That's right. There's a certain context window of previous text to which the model can attend (on the order of thousands of tokens). This will include both previous user inputs and model responses from the current conversation.

  • @MrLazini
    @MrLazini ปีที่แล้ว

    Very informative

  • @nchristensen3309
    @nchristensen3309 ปีที่แล้ว

    Is the operations from us as users part of the reward system ?

  • @wenderse
    @wenderse ปีที่แล้ว +1

    May I ask what technology you used to create such nice explanatory videos? Did you use 3b1b's manim engine? thanks.

    • @LSS94
      @LSS94 ปีที่แล้ว

      Very much looks like it!

    • @ariseffai
      @ariseffai  ปีที่แล้ว +2

      Not for this one - just keynote and FCP. But I have used manim in a couple other videos :)

  • @mentor1013
    @mentor1013 ปีที่แล้ว +1

    Can you please make a video on Midjourney as well?

  • @yugen3968
    @yugen3968 ปีที่แล้ว

    Hey, where could I approach you to clear a few things out about this...?

  • @alimansourey2076
    @alimansourey2076 ปีที่แล้ว

    Well done !

  • @karihotakainen5210
    @karihotakainen5210 ปีที่แล้ว

    How does the reward model score a single action, when it is trained to choose between two actions? Or does the policy model actually generate k actions that the reward model can then score and then choose a reward knowing which action the policy model saw as the most probable one?
    I'd really appreciate an answer, thanks.

  • @notgabby604
    @notgabby604 ปีที่แล้ว

    That's all very high level usage of neural networks. While some people think the basic foundations haven't set yet. Like for example 2 Siding ReLU.

  • @lazycompunder
    @lazycompunder ปีที่แล้ว +1

    that was awesome

  • @juanmanuel8464
    @juanmanuel8464 ปีที่แล้ว

    Great content!

  • @sjakievankooten
    @sjakievankooten ปีที่แล้ว

    Love the explanation!! Also thanks for making the video darkmode 😊

    • @sjakievankooten
      @sjakievankooten ปีที่แล้ว

      @@TasteTheStory good videos mate, but no need to spam it here :)

    • @TasteTheStory
      @TasteTheStory ปีที่แล้ว

      @@sjakievankooten Not spaming just trying to connect with people who share the same interest. thanks for your note.

  • @wladefant
    @wladefant ปีที่แล้ว +1

    Are you saying that the 3.000 words can not be increased by just for example more ram usage per chat (chatgpt)?

  • @rl6382
    @rl6382 8 หลายเดือนก่อน

    Just wanted to thank you for these videos.

  • @HH-mf8qz
    @HH-mf8qz 5 หลายเดือนก่อน

    Very good video
    Can you maybe make an updated version now that chatgpt 4 is released and the new googel gemeni is about to come out for mixel input AIs

  • @epeeypen
    @epeeypen ปีที่แล้ว

    i used chatgpt to help we write a love letter and it went really well.

  • @wladefant
    @wladefant ปีที่แล้ว +1

    13:12 The new bing (sydney) is able to link sources perfectly now

  • @user-mh9up1mw3r
    @user-mh9up1mw3r 10 หลายเดือนก่อน

    What is the architecture of the policy model and how large is it? How does it use the pretrained LLM?

  • @sanchi3944
    @sanchi3944 ปีที่แล้ว +1

    Lmao this literally what i asked GPT today since I'm making a chatbot on Rasa. Looks like the algos are pointing me in the right direction for once!

  • @satishkumar-ir9wy
    @satishkumar-ir9wy ปีที่แล้ว

    Hi, can you make a small video to build ChatGPT with NLP based classification Model.

  • @ddystopia8091
    @ddystopia8091 ปีที่แล้ว

    Hello, I want to work in this field. Now I'm a first year student studying informatics, how should I move towards it? Thank you!

  • @roromaniac8
    @roromaniac8 ปีที่แล้ว +2

    This was a wonderful explanation! Wouldn't it be expensive to have that much human capital evaluating and simulating chatbot responses? Seems especially so when you consider the wide amount of domains ChatGPT is able to provide reasonably correct responses to.

    • @CyberDork34
      @CyberDork34 ปีที่แล้ว +1

      Yes it is expensive. OpenAI outsources these tasks to countries like Kenya to save on these costs. It's kind of dubiously ethical but yeah

    • @roromaniac8
      @roromaniac8 ปีที่แล้ว

      @@CyberDork34 do you have a source that I could read about this? I haven’t been able to find something online.

  • @anthonydemattos432
    @anthonydemattos432 ปีที่แล้ว

    It is possible to do most of this process with just the fine tuning api?

  • @posthocprior
    @posthocprior ปีที่แล้ว

    Thanks so much for posting a clear explanation. After watching this, I feel like I do after I've been explained how a magic trick works: disappointed.

  • @baohq
    @baohq ปีที่แล้ว

    What is the platform that OpenAI uses to build chatgpt. Like pytorch, tensorflow or something ?

  • @AstroPinion
    @AstroPinion ปีที่แล้ว

    Thanks!

    • @ariseffai
      @ariseffai  ปีที่แล้ว +1

      Thanks Randall!

  • @siw504
    @siw504 ปีที่แล้ว +1

    Nice Video

  • @Mike-vj8do
    @Mike-vj8do 11 หลายเดือนก่อน

    amazing video Ari. Where is the name from? Israeli?

  • @peccavius
    @peccavius ปีที่แล้ว

    Thanks for the talk! You mention that the reward model is trained using cross-entropy loss as a binary classifier. I don't think that's accurate since you don't have a ground truth label for, say, response A (since the score is relative to others). The openAI paper just uses the negative log difference in scores between the higher and lower ranked response as the loss.

    • @ariseffai
      @ariseffai  10 หลายเดือนก่อน

      You're welcome! That's not quite correct. The classifier is trained to predict which of two responses is ranked higher by the human contractors. Then, the scalar logit output by the trained classifier for an individual response can be used as a reward signal.

  • @regCode
    @regCode 11 หลายเดือนก่อน

    I'm having trouble understanding supervised fine-tuning in this context. What are the labels? What is the task?

  • @juliarose2133
    @juliarose2133 ปีที่แล้ว

    anyone know what the equation is at 4:08 , where i can find more on it?

  • @DrJanpha
    @DrJanpha 11 หลายเดือนก่อน

    Codes as training data are only briefly mentioned?

  • @icyou8496
    @icyou8496 ปีที่แล้ว

    good explanation!!
    i just wandering how / what chatgpt threshold for displaying no results?
    i have observe something like this for example :
    me : (example of 1 non professional gamer)
    chat gpt : i dont have enough data for him
    me : (example of 1 professional gamer in same game)
    chat gpt : *explain professional player*
    me :(asking the first player)
    chat gpt : *explain about that non professional gamer*

  • @serioussrs9349
    @serioussrs9349 6 หลายเดือนก่อน

    Cool bro

  • @_romeopeter
    @_romeopeter ปีที่แล้ว +4

    the Math and some of the logic went over my head so I'm going to tray and summarise what I think I understood:
    ChatGPT is built on top it's predecessor InstructGPT which is mathematically trained with large data set to give spit to instructions given. However for ChatGPT, just spitting answers from instructions isn't enough and needs to be retrained over a method called 'Reinforcement Learning' which uses a 'reward model' to rank the next favourable answer.
    Did I get it? if not, then please tell me what I'm missing but in plain language because I know the above is flawed.

  • @Black-ww6lj
    @Black-ww6lj ปีที่แล้ว +2

    Plot twist : Content of this video was generated by chatGPT

  • @stephenthumb2912
    @stephenthumb2912 11 หลายเดือนก่อน

    a bit amazing how the hallucinations begin, so similar to a human caught in a lie or imagination, the lies built on lies get progressively more absurd in the same way that an untruth from a human where it gets more and more difficult and outlandish to make up a reason based on a stack of false premises.

  • @mr.rndmguy
    @mr.rndmguy ปีที่แล้ว +1

    I'm learning about it's trained models and it's inner functions, just to create a perfect Jailbreak. thanks

  • @shivangitomar5557
    @shivangitomar5557 10 หลายเดือนก่อน

    BEST!

  • @BetaTester704
    @BetaTester704 ปีที่แล้ว

    According to ChatGPT it's memory is limited to only one prior message in the same conversation, beyond that it can't remember anything.

  • @rickylehr9284
    @rickylehr9284 ปีที่แล้ว

    Why does it care about the reward in reward reinforcement?

  • @MainTeknoID
    @MainTeknoID ปีที่แล้ว

    Kerenn

  • @andramalexh
    @andramalexh ปีที่แล้ว

    PPO= operant conditioning?