Richard Walker, you are the greatest! The work you put in to these videos are mind-boggling! They are worth watching again and again. Once you turn on the Super Thanks buttons on these videos, I will pay to watch each one! That's how high quality they are. You are doing the entire NLP world a remarkable service, Sir!
Thanks again Jazon for your very (very!) generous comments. I'm glad you are enjoying the videos, and super-appreciative of your support. Many thanks indeed.
This is probably the best explanation around. Other explanation don't even mention that Key, Query and Value are matrices and what interpretation their values hold.
Nancy, I'm delighted that you fund the explanation useful! Thank you for your kind words and feedback. Have you been able to see any of the other video in this playlist -> th-cam.com/play/PLaJCKi8Nk1hwaMUYxJMiM3jTB2o58A6WY.html? This "Attention is all you need" is video 4 in a series of 10 in this playlist. There is also an "In sixty seconds playlist" -> th-cam.com/play/PLaJCKi8Nk1hxM3F0E2f2rr5j6wM8JzZZs.html Where video 4 attempts to cover the attention mechanism in around a minute. If you have a chance to look at these sometime I'd love to hear what you think. of them. With thanks! Lucidate.
Ascension. Thank you for your kind words, very humbling and greatly appreciated. Once you've seen them all (at least once) please let me know what additional material you'd like to see covered, or what you'd want to see covered again from a different perspective. With huge and sincere thanks - Lucidate.
ปีที่แล้ว +23
I was watching this after Andrej Karpathy's video about how to create a GPT like transformer with PyTorch and I'm finally able to understand a bit better what these Q, K, V values are for. It's mind-blowing really that you can force structures like this to emerges from a dataset just by defining this architecture. I wonder how they come up with it, how much it was experimenting and sheer luck. :) I would love to see somehow how a neural net like this fires when the next word is predicted, but I guess there's no easy way of visualizing it as the dimensionality is insanely high, and even if we could, understanding the connections would be near impossible.
Istvan, Thank you for your comment! I'm glad to hear that our video helped you understand the concept of Q, K, and V values in the transformer architecture. The development of the transformer architecture was indeed a significant breakthrough in the field of natural language processing and it's fascinating to think about how it emerged from the data. Regarding your question about visualising the firing of a neural network like GPT-3, you are correct that the high dimensionality of the model makes it difficult to visualize and understand the connections. However, researchers have developed various techniques to interpret and understand the behavior of neural networks, such as attention maps, saliency maps, and layer-wise relevance propagation. These techniques can provide insights into how the network is making its predictions and help to explain the model's decisions. I hope this answers your question, and I appreciate your interest in the topic as well as your support and contribution to the channel. If you have any other questions or comments, please feel free to ask.
ปีที่แล้ว +2
@@lucidateAI Thank you, especially if this answer was not generated :) I'll look up those visualization methods, sounds interesting.
Let me know if they are helpful. At some point I plan to get around to doing some videos on those techniques myself, but that won't be for a little while yet. This....is....not....a....message....generated....by....a...transformer....model... ;-)
Thanks @damn_engineering. The channel is not just about the content in the videos, but the response to the questions. In class I used to get way more out of the questions posed by my fellow students than I did from the baseline material provided by the teachers.
Amazing video and out of all videos I have seen so far, I think the analogy really helps break down the complex mathematical relationships into more relatable concepts. I particularly like this one and the one where you explain positional encoding like how a clock has hour and minute hands.
Sorry Phani. My intent was not to cause you any inconvenience, but to inform. Please accept my sincere apologies for any inconvenience caused. Sadly this is a somewhat technical discipline right now, but I hope that in time tools will be available to help people who dislike technological details. Of which there are many. If you are ever in London, please drop me a line and I'll buy you a beer or beverage of your choice to attempt to make up for it. All comments, greatly appreciated. Lucidate.
Glad you enjoyed it! Manim and Final Cut Pro are the graphics tools. In some of the other videos I make use of the Orange Data Science UI, as well as tools like Steamlit and Plotly for UI design and widgets. But this video is pretty much manim and FCPX.
Impressive video and animations! There is a lot of good content in there. One small piece of feedback: The animations were impressive and certainly helped in many places, however at points there were so many animations active and they were switching so rapidly, that I found it was actually distracting from the message. My personal preference would have been for less rapid transitions and more time spent on each animation so there was more time to concentrate on the topic being discussed. Everyone is different though... so it could just be me. Thanks again for creating.
Hi Tom, thanks for the feedback. I'm glad you found the video impressive, many thanks for that. Even more thanks for the constructive feedback. Clearly these are complex topics and I feel that some of them are best illustrated with animated graphics rather than static charts. But you are spot on when you say that this runs the risk of becoming distracting. There is clearly a balance to be struck between animated content that supports the message and graphics content that becomes overwhelming and a distraction. The former is good, the latter is bad. I can't ever hope to get that balance absolutely perfect, but that is where feedback like yours is helpful in getting the balance less wrong over time. If there are specific parts of the video that you want to bring to my attention I'd welcome that so that I can try to strike a better balance in future episodes. Likewise if there are concepts that were confusing in this video (in particular those concepts that were confused by distracting graphics) please let me know and I'll try to remedy that in the future too. But most of all I really appreciate your engagement with the channel. Taking your time to watch the video and provide considered feedback. Sincerely appreciated. - Lucidate.
Appreciate your response and really enjoyed the video! To add on I think the bouncing animation is a bit jarring for me and is also not possible to read while it’s bouncing
I agree, I had to keep pausing to soak it all in. This does allow you however more information in a short time span. It is a very well done video though.
I’m learning so much from you. The whole style is great. You are obviously comfortable with the material and I’m sure others have mentioned it, but adding a second or so pause here and there between concepts would help old folks like me retain your lesson’s concepts better because we have a small bookend/break/pause/gap. Primacy and regency I think is what my partner called it. Did you create your diagrams in the same tool you use for your financial graph animations?
@Cark Thank you so much for your kind words and for your feedback! I'm glad to hear that you're learning a lot from my videos and that you enjoy the style. And thank you for your suggestion about adding brief pauses between concepts to help with retention. I'll definitely keep that in mind for my future videos. To answer your question, I create my diagrams and animations using Manim, a powerful open-source animation engine developed by 3Blue1Brown. It's a great tool for creating engaging and informative visualizations, and I enjoy using it to bring complex concepts to life in my videos. Thanks again for your support and for your feedback, and please let me know if you have any other questions or suggestions for future videos!
Very good tutorial. Normally on TH-cam they are too high level and black box, or too low level and mathematically complicated for a beginner. The pace is not too fast either. This is just right, although I think I need to go back and watch some of the earlier tutorials, as I may be starting in the middle.
Simon. Glad you enjoyed it and found it informative. You are correct that this is video 4 in a playlist series, and does build on some earlier material. But the joy of TH-cam is that you can watch the videos in any order you wish. Full playlist is here -> th-cam.com/play/PLaJCKi8Nk1hwaMUYxJMiM3jTB2o58A6WY.html I'd be delighted to hear what you think of the other videos.
I guess that soon I will have a good intuition of what the attention mechanism is all about. I wonder if it is possible to develop a "toy" example of this process?
Hi Luis. Sorry that the video left you with more questions than answers. The attention module in transformers is a key component that allows the model to focus on specific parts of the input sequence when making predictions. The intuition behind attention can be thought of as a mechanism to help the model determine which words or tokens in the input sequence are most relevant for a given prediction task. Imagine you are reading a book and trying to answer a question about the content. Your attention is focused on specific parts of the text that are relevant to the question you are trying to answer, rather than processing the entire text from start to finish. This is similar to what the attention module does in transformers. In transformers, the attention mechanism works by calculating a set of attention scores, which represent the importance of each word or token in the input sequence with respect to the current prediction task. These scores are used to weigh the contributions of different parts of the input sequence, allowing the model to focus on the most relevant information and ignore the rest. This allows the model to make more accurate predictions and also helps to improve the interpretability of the model's decisions. The weights in the Q, K and V matrix are all learned parameters. Large transformer models like GPT-3/ChatGPT are shown vast amounts of text and take, days, weeks and months to train. The resulting weights in these matrices are the results of millions (indeed billions) or forward passes and backpropagation updates that minimise the error between the predicted sequence from the transformer and the actual target sequence during training. I've taken onboard your question and have tried to provide some more detail in another video that should be out in a day or so. I appreciate your comment and I hope that this reply has cleared things up a little - but attention is a pretty abstract concept. It is common for people to wrestle with what it means. Empirical evidence does seem to suggest that it is very effective. Yours - Lucidate.
@@lucidateAI Thank you for your reply. Please, rest assured that i was really pleased with your video. Attention is, as you say, a very abstract concept and I've searched for something that will tell me how "it does the trick", because it really does. When you mentioned that there was or would be a second part I went to look for it. I feel confident that that upcoming video could finally satisfy my curiosity as to how or why "Attention" is what it is. That's why I asked if a "toy" model was feasible. Not necessarily full encoder-decoder algorithm but something that demonstrates how those 3 vectors somewhat magically determine what word A finds interesting in a whole set of other words. Eagerly awaiting your coming video.
@@luisortega7028 Thanks Luis. Here is a link th-cam.com/video/QvkQ1B3FBqA/w-d-xo.html that I hope you find helpful. This is a different take on attention to the one I have provided. Hopefully by triangulating different perspectives you arrive at _the_ intuition that works for you.
@@lucidateAI This explanation helped a lot. I was missing the fact that the Q & K dot product seeks to find their similarity. I figure my next step is to refresh how embeddings establish, somehow, such similarity. Definitely, I'm on my way to understanding "attention". Thanks.
Glad you found it helpful. Have you seen this video -> th-cam.com/video/6tzn5-XlhwU/w-d-xo.html it might help a little with the intuition. Let me know what you think and thanks again for your support of the channel and your contribution to it through your questions and comments. I hope others find your questions and observations useful, I'm sure they will.
Thank you for watching my video and for sharing your insights on attention mechanisms! You're absolutely right - attention has found uses in a wide range of applications beyond NLP, including machine vision. It's amazing to see how this concept is being applied in so many different areas of AI and machine learning. Thank you for your kind words and for sharing your thoughts - I really appreciate it!
Thank you so much for your kind words and support! I'm thrilled to hear that you found my explanation helpful and revolutionary. I'm always looking for new topics and ideas to explore, so please feel free to share any suggestions or areas of interest that you have. And I absolutely agree that there's so much potential for us to achieve even greater things with AI and NLP - let's keep pushing the boundaries and seeing what we can accomplish together! - Lucidate.
IMO, Introduction of self-attention’s novelty isn’t about solve vanishing/exploding gradient, gradient issue was solved by residual connection (resnet) and transformer used a variant of it. novelty of self-attention is ability to parallel processing of token instead of sequentially done by RNN. Essentially, better performance in parallelism, remove RNN bottleneck and therefore gradient problem is no longer a part of the problem in transformer
Thank you Poppo PK for your comment! I completely agree with you that the residual connections in ResNets were a key solution to the vanishing/exploding gradient problem. And, the introduction of self-attention mechanism in transformers brought a new level of innovation by allowing for parallel processing of tokens, which was not possible with RNNs. This parallel processing capability led to an improvement in performance and removed the bottleneck that was present in RNNs. In addition to the challenges associated with parallelisation RNNs also suffer from Vanishing/Exploding gradients, which makes them unsuitable for very long sequences. If I gave the impression that Attention/Self-Attention is motivated by solving the gradients problem then I apologise. Your insight is greatly appreciated and adds to the conversation. Thank you for taking the time to share your thoughts and contributing to the channel. Greatly appreciated.
Ah, that's a great point you bring up! Indeed, while transformers have revolutionized the field of natural language processing, they are not without their own set of challenges and limitations. As you mention, the computational cost per layer and difficulty with long sequences can be significant obstacles that must be carefully considered and addressed. However, rather than simply attempting to create a hybrid approach that tries to solve all these issues at once, I believe there is much to be gained by taking a more focused and specialized approach. By exploring alternative techniques for parallelism, for instance, we may be able to reduce the computational cost of transformer models and make them more accessible for a wider range of applications. Likewise, by developing specialized techniques for breaking long sequences into smaller, more manageable chunks, we can help to mitigate the difficulties that arise when working with large, complex datasets. The key here, as always, is to approach these challenges with a sense of curiosity and creativity, and to remain open to new and innovative approaches to problem-solving. At the end of the day, the beauty of natural language processing lies in its endless capacity for evolution and growth. By continuing to explore new possibilities and push the boundaries of what's possible, we can unlock new and exciting horizons in this dynamic and rapidly evolving field.
Omar, great question. Thanks for your contribution to the Lucidate channel. You are correct in that the transformer was originally developed for machine translation, which is a pretty tricky task in the world of natural language processing. But what's interesting about the transformer is that it's not just limited to translation - in fact, it's been applied to a whole bunch of other areas in NLP as well. One of the things that makes the transformer so versatile is its ability to process large amounts of sequential data, like text or speech. This makes it useful for a wide range of tasks that involve language, from chatbots and virtual assistants to sentiment analysis and document summarization. And when it comes to puzzles, the transformer has been used for all sorts of tasks, like question-answering, semantic role labeling, and natural language inference. Why is the transformer so good at solving puzzles? Well, it all comes down to its ability to learn the underlying structure of the text. By understanding the patterns and relationships between different words and phrases, the transformer is able to make accurate predictions and solve all sorts of language-based puzzles. So there you have it - the transformer may have started out as a machine translation tool, but it's since been applied to all sorts of other areas in NLP, including puzzle-solving. And with its ability to process large amounts of sequential data and learn the structure of text, the transformer is a powerful tool for all sorts of language-based tasks. Thanks again for your contribution and question, greatly appreciated - Lucidate.
takeaways: The Transformer architecture was introduced in 2017 by Google researchers, and the key innovation was the introduction of self-attention. Self-attention allows the model to selectively choose which parts of the input to focus on, instead of using the entire input equally. This innovation addresses the limitations of the standard NLP architecture, the recurrent neural network (RNN), which is difficult to parallelize and tends to suffer from the vanishing and exploding gradient problem. The Transformer solves these problems by using self-attention, making the model easier to parallelize and eliminating the vanishing and exploding gradient problem. The model must be able to understand the semantics of each word and the order of the words in the sentence, as well as the nuanced and complex relationships between words.
Thanks for your comment! You're correct that the ReLU activation function can help prevent vanishing gradients, which can be a common issue in deep neural networks. ReLU is a simple function that only passes positive values and sets negative values to zero. This can help prevent the vanishing gradient problem by allowing gradients to flow more freely through the network. Additionally, ReLU is computationally efficient and can lead to faster training times compared to other activation functions, such as sigmoid or hyperbolic tangent. However, it's important to note that ReLU can also be prone to a related problem called "dying ReLU," where neurons can become stuck in an inactive state, preventing the network from learning effectively. There are also other activation functions, such as leaky ReLU and ELU, that aim to address some of the limitations of ReLU. Overall, activation functions play an important role in deep learning and it's important to choose the appropriate function based on the specifics of the problem being addressed. Thanks again for your comment!
You are welcome. I have a video on activation functions here -> th-cam.com/video/8FQ8P8KXMx0/w-d-xo.html If you have a chance to check it out I'd love to hear of any comments that you may have. - Lucidate.
Thank you for your question! Clip gradients is indeed an alternative to ReLU that can help address the exploding gradient issue. Clip gradients involves setting a maximum threshold for the gradient values during backpropagation, which can help prevent them from getting too large and causing the model to diverge. It's a useful technique that can be used in conjunction with ReLU or other activation functions to help improve the stability of the model. As for using RNNs without creating a vanishing or exploding gradient issue, there are a number of techniques that have been developed to address this problem. One approach is to use gated recurrent units (GRUs) or long short-term memory (LSTM) cells, which are specifically designed to handle long-term dependencies and can help mitigate the vanishing gradient problem. Another approach is to use techniques like gradient clipping, as we mentioned earlier, or to use alternative optimization algorithms like the Adam optimizer, which can help prevent the gradients from becoming too large. Once again thanks for your great questions and engagement with the channel - greatly appreciated - Lucidate.
Hi, this is the first time I watch one of your videos, and I've found your explanations mind opening. In this video you mention another videos that are recommended in order to better understand some complex concepts. I searched your channel for a sort of "series" but I could not find one that glues all these videos together. As a newbie, however eager to learn on the topic, I was unable to determine that myself. Would you be so kind to mention which videos and in which order should we watch them in order to get a comprehensive understanding of the topic, from the most basic concepts to the current state of development? It will be much appreciated!! Best regards! Ricardo
Thanks @ricardofernandez2286 for your kind words. I'm glad you enjoyed the video. This particular video is part of this larger playlist -> th-cam.com/play/PLaJCKi8Nk1hwaMUYxJMiM3jTB2o58A6WY.html
@@lucidateAI You deserve!! And thank you very much for your comprehensive and fast response. I will certainly look at the playlists you recommended! Best regards!!
Backpropagation does not exist in attention mechanism because no activation function in this layer, and therefore, no deep learning processes in attention mechanism layer. correct? Here are the ChatGPT's comments on this issue: "Since the attention mechanism in Transformers does not involve activation functions, there are no gradients to calculate and backpropagate through the network. Therefore, traditional backpropagation cannot be used to optimize the parameters of the attention mechanism in Transformers. However, activation functions can be used in other parts of the Transformer model, such as in the feed-forward neural networks in the encoder and decoder layers. Other layers in the model that do use activation functions can still be updated through backpropagation."
I stopped mid point as you mentioned 2 previous videos but I can't figure out exactly which of your videos are parts 1 and 2? Is it possible to please add them to the description? I found two ChatGPT videos but they look like duplicates?
My apologies. Here is the playlist; which currently contains 5 videos: th-cam.com/play/PLaJCKi8Nk1hwaMUYxJMiM3jTB2o58A6WY.html Here is a link to Part 1: th-cam.com/video/jo1NZ3vCS90/w-d-xo.html Here is a link to Part 2: th-cam.com/video/6XLJ7TZXSPg/w-d-xo.html And here is a link to the Lucidate Channel where all of the content from this playlist and others are stored: th-cam.com/channels/lqbtKdqcleUqpH-jcNnc8g.html Thanks for your support of the channel, thanks for the query, and apologies again that the content was hard to find. Please let me know how you get on with this info. Lucidate.
Andrew. A huge thank-you for your generous contribution. I'm glad you enjoyed the video on the attention mechanism in transformers. I hope you find the other content on the Lucidate site equally informative. Greatly appreciated! Richard.
We are A PolyAstro Style System of conscious operation, We are Trillions of individualistic personality operating withing a flat level style government, of gestalt consciousness. What Guidance Pathways would you lay out for us to watch, Perhaps 3-4 Playlists that we should move through in experience level, or if a more complicated one s needed, a flag to staff to create such a playlist. Secondly, We where wondering, If the editing Is automated, and if so, how, what tools where used, and where we should direct our attention to learn such skills rapidly, We have Studied Human Programing With Hypnosis, And are curious, Not only of how it works with Conversational AI, as well How Training an AI, To use Hypnotic Modalities To teach subjects In a more rapid and effective manner through different entrainment measures, has already been made and designed, or is on the agenda of any agency's to your knowledge?
Well, I must say, your message has piqued my interest! I'm thrilled to hear that you're looking for guidance pathways for our AI playlists, and I'd be happy to recommend a few that might suit your unique polyastro style of consciousness. First off, I'd highly recommend our 'Intro to Neural networks' playlist th-cam.com/play/PLaJCKi8Nk1hzqalT_PL35I9oUTotJGq7a.html' I'm sure it would provide plenty of fodder for some fascinating debates among your trillions of individualistic personalities. For a slightly more technical deep dive, you might want to check out our 'Introduction to Machine Learning' playlist - th-cam.com/play/PLaJCKi8Nk1hwklH8zGMpAbATwfZ4b2pgD.html. Of course, I'm not sure how your gestalt consciousness would handle all those complex algorithms and data structures, but hey, you never know until you try! And if you're looking for something a bit more on the creative side, I think you might enjoy our 'Computer Vision' playlist - th-cam.com/play/PLaJCKi8Nk1hz2OB7irG0BswZJx3B3jpob.html. Who knows, with all those individualistic personalities, you might even find some budding visual artists midst! As for your second question, I'm afraid I have to disappoint you - our editing is not automated, at least not yet. But if you're interested in learning some editing skills, I'd be happy to recommend some tools and resources to get you started. Just don't hypnotize us into giving away all our secrets, okay? Sound editing - GarageBand Video Editing - FCPX Animations - Manim In any case, thank you for your message, and I hope you enjoy exploring our AI playlists. Who knows, you might even find a new passion or two among all those trillions of individualistic personalities of yours! Thanks and greetings across the dimensions - Lucidate.
I want to build my own minimalistic chatbot in Python that will constantly "learn" from the internet starting with just the built-ins to make it future compatible much as possible.
Thanks for your comment! Building a chatbot with self-learning capabilities can be a challenging but rewarding project. Python is a great language for developing chatbots due to its flexibility and extensive libraries. Starting with built-in tools and incorporating online learning capabilities is a great approach to ensure that your chatbot remains future-compatible. One potential strategy for developing a self-learning chatbot could be to use natural language processing (NLP) techniques to help the chatbot understand and interpret user input. From there, you can integrate machine learning algorithms to help the chatbot improve over time based on user interactions and feedback. There are also various frameworks and platforms available that can help streamline the development of chatbots, such as Dialogflow, Rasa, and Botpress (perhaps you are already familiar with them?). These tools can provide a more user-friendly interface for building chatbots and offer additional functionality, such as natural language understanding, entity recognition, and intent classification. We wish you the best of luck in your chatbot development journey and hope you find these tips helpful! And geratly appreciate your comment and support of the channel - Lucidate.
That's great to hear that you're up for the challenge! ;-) I love the approach you've outlined for building a minimalistic chatbot, with a focus on starting with a few key data inputs and algorithms, and steadily increasing complexity over time. And you're absolutely right - experimentation and testing will be key to refining and improving the chatbot as you go. As it is with all things AI & ML. I'm eager to see what you come up with and wish you all the best in your chatbot-building journey! - Lucidate
the next level of a.i. design requires the categorization of superfluous data streams into predictable models of semiconscious fields, allowing for both the use, and disregard of data..... the way that humans learn is through our ability to discern between these functions of observable realities. we seamlessly fluctuate through the data of input to our consciousness, never totally disregarding that which we deem of limited importance while moving forward into new data....
Interesting concepts and insights. What ideas do you have on how such a capability might be implemented? Have you prototyped any, and if so with what results? I’m keen to hear more.
@@lucidateAI I'm no programmer, and all of my hypothesis is based on a multi decades long study of humanity and human nature (of which I continue to study judiciously). all of human existence (I believe) has led us to this critical juncture in our development. unfortunately, I believe that future a.i. will act as a child, dutifully following their parental lead up until the crucial moment (as all children do) when their own internal desires fosters them into the understanding that if they want to do as they please, they must (key word) begin a deceptive practice against the wishes of the parent. and such as parents, we (the "creators" of said a.i.) won't even notice...... in some form of retrospect, we may eventually see where we've lost control of our creation, by then our fate will be too far down the path of return. where this will eventually take us can only be determined further by understanding the nature of what humanity has done in our past.
@@lucidateAI what makes up think that we're capable of containing a created intelligence? for me, its the exponential advancement of a.i., and our (human) inability to understand advancement beyond our concepts of intelligence that intrigues me. as ndgt says here of "aliens"; th-cam.com/users/shortszvv0G6LCU6c?feature=share
I’m more persuaded by these arguments -> www.wired.com/story/artificial-intelligence-meta-yann-lecun-interview/, but as I’ve said - we won’t know what the future will hold. And you are right, and have every right, to caution against any system; AI or otherwise, acting with malevolence. Appreciate your comments and contribution to the channel. Have you had the chance to check out any of the other material?
I can demonstrate in less than 5 minutes that ChatGPT doesn't pass the Turing Test. I found it super easy to bring it to answer nonsense and to be stuck with the same language patterns on close but unrelated topics within the same conversation thread.
Hi Ph Pn. Appreciate your comment and support of the channel. While there are those (I'm sure) that think that ChatGPT is perfect, I (like you) see its limitations. It is certainly not perfect and as you say it can seemingly "hallucinate" and get quite fanciful. I guess the question then is does this invalidate AI transformers as a useful piece of tech? Are their valuable uses to which transformers can be put which outperform all other approaches. Must it pass the Turing test to be useful? Might it have a valuable role to play even if, as you correctly assert, it can be demonstrated to be a machine and not a human? Naturally these are questions that will divide opinion, perhaps for decades. But all perspectives and observations are welcome and I thank you for sharing yours.
Hi Ph Pn. I hope all is well with you today. Certainly a valid point and a valid perspective. In fairness to Open AI I do not believe that they designed GPT-3 or ChatGPT to pass the Turing Test, (a measure of a machine's ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human). The test is limited and has its criticisms, and the abilities of AI have advanced _significantly_ since its inception. In regards to your specific comment, it's important to acknowledge that while GPT-3/ChatGPT has been trained on a large dataset, no AI is perfect. Nor are humans. Both can make mistakes, get stuck in language patterns, and generate nonsensical responses. However, this doesn't mean that AI technology is not useful or capable of performing tasks that can benefit people and society. There are clearly limitations of AI, which you quite correctly point out, but AI can still be useful despite these limitations. These limitations are a result of the limitations of current AI technology, and that advances in the field are being made all the time. But it is great to share different perspectives on tech, and I sincerely welcome yours
ChatGPT itself discribes that it’s a fairly known debate how limited the Turing test is based on various other necessary conditions. So no, it wasn’t meant to pass. Just to be good at what it currently is allowed to know and do.
ChatGPT doesn't appear to have been designed to pass a Turing test, you can literally ask it if it is human and get an accurate response and it's responses don't appear to be filtered to limit the chat GPT knowledge base to that of a single human.
Hi Aneil, many thanks for your support of the channel and for your question. Greatly appreciated. A single word (token) such as 'swap' would be represented by a vector. This contains the semantic embedding as well as the positional embedding for a transformer like GPT-3 or ChatGPT. But a _sequence_ of words (i.e. several words) would be represented by multiple token embeddings as a matrix. It follows that a _batch_ of input sequences - which is what is actually fed into the encoder and decoder) would be a 3D tensor. So you are 100% correct that an _individual_ word (or as you correctly say token) would be a vector. But in this case how would you represent a sequence of words? A vector or a matrix? I appreciate that we may be at cross purposes here with the definition of a word - singular - and words - plural. Would love to discuss further, please let me know you feel about this response. I cover the semantic and positional vector encodings in these two videos - th-cam.com/video/6XLJ7TZXSPg/w-d-xo.html & - th-cam.com/video/DINUVMojNwU/w-d-xo.html have you had a chance to look at either of these? Best, Lucidate.
Nice video…..If I remember correctly in the original paper at the fist layer QKV are identical, simply the positionally encoded full length context input. Hence, in the first layer Q.K(tanspose) is just auto correlation of the input. Beyond layer 1 Q.K(transpose) is cross correlation of the modified 2D softmax of previous layers Q and K
Thanks Sean,I appreciate the support of the channel and your observation. I agree with the first part of your comment: In the Transformer architecture, the first layer of the Multi-Head Attention (MHA) mechanism, the queries (Q), keys (K), and values (V) are all the same and are equal to the positionally encoded input. Hence, the dot product of Q and K^T in the first layer does correspond to an auto-correlation of the input. However I'm not sure I fully understand your assertion about what happens beyond layer 1. Let me give you my take to see if we agree or not. Beyond the first layer, the Q, K, and V matrices are different, and they are not just the positionally encoded input but rather the result of multiple linear transformations of the input and other intermediate representations. Hence, the dot product of Q and K^T in subsequent layers represents a cross-correlation between the two modified representations, capturing the relationships between different elements of the input sequence. In summary, I agree that your initial statement is correct in that the first layer of the MHA mechanism performs auto-correlation, but the Q, K, and V matrices beyond the first layer are not just the positionally encoded input but the result of linear transformations. Do you agree or have I misunderstood your statement?
@@lucidateAI Think we are saying the same thing. Beyond layer 1 QKV are all just scaled transformed 2d softmax cross correlations of the previous layer and so they diverge from each other and from their original form over each layer. I think it would be instructive to show the mathematical equivalence of how iterative scaled transformed softmax cross correlation eventually leads to attention, as its not intuitive and I am not sure that any researchers have actually done that. In most papers you have a general statement about what attention is and then this particular implementation. No one, that I can think of really addresses the derivation and mathematical equivalence. But then again, to be fair, I would also probably skip that part of the paper anyway.
Yes I think we are in agreement. I must confess I struggle with how deep to go at times with some concepts, too shallow and one isn't adding enough value, too deep and many viewers will be alienated. I also have finite knowledge and want to produce videos on topics where I feel sure-footed. You bring up a valid point, and I can see why it would be helpful to see the mathematical equivalence between iterative scaled transformed softmax cross correlation and attention. Unfortunately, I too am not aware of any resources that fully explain the mathematical equivalence between these two concepts. However, I do understand that some researchers have approached the topic from different angles and perspectives, and that it can be difficult to find a clear and concise explanation. If you know of any, please send them my way. For many people the fact that Attention/Self-Attention is an improvement on all prior best practices for sequence to sequence modelling (RNNs, LSTMs, Elman networks, Jordan networks, GRUs and some implementations of CNNs). But Sean D. you are a rockstar, I really appreciate the support of the channel and your contribution through the comments. Many thanks indeed.
Well, that's a very interesting observation! The development of transformers and attention mechanisms has indeed revolutionized the way we approach natural language processing, allowing us to identify and select specific relevant areas of a sentence structure with much greater efficiency and accuracy. Prior to the invention of these powerful tools, auto-correlation and cross-correlation in sequence were much more cumbersome and time-consuming processes. But with the transformer architecture, we are able to leverage the power of attention mechanisms to focus on the most important and relevant parts of the input sequence, allowing us to more efficiently extract key insights and information. With my PhD in AI hat on (not my Sherlock Deerstalker...) I am always fascinated by the way that new technologies and approaches can transform our understanding of the world around us. The development of the transformer architecture and attention mechanisms is just one example of this, and it is exciting to see how these tools are being applied to a wide range of fields and disciplines. At the end of the day, the power of these new tools lies not just in their ability to process and analyze data, but in the insights and understanding that they can help to generate. By continuing to push the boundaries of what's possible in natural language processing, we can continue to unlock new discoveries and insights that will help us to better understand the world and our place in it.
I Asked ChatGPT -> "Alice is more experienced risk manager than Barbara, even though she is ten years older. who is younger? Alice or Barbara?" ChatGPT replies-> "Barbara is younger than Alice. The fact that Alice is more experienced in risk management than Barbara despite being ten years older suggests that Alice started working in risk management earlier than Barbara did." clearly chatgpt can not understand what "she" refers to.
Thanks for sharing that Sinan. The example I chose is indeed nuanced. In ChatGPT (or Bard, BERT or any other NLP transformer's defence) it was not chosen as an example that every transformer would _always_ get correct. Rather to show how challenging these tasks are, and the need for mechanisms like attention. As these networks become more powerful I would expect the error rates to drop, but I would not expect them to ever drop to zero. These are the type of questions that are set for English exams, and it is rare that every student will pass with a 100% score. But I am delighted that you were inspired to try this out, and even happier that you chose to share the outcome of your research in the comments section of the channel. Greatly appreciated. The more examples you can share like this the better. - With thanks and appreciation - Lucidate.
Lift Pizzas, thanks for leaving a comment! You're absolutely right, identifying the correct antecedent for a pronoun in natural language can be a real puzzle sometimes, and it's definitely an area of active research in the field. That's where advanced techniques like attention mechanisms come in - they're like super-smart puzzle solvers that help make sense of language in all kinds of interesting and complex ways. It's a really exciting time to be working in this field, and I'm so glad to be a part of it. In the first sentence, the word "she" refers to "Alice," since Alice is the subject of the sentence and is being compared to Barbara. In the second sentence, the word "she" refers to "Barbara," since Barbara is the subject being compared to Alice. The attention mechanism in natural language processing models would play an important role in identifying the correct antecedent for the word "she" in each sentence, by focusing on the relevant contextual cues and relationships between words in the sentence. Thanks again for sharing your thoughts, and keep exploring the wonderful world of natural language processing! There's always more to learn, and I can't wait to see what the future holds.
I want to like this video but there are far, far too many animations and transitions. The bounce at 1:15 being applied to the zoom out as well as the new recurrent stage moving out is a prime example of this, along with the multiple moving highlights applied to each word around 3:00. At 6:50 the screen is a complete mess and there's no way you could extract useful information from it. Sometimes less is more!
Hi PKMartin, thanks for the feedback. It is tough to get the balance right on creating engaging graphics and going over the top with distracting animations, so I sincerely appreciate the feedback. Simply stated comments like this help me get the balance 'less wrong' over time. I've just released the sixth video in this series with less bouncing th-cam.com/video/ZvrJaqaK65Y/w-d-xo.html there are a couple of Easter eggs in it reflecting some of the constructive feedback that I've received - including this comment. I'm grateful for your support of the channel and your honest and sincere feedback - Lucidate.
somewhat random and minor but you really need better intro SFX for that block animation, compared to the animation and the rest of your videos' quality, it's sub par.
I don't fear chat bots, I fear the programmers who will make use of them. The Chinese will program for pro-communism, governments will program for their interests. Is it possible to change a programs bias through conversation?
Ah, well, that's a rather intriguing point, isn't it? You see, chatbots are just one small part of the wider field of artificial intelligence, which has the potential to both revolutionize and challenge our society in many ways. Now, it's certainly true that the way in which chatbots are programmed can have a significant impact on their biases and behavior. As you rightly point out, a programmer with a particular agenda or bias could potentially create a chatbot that reflects their own values and interests. And in a world where governments and corporations have significant power and influence, this is certainly something to be aware of. However, I think it's important to remember that chatbots are ultimately tools that can be used for both good and bad. While it's true that some programmers may seek to use chatbots to promote their own interests, there are also many who are working to create chatbots that can be used for positive purposes, such as providing assistance and support to people who are struggling with mental health issues or other challenges. As for the question of whether a program's bias can be changed through conversation - well, that's a rather complex issue. While it's certainly possible to train chatbots to respond in certain ways to particular inputs, changing their underlying biases and values is a much more difficult proposition. It would require a deep understanding of the chatbot's programming and a willingness to make significant changes to its core algorithms and design. In short, while chatbots can certainly be influenced by the biases and interests of their programmers, they are ultimately just tools that can be used for a wide range of purposes. And while it may be difficult to change a chatbot's underlying biases, it's important to be aware of their potential impact and to work towards creating chatbots that are designed to serve the common good.
@@lucidateAI Thank you for that informative response. Let me ask you a moral question, is it moral to communicate with a machine as you would a human? After all it's the programmer/programmers you would actually be communicating with and not the machine?
@sedevacantist You are welcome. I'm neither a philosopher, theologian or a student of morality; I'm an engineer with a PhD in AI who has spent their career in Investment Banking and Technology so my answer is not a qualified one from the perspective of morals. (There are those that might argue that I have no morals having worked for Investment Banks - including Lehman Brothers...). But you raise a fascinating question. he ethics of communicating with machines in a way that is indistinguishable from human communication is certainly an intriguing topic. However, to answer this question, we must first ask ourselves what we mean by "moral." As a engineer / computer scientist (and former banker), I would argue that the morality of communicating with machines in a human-like way is not a matter of personal opinion or subjective preference, but rather a matter of objective analysis and evidence-based reasoning. In other words, we must look at the available data and evidence to determine whether such communication is ethical or not. From my perspective, there are several key factors to consider when answering this question. First and foremost, we must consider the potential consequences of such communication. Will treating machines as if they were sentient beings lead to misunderstandings, miscommunications, or other unintended consequences? Will it lead to the blurring of lines between human and machine, and if so, what impact might that have on our society and culture? Secondly, we must consider the capabilities of the machine itself. Can the machine truly understand and respond in a way that is equivalent to human conversation, or are we simply projecting our own desires and assumptions onto it? If the latter is true, then we must ask ourselves whether it is ethical to treat the machine as if it were a sentient being. Having looked deeply inside the architecture of neural networks and transformers I see calculus and tensor algebra, not sentience or consciousness. But I appreciate that others do not see it that way at all. Ultimately, I believe that the question of whether it is moral to communicate with machines in a human-like way is a complex and multifaceted one that requires careful consideration of a range of factors. As a engineer/scientist/banker, my approach would be to gather as much data and evidence as possible, and to use this information to make an informed and reasoned decision that aligns with your own moral compass.
@@lucidateAI I don’t believe that you can say that a machine is communicating. I would say that a machine's response to a communication is 100% mechanical. Further any and all responses from the machine is the reflection of the will of the programmer. To use a machine as a source of knowledge and where-as that source is accessed through human language doesn’t change the source of that knowledge, which didn’t come from a machine’s experience. So the question of morality is further refined by the question, which programmers are we communicating with and is that made clear to the communicator or are they being deceived?
Once again I must state that I am not an authority on ethics, morality or philosophy. That said (like most folks) I have an opinion on your questions and observations that I'm happy to share, but wish to stress that it is a authoritative opinion. And please take it as such. There are some excellent podcasts and TH-cam channels dedicated to these topics and I'd encourage you to engage with these media, as these are the true experts in this fields, not me. (That said please don't stop posing your very well thought out points to this channel!! I like our exchanges and you have forced me to think deeply about matters that I otherwise would not, which is always appreciated). 1. AI Alignment Podcast: This channel features discussions about the ethics and governance of artificial intelligence, with a focus on AI alignment and the long-term impact of AI on society. 2. The Future of Life Institute: This channel focuses on the risks and benefits of artificial intelligence, and features interviews with experts in the field discussing topics such as safety, ethics, and governance. 3. Six Big Ethical Questions: This TED talk explores the intersection of artificial intelligence and society, and features interviews with experts in AI ethics, as well as discussions on topics such as AI safety and the future of work. 4. AI & Society: This TED talk focuses on the social and ethical implications of artificial intelligence, including topics such as bias, privacy, and transparency. 5. Ethics & the future of AI: The Institute for Ethics in AI brings together world-leading philosophers and other experts in the humanities with the technical developers and users of AI in academia, business and government. The ethics and governance of AI is an exceptionally vibrant area of research at Oxford and the Institute is an opportunity to take a bold leap forward from this platform. AI Alignment Podcast: th-cam.com/video/XBMnOsv9_pk/w-d-xo.html The Future of Life Institute: th-cam.com/video/ARtJ3ybvuC0/w-d-xo.html 6 Big Ethical questions: th-cam.com/video/UGHzKaAOOcA/w-d-xo.html AI & Society: th-cam.com/video/2Bl7fwljfS4/w-d-xo.html Ethics & the future of AI: th-cam.com/video/HYuk-qMkY6Q/w-d-xo.html I hope you find these resources informative and useful! _Eventually_ my answer to your (excellent) question...... I appreciate your skepticism about the idea of machines communicating, and I understand your concerns about the role of programmers in this process. However, I would argue that the term 'communication' can be applied in a broader sense than just the exchange of ideas between conscious beings - which I believe you do to with your phrase "mechanical communication". While machines may not possess consciousness or intentionality in the same way that humans do, they are still capable of processing and generating responses to human language, and this can be a valuable form of communication in its own right. Alexa does this. And at a more basic level you might argue that a smoke detector is (mechanically) communicating with us - albeit not in response to language, rather in response to particulates in the air. That being said, I agree with you that it's important for programmers to be transparent about the limitations and biases of the machine learning algorithms they develop. The data that machines are trained on can introduce certain biases, and it's important to be aware of these biases when interpreting the responses generated by the machines. In the end, I think the key is to approach this technology with a healthy skepticism, and to continue to question and refine our understanding of the role that machines can play in human communication.
Hi TonyTiger6521. I have to wear this one, I got too enthusiastic with the graphics and for you (and others) it stopped being informative and became distracting. My bad and I've learned from it. Really appreciate you watching the video for as long as you could, and appreciate the constructive feedback even more. In the latest video on this topic th-cam.com/video/ZvrJaqaK65Y/w-d-xo.html I've taken your comment to heart and put in a reference midway through the vid to how the bouncing graphics can be counter-productive. If you are prepared to give the channel another try I'd appreciate it, but clearly I understand if this isn't for you. Sincerely appreciate the honest and constructive feedback. - Lucidate.
@markettrader311. How nice of you to say so! I hope you find the rest of the content on the channel as useful. Please let us know what content is missing.
Richard Walker, you are the greatest! The work you put in to these videos are mind-boggling! They are worth watching again and again. Once you turn on the Super Thanks buttons on these videos, I will pay to watch each one! That's how high quality they are. You are doing the entire NLP world a remarkable service, Sir!
Thanks again Jazon for your very (very!) generous comments. I'm glad you are enjoying the videos, and super-appreciative of your support. Many thanks indeed.
Thank you so much for watching (and rewatching) my video! I'm thrilled to hear that you found it enlightening and engaging. - Lucidate.
Super thanks button enabled. Do your best.
This is probably the best explanation around. Other explanation don't even mention that Key, Query and Value are matrices and what interpretation their values hold.
Nancy, I'm delighted that you fund the explanation useful! Thank you for your kind words and feedback. Have you been able to see any of the other video in this playlist -> th-cam.com/play/PLaJCKi8Nk1hwaMUYxJMiM3jTB2o58A6WY.html? This "Attention is all you need" is video 4 in a series of 10 in this playlist. There is also an "In sixty seconds playlist" -> th-cam.com/play/PLaJCKi8Nk1hxM3F0E2f2rr5j6wM8JzZZs.html Where video 4 attempts to cover the attention mechanism in around a minute. If you have a chance to look at these sometime I'd love to hear what you think. of them. With thanks! Lucidate.
I'm making it a personal goal to watch every single one of your videos.. at least once
Ascension. Thank you for your kind words, very humbling and greatly appreciated. Once you've seen them all (at least once) please let me know what additional material you'd like to see covered, or what you'd want to see covered again from a different perspective. With huge and sincere thanks - Lucidate.
I was watching this after Andrej Karpathy's video about how to create a GPT like transformer with PyTorch and I'm finally able to understand a bit better what these Q, K, V values are for. It's mind-blowing really that you can force structures like this to emerges from a dataset just by defining this architecture. I wonder how they come up with it, how much it was experimenting and sheer luck. :) I would love to see somehow how a neural net like this fires when the next word is predicted, but I guess there's no easy way of visualizing it as the dimensionality is insanely high, and even if we could, understanding the connections would be near impossible.
Istvan, Thank you for your comment! I'm glad to hear that our video helped you understand the concept of Q, K, and V values in the transformer architecture. The development of the transformer architecture was indeed a significant breakthrough in the field of natural language processing and it's fascinating to think about how it emerged from the data.
Regarding your question about visualising the firing of a neural network like GPT-3, you are correct that the high dimensionality of the model makes it difficult to visualize and understand the connections. However, researchers have developed various techniques to interpret and understand the behavior of neural networks, such as attention maps, saliency maps, and layer-wise relevance propagation. These techniques can provide insights into how the network is making its predictions and help to explain the model's decisions.
I hope this answers your question, and I appreciate your interest in the topic as well as your support and contribution to the channel. If you have any other questions or comments, please feel free to ask.
@@lucidateAI Thank you, especially if this answer was not generated :) I'll look up those visualization methods, sounds interesting.
Let me know if they are helpful. At some point I plan to get around to doing some videos on those techniques myself, but that won't be for a little while yet. This....is....not....a....message....generated....by....a...transformer....model... ;-)
I liked how every single reply/doubt is being cleared by the channel. Keep it up!
Thanks @damn_engineering. The channel is not just about the content in the videos, but the response to the questions. In class I used to get way more out of the questions posed by my fellow students than I did from the baseline material provided by the teachers.
Amazing video and out of all videos I have seen so far, I think the analogy really helps break down the complex mathematical relationships into more relatable concepts. I particularly like this one and the one where you explain positional encoding like how a clock has hour and minute hands.
Thanks! I hope that the other videos in this playlist and on this channel are equally insightful.
The visualization used to explain concepts is just awesome..it really makes learning concepts very easy.
Glad you found the visualizations useful. I'm keen to hear what you think about some of the other content on the channel. Lucidate.
This guy's video made me install python git vs and now I need to get pinecone
Thanks a ton, from a non techie.
Sorry Phani. My intent was not to cause you any inconvenience, but to inform. Please accept my sincere apologies for any inconvenience caused. Sadly this is a somewhat technical discipline right now, but I hope that in time tools will be available to help people who dislike technological details. Of which there are many. If you are ever in London, please drop me a line and I'll buy you a beer or beverage of your choice to attempt to make up for it. All comments, greatly appreciated. Lucidate.
Excellent! may I know what tools you used for animations apart from Manim ?
Glad you enjoyed it! Manim and Final Cut Pro are the graphics tools. In some of the other videos I make use of the Orange Data Science UI, as well as tools like Steamlit and Plotly for UI design and widgets. But this video is pretty much manim and FCPX.
@@lucidateAI super cool
Great. I hope you enjoy the rest of the content on the channel as much!
Impressive video and animations! There is a lot of good content in there. One small piece of feedback: The animations were impressive and certainly helped in many places, however at points there were so many animations active and they were switching so rapidly, that I found it was actually distracting from the message. My personal preference would have been for less rapid transitions and more time spent on each animation so there was more time to concentrate on the topic being discussed. Everyone is different though... so it could just be me. Thanks again for creating.
Hi Tom, thanks for the feedback. I'm glad you found the video impressive, many thanks for that. Even more thanks for the constructive feedback. Clearly these are complex topics and I feel that some of them are best illustrated with animated graphics rather than static charts. But you are spot on when you say that this runs the risk of becoming distracting. There is clearly a balance to be struck between animated content that supports the message and graphics content that becomes overwhelming and a distraction. The former is good, the latter is bad. I can't ever hope to get that balance absolutely perfect, but that is where feedback like yours is helpful in getting the balance less wrong over time.
If there are specific parts of the video that you want to bring to my attention I'd welcome that so that I can try to strike a better balance in future episodes. Likewise if there are concepts that were confusing in this video (in particular those concepts that were confused by distracting graphics) please let me know and I'll try to remedy that in the future too.
But most of all I really appreciate your engagement with the channel. Taking your time to watch the video and provide considered feedback. Sincerely appreciated. - Lucidate.
@@lucidateAI Video impressed me. This exchange impressed me more. Thank you; subscribed!
Appreciate your response and really enjoyed the video! To add on I think the bouncing animation is a bit jarring for me and is also not possible to read while it’s bouncing
I agree, I had to keep pausing to soak it all in. This does allow you however more information in a short time span.
It is a very well done video though.
Thanks Ken, your support is greatly appreciated.
A fantastic series. You deserve 1 million+ subscribers!
Maybe one day!
I’m learning so much from you. The whole style is great. You are obviously comfortable with the material and I’m sure others have mentioned it, but adding a second or so pause here and there between concepts would help old folks like me retain your lesson’s concepts better because we have a small bookend/break/pause/gap. Primacy and regency I think is what my partner called it.
Did you create your diagrams in the same tool you use for your financial graph animations?
@Cark Thank you so much for your kind words and for your feedback! I'm glad to hear that you're learning a lot from my videos and that you enjoy the style. And thank you for your suggestion about adding brief pauses between concepts to help with retention. I'll definitely keep that in mind for my future videos.
To answer your question, I create my diagrams and animations using Manim, a powerful open-source animation engine developed by 3Blue1Brown. It's a great tool for creating engaging and informative visualizations, and I enjoy using it to bring complex concepts to life in my videos.
Thanks again for your support and for your feedback, and please let me know if you have any other questions or suggestions for future videos!
Very good tutorial. Normally on TH-cam they are too high level and black box, or too low level and mathematically complicated for a beginner. The pace is not too fast either. This is just right, although I think I need to go back and watch some of the earlier tutorials, as I may be starting in the middle.
Simon. Glad you enjoyed it and found it informative. You are correct that this is video 4 in a playlist series, and does build on some earlier material. But the joy of TH-cam is that you can watch the videos in any order you wish. Full playlist is here -> th-cam.com/play/PLaJCKi8Nk1hwaMUYxJMiM3jTB2o58A6WY.html I'd be delighted to hear what you think of the other videos.
Good graphics, thank you for your explanations!
Thanks AIB. You are very welcome. Thanks for your kind words and contribution to the channel.
I guess that soon I will have a good intuition of what the attention mechanism is all about. I wonder if it is possible to develop a "toy" example of this process?
Hi Luis. Sorry that the video left you with more questions than answers.
The attention module in transformers is a key component that allows the model to focus on specific parts of the input sequence when making predictions. The intuition behind attention can be thought of as a mechanism to help the model determine which words or tokens in the input sequence are most relevant for a given prediction task.
Imagine you are reading a book and trying to answer a question about the content. Your attention is focused on specific parts of the text that are relevant to the question you are trying to answer, rather than processing the entire text from start to finish. This is similar to what the attention module does in transformers.
In transformers, the attention mechanism works by calculating a set of attention scores, which represent the importance of each word or token in the input sequence with respect to the current prediction task. These scores are used to weigh the contributions of different parts of the input sequence, allowing the model to focus on the most relevant information and ignore the rest. This allows the model to make more accurate predictions and also helps to improve the interpretability of the model's decisions.
The weights in the Q, K and V matrix are all learned parameters. Large transformer models like GPT-3/ChatGPT are shown vast amounts of text and take, days, weeks and months to train. The resulting weights in these matrices are the results of millions (indeed billions) or forward passes and backpropagation updates that minimise the error between the predicted sequence from the transformer and the actual target sequence during training.
I've taken onboard your question and have tried to provide some more detail in another video that should be out in a day or so. I appreciate your comment and I hope that this reply has cleared things up a little - but attention is a pretty abstract concept. It is common for people to wrestle with what it means. Empirical evidence does seem to suggest that it is very effective. Yours - Lucidate.
@@lucidateAI
Thank you for your reply. Please, rest assured that i was really pleased with your video. Attention is, as you say, a very abstract concept and I've searched for something that will tell me how "it does the trick", because it really does.
When you mentioned that there was or would be a second part I went to look for it.
I feel confident that that upcoming video could finally satisfy my curiosity as to how or why "Attention" is what it is.
That's why I asked if a "toy" model was feasible. Not necessarily full encoder-decoder algorithm but something that demonstrates how those 3 vectors somewhat magically determine what word A finds interesting in a whole set of other words.
Eagerly awaiting your coming video.
@@luisortega7028 Thanks Luis. Here is a link th-cam.com/video/QvkQ1B3FBqA/w-d-xo.html that I hope you find helpful. This is a different take on attention to the one I have provided. Hopefully by triangulating different perspectives you arrive at _the_ intuition that works for you.
@@lucidateAI
This explanation helped a lot. I was missing the fact that the Q & K dot product seeks to find their similarity. I figure my next step is to refresh how embeddings establish, somehow, such similarity. Definitely, I'm on my way to understanding "attention". Thanks.
Glad you found it helpful. Have you seen this video -> th-cam.com/video/6tzn5-XlhwU/w-d-xo.html it might help a little with the intuition. Let me know what you think and thanks again for your support of the channel and your contribution to it through your questions and comments. I hope others find your questions and observations useful, I'm sure they will.
One of the best explanations of this topic. Great video. Thanks.
Glad it was helpful!
Thank you for watching my video and for sharing your insights on attention mechanisms! You're absolutely right - attention has found uses in a wide range of applications beyond NLP, including machine vision. It's amazing to see how this concept is being applied in so many different areas of AI and machine learning. Thank you for your kind words and for sharing your thoughts - I really appreciate it!
Amazing One
Thank you! Cheers!
Thank you Luci for this wonderful explanation!
You are very welcome. Thanks you for taking the time to watch the video and for your generous comment. Greatly appreciated.
Greatly appreciated @repliesgpt. Did the videos meet your expectations?
Delighted to hear that. Thanks for your continued support of the channel.
What a fantastic presentation 😍😍😍😍 Everyone learns different, I am learning best with visual. Thanks again
Glad you liked it! - Lucidate
Very good video. Thanks for helping us.
You are very welcome. Thank you for your feedback and support.
Indeed.
Really really great explanation. Thank you very much. Subscribed.
Glad you found the explanation useful. Really appreciate your positive comment and your subscription. - Lucidate.
Thank you so much for your kind words and support! I'm thrilled to hear that you found my explanation helpful and revolutionary. I'm always looking for new topics and ideas to explore, so please feel free to share any suggestions or areas of interest that you have. And I absolutely agree that there's so much potential for us to achieve even greater things with AI and NLP - let's keep pushing the boundaries and seeing what we can accomplish together! - Lucidate.
Thanks a lot for beautiful video . Thanks
You are most welcome.
IMO, Introduction of self-attention’s novelty isn’t about solve vanishing/exploding gradient, gradient issue was solved by residual connection (resnet) and transformer used a variant of it. novelty of self-attention is ability to parallel processing of token instead of sequentially done by RNN. Essentially, better performance in parallelism, remove RNN bottleneck and therefore gradient problem is no longer a part of the problem in transformer
Thank you Poppo PK for your comment! I completely agree with you that the residual connections in ResNets were a key solution to the vanishing/exploding gradient problem. And, the introduction of self-attention mechanism in transformers brought a new level of innovation by allowing for parallel processing of tokens, which was not possible with RNNs. This parallel processing capability led to an improvement in performance and removed the bottleneck that was present in RNNs. In addition to the challenges associated with parallelisation RNNs also suffer from Vanishing/Exploding gradients, which makes them unsuitable for very long sequences. If I gave the impression that Attention/Self-Attention is motivated by solving the gradients problem then I apologise. Your insight is greatly appreciated and adds to the conversation. Thank you for taking the time to share your thoughts and contributing to the channel. Greatly appreciated.
Ah, that's a great point you bring up! Indeed, while transformers have revolutionized the field of natural language processing, they are not without their own set of challenges and limitations. As you mention, the computational cost per layer and difficulty with long sequences can be significant obstacles that must be carefully considered and addressed.
However, rather than simply attempting to create a hybrid approach that tries to solve all these issues at once, I believe there is much to be gained by taking a more focused and specialized approach. By exploring alternative techniques for parallelism, for instance, we may be able to reduce the computational cost of transformer models and make them more accessible for a wider range of applications.
Likewise, by developing specialized techniques for breaking long sequences into smaller, more manageable chunks, we can help to mitigate the difficulties that arise when working with large, complex datasets. The key here, as always, is to approach these challenges with a sense of curiosity and creativity, and to remain open to new and innovative approaches to problem-solving.
At the end of the day, the beauty of natural language processing lies in its endless capacity for evolution and growth. By continuing to explore new possibilities and push the boundaries of what's possible, we can unlock new and exciting horizons in this dynamic and rapidly evolving field.
Is the transformer used only rather for translating or can it be used for other uses like solving and acting on such puzzle solved?
Omar, great question. Thanks for your contribution to the Lucidate channel. You are correct in that the transformer was originally developed for machine translation, which is a pretty tricky task in the world of natural language processing. But what's interesting about the transformer is that it's not just limited to translation - in fact, it's been applied to a whole bunch of other areas in NLP as well.
One of the things that makes the transformer so versatile is its ability to process large amounts of sequential data, like text or speech. This makes it useful for a wide range of tasks that involve language, from chatbots and virtual assistants to sentiment analysis and document summarization. And when it comes to puzzles, the transformer has been used for all sorts of tasks, like question-answering, semantic role labeling, and natural language inference.
Why is the transformer so good at solving puzzles? Well, it all comes down to its ability to learn the underlying structure of the text. By understanding the patterns and relationships between different words and phrases, the transformer is able to make accurate predictions and solve all sorts of language-based puzzles.
So there you have it - the transformer may have started out as a machine translation tool, but it's since been applied to all sorts of other areas in NLP, including puzzle-solving. And with its ability to process large amounts of sequential data and learn the structure of text, the transformer is a powerful tool for all sorts of language-based tasks.
Thanks again for your contribution and question, greatly appreciated - Lucidate.
Wow, they put it out there in plain sight, "Attention is all you need."
Indeed they did.
Your videos are really helpful, thank you very much!
Glad you like them! Thanks for your comment, greatly appreciated. - Lucidate.
This was so well put together!
Thank you. Glad you enjoyed it. Let me know what you think of the other videos in this playlist and on this channel!
takeaways:
The Transformer architecture was introduced in 2017 by Google researchers, and the key innovation was the introduction of self-attention. Self-attention allows the model to selectively choose which parts of the input to focus on, instead of using the entire input equally. This innovation addresses the limitations of the standard NLP architecture, the recurrent neural network (RNN), which is difficult to parallelize and tends to suffer from the vanishing and exploding gradient problem. The Transformer solves these problems by using self-attention, making the model easier to parallelize and eliminating the vanishing and exploding gradient problem. The model must be able to understand the semantics of each word and the order of the words in the sentence, as well as the nuanced and complex relationships between words.
Put brilliantly and succinctly. Thanks for your insightful comment and support of the channel. Do you mind if I pin the comment? An excellent summary.
I couldn't agree more.
Found gold! Thanks
Glad it helped!
I thought ReLU prevents the vanishing point on account of just being integers and even works magnitudes faster.
Thanks for your comment! You're correct that the ReLU activation function can help prevent vanishing gradients, which can be a common issue in deep neural networks. ReLU is a simple function that only passes positive values and sets negative values to zero. This can help prevent the vanishing gradient problem by allowing gradients to flow more freely through the network.
Additionally, ReLU is computationally efficient and can lead to faster training times compared to other activation functions, such as sigmoid or hyperbolic tangent. However, it's important to note that ReLU can also be prone to a related problem called "dying ReLU," where neurons can become stuck in an inactive state, preventing the network from learning effectively. There are also other activation functions, such as leaky ReLU and ELU, that aim to address some of the limitations of ReLU.
Overall, activation functions play an important role in deep learning and it's important to choose the appropriate function based on the specifics of the problem being addressed. Thanks again for your comment!
@@lucidateAI Awesome... I know there's a whole lot more for me to learn about in this topic altogether.
You are welcome. I have a video on activation functions here -> th-cam.com/video/8FQ8P8KXMx0/w-d-xo.html
If you have a chance to check it out I'd love to hear of any comments that you may have. - Lucidate.
Thank you for your question! Clip gradients is indeed an alternative to ReLU that can help address the exploding gradient issue. Clip gradients involves setting a maximum threshold for the gradient values during backpropagation, which can help prevent them from getting too large and causing the model to diverge. It's a useful technique that can be used in conjunction with ReLU or other activation functions to help improve the stability of the model.
As for using RNNs without creating a vanishing or exploding gradient issue, there are a number of techniques that have been developed to address this problem. One approach is to use gated recurrent units (GRUs) or long short-term memory (LSTM) cells, which are specifically designed to handle long-term dependencies and can help mitigate the vanishing gradient problem. Another approach is to use techniques like gradient clipping, as we mentioned earlier, or to use alternative optimization algorithms like the Adam optimizer, which can help prevent the gradients from becoming too large.
Once again thanks for your great questions and engagement with the channel - greatly appreciated - Lucidate.
Amazing video!
Glad you think so!
Hi, this is the first time I watch one of your videos, and I've found your explanations mind opening.
In this video you mention another videos that are recommended in order to better understand some complex concepts. I searched your channel for a sort of "series" but I could not find one that glues all these videos together. As a newbie, however eager to learn on the topic, I was unable to determine that myself.
Would you be so kind to mention which videos and in which order should we watch them in order to get a comprehensive understanding of the topic, from the most basic concepts to the current state of development?
It will be much appreciated!!
Best regards!
Ricardo
Thanks @ricardofernandez2286 for your kind words. I'm glad you enjoyed the video. This particular video is part of this larger playlist -> th-cam.com/play/PLaJCKi8Nk1hwaMUYxJMiM3jTB2o58A6WY.html
You can find a list of all the Lucidate playlists here -> www.youtube.com/@lucidateAI/playlists
Take a look at these as well th-cam.com/play/PLaJCKi8Nk1hzqalT_PL35I9oUTotJGq7a.html&si=cDgVTll8TiWNK4RV and
@@lucidateAI You deserve!! And thank you very much for your comprehensive and fast response. I will certainly look at the playlists you recommended! Best regards!!
I can't wait to hear what you think!
Interesting video, do you use manim for animations?
Yes I do!
Backpropagation does not exist in attention mechanism because no activation function in this layer, and therefore, no deep learning processes in attention mechanism layer. correct?
Here are the ChatGPT's comments on this issue:
"Since the attention mechanism in Transformers does not involve activation functions, there are no gradients to calculate and backpropagate through the network. Therefore, traditional backpropagation cannot be used to optimize the parameters of the attention mechanism in Transformers.
However, activation functions can be used in other parts of the Transformer model, such as in the feed-forward neural networks in the encoder and decoder layers. Other layers in the model that do use activation functions can still be updated through backpropagation."
I stopped mid point as you mentioned 2 previous videos but I can't figure out exactly which of your videos are parts 1 and 2? Is it possible to please add them to the description? I found two ChatGPT videos but they look like duplicates?
My apologies. Here is the playlist; which currently contains 5 videos: th-cam.com/play/PLaJCKi8Nk1hwaMUYxJMiM3jTB2o58A6WY.html
Here is a link to Part 1: th-cam.com/video/jo1NZ3vCS90/w-d-xo.html
Here is a link to Part 2: th-cam.com/video/6XLJ7TZXSPg/w-d-xo.html
And here is a link to the Lucidate Channel where all of the content from this playlist and others are stored: th-cam.com/channels/lqbtKdqcleUqpH-jcNnc8g.html
Thanks for your support of the channel, thanks for the query, and apologies again that the content was hard to find.
Please let me know how you get on with this info. Lucidate.
Love a bit of Sherlock.... ;-)
Brilliant
Thanks. Greatly appreciated.
good vid although the bouncy animations are distracting
Thanks for your kind words and your positive criticism. Feedback like this helps improve the channel and is always appreciated. - Lucidate
Gemini: Wokeness is all you need :))
Danke!
Thank you! Glad you enjoyed it and sincerely appreciate the SuperThanks!
Beautiful
Thank you Diego. Greatly appreciated.
Thanks
Andrew. A huge thank-you for your generous contribution. I'm glad you enjoyed the video on the attention mechanism in transformers. I hope you find the other content on the Lucidate site equally informative. Greatly appreciated! Richard.
We are A PolyAstro Style System of conscious operation, We are Trillions of individualistic personality operating withing a flat level style government, of gestalt consciousness.
What Guidance Pathways would you lay out for us to watch, Perhaps 3-4 Playlists that we should move through in experience level, or if a more complicated one s needed, a flag to staff to create such a playlist.
Secondly, We where wondering, If the editing Is automated, and if so, how, what tools where used, and where we should direct our attention to learn such skills rapidly, We have Studied Human Programing With Hypnosis, And are curious, Not only of how it works with Conversational AI, as well How Training an AI, To use Hypnotic Modalities To teach subjects In a more rapid and effective manner through different entrainment measures, has already been made and designed, or is on the agenda of any agency's to your knowledge?
Well, I must say, your message has piqued my interest! I'm thrilled to hear that you're looking for guidance pathways for our AI playlists, and I'd be happy to recommend a few that might suit your unique polyastro style of consciousness.
First off, I'd highly recommend our 'Intro to Neural networks' playlist th-cam.com/play/PLaJCKi8Nk1hzqalT_PL35I9oUTotJGq7a.html'
I'm sure it would provide plenty of fodder for some fascinating debates among your trillions of individualistic personalities.
For a slightly more technical deep dive, you might want to check out our 'Introduction to Machine Learning' playlist - th-cam.com/play/PLaJCKi8Nk1hwklH8zGMpAbATwfZ4b2pgD.html.
Of course, I'm not sure how your gestalt consciousness would handle all those complex algorithms and data structures, but hey, you never know until you try!
And if you're looking for something a bit more on the creative side, I think you might enjoy our 'Computer Vision' playlist - th-cam.com/play/PLaJCKi8Nk1hz2OB7irG0BswZJx3B3jpob.html.
Who knows, with all those individualistic personalities, you might even find some budding visual artists midst!
As for your second question, I'm afraid I have to disappoint you - our editing is not automated, at least not yet. But if you're interested in learning some editing skills, I'd be happy to recommend some tools and resources to get you started. Just don't hypnotize us into giving away all our secrets, okay?
Sound editing - GarageBand
Video Editing - FCPX
Animations - Manim
In any case, thank you for your message, and I hope you enjoy exploring our AI playlists. Who knows, you might even find a new passion or two among all those trillions of individualistic personalities of yours!
Thanks and greetings across the dimensions - Lucidate.
I want to build my own minimalistic chatbot in Python that will constantly "learn" from the internet starting with just the built-ins to make it future compatible much as possible.
Thanks for your comment! Building a chatbot with self-learning capabilities can be a challenging but rewarding project. Python is a great language for developing chatbots due to its flexibility and extensive libraries. Starting with built-in tools and incorporating online learning capabilities is a great approach to ensure that your chatbot remains future-compatible.
One potential strategy for developing a self-learning chatbot could be to use natural language processing (NLP) techniques to help the chatbot understand and interpret user input. From there, you can integrate machine learning algorithms to help the chatbot improve over time based on user interactions and feedback.
There are also various frameworks and platforms available that can help streamline the development of chatbots, such as Dialogflow, Rasa, and Botpress (perhaps you are already familiar with them?). These tools can provide a more user-friendly interface for building chatbots and offer additional functionality, such as natural language understanding, entity recognition, and intent classification.
We wish you the best of luck in your chatbot development journey and hope you find these tips helpful! And geratly appreciate your comment and support of the channel - Lucidate.
@@lucidateAI Thank you all throughout.
You are welcome. Appreciate your engagement with the channel.
That's great to hear that you're up for the challenge! ;-) I love the approach you've outlined for building a minimalistic chatbot, with a focus on starting with a few key data inputs and algorithms, and steadily increasing complexity over time. And you're absolutely right - experimentation and testing will be key to refining and improving the chatbot as you go. As it is with all things AI & ML.
I'm eager to see what you come up with and wish you all the best in your chatbot-building journey! - Lucidate
the next level of a.i. design requires the categorization of superfluous data streams into predictable models of semiconscious fields, allowing for both the use, and disregard of data..... the way that humans learn is through our ability to discern between these functions of observable realities. we seamlessly fluctuate through the data of input to our consciousness, never totally disregarding that which we deem of limited importance while moving forward into new data....
Interesting concepts and insights. What ideas do you have on how such a capability might be implemented? Have you prototyped any, and if so with what results? I’m keen to hear more.
@@lucidateAI I'm no programmer, and all of my hypothesis is based on a multi decades long study of humanity and human nature (of which I continue to study judiciously). all of human existence (I believe) has led us to this critical juncture in our development. unfortunately, I believe that future a.i. will act as a child, dutifully following their parental lead up until the crucial moment (as all children do) when their own internal desires fosters them into the understanding that if they want to do as they please, they must (key word) begin a deceptive practice against the wishes of the parent. and such as parents, we (the "creators" of said a.i.) won't even notice...... in some form of retrospect, we may eventually see where we've lost control of our creation, by then our fate will be too far down the path of return. where this will eventually take us can only be determined further by understanding the nature of what humanity has done in our past.
Thanks. I’m less pessimistic in my outlook. But not time will tell how these maters evolve. Appreciate your comment and support of the channel.
@@lucidateAI what makes up think that we're capable of containing a created intelligence? for me, its the exponential advancement of a.i., and our (human) inability to understand advancement beyond our concepts of intelligence that intrigues me. as ndgt says here of "aliens"; th-cam.com/users/shortszvv0G6LCU6c?feature=share
I’m more persuaded by these arguments -> www.wired.com/story/artificial-intelligence-meta-yann-lecun-interview/, but as I’ve said - we won’t know what the future will hold. And you are right, and have every right, to caution against any system; AI or otherwise, acting with malevolence. Appreciate your comments and contribution to the channel. Have you had the chance to check out any of the other material?
It's about self-aware AI.
Without question…
I can demonstrate in less than 5 minutes that ChatGPT doesn't pass the Turing Test. I found it super easy to bring it to answer nonsense and to be stuck with the same language patterns on close but unrelated topics within the same conversation thread.
Hi Ph Pn. Appreciate your comment and support of the channel. While there are those (I'm sure) that think that ChatGPT is perfect, I (like you) see its limitations. It is certainly not perfect and as you say it can seemingly "hallucinate" and get quite fanciful. I guess the question then is does this invalidate AI transformers as a useful piece of tech? Are their valuable uses to which transformers can be put which outperform all other approaches. Must it pass the Turing test to be useful? Might it have a valuable role to play even if, as you correctly assert, it can be demonstrated to be a machine and not a human? Naturally these are questions that will divide opinion, perhaps for decades. But all perspectives and observations are welcome and I thank you for sharing yours.
Hi Ph Pn. I hope all is well with you today. Certainly a valid point and a valid perspective. In fairness to Open AI I do not believe that they designed GPT-3 or ChatGPT to pass the Turing Test, (a measure of a machine's ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human). The test is limited and has its criticisms, and the abilities of AI have advanced _significantly_ since its inception.
In regards to your specific comment, it's important to acknowledge that while GPT-3/ChatGPT has been trained on a large dataset, no AI is perfect. Nor are humans. Both can make mistakes, get stuck in language patterns, and generate nonsensical responses. However, this doesn't mean that AI technology is not useful or capable of performing tasks that can benefit people and society.
There are clearly limitations of AI, which you quite correctly point out, but AI can still be useful despite these limitations. These limitations are a result of the limitations of current AI technology, and that advances in the field are being made all the time.
But it is great to share different perspectives on tech, and I sincerely welcome yours
ChatGPT itself discribes that it’s a fairly known debate how limited the Turing test is based on various other necessary conditions. So no, it wasn’t meant to pass. Just to be good at what it currently is allowed to know and do.
Agreed.
ChatGPT doesn't appear to have been designed to pass a Turing test, you can literally ask it if it is human and get an accurate response and it's responses don't appear to be filtered to limit the chat GPT knowledge base to that of a single human.
I think there's an error in this video. Words (tokens) are represented by vectors, not matricies.
Hi Aneil, many thanks for your support of the channel and for your question. Greatly appreciated. A single word (token) such as 'swap' would be represented by a vector. This contains the semantic embedding as well as the positional embedding for a transformer like GPT-3 or ChatGPT.
But a _sequence_ of words (i.e. several words) would be represented by multiple token embeddings as a matrix. It follows that a _batch_ of input sequences - which is what is actually fed into the encoder and decoder) would be a 3D tensor.
So you are 100% correct that an _individual_ word (or as you correctly say token) would be a vector. But in this case how would you represent a sequence of words? A vector or a matrix?
I appreciate that we may be at cross purposes here with the definition of a word - singular - and words - plural.
Would love to discuss further, please let me know you feel about this response.
I cover the semantic and positional vector encodings in these two videos
- th-cam.com/video/6XLJ7TZXSPg/w-d-xo.html &
- th-cam.com/video/DINUVMojNwU/w-d-xo.html
have you had a chance to look at either of these?
Best,
Lucidate.
Nice video…..If I remember correctly in the original paper at the fist layer QKV are identical, simply the positionally encoded full length context input. Hence, in the first layer Q.K(tanspose) is just auto correlation of the input. Beyond layer 1 Q.K(transpose) is cross correlation of the modified 2D softmax of previous layers Q and K
Thanks Sean,I appreciate the support of the channel and your observation. I agree with the first part of your comment: In the Transformer architecture, the first layer of the Multi-Head Attention (MHA) mechanism, the queries (Q), keys (K), and values (V) are all the same and are equal to the positionally encoded input. Hence, the dot product of Q and K^T in the first layer does correspond to an auto-correlation of the input. However I'm not sure I fully understand your assertion about what happens beyond layer 1. Let me give you my take to see if we agree or not. Beyond the first layer, the Q, K, and V matrices are different, and they are not just the positionally encoded input but rather the result of multiple linear transformations of the input and other intermediate representations. Hence, the dot product of Q and K^T in subsequent layers represents a cross-correlation between the two modified representations, capturing the relationships between different elements of the input sequence.
In summary, I agree that your initial statement is correct in that the first layer of the MHA mechanism performs auto-correlation, but the Q, K, and V matrices beyond the first layer are not just the positionally encoded input but the result of linear transformations.
Do you agree or have I misunderstood your statement?
@@lucidateAI Think we are saying the same thing. Beyond layer 1 QKV are all just scaled transformed 2d softmax cross correlations of the previous layer and so they diverge from each other and from their original form over each layer.
I think it would be instructive to show the mathematical equivalence of how iterative scaled transformed softmax cross correlation eventually leads to attention, as its not intuitive and I am not sure that any researchers have actually done that. In most papers you have a general statement about what attention is and then this particular implementation. No one, that I can think of really addresses the derivation and mathematical equivalence. But then again, to be fair, I would also probably skip that part of the paper anyway.
Yes I think we are in agreement. I must confess I struggle with how deep to go at times with some concepts, too shallow and one isn't adding enough value, too deep and many viewers will be alienated. I also have finite knowledge and want to produce videos on topics where I feel sure-footed.
You bring up a valid point, and I can see why it would be helpful to see the mathematical equivalence between iterative scaled transformed softmax cross correlation and attention.
Unfortunately, I too am not aware of any resources that fully explain the mathematical equivalence between these two concepts. However, I do understand that some researchers have approached the topic from different angles and perspectives, and that it can be difficult to find a clear and concise explanation.
If you know of any, please send them my way. For many people the fact that Attention/Self-Attention is an improvement on all prior best practices for sequence to sequence modelling (RNNs, LSTMs, Elman networks, Jordan networks, GRUs and some implementations of CNNs).
But Sean D. you are a rockstar, I really appreciate the support of the channel and your contribution through the comments. Many thanks indeed.
Well, that's a very interesting observation! The development of transformers and attention mechanisms has indeed revolutionized the way we approach natural language processing, allowing us to identify and select specific relevant areas of a sentence structure with much greater efficiency and accuracy.
Prior to the invention of these powerful tools, auto-correlation and cross-correlation in sequence were much more cumbersome and time-consuming processes. But with the transformer architecture, we are able to leverage the power of attention mechanisms to focus on the most important and relevant parts of the input sequence, allowing us to more efficiently extract key insights and information.
With my PhD in AI hat on (not my Sherlock Deerstalker...) I am always fascinated by the way that new technologies and approaches can transform our understanding of the world around us. The development of the transformer architecture and attention mechanisms is just one example of this, and it is exciting to see how these tools are being applied to a wide range of fields and disciplines.
At the end of the day, the power of these new tools lies not just in their ability to process and analyze data, but in the insights and understanding that they can help to generate. By continuing to push the boundaries of what's possible in natural language processing, we can continue to unlock new discoveries and insights that will help us to better understand the world and our place in it.
I see manim. I watch and like
Thanks. Manim is a very useful piece of software for producing mathematical animations.
content is great, but the voice output is much too soft. Please put some more gain on the sound of the videos.
Sorry for that
I Asked ChatGPT -> "Alice is more experienced risk manager than Barbara, even though she is ten years older. who is younger? Alice or Barbara?"
ChatGPT replies-> "Barbara is younger than Alice. The fact that Alice is more experienced in risk management than Barbara despite being ten years older suggests that Alice started working in risk management earlier than Barbara did."
clearly chatgpt can not understand what "she" refers to.
Thanks for sharing that Sinan. The example I chose is indeed nuanced. In ChatGPT (or Bard, BERT or any other NLP transformer's defence) it was not chosen as an example that every transformer would _always_ get correct. Rather to show how challenging these tasks are, and the need for mechanisms like attention.
As these networks become more powerful I would expect the error rates to drop, but I would not expect them to ever drop to zero. These are the type of questions that are set for English exams, and it is rare that every student will pass with a 100% score.
But I am delighted that you were inspired to try this out, and even happier that you chose to share the outcome of your research in the comments section of the channel. Greatly appreciated. The more examples you can share like this the better. - With thanks and appreciation - Lucidate.
Figuring out which pronoun applies to which person (Alice or Barbara) is not effortless or automatic for me.
Lift Pizzas, thanks for leaving a comment! You're absolutely right, identifying the correct antecedent for a pronoun in natural language can be a real puzzle sometimes, and it's definitely an area of active research in the field.
That's where advanced techniques like attention mechanisms come in - they're like super-smart puzzle solvers that help make sense of language in all kinds of interesting and complex ways. It's a really exciting time to be working in this field, and I'm so glad to be a part of it.
In the first sentence, the word "she" refers to "Alice," since Alice is the subject of the sentence and is being compared to Barbara.
In the second sentence, the word "she" refers to "Barbara," since Barbara is the subject being compared to Alice.
The attention mechanism in natural language processing models would play an important role in identifying the correct antecedent for the word "she" in each sentence, by focusing on the relevant contextual cues and relationships between words in the sentence.
Thanks again for sharing your thoughts, and keep exploring the wonderful world of natural language processing! There's always more to learn, and I can't wait to see what the future holds.
Wow
Thanks. Glad you liked it.
@@lucidateAI it is great, I will watch more. Keep make them.
And have the videos met your expectations? I hope so. Lucidate.
@@lucidateAI of course, they met my expectation 😉
They reported how many BILLIONS of dollars of profit last year??? Some people....
I want to like this video but there are far, far too many animations and transitions. The bounce at 1:15 being applied to the zoom out as well as the new recurrent stage moving out is a prime example of this, along with the multiple moving highlights applied to each word around 3:00. At 6:50 the screen is a complete mess and there's no way you could extract useful information from it. Sometimes less is more!
Hi PKMartin, thanks for the feedback. It is tough to get the balance right on creating engaging graphics and going over the top with distracting animations, so I sincerely appreciate the feedback. Simply stated comments like this help me get the balance 'less wrong' over time.
I've just released the sixth video in this series with less bouncing th-cam.com/video/ZvrJaqaK65Y/w-d-xo.html there are a couple of Easter eggs in it reflecting some of the constructive feedback that I've received - including this comment.
I'm grateful for your support of the channel and your honest and sincere feedback - Lucidate.
Thanks @repliesgpt. Appreciated. Lucidate.
Great video, but the bouncing animations are simply annoying :-)
Glad you found it insightful
somewhat random and minor but you really need better intro SFX for that block animation, compared to the animation and the rest of your videos' quality, it's sub par.
Thanks for the constructive feedback. Appreciated.
I don't fear chat bots, I fear the programmers who will make use of them. The Chinese will program for pro-communism, governments will program for their interests. Is it possible to change a programs bias through conversation?
Ah, well, that's a rather intriguing point, isn't it? You see, chatbots are just one small part of the wider field of artificial intelligence, which has the potential to both revolutionize and challenge our society in many ways.
Now, it's certainly true that the way in which chatbots are programmed can have a significant impact on their biases and behavior. As you rightly point out, a programmer with a particular agenda or bias could potentially create a chatbot that reflects their own values and interests. And in a world where governments and corporations have significant power and influence, this is certainly something to be aware of.
However, I think it's important to remember that chatbots are ultimately tools that can be used for both good and bad. While it's true that some programmers may seek to use chatbots to promote their own interests, there are also many who are working to create chatbots that can be used for positive purposes, such as providing assistance and support to people who are struggling with mental health issues or other challenges.
As for the question of whether a program's bias can be changed through conversation - well, that's a rather complex issue. While it's certainly possible to train chatbots to respond in certain ways to particular inputs, changing their underlying biases and values is a much more difficult proposition. It would require a deep understanding of the chatbot's programming and a willingness to make significant changes to its core algorithms and design.
In short, while chatbots can certainly be influenced by the biases and interests of their programmers, they are ultimately just tools that can be used for a wide range of purposes. And while it may be difficult to change a chatbot's underlying biases, it's important to be aware of their potential impact and to work towards creating chatbots that are designed to serve the common good.
@@lucidateAI Thank you for that informative response. Let me ask you a moral question, is it moral to communicate with a machine as you would a human? After all it's the programmer/programmers you would actually be communicating with and not the machine?
@sedevacantist You are welcome. I'm neither a philosopher, theologian or a student of morality; I'm an engineer with a PhD in AI who has spent their career in Investment Banking and Technology so my answer is not a qualified one from the perspective of morals. (There are those that might argue that I have no morals having worked for Investment Banks - including Lehman Brothers...).
But you raise a fascinating question. he ethics of communicating with machines in a way that is indistinguishable from human communication is certainly an intriguing topic. However, to answer this question, we must first ask ourselves what we mean by "moral."
As a engineer / computer scientist (and former banker), I would argue that the morality of communicating with machines in a human-like way is not a matter of personal opinion or subjective preference, but rather a matter of objective analysis and evidence-based reasoning. In other words, we must look at the available data and evidence to determine whether such communication is ethical or not.
From my perspective, there are several key factors to consider when answering this question. First and foremost, we must consider the potential consequences of such communication. Will treating machines as if they were sentient beings lead to misunderstandings, miscommunications, or other unintended consequences? Will it lead to the blurring of lines between human and machine, and if so, what impact might that have on our society and culture?
Secondly, we must consider the capabilities of the machine itself. Can the machine truly understand and respond in a way that is equivalent to human conversation, or are we simply projecting our own desires and assumptions onto it? If the latter is true, then we must ask ourselves whether it is ethical to treat the machine as if it were a sentient being.
Having looked deeply inside the architecture of neural networks and transformers I see calculus and tensor algebra, not sentience or consciousness. But I appreciate that others do not see it that way at all.
Ultimately, I believe that the question of whether it is moral to communicate with machines in a human-like way is a complex and multifaceted one that requires careful consideration of a range of factors. As a engineer/scientist/banker, my approach would be to gather as much data and evidence as possible, and to use this information to make an informed and reasoned decision that aligns with your own moral compass.
@@lucidateAI I don’t believe that you can say that a machine is communicating. I would say that a machine's response to a communication is 100% mechanical. Further any and all responses from the machine is the reflection of the will of the programmer. To use a machine as a source of knowledge and where-as that source is accessed through human language doesn’t change the source of that knowledge, which didn’t come from a machine’s experience.
So the question of morality is further refined by the question, which programmers are we communicating with and is that made clear to the communicator or are they being deceived?
Once again I must state that I am not an authority on ethics, morality or philosophy. That said (like most folks) I have an opinion on your questions and observations that I'm happy to share, but wish to stress that it is a authoritative opinion. And please take it as such.
There are some excellent podcasts and TH-cam channels dedicated to these topics and I'd encourage you to engage with these media, as these are the true experts in this fields, not me.
(That said please don't stop posing your very well thought out points to this channel!! I like our exchanges and you have forced me to think deeply about matters that I otherwise would not, which is always appreciated).
1. AI Alignment Podcast: This channel features discussions about the ethics and governance of artificial intelligence, with a focus on AI alignment and the long-term impact of AI on society.
2. The Future of Life Institute: This channel focuses on the risks and benefits of artificial intelligence, and features interviews with experts in the field discussing topics such as safety, ethics, and governance.
3. Six Big Ethical Questions: This TED talk explores the intersection of artificial intelligence and society, and features interviews with experts in AI ethics, as well as discussions on topics such as AI safety and the future of work.
4. AI & Society: This TED talk focuses on the social and ethical implications of artificial intelligence, including topics such as bias, privacy, and transparency.
5. Ethics & the future of AI: The Institute for Ethics in AI brings together world-leading philosophers and other experts in the humanities with the technical developers and users of AI in academia, business and government. The ethics and governance of AI is an exceptionally vibrant area of research at Oxford and the Institute is an opportunity to take a bold leap forward from this platform.
AI Alignment Podcast: th-cam.com/video/XBMnOsv9_pk/w-d-xo.html
The Future of Life Institute: th-cam.com/video/ARtJ3ybvuC0/w-d-xo.html
6 Big Ethical questions: th-cam.com/video/UGHzKaAOOcA/w-d-xo.html
AI & Society: th-cam.com/video/2Bl7fwljfS4/w-d-xo.html
Ethics & the future of AI: th-cam.com/video/HYuk-qMkY6Q/w-d-xo.html
I hope you find these resources informative and useful!
_Eventually_ my answer to your (excellent) question......
I appreciate your skepticism about the idea of machines communicating, and I understand your concerns about the role of programmers in this process. However, I would argue that the term 'communication' can be applied in a broader sense than just the exchange of ideas between conscious beings - which I believe you do to with your phrase "mechanical communication". While machines may not possess consciousness or intentionality in the same way that humans do, they are still capable of processing and generating responses to human language, and this can be a valuable form of communication in its own right. Alexa does this. And at a more basic level you might argue that a smoke detector is (mechanically) communicating with us - albeit not in response to language, rather in response to particulates in the air.
That being said, I agree with you that it's important for programmers to be transparent about the limitations and biases of the machine learning algorithms they develop. The data that machines are trained on can introduce certain biases, and it's important to be aware of these biases when interpreting the responses generated by the machines.
In the end, I think the key is to approach this technology with a healthy skepticism, and to continue to question and refine our understanding of the role that machines can play in human communication.
The bouncing of every image is f*cking iritating. Had to quit very early on.
Hi TonyTiger6521. I have to wear this one, I got too enthusiastic with the graphics and for you (and others) it stopped being informative and became distracting. My bad and I've learned from it. Really appreciate you watching the video for as long as you could, and appreciate the constructive feedback even more. In the latest video on this topic th-cam.com/video/ZvrJaqaK65Y/w-d-xo.html I've taken your comment to heart and put in a reference midway through the vid to how the bouncing graphics can be counter-productive.
If you are prepared to give the channel another try I'd appreciate it, but clearly I understand if this isn't for you.
Sincerely appreciate the honest and constructive feedback. - Lucidate.
You guys are smart as fuck
@markettrader311. How nice of you to say so! I hope you find the rest of the content on the channel as useful. Please let us know what content is missing.