I can see that there's a lot of effort put into this video. Siraj explained RNNs in such a simple way. I wish I could like this video a thousand times.
My neural net's outperformin' my wet-ware But if I train long enough I'm goin' to get there Till then, my RNN is holding the mic... and to the best of its abilities it's spitting probabilities It's lyrics gettin' tighter while I sleep, and like it's learning, it's rhymes are deep Deep, and they're only getting deeper Don't worry 'bout the fan noise cus I'm a deep sleeper : )
I liked the live coding in this tutorial. I think one think you can consider to save time, is have all the comments pre-written and just add in the code in between during the tutorial. That way you can just refer to your comments to explain the structure of the code you will be writing.
This is one of your best videos. Please consider completing it with another video using LSTM. Thank you. Also will be very interesting to consider a model with two recurrent hidden layers. Thank you again.
If I am not mistaken @38.50 mark, gradient clipping is used to avoid exploding gradient not vanishing gradient. To deal with vanishing gradient, we can use GRU and more commonly, LSTM as Siraj mentioned.
So, you said you didn't care that much about using DL on financial data. Then you said you where going to talk more about it because WE cared about that. You put US first! You are awesome, dude.
lol, I'm reading metamorphosis right now and before you even said you were going to use metamorphosis, I was thinking about how Franz Kafka would be a good style to mimic with this, and BAM, seconds later that's exactly who you are using
Loved the video! Two remaks tough. I had to rewatch some parts once you go over the copy pasted code as It can get hard to see what part your talking about and it can get distracting once you start reading a wrong part. To still be able to speed things op I'd suggest to make the code appear line by line or block by block like in a presentation as this puts more focus that this part does this explanation. Secondly having a prebaked pie ready in the oven to show the end result is always cool to see. We get a gimps of where it is going at the end of the video but it would be fun to see it in a more completed state Anyway really enjoy the way you explain it :D great job!
The way you never code important parts makes things much harder, there is no step-by-step explanation. There is no difference between reading through that python notebook and watching your videos. The only use i see to those videos is to discover a new technology, so i can understand somewhere else...
Just tried a version of this using a very slightly deeper network and taking the hidden representation out at a lower dimension than the input (in the hope of resource saving). Instead of a softmax output, I'm using a standard one (real valued numbers). It's a variation of an autoencoder with feedback (the hidden layer is the bottleneck, and where the feedback comes from, which is added as a separate partition of the input). I used a sound spectrograph image for training; each 'letter' is a line of the spectrograph... It's low-fi (due to computation limits) but it generates a line of a spectrograph as an output on each pass to build a new 'semi-random' one. The results are quite amusing... very much like a 'poor mans' version of wavenet.
i LOVE your tutorials and these two sentences in the beginning of all your videos make me love you more hhh "Hello world it's Siraj" ! hhh you are awesome man
i 'd like to know more about you .. How did you begin in this career and how long does it takes from you to reach this level ! .. i'm curious about your time management and for how many hours did you read and study .. how we can get motivated all the time i guess this is a good video idea !
@@SimonWoodburyForget You do need a collage degree to get a job. A company would hire a person who isnt skilled but has a collage degree rather than someone who is skilled but dosent have a collage degree; and if you want to get a reputation similar to that of a collage, your only chances are winning the nobel. tldr; go to collage and get a degree and get a job. or else its going to be wayyyy harder to get a job no matter your skill.
@@SimonWoodburyForget First off i totally agree with you, but the world we live in isnt a perfect utopia, most(and i mean most not all) jobs require a degree, people with a degree are more favored than ones that actually do hardwork and learn from scratch, in fact these self thought people are way more skilled and more capable. But sadly this isnt the world we live in. Most employers look for a degree. This does imply to a majority jobs outside the IT industry, but having a degree actually gives you a higher chance of getting a job. And not all collage students are dumb, there are some people but its just a minority most people are skilled and actually very smart.
I dont he has one on how to train on Cloud, but more so on how to deploy a Machine Learning project with pertained model. Please correct me if I am wrong.
Hello Siraj, can you teach about NeuroEvolution especially NEAT (Neural Evolution of Augmenting Topologies). Looking forward on how you can implement it python :)
17:00 I think you meant 'read-only' Great video! Definitely learning as much as I can, because I cannot wait until I get admitted into a Master's program :D
Hi Siraj, Could you please give the reference text or source from where you are getting the forumlas and differnet diffrentiations? I am getting different answer for dLi/df_k than your answer (p_k - 1). Also, you only covered chain rule here but there is definately some advanced rules used here (product rule). Also, I am not sure how one would do derivation of summation of e^j where one of the j = k.
Great video! In most use cases for RNN, all training sequences are picked from a population of same 'type' like "Shakespeare writings". A step more advanced - What if you have generated a lot of _random_ sequences and calculated a quality value for each sequence (for stuff like game theory or physical modelling). Can that quality value be integrated in the cost function, thus taking into account the value of each sequence? Could you sketch out how to do this or give reference to relevant links? Thanks!
I managed to find some keywords that could guide me in the right direction: "supervised learning" and "Q-learning". One example was this one: bit.ly/2tvMdZQ
Are there any systems for nn music generation, that you know of, that attempts to create phrases with a consistent feeling throughout the phrase, then switch to a new phrase, then end the song eventually? If not then I have some work to do.
So right now there is only one hidden layer which spits out value at t-1 which is used along with input to generate values at time stamp t. What happens if there are multiple hidden layers? for eg if the architecture is as follows i/o ----> h1--->h2 --->o/p How would the connections between the hidden layers be in a rnn of this type ?
I wonder how well this would perform after some good training, compared with a simple Markov chain algorithm, in terms of generating words that make sense.
Hi Siraj! Thanks for the great material.. I am wondering if it is possible to use a Recurrent Neural Network to make a classifier? I would like to classify the events of a device based on some sensors like accelerometers, and other signals. I guess it should be similar to classifying the physical activity like running or walking. However, in my case the events are not periodic. I have everything to collect the labeled samples, but any idea about how large should be the dataset for the training part? Any idea would be much appreciated..
Why is it that some rnn models I see online show the output from the previous timestep going into the hidden layer, however in this video you say to use the hidden layer from the previous timestep should added to the hidden later?
Hi, I'm struggling to apply a NARX RNN model for a dissertation project. Could you recommend me some useful literature or R lib for doing so? I'd really appreciate it!
Hi Raj, Great video. I have a question about neural network. What is the difference between neural network, convolutional neural network and recurrent neural network?
Why do we need to use two different activation functions(sigmoid & tanh) in input gate in LSTM? and why do we need to use tanh in output gate in LSTM?...
Can you explain why you have to format the input vector into a dictionary then to binary vector? You have for example: a:55, r: 47 c:22 which you map to a binary vector (80x1) -> a = 0, 0, 0 ... 0, 1, 0, 0... Could you not just have that dictionary of 80 characters and scale the integer representation to a float of 0->1, such that for example a:0.6875 c:0.5875 c:0.275. Then instead of an input vector of (80x1) your input is just a float value (1x1) representing a unique character. I know this probably wouldn't work, but I don't understand why. The reason I ask is because I'm trying to port your code to a time series waveform and I just have input data in float form from 0->1 and I don't know if I need to label each float point to a binary vector to represent each unique float value in the sequence. That doesn't seem like it would make sense.. please help :)
I may have found part of the answer to my question: label encoding vs 1-hot encoding. hackernoon.com/what-is-one-hot-encoding-why-and-when-do-you-have-to-use-it-e3c6186d008f
Hey Siraj, could you do a video about the "hashing trick" I keep hearing about? It is said it speeds things up significantly, but I am still a bit confused about it.
Hey SIraj, I am a huge fan of your videos, they have helped me a lot. Do you know of any material on applying machine learning models to Intrusion Detection Systems (IDS) ?
Siraj is an example of what you will never find in a school because he gets to the point and quickly. Most CS subjects can be learned in weeks. The Nand to Tetris course is a great course that demonstrates how much time students waste. CS is easy compared to any math major. NNs just use the chain rule of calculus and PGMs just use the chain rule of probability. Go figure. It's elementary math. SVD is numerically more stable than PCA but autoencoders just outdate the whole math department. A little crunching generalizes better than any 17th century math obsession. However, CS departments are short on graphics and engineering when it comes to numerical methods like FEM. Needs to cover way more and much quicker. I still think people should stick to a math degree even if you want to do CS. Too superficial.
One more thing is a gaming laptop with nvidia graphic card is suitable for this but, the prices for these laptop are high due to people using them for bitcoin mining.
Thank you for this series! This is awesome! When running the model for 500000+ iterations on the Kafka text it doesn't seem to get lower than a 40% loss. What would you suggest to optimize this particular model most efficiently? Greetings from the Netherlands
Because gradient vanishing problem increases in sigmoid functions more as compared to tanh .And in case of RNN we use previous hidden layer along with inputs so more prone to gradient vanishing that's why we avoid Sigmoid.
I guess for chars you get around 100 such things in English language, like in this case 81 output neurons. For words, maybe at least 3000 to make something that makes any sense?
is there any usefulness to study this rnn? i mean i'm majoring in electrical engineering but i really love to learn all kind neural network? is that a waste of time?
ShawarmaLifeLiving nah. i love learn heavy stuff, quantum mechanic, quantum information theory, string theory, nuclear fussion-fission, em field, etc. so can this be apply to my field?
I think there is a mistake with the code: ps[t]=np.exp(ys[t])/np.sum(np.exp(ys[t])). The divisor should be a sum of all t's, in this case np.exp(ys[t])=np.sum(np.exp(ys[t])) giving the probability = 1.
no, the "np.exp(ys[t])" apply exponent for each items in that array. NumPy support that kind of operations. so it also returns an array that each of them is divided by the summation.
Yeah, this neural net is deep... ... its shit gets fitter while I sleep My computer fan's too loud though... ... guess I'm uppin' this shit to the cloud, yo.
#notificationsquad where art thou ?
ayyyeeeee
nooo, i'm too late
yooo
we are everywhere and we are nowhere
I freaking love your energy man, it's like you just realized you're conscious and you are determined to figure out how you're able to think.
17:03 : 'r' argument in open is not for "recursive", but "read" mode.
I can see that there's a lot of effort put into this video. Siraj explained RNNs in such a simple way. I wish I could like this video a thousand times.
Thank you so much! You cut right through the mystifying tech jargon for me - it's really hard to teach yourself this stuff without TH-camrs like you!
input times weight, add a bias, activate
*beat drops*
then backpropagate
My neural net's outperformin' my wet-ware
But if I train long enough I'm goin' to get there
Till then, my RNN is holding the mic... and to the best of its abilities it's spitting probabilities
It's lyrics gettin' tighter while I sleep, and like it's learning, it's rhymes are deep
Deep, and they're only getting deeper
Don't worry 'bout the fan noise cus I'm a deep sleeper : )
Very much appreciate that the full program is coded in the video!
I liked the live coding in this tutorial. I think one think you can consider to save time, is have all the comments pre-written and just add in the code in between during the tutorial. That way you can just refer to your comments to explain the structure of the code you will be writing.
I don't mind if he types looking at a reference, but this pasting a whole function all of sudden is less good, in my opinion.
thanks for the feedback Hammad
This is the best rnn explanation out of any other video
over a year old and this still applies, just shows you've been smarter to keep with up with life
This is one of your best videos. Please consider completing it with another video using LSTM. Thank you. Also will be very interesting to consider a model with two recurrent hidden layers. Thank you again.
very nice lesson thanks alot.. helped me to understand recurrent neural networks to make my conclusion work in computer enegineering degree
If I am not mistaken @38.50 mark, gradient clipping is used to avoid exploding gradient not vanishing gradient. To deal with vanishing gradient, we can use GRU and more commonly, LSTM as Siraj mentioned.
So, you said you didn't care that much about using DL on financial data. Then you said you where going to talk more about it because WE cared about that. You put US first! You are awesome, dude.
Thanks Siraj. Learnt a lot from this video . Got a new better way to look at RNNs
lol, I'm reading metamorphosis right now and before you even said you were going to use metamorphosis, I was thinking about how Franz Kafka would be a good style to mimic with this, and BAM, seconds later that's exactly who you are using
so awesome
It was great, thanks a lot. It comes from your soul and all your cells. I could feel it.
You really are doing amazing work Siraj. Love you.
thanks Kaustubh love u
love u both
i never thought that i can be smart and cool before thanks a lot siraj
Loved the video! Two remaks tough. I had to rewatch some parts once you go over the copy pasted code as It can get hard to see what part your talking about and it can get distracting once you start reading a wrong part. To still be able to speed things op I'd suggest to make the code appear line by line or block by block like in a presentation as this puts more focus that this part does this explanation.
Secondly having a prebaked pie ready in the oven to show the end result is always cool to see. We get a gimps of where it is going at the end of the video but it would be fun to see it in a more completed state
Anyway really enjoy the way you explain it :D great job!
The way you never code important parts makes things much harder, there is no step-by-step explanation. There is no difference between reading through that python notebook and watching your videos. The only use i see to those videos is to discover a new technology, so i can understand somewhere else...
Same here... :p
+
this
I was writing a paper on music generation using lstms and I found this.
NICE!
perfect
I spent 10 second logging in youtube to just click the like button of this video.
very intuitive explanation, thanks and good job!
Just tried a version of this using a very slightly deeper network and taking the hidden representation out at a lower dimension than the input (in the hope of resource saving). Instead of a softmax output, I'm using a standard one (real valued numbers). It's a variation of an autoencoder with feedback (the hidden layer is the bottleneck, and where the feedback comes from, which is added as a separate partition of the input). I used a sound spectrograph image for training; each 'letter' is a line of the spectrograph... It's low-fi (due to computation limits) but it generates a line of a spectrograph as an output on each pass to build a new 'semi-random' one. The results are quite amusing... very much like a 'poor mans' version of wavenet.
Also yes, your node mechanics mnemonic is catchy.
You are awesome. It's not easy to get this topic through.
You are a professor by nature... cool video... awesome... keep going...
This is very clear explanation.. recommended for the intermediate level learning. This is really help a lots
love the video! easy to follow if you understand basic NN already
i LOVE your tutorials and these two sentences in the beginning of all your videos make me love you more hhh "Hello world it's Siraj" ! hhh you are awesome man
i 'd like to know more about you .. How did you begin in this career and how long does it takes from you to reach this level ! .. i'm curious about your time management and for how many hours did you read and study .. how we can get motivated all the time i guess this is a good video idea !
Thanks a lot Siraj.....it is so helpful....
thank you for Recurrent Neural Networks video
This thing was interesting as hell not gonna lie
awesome
Great job Siraj!
love the crasyness :)
rap with me:
(input * weight + bias ) Activate
Why go to college when you can listen to Siraj?
To be tortured
@@SimonWoodburyForget You do need a collage degree to get a job. A company would hire a person who isnt skilled but has a collage degree rather than someone who is skilled but dosent have a collage degree; and if you want to get a reputation similar to that of a collage, your only chances are winning the nobel. tldr; go to collage and get a degree and get a job. or else its going to be wayyyy harder to get a job no matter your skill.
@@SimonWoodburyForget First off i totally agree with you, but the world we live in isnt a perfect utopia, most(and i mean most not all) jobs require a degree, people with a degree are more favored than ones that actually do hardwork and learn from scratch, in fact these self thought people are way more skilled and more capable. But sadly this isnt the world we live in. Most employers look for a degree. This does imply to a majority jobs outside the IT industry, but having a degree actually gives you a higher chance of getting a job. And not all collage students are dumb, there are some people but its just a minority most people are skilled and actually very smart.
Thanks Siraj , you help me alot
Great job Sir!
Plz make a video on how to train models in cloud!
Vibhas Singh he has a video on that
I dont he has one on how to train on Cloud, but more so on how to deploy a Machine Learning project with pertained model. Please correct me if I am wrong.
hmmmmmmmmm good idea
you forgot to add... because my laptop takes longer!
actually i forgot i do have one th-cam.com/video/Bgwujw-yom8/w-d-xo.html
Hi Siraj, Can you please make a detailed coding video about different gradient descent optimizers ?
like how to code momentum, or Adam etc.. Please..
I want that too
Hello Siraj, can you teach about NeuroEvolution especially NEAT (Neural Evolution of Augmenting Topologies). Looking forward on how you can implement it python :)
coming soon
really nice work sir! you help a lot of people, really intuitive videos with a bit of crazy, thank you so much for the hard work!
Oh yea, I was thinking about it too for some time. I imagine ES-Hyper NEAT + recurrent neural networks with memory cells as ultimate A.I. tool.
17:00
I think you meant 'read-only'
Great video! Definitely learning as much as I can, because I cannot wait until I get admitted into a Master's program :D
this is amazing, im getting so excited
Woah! The audio is A-GAME!
thanks Jake
Hi Siraj,
Could you please give the reference text or source from where you are getting the forumlas and differnet diffrentiations? I am getting different answer for dLi/df_k than your answer (p_k - 1). Also, you only covered chain rule here but there is definately some advanced rules used here (product rule). Also, I am not sure how one would do derivation of summation of e^j where one of the j = k.
Great video!
In most use cases for RNN, all training sequences are picked from a population of same 'type' like "Shakespeare writings".
A step more advanced - What if you have generated a lot of _random_ sequences and calculated a quality value for each sequence (for stuff like game theory or physical modelling). Can that quality value be integrated in the cost function, thus taking into account the value of each sequence? Could you sketch out how to do this or give reference to relevant links? Thanks!
I managed to find some keywords that could guide me in the right direction: "supervised learning" and "Q-learning". One example was this one:
bit.ly/2tvMdZQ
Are there any systems for nn music generation, that you know of, that attempts to create phrases with a consistent feeling throughout the phrase, then switch to a new phrase, then end the song eventually?
If not then I have some work to do.
So right now there is only one hidden layer which spits out value at t-1 which is used along with input to generate values at time stamp t. What happens if there are multiple hidden layers? for eg if the architecture is as follows
i/o ----> h1--->h2 --->o/p
How would the connections between the hidden layers be in a rnn of this type ?
Kafka crazy ? Isn't that a bit much ? Weird OK and DEEP maybe ;) Anyway thanks for all your nice and very instructive videos :)
hmm yeah in my head crazy == genius, but i can see how that could be interpreted as negative. thanks for the feedback!
That's how I understood it. Genius sounds better definitely :) Thanks again for all the work you do
I wonder how well this would perform after some good training, compared with a simple Markov chain algorithm, in terms of generating words that make sense.
Can you please make a video on wind forecasting using the hourly data and implementing it using recurrent neural networks ??
Hi Siraj!
Thanks for the great material.. I am wondering if it is possible to use a Recurrent Neural Network to make a classifier? I would like to classify the events of a device based on some sensors like accelerometers, and other signals.
I guess it should be similar to classifying the physical activity like running or walking. However, in my case the events are not periodic. I have everything to collect the labeled samples, but any idea about how large should be the dataset for the training part? Any idea would be much appreciated..
Why is it that some rnn models I see online show the output from the previous timestep going into the hidden layer, however in this video you say to use the hidden layer from the previous timestep should added to the hidden later?
waiting for this. love u man!
Hi, I'm struggling to apply a NARX RNN model for a dissertation project. Could you recommend me some useful literature or R lib for doing so? I'd really appreciate it!
funk-yeah! my RNN class is held by rap star!
Siraj....U ROCK :)
thanks Soumya :)
This is a great video. Is there like a tensorflow implementation of this application?
All the part on the loss function is not very clearful.. can you explain what is dhraw and all those operations ?
Hi Raj, Great video. I have a question about neural network. What is the difference between neural network, convolutional neural network and recurrent neural network?
Siraj,how to know the shape required for weight and bias,etc
Why do we need to use two different activation functions(sigmoid & tanh) in input gate in LSTM? and why do we need to use tanh in output gate in LSTM?...
16:46 "one morning Gregor Samsa awoke from uneasy dreams he found himself transformed in his bed into a gigantic insect." You can't say blah blah :)
So for deep networks, on which hidden layer do I stick a past hidden to? All of them? Just one of them?
Can you explain why you have to format the input vector into a dictionary then to binary vector? You have for example: a:55, r: 47 c:22 which you map to a binary vector (80x1) -> a = 0, 0, 0 ... 0, 1, 0, 0...
Could you not just have that dictionary of 80 characters and scale the integer representation to a float of 0->1, such that for example a:0.6875 c:0.5875 c:0.275. Then instead of an input vector of (80x1) your input is just a float value (1x1) representing a unique character. I know this probably wouldn't work, but I don't understand why. The reason I ask is because I'm trying to port your code to a time series waveform and I just have input data in float form from 0->1 and I don't know if I need to label each float point to a binary vector to represent each unique float value in the sequence. That doesn't seem like it would make sense.. please help :)
I may have found part of the answer to my question: label encoding vs 1-hot encoding. hackernoon.com/what-is-one-hot-encoding-why-and-when-do-you-have-to-use-it-e3c6186d008f
Hey Siraj, could you do a video about the "hashing trick" I keep hearing about? It is said it speeds things up significantly, but I am still a bit confused about it.
can we use it for finding labels which we gave during training
Thank you a lot. what does 'iteration' exactly mean? iteration happen when learning or generating?
How do you make pictures of neural network?
Please attach the souce code in plain .py. I cant install anything else other than python due to restrictions.
Keep it up, good work!
Can you upload a tutorial about action detection please...
Hey SIraj, I am a huge fan of your videos, they have helped me a lot. Do you know of any material on applying machine learning models to Intrusion Detection Systems (IDS) ?
I'm 4 years late to the party, but maybe try a GAN?
Siraj is an example of what you will never find in a school because he gets to the point and quickly. Most CS subjects can be learned in weeks. The Nand to Tetris course is a great course that demonstrates how much time students waste. CS is easy compared to any math major. NNs just use the chain rule of calculus and PGMs just use the chain rule of probability. Go figure. It's elementary math. SVD is numerically more stable than PCA but autoencoders just outdate the whole math department. A little crunching generalizes better than any 17th century math obsession. However, CS departments are short on graphics and engineering when it comes to numerical methods like FEM. Needs to cover way more and much quicker. I still think people should stick to a math degree even if you want to do CS. Too superficial.
can we predict next number of a given integer sequence using RNN?
Is it possible to vectorize the forward/back propagation for RNNs, like classification ANNs?
what if we are dealing with language that has no alphabet? Such as Mandarin/Chinese ? How do we implement RNN in that case?
Each kanji has some meaning associated to it. The total number of kanji are a few thousand. We can use embeddings to represent them.
Can anybody please tell me what does this mean xs[t][input[t]] whose value are we changing
The final code gave me an error saying "sample is not defined". Please help.
there is an error in 19:19 .it should be ix_to_char not char_to_char
thanks
Great video. Except Kafka was not weird!
Why do we have 3 pairs (of 3 for input-hidden, hidden-hidden, hidden-output) instead of just one pair ?
Can you help in ConvolutionLSTM and DeConvloutionLSTM
siraj do a video on top 10 laptop suitable for deep learning!
ehfo0777 Use the cloud
"I train my models on the cloud now, 'cause my laptop takes longer!!" :P
good correlation with "top 10 most expensive laptops"
i train my models in the cloud now, cause my laptop takes longer ;)
One more thing is a gaming laptop with nvidia graphic card is suitable for this but, the prices for these laptop are high due to people using them for bitcoin mining.
Can I use this to generate recommended URLs?
Thank you for this series! This is awesome! When running the model for 500000+ iterations on the Kafka text it doesn't seem to get lower than a 40% loss. What would you suggest to optimize this particular model most efficiently?
Greetings from the Netherlands
In nn u get saturation after a number of epochs hence too many rnt that good bro
why did you use 0.01 as multiplicant of np.random.rand(.....)?
does python 3 will give you error in the code
because i am getting error.
really love your videos sir !
just a quick question,why the tanh and softmax are widely used in RNN instead of sigmoid function ?
Because gradient vanishing problem increases in sigmoid functions more as compared to tanh .And in case of RNN we use previous hidden layer along with inputs so more prone to gradient vanishing that's why we avoid Sigmoid.
Any specific reason for choosing character level generation over word ?
I guess for chars you get around 100 such things in English language, like in this case 81 output neurons. For words, maybe at least 3000 to make something that makes any sense?
Take me in as your apprentice xD
Where do I get this Kafka.txt?
How often do you use MATLAB?
This video is RAD
Hey Siraj, what laptop do you use? Specs?
is there any usefulness to study this rnn? i mean i'm majoring in electrical engineering but i really love to learn all kind neural network? is that a waste of time?
is learning anything new a waste of time?
ShawarmaLifeLiving nah. i love learn heavy stuff, quantum mechanic, quantum information theory, string theory, nuclear fussion-fission, em field, etc. so can this be apply to my field?
Xs[t][inputs[t]] what does this do
I think there is a mistake with the code: ps[t]=np.exp(ys[t])/np.sum(np.exp(ys[t])).
The divisor should be a sum of all t's, in this case np.exp(ys[t])=np.sum(np.exp(ys[t])) giving the probability = 1.
no, the "np.exp(ys[t])" apply exponent for each items in that array. NumPy support that kind of operations. so it also returns an array that each of them is divided by the summation.
Yeah, this neural net is deep...
... its shit gets fitter while I sleep
My computer fan's too loud though...
... guess I'm uppin' this shit to the cloud, yo.