Here are some updates to the code to support TensorFlow version 1.0: def train_neural_network(x): prediction = neural_network_model(x) # OLD VERSION: #cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(prediction,y) ) # NEW: cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y) ) optimizer = tf.train.AdamOptimizer().minimize(cost) hm_epochs = 10 with tf.Session() as sess: # OLD: #sess.run(tf.initialize_all_variables()) # NEW: sess.run(tf.global_variables_initializer())
It's throwing this error for me: Epoch 9 completed out of 10 loss: 21351.9792503 Accuracy: 0.9508 *** Error in `python3': free(): invalid next size (normal): 0x0000000002b5ea10 *** Aborted I'm not sure where to look for a solution. Does anyone know what's going on here?
If sentdex has errors, it makes me feel less depressed when I get errors. Thank your for all your hard work. Your tutorials are helping me help the local law enforcement I work for. Thank you for all you do!
After struggling for days trying to understand how TensorFlow works , came to your tutorial and this was by far the best tutorial i've been through, thanx alot
I am a beginner who basically just has Python syntax skills mastered. Everything else is on a 'google it' basis. At first was kinda mad that you had so many errors, but the fact that you left them in made me go back and rewatch a bunch of times, question my code, google a bunch of stuff but overall I had a better grasp, made it not just mindless copying of code. The more time I spend looking at the code the more I understand
Hi,Dear I got error when I ran the code : for _ in range(int(mnist.train.num_examples/batch_size)): AttributeError: 'dict' object has no attribute 'train'
Thank you of being human and let us take part of making errors! I really love this because event the best coders make brakets, comma and array index errors! So I loved this by watching! Being a python virgin this really helped me.
this was so funyny with the error lol , I 'm glad that you didn't script this before and made the code froms cratch , this debugging was really great and beneficial to us ,thank you.
Great tutorial series. I am definitely a fan of TensorFlow now. I was pleasantly surprised by the depth of discussion of the code and the actual coding which makes it so much easier to follow along. Keep up the great work.
Thank you so much for this series. I've seen too many videos, papers, etc. that gloss over the internals as "passing around tensors" between the hidden layers, but actually seeing the weights/biases as simple arrays finally made it simple to grasp it as data being transformed **through** multidimensional arrays between each layer, as well as these weight/bias tensors being the persistent portion of the model. If this is indeed the case, I would love to see a few things: 1) How to pickle the model. 2) When it comes to other ML tasks such as regression, clustering, etc. do you just feed the output layer as input into a linear model? Can you feed a hidden layer the same way for multi-task learning? 3) How to inspect the weight matrices. I'm working with data where feature positions are stable, and building masks for most relevant regions will help reduce noise and data collection costs. Thanks again for the videos, keep up the great work! -Ben
Hey sentdex - Thanks for your amazing videos! Regarding the errors: Please leave them in, they are super helpful and actually really motivating knowing you also make them. ;-) Keep it up! :)
i didnt understand we are starting with train_neural_network(x) and first process is prediction= neural_network_model(x) but we didnt feed x it is just placeholder and has no data? how its gonna work?
You'd probably wanna go with this series instead: pythonprogramming.net/introduction-deep-learning-python-tensorflow-keras/ TF 2.0 is eager by default now, so the "graph" and "Session" is no longer something you *have* to worry about. There are also lots of smaller changes that might be annoying to deal with from how old this series is.
Very good tutorial. The whole playlist is well done. didn't make the bracket and range mistake, but had loads of other errors like the missing args "logits" and "labels" in the softmax... Looking forward for more tutorials
just found out this channel from two minute papers and ran through your tensor flow tut. Great content so far, and I'm looking forward to more. Subscribed!
Thank you for suggesting those tweaks! I was able to closely reproduce your results AND it forced me to look-up the tensorflow API doc. For those who are curious, here is an example of the tweaks put into action: hidden_l1 = {'weights': tf.Variable(tf.truncated_normal([784, n_nodes_hl1], stddev=0.1)), 'biases': tf.Variable(tf.constant(0.1,shape=[n_nodes_hl1]))} And you can find the definition of the two newly introduced functions here: www.tensorflow.org/api_docs/python/tf/truncated_normal www.tensorflow.org/api_docs/python/tf/constant For me, changing the weights produced the bulk of the result. The amount of "loss" went down three orders of magnitude (from ~20,000 in the default code to 11 with the new weights).
+bizdep not sure what you're missing. You can change sentdex's code at lines 19-20 where it defines the weights and biases of the first hidden layer. Just replace what's inside tf.Variable() like JP Beaudry did. Apply the same change to the next hidden layers, and you're golden. If you don't understand what this does, let me explain. Having a small positive initial bias means the neurons are more likely to fire in average. That speeds up training because backpropagation relies on how much impact each neuron has on the output, and inactive neurons have no impact whatsoever. It doesn't really matter if they're all the same, since the bias just controls the activation threshold, so we set it to a constant. Now, having a large amplitude for the weights makes the network more unstable, and thus harder to train, since each learning step may change the functional behavior a lot. So we start those values randomly (like sentdex did), but with a smaller standard deviation (0.1) and without any values larger than 0.2 in magnitude (which is what the truncated normal distribution does). That ensures we get a better initial condition, so training goes smoother. There's some work out there about how to pick a good initialization, but people often just try a bunch of variations from what I suggested until they find what works best. sentdex's initialization is not wrong, it just doesn't generally work well.
Hi Andre, Thanks for explaining in details. I really appreciate it. Just for being sure that I am getting it correctly, does the below modification in code looks good to you? hid_1_layer = {'weights':tf.Variable(tf.truncated_normal([shape,stddev = 0.1])), 'biases':tf.Variable(tf.constant([0.1,shape = shape]))} hid_2_layer = {'weights':tf.Variable(tf.truncated_normal([shape,stddev = 0.1])), 'biases':tf.Variable(tf.constant([0.1,shape = shape]))} hid_3_layer = {'weights':tf.Variable(tf.truncated_normal([shape,stddev = 0.1])), 'biases':tf.Variable(tf.constant([0.1,shape = shape]))} output_layer = {'weights':tf.Variable(tf.truncated_normal([shape,stddev = 0.1])), 'biases':tf.Variable(tf.constant([0.1,shape = shape]))} When, I am running this piece of code instead of the original one, I am getting syntax error. Please see.
+bizdep I think this should work: hidden_1_layer = {'weights':tf.Variable(tf.truncated_normal([784, n_nodes_hl1], stddev=0.1)), 'biases':tf.Variable(tf.constant([0.1,shape = [n_nodes_hl1]]))} hidden_2_layer = {'weights':tf.Variable(tf.truncated_normal([n_nodes_hl1, n_nodes_hl2], stddev=0.1)), 'biases':tf.Variable(tf.constant([0.1,shape = [n_nodes_hl2]]))} hidden_3_layer = {'weights':tf.Variable(tf.truncated_normal([n_nodes_hl2, n_nodes_hl3], stddev=0.1)), 'biases':tf.Variable(tf.constant([0.1,shape = [n_nodes_hl3]]))} output_layer = {'weights':tf.Variable(tf.truncated_normal([n_nodes_hl2, n_classes], stddev=0.1)), 'biases':tf.Variable(tf.constant([0.1,shape = [n_classes]]))} You should've been able to figure this out by yourself, though. If you don't understand what's wrong with your code, I suggest you review basic python programming lessons before delving into deep learning.
I'm a python noob but took a stab at the "for loop". I scrapped the weights and biases dict and created them on demand as i wired up the layers doing a look-ahead by 1 to hook the layers together. I hope I did this right - have a look. The accuracy remains over 95% so hopefully it is correct. github.com/jeffgriffith/mnist-tensorflow-demo/blob/master/mnist_tf_demo.py
How the hell is tensorflow doing all that? It's cool, but on the other hand it feels like the user doesn't really know what's going on. I feel a little bit lost working with this. How is the optimiser accessing values of a dictionary within another function? I need a drink.
For anybody going through my same feelings, down below in the comments Sentdex clarified what's happening: sentdex3 weeks ago Remember, TensorFlow is not pure python code, they've only basically created a nice pretty Python wrapper to make our lives easier. TensorFlow inherently runs on a computation graph. The graph is first described by neural_network_model in our code, but very quickly it's iterated on, based on the rules we set forth in the train_neural_network function... but every iteration in train_neural_network does not reset the values, it's working on the computation graph. This abstraction is notably confusing, but that's what's happening, and that abstraction is made in the name of simplicity and allowing us mere mortal Python developers make use of the technology.
It is a bit confusing, has very little to do with how normal python works or how the models work mathematically, but I guess it's not hard to learn, you just have to accept the rules tf introduces and follow them. That's always the price of abstraction.
Did you get ur answer, as its been 1 year now. If yes, could you please share it with us as well.
8 ปีที่แล้ว +21
Dude! I follow you for a long time and you make awesome videos. I tried to study for my own tensorflow also but I get to the point where I have a nice accuracy, great, but how implement that in a real image, you now try to multiscale the weight and convolute over a image so we can detect numbers from 1 to 10 that are in a image, could you do a video about that. Thanks!
Hey Harrison, in response to your comment about making the text bigger, I've found that a general rule of thumb when using the linux command line, that most of the usual keyboard shortcuts, cut, paste, increase text size, etc are the same as usual, but with the shift key added. So, for example: copy is usually ctrl C but in the command line it becomes ctrl shift C paste is usually ctrl V but bcomes ctrl shift V increase text size is usually ctrl + but becomes ctrl shift + This isnt true for everything, but in general, its a good thing to try. Thanks for the tutorial
Thanks for sharing. Regarding the confusion about tf.add and the + operator: The tensorflow python API overloads several operators, if you use "x+y" and x (or y) is a tensor, then tf.add will be used. In general, you want to use tf.add, so the code is clearer about a tensorflow function being used, and in case you want to use the "name" parameter.
@sentdex the MNIST database is unavailable (the site is down) so the tutorial doesnt work. i looked for hours now for the original files and cant find them anywhere. if you could upload them to your website that would be wonderful.
I have some suggestions. I really like your videos that's why i want to give suggestions. 1) a lot of material is available on internet related to classification and mnist. But you should work more on customize data. 2) image segmentation related video 3) Video on which API is best for convolutional network. Because selection of API is also difficult for me. 4) Comparison of writing some code in both tensorflow 1.x and 2.x. What are the differences in between two of them. because a lot of things have changed.
You got to watch out for them Global Variables. Try to only use the "ABC" and "XYZ" variables within your definitions. This will help you avoid errors in the future. :-) Thanks for the vid! I'm learning a lot about Tensorflow!
All of your tensor flow series are awesome tutorials, thank you! - Question... is there any flaw in using the prediction variable as you demonstrate it without running it thru a softmax filter, while the cost function passes the output thru a softmax filter and then calculates the loss on each item?
Did anyone getting error like ValueError: Dimensions must be equal, but are 784 and 500 for 'MatMul_1' (op: 'MatMul') with input shapes: [?,784], [500,500]. Any help/pointers are highly appreciated.
In the model when assigning output, you can't use the + operator in matmul() in tensorflow 1.0.1. Just wrap it with tf.add; replace the output line with output = tf.add(tf.matmul(l3, output_layer['weights']), output_layer['biases']) and it should work :)
dont try to Ctr+C the same code line and edit it. Let look at l2 = tf.add(tf.matmul(l1, hidden_2_layer['weights']), hidden_2_layer['biases']) l2 = tf.nn.relu((l2))
Unfortunately, this doesn't fix the problem as stated, since it's related to the matrix multiply method not so much matrix addition. Indeed, there's a problem with the addition as well, but this particular exception isn't directly related to that. I'm currently trying to fix this same issue, and the stack trace points me to the hidden_layer_1 dictionary definition, specifically the value for "weights" key. Well, it actually points me to the first matrix multiply call, THEN the dictionary definition of hl1. When I figure it out, I'll respond, even if this is a months-late reply.
Okay, so, sometime later I've managed to debug my followalong code. Indeed, the dimensions are incorrect, but the exception -- I believe -- is thrown for a host of other reasons. What I did to fix it: (1) changed all variables per sentdex's update on his website, (2) changed the model ((input*weight)+bias) variable names because mine were all named l1 (noob mistake), (3) updated the args passed to the 'matmul' method per: stackoverflow.com/questions/44956460/valueerror-dimensions-must-be-equal-but-are-784-and-500-for-matmul-1-op-m . With that, my code ran perfectly. Hope that helps for anyone still stuck on this value exception!
Really liking this Neural Network with tensorFlow part of the series. Here are some suggestions: A Neural Network to predict time series (let's say Stock Prices) would be nice. Or maybe writing a reinforcement learning NN to write articles or make music.
I am almost positive it wont be anything stocks/finance related. I am leaning more towards language stuff with RNNs. Not sure I will make an article writer. Music making is interesting, but I will probably stick with RNNs to start. We'll see though, CNNs are pretty important to bring into play too at some point. I feel like I could easily be doing ML tuts for the next year or two... yikes :P
Yeah when it comes down to ML there really is a wide spectrum of things to cover. Anyways whatever the next tutorial will be I am definitely looking forward to it! :)
Eh, my initial response was that I wasn't too interested in that topic, but the paper is actually quite interesting. Almost certainly way out of my league, but will look more into this. Thanks for the suggestion. Link for anyone else curious: arxiv.org/pdf/1506.05869v3.pdf
Shouldnt we pass the output in neural_network_mode() to the relu acivation as that will bring it to 0 -1 scale (as we do in sigmoid) to act as logits or prediction.
In the new version use: softmax_cross_entropy_with_logits_v2 Like, tf.nn.softmax_cross_entropy_with_logits_v2(logits=prediction, labels=y) Or the old, softmax_cross_entropy_with_logits Like, tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y) Using logits and labels is necessary else you'll have an error.
Thanks. I couldn't understand Tensorflow's tutorial on building nn for mnist. It uses convolutional network which is a bit more complicated. Your tutorial was a step in between that I needed right now :)
When you call the neural_network_model(x) you are running the training set through the entire neural network just once and then in train_neural_network(x) you capture the cost, optimize it and backpropagate the new weights, but this time you run the same training set in batches, am I understanding correctly? Why not the run the neural_network_model(x) in batch the first time?
I have a question. The activation function used here is RELU, but the cost function used here is cross entropy and its related to softmax function. How is cross entropy different from mean sqaured error and why is this preferred. Thanks a lot for the help with tensorflow .
Hey! brilliant tutorial. I was wondering if you could clarify something for me. where you have on line 59 epoch_x, epoch_y, why did you have to do that? secondly, what is the point of the _ in the for loop? is it just an arbitrary value like i, j, k etc? and finally, I didnt really understand why you assigned _,c for the optimiser? like what made you do that? i didnt follow, thanks!
I don't quite understand one thing. We have our final model ready and we pass it an image. Let's say at the last hidden layer we have only 4 nodes that after applying relu evaluate to 0,1,1,1 respectively. To reach to the output we multiply these by some final weights, add bias and end up with 23, 102, 0, 2. My question is how do these values get translated into 1-hot encoded number, e.g. [0,0,0,0,1,0,0,0,0,0] ? The last layer that could have an arbitrary nr of nodes gets somehow translated to the 1-hot encoded label that could be of any arbitrary length as well. How does it happen?
If anyone gets an error regarding softmax_cross_entrpoty......logits and labels missing, do this: def train_neural_network(x): prediction=neural_network_model(x) cost=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y,name=None)) optimizer=tf.train.AdamOptimizer().minimize(cost) hope this will help you guys
Can i ask why the neural network model returns a bunch of zeros and a one at the output layer? What i am think is since we are doing (features*weights + biases) and feed through a ReLU function, it doesn't seem to return 0 or 1
Thanx for the tutorial, it was great. I understood most of it but in some part i got confused. You talked about stochastic gradient descent and others but you haven't fully talked about gradient descents and how it is helping to reduce cost functions, i would really like to know relationship about them in a clear explanation that you always give. I haven't watched future videos but up to now i have been following machine learning playlist sequentially, so if it is in future videos.. ignore my comment :)
Hi Sentdex, thanks for creating these tutorial, really appreciate ur work. Do output layers generally have linear activation function or can there be a sigmoid activation function as well? Just wondering
Hey, great series, I really enjoyed the videos so far! I got two questions: 1. is some of this stuff (I am referring to the whole series) useful to do some curve/surface/parameter fitting or is it only used for data classification? Could you point out something I could read about that? 2. any comment on how the methods you covered might apply to small datasets of, say, tens to hundreds of points? Thanks again for the hard work you put in explaining all of this!
Question about line 60: is this code actually evaluating cost twice (being that cost is a part of the optimizer declaration)? Or is TensorFlow just smart enough to evaluate only what it needs to in the computational graph and nothing extra?
Is there a website listing of "best models" in Tensor Flow for any given problem area? I.e. the best model for Face Recognition, a best model for stock market prediction?
It really is amazing how you can just throw in data and it "magically" (as you say it a lot :D) decreases the loss function so you get good predictions (95% is crazy if you think about it).
My data is in a csv file, all floats. What function should I use to train the data? Will data.train.next_batch(batch_size) work, where data is a numpy array?
Hi +sentdex. Great videos. I'm learning a lot, but I need a better explanation than "magic" @8:34 when explaining this '_, c = sess.run([optimizer, cost], feed_dict = {x: x, y: y})' Do you have other references, videos, or literature so we can get smart on this? Thanks!
A great video! But I'm confused about the function train_neural_network( x ). Why are we not also passing y as an argument as we are accessing it inside the above function? Or to rephrase how does is the above-mentioned function accessing y as it clearly seems to be working? I tried it without passing any arguments and it still works. How is that even possible? Thanks in advance!
When I am calculating input_data with weights and adding biases (my codes are exactly similar as mentioned l1, l2, and l3. Even though the data is defined in the previous step neural_network_model (data), I am getting an error in the l1 step that name 'data' is not defined. Any suggestion will be appreciated.
So I'm getting an error where the MatMul won't multiply because the matrix dimensions aren't the same. I had to set the number of nodes to 784. Any help?
Sentdex, the variable y is not defined locally in, or passed to, the function train_neural_network. Am I right to assume the line y = tf.placeholder... is effectively a Global then? I'm getting an error on the line cost = tf.reduce_mean(...), even after I changed to epoch_x and epoch_y.
Always had a good taste for the algorithm but it is time to move on to machine learning. Idea itself is very exciting and that's the whole reason i will give a shot, just because as you train it, you watch it performing actions no human algorithm (hand coded) can manage and figure it's task by just itself. Even it comes with the ideas of it's own which are better then humans for instance in board games.
Really great content! Would be interesting to see a tutorial on how to prepare your own data. Would not have to be image recognition - but could be more of time series data or very granular event data (i.e. IoT). How might we prep the data, build the model, train and test. Heck even predict - using new data.
Hi, thanks a lot for your tutorials. They are really great. But i have some question. How can I test an image using the already training model? I mean, given 4.png, how can i load the model and test if it correctly recognizes my image? Is there already a tutorial/example on this? Thanks
Thank you for your nice tutorials. I tested the same program for three and also for a simplification by just two layers and the accuracy at the end was still 0.948 (the same as for three layers). Does this mean that in this exact example, two layers are enough or does it have to do with how the layers are constructed and that additional layers should contain maybe different distributions? I also tested it for just one layer and the accuracy just went down by a little to 0.9469
in the function call of train_neural_network(x). How was data into the x place holder? Isn't it just an empty float of shape 784? Please help! I'm not understanding.
I think the variable 'output' in the function 'def neural_network_model(data)' should be also passed as an argument to the function 'relu(...)'. I think this vector should pass through the activation function too. So, the final statement in that function might be 'return tf.nn.relu(output)' instead of 'return output'. With that little change I got 98% accuracy instead of 94.85%.
Here are some updates to the code to support TensorFlow version 1.0:
def train_neural_network(x):
prediction = neural_network_model(x)
# OLD VERSION:
#cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(prediction,y) )
# NEW:
cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y) )
optimizer = tf.train.AdamOptimizer().minimize(cost)
hm_epochs = 10
with tf.Session() as sess:
# OLD:
#sess.run(tf.initialize_all_variables())
# NEW:
sess.run(tf.global_variables_initializer())
hi..when i did the new sess.run, i got error message invalid character in identifier...any suggestion?
@ken m
What does stacktrace say?
I will suggest post your question on stackoverflow.com and share the question link here.
Ken M don't directly copy the code from here, typing it should solve the problem.
the source website is down (and from what i saw not for the first time), if you could upload the files to pythonprogramming.net that would be great.
It's throwing this error for me:
Epoch 9 completed out of 10 loss: 21351.9792503
Accuracy: 0.9508
*** Error in `python3': free(): invalid next size (normal): 0x0000000002b5ea10 ***
Aborted
I'm not sure where to look for a solution. Does anyone know what's going on here?
If sentdex has errors, it makes me feel less depressed when I get errors.
Thank your for all your hard work. Your tutorials are helping me help the local law enforcement I work for.
Thank you for all you do!
After struggling for days trying to understand how TensorFlow works , came to your tutorial and this was by far the best tutorial i've been through, thanx alot
I am trying to understand about deep learning for last a month, but your video save me in one day. Thanks for such kind of video.
The full code in TensorFlow v2:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# Load data
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
# Normalize data
x_train = tf.keras.utils.normalize(x_train, axis=1)
x_test = tf.keras.utils.normalize(x_test, axis=1)
# Build model
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
model.add(tf.keras.layers.Dense(units=128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(units=128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(units=10, activation=tf.nn.softmax))
# Compile model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Train model
model.fit(x_train, y_train, epochs=3)
# Evaluate model
val_loss, val_acc = model.evaluate(x_test, y_test)
print("val_loss : ", val_loss)
print("val_acc : ", val_acc)
I am a beginner who basically just has Python syntax skills mastered. Everything else is on a 'google it' basis. At first was kinda mad that you had so many errors, but the fact that you left them in made me go back and rewatch a bunch of times, question my code, google a bunch of stuff but overall I had a better grasp, made it not just mindless copying of code. The more time I spend looking at the code the more I understand
Hi,Dear I got error when I ran the code : for _ in range(int(mnist.train.num_examples/batch_size)):
AttributeError: 'dict' object has no attribute 'train'
Dude ... this is a great way of teaching how to code. Greetings from Hamburg.
Thank you of being human and let us take part of making errors! I really love this because event the best coders make brakets, comma and array index errors! So I loved this by watching! Being a python virgin this really helped me.
Someone needs to tell tensorflow dev team on how to name their functions -_-
wrr
next thing we know they are writing paragrah as a variable name
it's not so bad, better as acronyms, besides it's expressive and most people use some kind of autocompletion.
this was so funyny with the error lol , I 'm glad that you didn't script this before and made the code froms cratch , this debugging was really great and beneficial to us ,thank you.
Great tutorial series. I am definitely a fan of TensorFlow now. I was pleasantly surprised by the depth of discussion of the code and the actual coding which makes it so much easier to follow along. Keep up the great work.
Awesome! Better than some of the paid-for online classes on TensorFlow!
Just wanted to send thanks for making these videos, they are much appreciated.
Thank you so much for this series. I've seen too many videos, papers, etc. that gloss over the internals as "passing around tensors" between the hidden layers, but actually seeing the weights/biases as simple arrays finally made it simple to grasp it as data being transformed **through** multidimensional arrays between each layer, as well as these weight/bias tensors being the persistent portion of the model. If this is indeed the case, I would love to see a few things:
1) How to pickle the model.
2) When it comes to other ML tasks such as regression, clustering, etc. do you just feed the output layer as input into a linear model? Can you feed a hidden layer the same way for multi-task learning?
3) How to inspect the weight matrices. I'm working with data where feature positions are stable, and building masks for most relevant regions will help reduce noise and data collection costs.
Thanks again for the videos, keep up the great work!
-Ben
Hey sentdex - Thanks for your amazing videos! Regarding the errors: Please leave them in, they are super helpful and actually really motivating knowing you also make them. ;-) Keep it up! :)
i didnt understand we are starting with train_neural_network(x) and first process is prediction= neural_network_model(x) but we didnt feed x it is just placeholder and has no data? how its gonna work?
I like the reaction you made after you said, "please don't give me an error" at 13:44 :D Thank you for the videos. That reaction was "epoch" :D
Is this tutorial relevant from a coding point of view as many things have changed in tensorflow 2.0?
You'd probably wanna go with this series instead: pythonprogramming.net/introduction-deep-learning-python-tensorflow-keras/
TF 2.0 is eager by default now, so the "graph" and "Session" is no longer something you *have* to worry about. There are also lots of smaller changes that might be annoying to deal with from how old this series is.
@@sentdex Thanks, I was getting a lot of errors and was not finding much on the internet. Great Tutorials, keep up the good work!!!!!
Thanks for this comment yash, saved me some valuable time and headaches
Very good tutorial. The whole playlist is well done. didn't make the bracket and range mistake, but had loads of other errors like the missing args "logits" and "labels" in the softmax... Looking forward for more tutorials
just found out this channel from two minute papers and ran through your tensor flow tut. Great content so far, and I'm looking forward to more. Subscribed!
One of my Best tutorial..!! Good Job
line 48 should be
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y))
may solve problem like
ValueError: Only call `softmax_cross_entropy_with_logits` with named arguments (labels=..., logits=..., ...)
line 54 can upgrade as tf.global_variables_initializer()
Yuheng Shi excellent, thank you.
Thanks! Up you go.
Awesome. You are making so many excellent tutorials! Many, Many thanks
Happy to make them!
Perfectttttt!!!! Love your videos!!!!
I recommend everyone to see these!
Hi, thanks for sharing. Tensorflow doc is very difficult to understand for a noob like me. Great tutorials. I am able to achieve 96.6% accuracy. (Y)
Your videos are amazing! Thank you for all your excellent tutorials!!!!
thanks for sharing your precious knowledge. it's really helpful 👍👍👍
How to use the model which we have created?
How to give any input from 0-9 as images and check if it predicting correct or not, how to do that?
Thank you for sharing
You're very welcome!
ADAMIN DİBİ! HELAL OLSUN.
You are such a man. Thank you!
Change the weights initialization to tf.truncated_normal(shape, stddev=0.1) and bias to tf.constant(0.1, shape=shape). Got 98% accuracy. :D
Thank you for suggesting those tweaks! I was able to closely reproduce your results AND it forced me to look-up the tensorflow API doc.
For those who are curious, here is an example of the tweaks put into action:
hidden_l1 = {'weights': tf.Variable(tf.truncated_normal([784, n_nodes_hl1], stddev=0.1)),
'biases': tf.Variable(tf.constant(0.1,shape=[n_nodes_hl1]))}
And you can find the definition of the two newly introduced functions here:
www.tensorflow.org/api_docs/python/tf/truncated_normal
www.tensorflow.org/api_docs/python/tf/constant
For me, changing the weights produced the bulk of the result. The amount of "loss" went down three orders of magnitude (from ~20,000 in the default code to 11 with the new weights).
Where is the example of tweaks that you've mentioned? I am sorry if I am missing that in your comment.
+bizdep not sure what you're missing. You can change sentdex's code at lines 19-20 where it defines the weights and biases of the first hidden layer. Just replace what's inside tf.Variable() like JP Beaudry did. Apply the same change to the next hidden layers, and you're golden.
If you don't understand what this does, let me explain. Having a small positive initial bias means the neurons are more likely to fire in average. That speeds up training because backpropagation relies on how much impact each neuron has on the output, and inactive neurons have no impact whatsoever. It doesn't really matter if they're all the same, since the bias just controls the activation threshold, so we set it to a constant.
Now, having a large amplitude for the weights makes the network more unstable, and thus harder to train, since each learning step may change the functional behavior a lot. So we start those values randomly (like sentdex did), but with a smaller standard deviation (0.1) and without any values larger than 0.2 in magnitude (which is what the truncated normal distribution does). That ensures we get a better initial condition, so training goes smoother.
There's some work out there about how to pick a good initialization, but people often just try a bunch of variations from what I suggested until they find what works best. sentdex's initialization is not wrong, it just doesn't generally work well.
Hi Andre,
Thanks for explaining in details. I really appreciate it. Just for being sure that I am getting it correctly, does the below modification in code looks good to you?
hid_1_layer = {'weights':tf.Variable(tf.truncated_normal([shape,stddev = 0.1])),
'biases':tf.Variable(tf.constant([0.1,shape = shape]))}
hid_2_layer = {'weights':tf.Variable(tf.truncated_normal([shape,stddev = 0.1])),
'biases':tf.Variable(tf.constant([0.1,shape = shape]))}
hid_3_layer = {'weights':tf.Variable(tf.truncated_normal([shape,stddev = 0.1])),
'biases':tf.Variable(tf.constant([0.1,shape = shape]))}
output_layer = {'weights':tf.Variable(tf.truncated_normal([shape,stddev = 0.1])),
'biases':tf.Variable(tf.constant([0.1,shape = shape]))}
When, I am running this piece of code instead of the original one, I am getting syntax error. Please see.
+bizdep I think this should work:
hidden_1_layer = {'weights':tf.Variable(tf.truncated_normal([784, n_nodes_hl1], stddev=0.1)),
'biases':tf.Variable(tf.constant([0.1,shape = [n_nodes_hl1]]))}
hidden_2_layer = {'weights':tf.Variable(tf.truncated_normal([n_nodes_hl1, n_nodes_hl2], stddev=0.1)),
'biases':tf.Variable(tf.constant([0.1,shape = [n_nodes_hl2]]))}
hidden_3_layer = {'weights':tf.Variable(tf.truncated_normal([n_nodes_hl2, n_nodes_hl3], stddev=0.1)),
'biases':tf.Variable(tf.constant([0.1,shape = [n_nodes_hl3]]))}
output_layer = {'weights':tf.Variable(tf.truncated_normal([n_nodes_hl2, n_classes], stddev=0.1)),
'biases':tf.Variable(tf.constant([0.1,shape = [n_classes]]))}
You should've been able to figure this out by yourself, though. If you don't understand what's wrong with your code, I suggest you review basic python programming lessons before delving into deep learning.
I'm a python noob but took a stab at the "for loop". I scrapped the weights and biases dict and created them on demand as i wired up the layers doing a look-ahead by 1 to hook the layers together. I hope I did this right - have a look. The accuracy remains over 95% so hopefully it is correct. github.com/jeffgriffith/mnist-tensorflow-demo/blob/master/mnist_tf_demo.py
you got the damn for loop!!!! You are the pythoneer!!!!
Where and how did you learn machine learning
How the hell is tensorflow doing all that? It's cool, but on the other hand it feels like the user doesn't really know what's going on. I feel a little bit lost working with this. How is the optimiser accessing values of a dictionary within another function? I need a drink.
For anybody going through my same feelings, down below in the comments Sentdex clarified what's happening:
sentdex3 weeks ago
Remember, TensorFlow is not pure python code, they've only basically created a nice pretty Python wrapper to make our lives easier. TensorFlow inherently runs on a computation graph. The graph is first described by neural_network_model in our code, but very quickly it's iterated on, based on the rules we set forth in the train_neural_network function... but every iteration in train_neural_network does not reset the values, it's working on the computation graph. This abstraction is notably confusing, but that's what's happening, and that abstraction is made in the name of simplicity and allowing us mere mortal Python developers make use of the technology.
It is a bit confusing, has very little to do with how normal python works or how the models work mathematically, but I guess it's not hard to learn, you just have to accept the rules tf introduces and follow them. That's always the price of abstraction.
tf was made with c++ and cuda I think. :)
Did you get ur answer, as its been 1 year now.
If yes, could you please share it with us as well.
Dude! I follow you for a long time and you make awesome videos. I tried to study for my own tensorflow also but I get to the point where I have a nice accuracy, great, but how implement that in a real image, you now try to multiscale the weight and convolute over a image so we can detect numbers from 1 to 10 that are in a image, could you do a video about that. Thanks!
It would be very interesting..
Yeah! I am waiting for it :D
Hey Harrison, in response to your comment about making the text bigger, I've found that a general rule of thumb when using the linux command line, that most of the usual keyboard shortcuts, cut, paste, increase text size, etc are the same as usual, but with the shift key added.
So, for example:
copy is usually ctrl C but in the command line it becomes ctrl shift C
paste is usually ctrl V but bcomes ctrl shift V
increase text size is usually ctrl + but becomes ctrl shift +
This isnt true for everything, but in general, its a good thing to try.
Thanks for the tutorial
Thanks for sharing.
Regarding the confusion about tf.add and the + operator: The tensorflow python API overloads several operators, if you use "x+y" and x (or y) is a tensor, then tf.add will be used. In general, you want to use tf.add, so the code is clearer about a tensorflow function being used, and in case you want to use the "name" parameter.
@sentdex the MNIST database is unavailable (the site is down) so the tutorial doesnt work. i looked for hours now for the original files and cant find them anywhere. if you could upload them to your website that would be wonderful.
I have some suggestions. I really like your videos that's why i want to give suggestions. 1) a lot of material is available on internet related to classification and mnist. But you should work more on customize data. 2) image segmentation related video 3) Video on which API is best for convolutional network. Because selection of API is also difficult for me. 4) Comparison of writing some code in both tensorflow 1.x and 2.x. What are the differences in between two of them. because a lot of things have changed.
You got to watch out for them Global Variables. Try to only use the "ABC" and "XYZ" variables within your definitions. This will help you avoid errors in the future. :-) Thanks for the vid! I'm learning a lot about Tensorflow!
All of your tensor flow series are awesome tutorials, thank you! - Question... is there any flaw in using the prediction variable as you demonstrate it without running it thru a softmax filter, while the cost function passes the output thru a softmax filter and then calculates the loss on each item?
Very helpful, especially the debugging.Thanks.
Did anyone getting error like ValueError: Dimensions must be equal, but are 784 and 500 for 'MatMul_1' (op: 'MatMul') with input shapes: [?,784], [500,500]. Any help/pointers are highly appreciated.
same error here
In the model when assigning output, you can't use the + operator in matmul() in tensorflow 1.0.1. Just wrap it with tf.add; replace the output line with
output = tf.add(tf.matmul(l3, output_layer['weights']), output_layer['biases'])
and it should work :)
dont try to Ctr+C the same code line and edit it.
Let look at
l2 = tf.add(tf.matmul(l1, hidden_2_layer['weights']), hidden_2_layer['biases'])
l2 = tf.nn.relu((l2))
Unfortunately, this doesn't fix the problem as stated, since it's related to the matrix multiply method not so much matrix addition. Indeed, there's a problem with the addition as well, but this particular exception isn't directly related to that. I'm currently trying to fix this same issue, and the stack trace points me to the hidden_layer_1 dictionary definition, specifically the value for "weights" key. Well, it actually points me to the first matrix multiply call, THEN the dictionary definition of hl1. When I figure it out, I'll respond, even if this is a months-late reply.
Okay, so, sometime later I've managed to debug my followalong code. Indeed, the dimensions are incorrect, but the exception -- I believe -- is thrown for a host of other reasons. What I did to fix it: (1) changed all variables per sentdex's update on his website, (2) changed the model ((input*weight)+bias) variable names because mine were all named l1 (noob mistake), (3) updated the args passed to the 'matmul' method per: stackoverflow.com/questions/44956460/valueerror-dimensions-must-be-equal-but-are-784-and-500-for-matmul-1-op-m . With that, my code ran perfectly.
Hope that helps for anyone still stuck on this value exception!
Brilliant tutorials mate!
Thanks!
Eventually worked on raspberry pi.Took almost all CPU and memory resources.The pi feels like it has just been abused
lol
What did you make using tf in raspberry pi? Pls share!! :)
don't mind me, but what were you thinking? XDD
Deeply abused ;)
Really liking this Neural Network with tensorFlow part of the series. Here are some suggestions:
A Neural Network to predict time series (let's say Stock Prices) would be nice. Or maybe writing a reinforcement learning NN to write articles or make music.
I am almost positive it wont be anything stocks/finance related. I am leaning more towards language stuff with RNNs. Not sure I will make an article writer. Music making is interesting, but I will probably stick with RNNs to start.
We'll see though, CNNs are pretty important to bring into play too at some point. I feel like I could easily be doing ML tuts for the next year or two... yikes :P
Yeah when it comes down to ML there really is a wide spectrum of things to cover. Anyways whatever the next tutorial will be I am definitely looking forward to it! :)
There is a paper published by 2 google researchers "Neural Conversational Model". That is interesting.
Hope you will look into that.
Eh, my initial response was that I wasn't too interested in that topic, but the paper is actually quite interesting. Almost certainly way out of my league, but will look more into this. Thanks for the suggestion. Link for anyone else curious: arxiv.org/pdf/1506.05869v3.pdf
This tutorial i will say just awesome,,,,,,, always much better learning lessons, thanks
@17:10 "Hello everyone! My name is sentdex, and welcome to debugging with Python" lol
where do we save the weight and biases?
Shouldnt we pass the output in neural_network_mode() to the relu acivation as that will bring it to 0 -1 scale (as we do in sigmoid) to act as logits or prediction.
how can we save our tensorflow model so that we can use for futher projects or use it for transfer learning?
is it normal to not understand almost nothing what it is actually going on yet on this train function? T.T
Possibly, but what you should do next is, step by step, research more into what you do not understand. Keep doing that til you understand :P
we call the train_neural_network(x) with the parameter x but x has still not been loaded with any data ! are we just passing the placeholder ?
no, we hand over the mnist-training-batches while training
where we say epoch_x, epoch_y = mnist_training_batch, mnist_training_batch_labels
What do the values of the loss signify? The values shown in the video seem too big.
In the new version use: softmax_cross_entropy_with_logits_v2
Like,
tf.nn.softmax_cross_entropy_with_logits_v2(logits=prediction, labels=y)
Or the old, softmax_cross_entropy_with_logits
Like,
tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y)
Using logits and labels is necessary else you'll have an error.
thanx bro..
Thanks. Great video. Is there a way to send a picture as an entry and get back the print as number on the console?
Thanks. I couldn't understand Tensorflow's tutorial on building nn for mnist. It uses convolutional network which is a bit more complicated. Your tutorial was a step in between that I needed right now :)
When you call the neural_network_model(x) you are running the training set through the entire neural network just once and then in train_neural_network(x) you capture the cost, optimize it and backpropagate the new weights, but this time you run the same training set in batches, am I understanding correctly? Why not the run the neural_network_model(x) in batch the first time?
I have a question. The activation function used here is RELU, but the cost function used here is cross entropy and its related to softmax function. How is cross entropy different from mean sqaured error and why is this preferred.
Thanks a lot for the help with tensorflow .
My dataset consists of low resolution astronomical images for object detection.Will CNN be useful for it, because i m going to try with it..
Hey! brilliant tutorial. I was wondering if you could clarify something for me. where you have on line 59 epoch_x, epoch_y, why did you have to do that? secondly, what is the point of the _ in the for loop? is it just an arbitrary value like i, j, k etc? and finally, I didnt really understand why you assigned _,c for the optimiser? like what made you do that? i didnt follow, thanks!
I don't quite understand one thing. We have our final model ready and we pass it an image. Let's say at the last hidden layer we have only 4 nodes that after applying relu evaluate to 0,1,1,1 respectively. To reach to the output we multiply these by some final weights, add bias and end up with 23, 102, 0, 2. My question is how do these values get translated into 1-hot encoded number, e.g. [0,0,0,0,1,0,0,0,0,0] ?
The last layer that could have an arbitrary nr of nodes gets somehow translated to the 1-hot encoded label that could be of any arbitrary length as well. How does it happen?
I think he just simply forgot to map the output values to [0,1] by an activation function.
Do you have any videos for using TensorFlow for image classification?
Thanks
If anyone gets an error regarding softmax_cross_entrpoty......logits and labels missing, do this:
def train_neural_network(x):
prediction=neural_network_model(x)
cost=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y,name=None))
optimizer=tf.train.AdamOptimizer().minimize(cost)
hope this will help you guys
Can i ask why the neural network model returns a bunch of zeros and a one at the output layer? What i am think is since we are doing (features*weights + biases) and feed through a ReLU function, it doesn't seem to return 0 or 1
Shouldn't the cost be related more to the shape then the value of the number? For instance, 1 would be closer to 7 than 0
Thanx for the tutorial, it was great. I understood most of it but in some part i got confused. You talked about stochastic gradient descent and others but you haven't fully talked about gradient descents and how it is helping to reduce cost functions, i would really like to know relationship about them in a clear explanation that you always give. I haven't watched future videos but up to now i have been following machine learning playlist sequentially, so if it is in future videos.. ignore my comment :)
Hi Sentdex, thanks for creating these tutorial, really appreciate ur work. Do output layers generally have linear activation function or can there be a sigmoid activation function as well? Just wondering
Hey, great series, I really enjoyed the videos so far!
I got two questions:
1. is some of this stuff (I am referring to the whole series) useful to do some curve/surface/parameter fitting or is it only used for data classification? Could you point out something I could read about that?
2. any comment on how the methods you covered might apply to small datasets of, say, tens to hundreds of points?
Thanks again for the hard work you put in explaining all of this!
Excellent tutorial!
+Javier Matias thanks!
It is a very useful video!! How can I find the dataset in the Video? Now I have a large dataset with 15 attributes, it is OK to use this model?
Question about line 60: is this code actually evaluating cost twice (being that cost is a part of the optimizer declaration)? Or is TensorFlow just smart enough to evaluate only what it needs to in the computational graph and nothing extra?
Is there a website listing of "best models" in Tensor Flow for any given problem area? I.e. the best model for Face Recognition, a best model for stock market prediction?
Time series: Regression, Recurrent Neural Networks.
Image data: Convolutional Neural Networks
Language data: Recurrent Neural Networks, specially LSTM cells.
which models are good for multi label classification?
It really is amazing how you can just throw in data and it "magically" (as you say it a lot :D) decreases the loss function so you get good predictions (95% is crazy if you think about it).
My data is in a csv file, all floats. What function should I use to train the data?
Will data.train.next_batch(batch_size) work, where data is a numpy array?
on line 42, shouldn't it be: output = tf.nn.softmax(tf.matmul(l3, output....))??
Hi +sentdex. Great videos. I'm learning a lot, but I need a better explanation than "magic" @8:34 when explaining this
'_, c = sess.run([optimizer, cost], feed_dict = {x: x, y: y})'
Do you have other references, videos, or literature so we can get smart on this?
Thanks!
A great video! But I'm confused about the function train_neural_network( x ).
Why are we not also passing y as an argument as we are accessing it inside the above function? Or to rephrase how does is the above-mentioned function accessing y as it clearly seems to be working?
I tried it without passing any arguments and it still works. How is that even possible?
Thanks in advance!
what is the helper function to create 'next batch' for our own dataset
Great tutorial! Thank you!
When I am calculating input_data with weights and adding biases (my codes are exactly similar as mentioned l1, l2, and l3. Even though the data is defined in the previous step neural_network_model (data), I am getting an error in the l1 step that name 'data' is not defined. Any suggestion will be appreciated.
So I'm getting an error where the MatMul won't multiply because the matrix dimensions aren't the same. I had to set the number of nodes to 784. Any help?
here you perform epoch_loss += c , what is c basically ??
If you don't want the epoch loss, you don't have to run the 'cost' in sess.run, right?
Sentdex, the variable y is not defined locally in, or passed to, the function train_neural_network. Am I right to assume the line y = tf.placeholder... is effectively a Global then? I'm getting an error on the line cost = tf.reduce_mean(...), even after I changed to epoch_x and epoch_y.
Never mind. For those watching in 2018, the function softmax_cross_entropy_with_logits requires named input arguments. That was my issue.
Always had a good taste for the algorithm but it is time to move on to machine learning. Idea itself is very exciting and that's the whole reason i will give a shot, just because as you train it, you watch it performing actions no human algorithm (hand coded) can manage and figure it's task by just itself. Even it comes with the ideas of it's own which are better then humans for instance in board games.
Really great content! Would be interesting to see a tutorial on how to prepare your own data. Would not have to be image recognition - but could be more of time series data or very granular event data (i.e. IoT). How might we prep the data, build the model, train and test. Heck even predict - using new data.
Thank you so much for your tutorial! They are very helpful! In the next tensorflow project, do you mind explain the data structure a little bit?
Hi, thanks a lot for your tutorials. They are really great. But i have some question. How can I test an image using the already training model?
I mean, given 4.png, how can i load the model and test if it correctly recognizes my image? Is there already a tutorial/example on this?
Thanks
Hello Harrison, thank you for this series.
What beast of a pc do you have? your model was done while mine was at epoch 5
Check out th-cam.com/video/Q3mR7ftZ8JU/w-d-xo.html, I break down my computer there :P
Thank you for your nice tutorials. I tested the same program for three and also for a simplification by just two layers and the accuracy at the end was still 0.948 (the same as for three layers). Does this mean that in this exact example, two layers are enough or does it have to do with how the layers are constructed and that additional layers should contain maybe different distributions? I also tested it for just one layer and the accuracy just went down by a little to 0.9469
@sentdex , Is it possible to write a use_neural_network function for this network ?
If yes , how do I pass an image and classify it ?
Can somebody tell me why the output of the model function is just x*w+b and not g(x*w+b)? where g is a sigmoid or softmax function
in the function call of train_neural_network(x). How was data into the x place holder? Isn't it just an empty float of shape 784? Please help! I'm not understanding.
awesome videos
I think the variable 'output' in the function 'def neural_network_model(data)' should be also passed as an argument to the function 'relu(...)'. I think this vector should pass through the activation function too. So, the final statement in that function might be 'return tf.nn.relu(output)' instead of 'return output'. With that little change I got 98% accuracy instead of 94.85%.
How can i input my own image in this and get the digit as output ?
Is this code done by Tensorflow or created by you?
If you are using another data set, would we have something like mnist.train.num_examples? How we get that?