In a way, capsules are the encoder-decoder model applied to computer vision. Convolutions were constraining the network on how to encode an image. But capsules can evolve whatever model they want, to achieve an optimal encoding model. This is great progress. Encoder-decoder architecture has been applied successfully in NLP as well - see thought vectors which have achieved soa in MT
At 58:00 it appears the model is being trained on the image in various orientations. Wouldn’t a CNN model perform as well if it were trained given the same orientation rotation? I do understand his linear transformation argument but not sure this would work any better.
That makes so much sense. I always felt it was weird and wrong that NN did not threat the objects in an image as constructs of their own and only a part of soemthign else as another level of interpretation.
@@jamesporter628 I've read his research. A rather cheesy rebranding of the 50 year old Multi-layered neural network (NN) field, developed globally through the efforts of 100s of other ANS researchers. "DEEP" is a rather minor variant on this general NN theme, Cheesy in the extreme through application/misappropriation of that semantically loaded word... "DEEP". Displays rather marginal performance improvements over other multi-layered NN methods, despite enormous levels of funding and hype.
@@deeplemming3746 You sound like a crackpot. Hinton's contributions have led to algorithms with world-beating performance and are frequently put to practical use in academia and industry. Describing them as "cheesy" makes you sound like an absolute moron.
It drives me crazy that people taping these kinds of lectures think they need to do fancy camera work, constantly switching between the presentation on screen and someone's face, and that therefore I'm missing part of the slides. In particular, here we are missing the key result at the end. Unless he's doing something we really need to see to understand what he's talking about, the presenter's appearance is not interesting, stick with the slides.
The tetrahedron problem solution is incorrect because the faces are of rectangular shape and they don’t fit each other. To be solvable, they must be squares. That’s why the MIT professor was right.
I'm curious on the psychology of that letter r. I didn't rotate it as Jeffrey said I performed a flip. Idk wether it's a spacial reasoning skill or what
Kids when they just learn to read have the same problem. They are rotation insensitive or even mirror-insensitive, failing to make the difference between especially bdp and q (and g) especially but also other smaller differences that depend on orientation or mirroring.
It's the opposite of a "problem" it's a faster computation if the letter is mentally picked up and just flipped the right way not rotated then inverted.
In democratization of Deep Learning, people segmented it into CNN, RNNs and so on. These essentially are architectures. As they work, they also prevent people from thinking beyond them. In 'capsule' paper, Hinton has simply proposed a new architecture. Its not innovation, its just thinking from scratch. It was mystery to me as well, when reading text from natural scenes, how can there not be an algorithm which will read the position of each letter and simply combine. I guess that problem will now be solved which earlier had low accuracy.
No better than most of the many, many other multi-layer NN paradigms, which have been developed by many, many other researchers over the past 40 - 50 years.
as he said.....if a object lost part of itself. what will be happen ??? ex: a face lost a nose. which mean ...... a face isn't a face cuz missed part ???
At 28 minutes Dr. Hinton says "So I guess this is another piece of *an aitism*"?? Couldn't make out the word in asterisk. Did someone catch what was said?
Have a look at this debate if you want to understand the references Hinton makes with regards to "Nativism" or "Innatism": th-cam.com/video/aCCotxqxFsk/w-d-xo.html
@@f1l4nn1m Rather doubt that you or i will be alive 50 years from now. But non the less, what remains will have little or nothing to do with his fabricated... DL Boondoggle. The whole thing is quite disgusting beyond belief ! PS: very impressed by your level of technical expertise.
I solved the tetrahedron problem in minus 10 seconds before I even saw it. What I did was think about what shapes you could get if you sliced a tetrahedron. Either a triangle, or a square. Then I though, well a triangle would be too easy. So it must be the square. And the two shapes would be at right angles. Mind you I never been to MIT. I got the Africa one in 1/2 second too. Can I have a Nobel Prize. Thanks.
Hinton was great... in 1988. I don't understand what progress he has made since then his ted talk was no different. I hope Google has helped him get more up to date as he is a very bright guy. The convnet feature detector and pooling is so primitive versus a holographic excitation model we use at noonean. "The fact that it works so well is going to make it hard to get rid of" - thats one of the smartest things Hinton has said. these simple backprop Convnet approaches are just wrong, slow performing, and dont manage large numbers of features well. They work for what Hinton and Google did with 256x256 bit planes. But show me all the faces in an image. show me all the triangles. nope it doesnt do that well at all. I do not think hintons approach can differentiate between a set of corners/angles and a triangle. It is very interesting that Hinton uses digits as his recognition sample which he has done now for 30 years, but other than Zero there is no notion of continuance. I would warn people to take Hinton with a grain of salt. He is not a great cognitive scientist as he does not follow brain model to design. Holographic paradigms with active reinforcement at the neural level withing a recognition plane, with scanning across a larger image, work better than convnets or Hintons approach. Rather than digits, I'd like to see just a solution for pure geometric shapes. Show me squares versus ovals versus triangles versus corners in a large scale photograph. Show me all the eyes vs noses vs mouths. show me all the faces with eyes/nose/mouth in the right spatial relationship to be a face. By working with digits, Hinton is just being too simplistic. Probabilistic models may suffice in that domain. His early research on solving orientation-less recognition was quite good. But I just don't see what he's added in 30 years. At Noonean, we use a edelman-pribram holographic model as our "capsule" not covnets.
He invented DropOut and CD, and moreover, he has very good information about how to integrate the stochastic process with neural networks. the reason he is using MNIST yet is very simple because he is inventing, you are applying.
Mohammad Badri Ahmadi MNIST is kind of maxed out by every modern method though, it is mostly used for smoke testing. something like CIFAR10 works better for benchmarking if you need a small dataset.
Question: How does linear transformation work?
Answer : 21:27
In a way, capsules are the encoder-decoder model applied to computer vision. Convolutions were constraining the network on how to encode an image. But capsules can evolve whatever model they want, to achieve an optimal encoding model. This is great progress. Encoder-decoder architecture has been applied successfully in NLP as well - see thought vectors which have achieved soa in MT
The paper just came out. Today!
We need working implementation!
Here is a brief description of Capsule Networks, together with an implementation: hackernoon.com/what-is-a-capsnet-or-capsule-network-2bfbe48769cc
And here is another one: medium.com/@pechyonkin/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b
Very recommendable! Some of Hinton's critique of convolutional neural networks is worth pondering about
True!!!!
I've blogged about this post here:
moreisdifferent.com/2017/09/hinton-whats-wrong-with-CNNs
At 58:00 it appears the model is being trained on the image in various orientations. Wouldn’t a CNN model perform as well if it were trained given the same orientation rotation? I do understand his linear transformation argument but not sure this would work any better.
Similar concepts to Jeff Hawkins. Terms are slightly different. Would love to have these fellows cooperate and accelerate their work.
That makes so much sense. I always felt it was weird and wrong that NN did not threat the objects in an image as constructs of their own and only a part of soemthign else as another level of interpretation.
How would the transformation capability of the capsule be useful to recognizing plants/trees that don't have a fixed structure?
This might be my new favorite talk
Somehow this makes me proud to be UofT graduate student
I've been thinking along the same lines, but wasn't able to formulate a clear explanation of my thoughts. An excellent talk by Geoffrey Hinton.
This is very similar to the Recursive Cortical Network.
True genius destroying the field he helped build
@@deeplemming3746 ...are you okay? Have you even read his research?
Why do you think so? He's pretty concrete in what he says/the words he uses.
@@jamesporter628 I've read his research.
A rather cheesy rebranding of the 50 year old Multi-layered neural network (NN) field, developed globally through the efforts of 100s of other ANS researchers.
"DEEP" is a rather minor variant on this general NN theme, Cheesy in the extreme through application/misappropriation of that semantically loaded word... "DEEP".
Displays rather marginal performance improvements over other multi-layered NN methods, despite enormous levels of funding and hype.
@@deeplemming3746 You sound like a crackpot. Hinton's contributions have led to algorithms with world-beating performance and are frequently put to practical use in academia and industry. Describing them as "cheesy" makes you sound like an absolute moron.
@@deeplemming3746 There are hundreds of applications of deep learning being used today, silly crackpot.
does anyone know is there any papers about capsule?
Even if not yet published here's a paper on capsule: www.cs.toronto.edu/~fritz/absps/transauto6.pdf
Geoffrey and his colleagues released "Dynamic Routing Between Capsules" a few weeks ago.
Please please make the camera view static...
It drives me crazy that people taping these kinds of lectures think they need to do fancy camera work, constantly switching between the presentation on screen and someone's face, and that therefore I'm missing part of the slides. In particular, here we are missing the key result at the end. Unless he's doing something we really need to see to understand what he's talking about, the presenter's appearance is not interesting, stick with the slides.
I noticed Hinton says he's going to take "Marian perspective" -- (not sure if I am spelling it correctly). does anyone know that that means?
marr's levels of analysis is my guess
He is referring to David Marr ( en.wikipedia.org/wiki/David_Marr_(neuroscientist) ) and his work. Check out David Marr's book "Vision", 1982.
Awesome, great, thank you!
Martian Perspective. Reptilians!
thanks!
14:35 Carl Hewitt, the Carl Hewitt? The one of Planner and actor model fame?? Damn, it must be harder than it seems!
Next time I see a shoe I'll think of matrix determinants lol.
Fascinating how Hinton analyses NNs.
The tetrahedron problem solution is incorrect because the faces are of rectangular shape and they don’t fit each other. To be solvable, they must be squares. That’s why the MIT professor was right.
I'm curious on the psychology of that letter r. I didn't rotate it as Jeffrey said I performed a flip. Idk wether it's a spacial reasoning skill or what
Kids when they just learn to read have the same problem. They are rotation insensitive or even mirror-insensitive, failing to make the difference between especially bdp and q (and g) especially but also other smaller differences that depend on orientation or mirroring.
It's the opposite of a "problem" it's a faster computation if the letter is mentally picked up and just flipped the right way not rotated then inverted.
It is very close to wave theory in cortex to organize associative memory in humans brain by Redozubov
In democratization of Deep Learning, people segmented it into CNN, RNNs and so on. These essentially are architectures. As they work, they also prevent people from thinking beyond them. In 'capsule' paper, Hinton has simply proposed a new architecture. Its not innovation, its just thinking from scratch. It was mystery to me as well, when reading text from natural scenes, how can there not be an algorithm which will read the position of each letter and simply combine. I guess that problem will now be solved which earlier had low accuracy.
I don't think its stopped people thinking beyond them. But its always good to learn what seems to work before coming up without your own ideas!
What does "EM" mean and what are "mini-columns"? It's so annoying when people use terms without defining them properly.
expectation maximization and mini-columns is somewhat related to brain.
Question: Why is the squashing x|x|/(1+|x|^2) not x/|x| * tanh(|x|) ?
the fuction is the same, how it works is different, function is not equal to how to it really works
I like it! But.... does it work?
No better than most of the many, many other multi-layer NN paradigms, which have been developed
by many, many other researchers over the past 40 - 50 years.
as he said.....if a object lost part of itself. what will be happen ??? ex: a face lost a nose. which mean ...... a face isn't a face cuz missed part ???
At 28 minutes Dr. Hinton says "So I guess this is another piece of *an aitism*"?? Couldn't make out the word in asterisk. Did someone catch what was said?
en.wikipedia.org/wiki/Innatism
Have a look at this debate if you want to understand the references Hinton makes with regards to "Nativism" or "Innatism": th-cam.com/video/aCCotxqxFsk/w-d-xo.html
Does capsules really work and practically applicable?
github.com/naturomics/CapsNet-Tensorflow
Maybe "crowding" is why some people really hate crowds.
Hinton is burying an ax on MIT profs. ;-)
Correction,... Hinton is burying the axe on himself.
Never gonna happen
@@f1l4nn1m Never what gonna happn ?
@@deepswindle5211 Keep leaving the dream... see you in 50 years. We'll see what's going to remain of the Deep Learning movement.
@@f1l4nn1m Rather doubt that you or i will be alive 50 years from now.
But non the less, what remains will have little or nothing to do with his fabricated... DL Boondoggle.
The whole thing is quite disgusting beyond belief !
PS: very impressed by your level of technical expertise.
i like it
I'm an innate-ist. You only have to look at how a giraffe can walk after 1 minute of being born.
I solved the tetrahedron problem in minus 10 seconds before I even saw it. What I did was think about what shapes you could get if you sliced a tetrahedron. Either a triangle, or a square. Then I though, well a triangle would be too easy. So it must be the square. And the two shapes would be at right angles. Mind you I never been to MIT.
I got the Africa one in 1/2 second too. Can I have a Nobel Prize. Thanks.
What?
(pardon my rudensess)
He talks too fast and indistinctly. I can't understand about 30% of what he says.
His teaching style I find the most challenging... :-(
Be smarter.
He's British.
Lester Litchfield Be humble.
I thought he was pretty coherent?
Hinton was great... in 1988. I don't understand what progress he has made since then his ted talk was no different. I hope Google has helped him get more up to date as he is a very bright guy. The convnet feature detector and pooling is so primitive versus a holographic excitation model we use at noonean. "The fact that it works so well is going to make it hard to get rid of" - thats one of the smartest things Hinton has said. these simple backprop Convnet approaches are just wrong, slow performing, and dont manage large numbers of features well. They work for what Hinton and Google did with 256x256 bit planes. But show me all the faces in an image. show me all the triangles. nope it doesnt do that well at all. I do not think hintons approach can differentiate between a set of corners/angles and a triangle. It is very interesting that Hinton uses digits as his recognition sample which he has done now for 30 years, but other than Zero there is no notion of continuance. I would warn people to take Hinton with a grain of salt. He is not a great cognitive scientist as he does not follow brain model to design. Holographic paradigms with active reinforcement at the neural level withing a recognition plane, with scanning across a larger image, work better than convnets or Hintons approach. Rather than digits, I'd like to see just a solution for pure geometric shapes. Show me squares versus ovals versus triangles versus corners in a large scale photograph. Show me all the eyes vs noses vs mouths. show me all the faces with eyes/nose/mouth in the right spatial relationship to be a face. By working with digits, Hinton is just being too simplistic. Probabilistic models may suffice in that domain. His early research on solving orientation-less recognition was quite good. But I just don't see what he's added in 30 years. At Noonean, we use a edelman-pribram holographic model as our "capsule" not covnets.
CNNs are state of the art in image recognition, are you from the future?
I guess he is not seen around that much, he was one of the authors on dropout though (2014)
He invented DropOut and CD, and moreover, he has very good information about how to integrate the stochastic process with neural networks. the reason he is using MNIST yet is very simple because he is inventing, you are applying.
Rmsprop was also introduced in his coursera course
Mohammad Badri Ahmadi MNIST is kind of maxed out by every modern method though, it is mostly used for smoke testing. something like CIFAR10 works better for benchmarking if you need a small dataset.
Hinton is an innatist at this point, but doesn't want to admit it.
AI is as useless as the relativity theory or as quantum physics.
怼cnn