Wow. I've been having trouble understanding the spatial pooler algorithm for a while, but it's actually a lot simpler than I thought. For a column, a random array of bits the size of the input space is created first, which will activate if it overlaps enough with the input, and, only once the entire array is activated, it increases the permanence of the overlapping bits and decreases the permanence of the non-overlapping bits. And, since each column will only activate when its random set of bits overlaps with the input, each column will represent a different portion of the input, and will improve representation of regularly occurring portions of the input.
this episode was so helpful! I think I finally understand the purpose of spatial pooling, and the way it accomplishes learning. it all works so beautifully. super excited for temporal pooling in the future!
How does a column (SP bit) decide to activate in the brain? With this model you just take the column with the most active inputs from the set of all columns, but wouldn't it actually be that each column has an action potential or threshold that has to be met instead?
It is not entirely certain what enforces the minicolumn structures in the neocortex, but we think they are likely caused by inhibitory neurons that create groups of neurons with the same feedforward proximal receptive field properties.
A minicolumn competes with neighboring minicolumns to represent the input. Those with the most active synapses overlapping input are activated. All the minicolumns have different potential pools and permanences, so they all grow affinity to different overlapping input features. The semantics of the input are distributed into the minicolumn activations. You lose information, but now we have a normalized structure we can run Temporal Memory on (see episode 11!)
Are you going to cover a comparison between HTM and traditional NN? I am always wondering what is the key and essential idea that makes them different. For example, in HTM, we have SDR, in NN, we have embedding. They both aim to capture semantic similarity. Do you believe that if appropriate changes(although maybe unknown yet) were made, traditional NN could also do the same thing with comparable performance?
Very high entry barrier. Good start with sparse things, then started to operate by new terms: columns of spacial pooler, permanence, potencial... where are all these terms from ??? Did i miss some episodes between episode 6 and episode 7 ?
2 small questions: 1) I know this has been asked but I still want to know more, how are the connections selected, it was pointed out that "weight"/persistence is only updated when there is activation. Is there an index value for each field that links the frequency of that input and the probability of activation, in order to maximise only connections and area that are more likely to provide and activation rather than keeping persistence to scattered points already covered by more specialized "neurons" 2) is the "proximity grid" invariable? this would mean that neurons with a high specialization potential for one area of the input may not be properly specialized as opposed to one that has a worst "proximity grid" but more randomly initiated "persistence" value for that area. This would mean that despite having more potential and being an ultimately more optimal choice, some neurons may have to become sub-optimal for another area. But if the grid updated this wouldn't matter as much and provide more flexibility to specialisation and optimization, though this may not be the best choice to counter resiliency if the number of "activated" neuron would be based on a threshold value rather than an relative amount. Sorry for the rambly questions, but they help me understand.
1) Each mini-column has a proximal segment with X synapses toward an input space. Each synapse has a permanence value. This is randomly initialized to be close to a connection threshold, so that about 50% of them are initially connected. This allows learning to happen quickly. Frequency is not counted, but indicated by increase in permanence. Remember each neuron can represent many different features in different contexts. Only a population of neurons actually have meaning. 2) I think you mean "potential pool". It is variable. It is also distributed throughout the space, and will not pick up topological features. However, there is a video about Topology if you keep watching HTM School. Also, an episode on Boosting should help you understand other questions here.
So, only the permanence values of currently existing connections (green circles and grey circles) get changed (referring to 6:30)? Also, is the permanence threshold static? If both of these are true, i can't see how new connections can even get added , ever.
Both are true (although perm threshold is user-configurable). If an input overlaps with a column's proximal connections, the permanences are increased no matter what the perm threshold is. After the increase, if any of them have breached this threshold, they are considered connected. The permanence threshold never needs to change over time. This would affect how the entire system mapped spatially to input.
i see, so we're not only changing permanence values of existing connections, but all potential connections as well (those that have permanence below threshold)? If that's the case then it makes sense.
@@NumentaTheory Does it also hold true for decreasing permanence? I.e. _proximal_ connections are made weaker, not only existing connections (grey circles). It doesn't really crucial as I see it, but still interesting. I even think that penalizing only grey circles is more meaningful, as they exist but don't contribute to the current input pattern => we don't need these connections regarding this input.
Great videos and very interesting material, thank you so much for sharing it. I have a question: how can you calculate how likely is that 2 or more columns end up in the same state?
Gnarly sode guys, I do have a question though, are the dendrites from each column randomly connected to the input space in an actual brain? Or is there some structure in place?
The cells in the SP are different than the cells in the input space. The input space is external to the SP. It represents sensory input from neuronal axons. There are mini-column structures in the SP. The input space does not have these structures. Keep watching, I think I clarify in later episodes.
What does this learning process converges to? Does it optimize something measurable? If so, then why not to use an existing optimizer, like gradient descent for example?
Each mini-column learns some specific input features within it's potential pool. It converges to many different mini-columns learning to represent lots of different unique input features. It is measurable. We have run MNIST on it, although this is not a very good problem for the SP. The SP only makes a lot of sense when you run the TM on top of it. Keep watching videos, I explain all this in future episodes. And for why we don't use something existing, I think you don't understand our purpose. We are trying to understand how intelligence works. We're not trying to solve an engineering problem. There's no gradient descent optimizer in the brain.
Numenta haha yes. But calculus isn’t the issue as I have a good enough knowledge of calculus and numerical analysis. It’s just the concept of nn takes me off guard sometimes depending on which type of nn I read about.
Why only connect to seven cells in input space? I can see more than seven cells in the grey area( active area). And why didn't it low the permanence values of all other blue dot outside the input?
I don't understand your first question. At what point in the video? And for your 2nd Q: We don't need to apply learning on cells outside the active columns. Cells forget when old patterns as they are tapped to remember new patterns.
HTM School @ 12:45 , said that 7 cells got connection so it means increase in permanence values. But why only 7? I see more than 7 cells in grey area. :)
Those were the cells with permanence values close enough to the threshold that they became connected. This is an example of Hebbian learning. All the perms within the overlap of potential pool and input were increased, but only some synapses breached the connection threshold.
Synapse permanences are increased when enough input bits in a column's receptive field are on. They are decreased when this does not happen. I tried to show this happening at 11:12.
Does that really happen in the brain that something controls the global active column number? I think the condition of activating a column should be independent of other columns, which means they should not be ranked together to figure out which columns are active.
00:1-11 "We are using our brain to understand how we use our brains to understand how we use our brains" I love this statement
For me, this episode shows how "neurons that fire together wire together" can actually be useful a learning rule in unsupervised scenarios.
Wow. I've been having trouble understanding the spatial pooler algorithm for a while, but it's actually a lot simpler than I thought.
For a column, a random array of bits the size of the input space is created first, which will activate if it overlaps enough with the input, and, only once the entire array is activated, it increases the permanence of the overlapping bits and decreases the permanence of the non-overlapping bits.
And, since each column will only activate when its random set of bits overlaps with the input, each column will represent a different portion of the input, and will improve representation of regularly occurring portions of the input.
Yep, that is pretty close. ;) I'm glad this made sense to you.
this episode was so helpful! I think I finally understand the purpose of spatial pooling, and the way it accomplishes learning. it all works so beautifully. super excited for temporal pooling in the future!
Really looking forward to the next video! This series has been very useful for me. Super looking forward for temporal pooling.
Thanks! It will still be month or two before I get back to video production, but they are coming...
Excellent session. A complex topic explained nicely.
These were very well done. Perfect length.
How does a column (SP bit) decide to activate in the brain? With this model you just take the column with the most active inputs from the set of all columns, but wouldn't it actually be that each column has an action potential or threshold that has to be met instead?
It is not entirely certain what enforces the minicolumn structures in the neocortex, but we think they are likely caused by inhibitory neurons that create groups of neurons with the same feedforward proximal receptive field properties.
A minicolumn competes with neighboring minicolumns to represent the input. Those with the most active synapses overlapping input are activated. All the minicolumns have different potential pools and permanences, so they all grow affinity to different overlapping input features. The semantics of the input are distributed into the minicolumn activations. You lose information, but now we have a normalized structure we can run Temporal Memory on (see episode 11!)
Are you going to cover a comparison between HTM and traditional NN? I am always wondering what is the key and essential idea that makes them different. For example, in HTM, we have SDR, in NN, we have embedding. They both aim to capture semantic similarity. Do you believe that if appropriate changes(although maybe unknown yet) were made, traditional NN could also do the same thing with comparable performance?
+叶露 I'm not sure if I will cover that, but I will think about it. Thanks for the idea!
Great content! I'd love to see the equations corresponding to each concept shown somewhere on screen.
wonderful visualization!!!!
It's really elegant!!!
My favorite part of every episode is the silly joke at the beginning
Thank you for your post, any ideas on how cortical is encoding language to spatial pooler to create NLP SDR.
Cortical does not use spatial pooling to create their semantic fingerprints. They have a whole published technique and a white paper about it.
@@NumentaTheory Thank you do you have a link.
www.cortical.io/science-semantic-folding.html
Very high entry barrier. Good start with sparse things, then started to operate by new terms: columns of spacial pooler, permanence, potencial... where are all these terms from ??? Did i miss some episodes between episode 6 and episode 7 ?
2 small questions:
1) I know this has been asked but I still want to know more, how are the connections selected, it was pointed out that "weight"/persistence is only updated when there is activation. Is there an index value for each field that links the frequency of that input and the probability of activation, in order to maximise only connections and area that are more likely to provide and activation rather than keeping persistence to scattered points already covered by more specialized "neurons"
2) is the "proximity grid" invariable? this would mean that neurons with a high specialization potential for one area of the input may not be properly specialized as opposed to one that has a worst "proximity grid" but more randomly initiated "persistence" value for that area. This would mean that despite having more potential and being an ultimately more optimal choice, some neurons may have to become sub-optimal for another area. But if the grid updated this wouldn't matter as much and provide more flexibility to specialisation and optimization, though this may not be the best choice to counter resiliency if the number of "activated" neuron would be based on a threshold value rather than an relative amount.
Sorry for the rambly questions, but they help me understand.
1) Each mini-column has a proximal segment with X synapses toward an input space. Each synapse has a permanence value. This is randomly initialized to be close to a connection threshold, so that about 50% of them are initially connected. This allows learning to happen quickly. Frequency is not counted, but indicated by increase in permanence. Remember each neuron can represent many different features in different contexts. Only a population of neurons actually have meaning.
2) I think you mean "potential pool". It is variable. It is also distributed throughout the space, and will not pick up topological features. However, there is a video about Topology if you keep watching HTM School. Also, an episode on Boosting should help you understand other questions here.
Thank you for the reply! Yes I'm at the topo episode.
So, only the permanence values of currently existing connections (green circles and grey circles) get changed (referring to 6:30)? Also, is the permanence threshold static? If both of these are true, i can't see how new connections can even get added , ever.
Both are true (although perm threshold is user-configurable). If an input overlaps with a column's proximal connections, the permanences are increased no matter what the perm threshold is. After the increase, if any of them have breached this threshold, they are considered connected. The permanence threshold never needs to change over time. This would affect how the entire system mapped spatially to input.
i see, so we're not only changing permanence values of existing connections, but all potential connections as well (those that have permanence below threshold)? If that's the case then it makes sense.
@@NumentaTheory Does it also hold true for decreasing permanence? I.e. _proximal_ connections are made weaker, not only existing connections (grey circles). It doesn't really crucial as I see it, but still interesting. I even think that penalizing only grey circles is more meaningful, as they exist but don't contribute to the current input pattern => we don't need these connections regarding this input.
@@pkuderov Permanences are only penalized if a predicted minicolumn was incorrect. Perms only change within active minicolumns.
Great videos and very interesting material, thank you so much for sharing it.
I have a question: how can you calculate how likely is that 2 or more columns end up in the same state?
Yes you can!
Gnarly sode guys, I do have a question though, are the dendrites from each column randomly connected to the input space in an actual brain? Or is there some structure in place?
They are not entirely random, they have a topology, which you will learn about in episode 10.
@@NumentaTheory gracias amigo
Why do you referring to cells n the input space ( spacial pooler) as columns, perhaps you can clarify this in some future videos.
The cells in the SP are different than the cells in the input space. The input space is external to the SP. It represents sensory input from neuronal axons. There are mini-column structures in the SP. The input space does not have these structures. Keep watching, I think I clarify in later episodes.
How do you randomly initialize the spatial pooler ?
What does this learning process converges to? Does it optimize something measurable? If so, then why not to use an existing optimizer, like gradient descent for example?
Each mini-column learns some specific input features within it's potential pool. It converges to many different mini-columns learning to represent lots of different unique input features. It is measurable. We have run MNIST on it, although this is not a very good problem for the SP. The SP only makes a lot of sense when you run the TM on top of it. Keep watching videos, I explain all this in future episodes.
And for why we don't use something existing, I think you don't understand our purpose. We are trying to understand how intelligence works. We're not trying to solve an engineering problem. There's no gradient descent optimizer in the brain.
A while as you pointed me to the GitHub repo where all the code for the online demo is kept, please post that link again.
discourse.numenta.org/t/how-to-run-htm-school-visualizations/2346
Just discovered this series. Well done. It's been three months, it there still a plan to continue?
I just published another episode this morning on boosting.
The ace of spades! The ace of spaces!
\m/( _ _ )\m/
Great series Btw, can't wait to try and test this with music creation.
is it just me, or is htm easier to understand than the traditional nn?
It is easier! You don't have to understand Calculus!
Numenta haha yes. But calculus isn’t the issue as I have a good enough knowledge of calculus and numerical analysis. It’s just the concept of nn takes me off guard sometimes depending on which type of nn I read about.
Why only connect to seven cells in input space? I can see more than seven cells in the grey area( active area). And why didn't it low the permanence values of all other blue dot outside the input?
I don't understand your first question. At what point in the video? And for your 2nd Q: We don't need to apply learning on cells outside the active columns. Cells forget when old patterns as they are tapped to remember new patterns.
HTM School @ 12:45 , said that 7 cells got connection so it means increase in permanence values. But why only 7? I see more than 7 cells in grey area. :)
Those were the cells with permanence values close enough to the threshold that they became connected. This is an example of Hebbian learning. All the perms within the overlap of potential pool and input were increased, but only some synapses breached the connection threshold.
HTM School .thanks. Now I get it. :)
How does the pooler knows which connections to break and which to create?
Synapse permanences are increased when enough input bits in a column's receptive field are on. They are decreased when this does not happen. I tried to show this happening at 11:12.
Does that really happen in the brain that something controls the global active column number? I think the condition of activating a column should be independent of other columns, which means they should not be ranked together to figure out which columns are active.
See the episode on Topology, which should explain some of it: th-cam.com/video/HTW2Q_UrkAw/w-d-xo.html
Is that an approved hairstyle at HTM school?
Hold on a second, let me get my ink stamp.... APPROVED!
Im feeling like a fool :')