I have a more streamline answer to the protein problem. The protein doesn't start folding when it's a complete sequence, it folds as the sequence is being built. This computationally and temporally constrains the degrees of movement, limiting the number of molecular forces at work at any one given time. Meaning that the part of the sequence that has already been constructed, is already folded into it's low energy state, and the part that hasn't been build isn't preturbing the current folding stage. The folding process is constrained to occur as sequentially as possible, not in parrallel.
A threshold activation heatmap over a parallel distribution of temporal sequential threads is more descriptive. Each thread operates in its own input/output relative connection space and favors specific input sequences over time. Maximum amplification of a sequence (i_1/time + i_2/time + i_3/time...) indicating highly favored temporal sequence and (i_3/time - i_2/time - i_1/time...) indicating the least favored temporal sequence (with temporal sequences in-between these 2 extremes). Each thread is measured against its threshold (T), Amplification (A), and a latent timeframe (L) and elapsed time (E) for sequence-coordinated activation. When A exceeds T, the output is calculated as (1 - |L - E|) to output partners. Favored detection sequences can be defined by a integer to define (most favored position) within the temporal sequential thread. The process can be tuned by sensory reward detections over time, increasing mutation velocity in a direction, changing the param magnitude in that direction, acting on thresholds and shift the sequence of most value for each thread. There is more optimizations to further optimize this style of learning, by extending it with threads that mutate other threads based on their activation levels, allowing mutation behavior to be inferred and leveraged by the network as it's trained (it begins to handle it's own mutations internally based on inference). Then it's a matter of reading the heat map to see what parts of the network like doing certain task, and seeing the state transitions of the network.
Potential energy is dual to kinetic energy -- energy is dual. If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics! Integrating information is a syntropic process -- teleological (Sheaf co-homology). Homology is dual to co-homology synthesizes sheaf co-homology. Action is dual to reaction -- forces are dual. "Always two there are" -- Yoda. Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics! Energy minimization is a syntropic process -- teleological.
In fact, many proteins function in a _local_ minimum that is _not_ the global minimum. This is why proteins denature irreversibly when exposed to heat; there's an energy barrier that they can never come back from if they cross it.
They function at their own global minimum, but the global minimum is also defined by the environment, for example pH, or various chemical agents. Also, for some proteins denaturation is reversible ("renaturation") when the conditions for denaturaing are removed. Irreversibility often caused by protiens interconnecting each other and forming a mesh, losing almost all degrees of freedom; in this case you cannot talk about a global minimum of an individual protein molecule.
@@GeoffryGifari The space of possible conformations for a protein is gigantic, very high dimensional, so even if they were started in a random state, it would be very unlikely for them to fall into the global minimum right away. As it happens, though, they are constructed one amino acid at a time and the enzyme that builds them keeps anything from interacting with the most recently added residues, so there is a systematic way to do it. At each step, the already-extruded portion of the polypeptide finds its own minimum, and that limits the trajectory of the conformation as it grows. Thus, it's always in some kind of local minimum at every step. There are many other factors at play, too. There are chaperone proteins that prevent selected parts from interacting, there is a whole different process for proteins that are supposed to be in a membrane, and on and on. Life never gives you simple, straightforward rules. What stops it from falling to the global minimum is the energy barriers surrounding local minima. That's pretty much the definition of local minimum, surrounded by energy barriers. If the water molecules that are constantly battering it give it enough energy, it can hop over the barrier and fall into the nearest minimum in that direction, so to speak, but there is no dynamical reason for it to progress toward the global minimum at any given moment; it has to reach it by random jostling. It's theoretically possible for it to jump out of the global minimum, too, in the same way, but by definition, the global minimum has the biggest barriers in the whole space, so it's likely to stick around there.
@@onebronx Some good points there; I was thinking only of heat denaturing. The result of physical heat denaturing is inevitably the global minimum, and at least when I was in university in 2006, we didn't hear about any proteins coming back from that. If a protein functions at its global minimum, then you can always cook it until it chemically comes apart, but that is another story. Most proteins need to be able to cycle between conformations at low energy scales, e.g. the binding energy of a small molecule, so they don't typically function at the global minimum. It would be pretty contrived if they did, if you think about it. RNAzymes, on the other hand denature and cease function at _low_ temperatures, rather than high, last I checked, and this happens for the same reason: an enzyme needs to access multiple conformation states to function, so it doesn't function if denied the energy to flop around. There are way more interaction motifs for RNA than for polypeptides, so the energy landscape, while bumpy, doesn't generally have an irreversible global minimum like a protein, so they can return to function pretty well after being cooked. That's just what I was taught ~2006, so if we know any different now, please show me a citation.
@@davidhand9721 well, again, are we talking about heat denaturating of a single polypeptide, or of a large collective? Those are different environments -- in a collective, large molecules can start interlinking, so there is no much sense to talk about some individual global minimums. Of course there is always an ultimate global minimum -- death of the Universe, but we're usually interested in somewhat more "locally global" minimums, where your surrounding environment does not kill you if you stretched a bit too much :) For polymers it is even more complicated because of topology -- there are millions way to make a polypeptide chain self-entangled like a rope in a washing machine, and it actually can denaturate in some configuration with a _higher_ energy, simply because it cannot detangle itself from it. Intuitively, a global minimum (or very close one) should be reachable if you build the chain slowly and carefully, shaking it periodically to allow some "annealing", allowing it to de-tangle early while the chain is still short. And the shorter the polypeptide, the more likely it will eventually find its global minimum (in a sense "global for a single molecule in the pH-neutral isotonic intracellular fluid"). Huge polypeptides can be farther from a global minimum for sure, but may be not so far too, due to evolutionaly pressure: the secondary/ternary structures are evolving and they were naturally-selected for as much stability as it still allows them to function. Those pretty alpha-heilces and beta-sheets, varoius cross-links and nicely-fitting active zones overall create more bonds per unit of length than some randomly aligned segments of the same chain - they evolved for that! And more bonds means less free energy. And then, near the global minimum, you can have some local minimums with slightly higher energies, corresponding to different "activated" configuration. OTOH, the alternative "activated" configuration can be thought as "global" minimums for a system "protein + activator".
This is an insanely educational video, as a ML researcher working on representation learning for multi modal retrieval, this is insanely helpful and relatable. I think you just gave me a new area to look at now, how exciting, i owe you one.
Everyone involved in this field needs to take a step back and focus on developing empathy before engaging with emerging technologies. The money is big but the forces behind the scenes have nefarious goals. Will you engage ethically going forward? Because there are no positive role models. We are in a battle for humanity's future.
@@blakesmith4879 I agree, the inspiration from PGMs and energy states are interesting and comes closer to formalising emergent properties in biological evolution
Potential energy is dual to kinetic energy -- energy is dual. If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics! Integrating information is a syntropic process -- teleological (Sheaf co-homology). Homology is dual to co-homology synthesizes sheaf co-homology. Action is dual to reaction -- forces are dual. "Always two there are" -- Yoda. Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics! Energy minimization is a syntropic process -- teleological.
John Hopfield wasn't the first to describe the formalism which has been subsequently popularised as "Hopfield networks". It seems much fairer to the wider field and long history of neuroscientists, computer scientists, physicists, and so on to call them "associative memory networks", i.e. Hopfield was definitely not the first/only to propose the network some call "Hopfield networks". For instance, after the proposal of Marr (1971), many similar models of associative memory were proposed, e.g., those of Nakano (1972), Amari (1972), Little (1974), and Stanley (1976), which all have a very similar (or exactly the same) formalism as Hopfield's 1982 paper. Today, notable researchers in this field correct their students' papers to replace instances of "Hopfield networks" with "associative memory networks (sometimes referred to as Hopfield networks)" or something similar. I would encourage you to do the same in your current/future videos. I deeply regret making a similar mistake regarding this topic in one of my earlier papers. However, I am glad to correct the record now and in the future. Refs: D Marr. Simple memory: a theory for archicortex. Philos Trans R Soc Lond B Biol Sci, 262(841):23-81, July 1971. Kaoru Nakano. Associatron-a model of associative memory. IEEE Transactions on Systems, Man, and Cybernetics, SMC-2(3):380-388, 1972. doi: 10.1109/TSMC.1972.4309133. S.-I. Amari. Learning patterns and pattern sequences by self-organizing nets of threshold elements. IEEE Transactions on Computers, C-21(11):1197-1206, 1972. doi: 10.1109/T-C.1972.223477. W.A. Little. The existence of persistent states in the brain. Mathematical Biosciences, 19(1):101-120, 1974. ISSN 0025-5564. doi: doi.org/10.1016/0025-5564(74)90031-5. J. C. Stanley. Simulation studies of a temporal sequence memory model. Biological Cybernetics, 24(3):121-137, Sep 1976. ISSN 1432-0770. doi: 10.1007/BF00364115.
Relative connection spaces are dimensionally agnostic, they don't presupose a dimensionality for each node in the connection space, it's better at tracking large distributions (where a heat map can highligh areas of activity [threshold activations], to see the areas that light up when the system is doing specific task or undergoing a specific sensory data pattern). This way the dimensionality isn't constrained to a 2d sheet and predifined curvature manifold, you can better see the modal transitions of the system via this heat map.
Minor correction: 'Cells that fire together, wire together' was coined by Carla Shatz (1992). Unlike Donald Hebb's original formulation, Shatz's summary of Hebbian learning eliminates the role of axonal transmission delays. By extension, neural networks which remain true to Hebb's original definition should go beyond rate coded models and instead simulate the time delays.
Yes latent time parameters need to be implemented. A threshold activation heatmap over a parallel distribution of interconnected temporal sequential threads is more descriptive in targeting what he is trying to convey in larger distibutions where hopfield computational structure fails. Each thread operates in its own input/output relative connection space and favors specific input sequences over time. Maximum amplification of a sequence (i_1/time + i_2/time + i_3/time...) indicating highly favored temporal sequence and (i_3/time - i_2/time - i_1/time...) indicating the least favored temporal sequence (with temporal sequences in-between these 2 extremes). Each thread is measured against its threshold (T), Amplification (A), and a latent timeframe (L) and elapsed time (E) for sequence-coordinated activation. When A exceeds T, the output is calculated as (1 - |L - E|) to output partners. Favored detection sequences can be defined by a integer for each input within a temporal sequential thread (a mutable trainable param), representing the input's favored position within the temporal sequence. The process can be tuned by sensory reward detections over time, increasing mutation velocity in a direction, changing the param magnitude in that direction, acting on thresholds and shift the sequence of most value for each thread. There is more optimizations to further optimize this style of learning, by extending it with threads that mutate other threads based on their activation levels, allowing mutation behavior to be inferred and leveraged by the network. Then it's a matter of reading the heat map to see what parts of the network like doing certain task given a specific time slice, and seeing the state transitions of the network when other distributions become active.
Learning is a teleological process, syntropic! Potential energy is dual to kinetic energy -- energy is dual. If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics! Integrating information is a syntropic process -- teleological (Sheaf co-homology). Homology is dual to co-homology synthesizes sheaf co-homology. Action is dual to reaction -- forces are dual. "Always two there are" -- Yoda. Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics! Energy minimization is a syntropic process -- teleological.
At the beginning, when you mention the O(n) problem, as a programmer it just intuitively makes you want to use a tree or a hash map lol In any case, another banger! Its fascinating to see how these things work!
I had the same thought. Though, a hashmap or something similar probably wouldn't work, since many times the key is incomplete or noisy, which would cause the hashing function to return a hash that would map to the wrong index
Awesome video, like always. Just a small nitpick: your speaker audio jumps between the left and right audio channel, which is quite distracting - especially with headphones. You can easily solve this by setting the voice audio track to "mono" when editing the video. Cheers
Exceptional pedagogical skill! I’m not able to hold these types of explanations in my mind, so any attempt at following such a web of relations would quickly have me lost. But this is a masterclass in clear considerate communication 🙏
Man, really thankful to your contents. I was facinated by your video about TEM, and started trying to fully understand that network(and memory in general) in my leisure time since about a year ago. I learned about latent variables, transformer architecture(fantastic videos by andrej karpathy), autoencoders, etc, but got stuck at (modern)hopfield nets, which I think is super important in the architecture of TEM. Very glad to see that you start to touch this field of Hopfield Nets, this is probably the best video about vanilla hns I've ever watched. Really looking forward to your video about Boltzman Machines and Modern Hopfield Nets, always appreciates your videos!
Potential energy is dual to kinetic energy -- energy is dual. If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics! Integrating information is a syntropic process -- teleological (Sheaf co-homology). Homology is dual to co-homology synthesizes sheaf co-homology. Action is dual to reaction -- forces are dual. "Always two there are" -- Yoda. Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics! Energy minimization is a syntropic process -- teleological.
Potential energy is dual to kinetic energy -- energy is dual. If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics! Integrating information is a syntropic process -- teleological (Sheaf co-homology). Homology is dual to co-homology synthesizes sheaf co-homology. Action is dual to reaction -- forces are dual. "Always two there are" -- Yoda. Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics! Energy minimization is a syntropic process -- teleological.
I was reading and watching videos about metacognition and bayesian probability and now you have thrown me into a new rabbit hole! 😅 Your videos are incredible and it's great to have a new one. Thank you!
Potential energy is dual to kinetic energy -- energy is dual. If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics! Integrating information is a syntropic process -- teleological (Sheaf co-homology). Homology is dual to co-homology synthesizes sheaf co-homology. Action is dual to reaction -- forces are dual. "Always two there are" -- Yoda. Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics! Energy minimization is a syntropic process -- teleological.
Potential energy is dual to kinetic energy -- energy is dual. If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics! Integrating information is a syntropic process -- teleological (Sheaf co-homology). Homology is dual to co-homology synthesizes sheaf co-homology. Action is dual to reaction -- forces are dual. "Always two there are" -- Yoda. Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics! Energy minimization is a syntropic process -- teleological. Problem, reaction, solution -- the Hegelian dialectic.
4:00 LOL "The ball doesn't search through all possible trajectories to select the optimal parabolic one." The visualization of the "trajectory space" is even funnier ;) I suspect there's different encodings for proteins with identical function, but which are more robust wrt folding consistently.
Potential energy is dual to kinetic energy -- energy is dual. If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics! Integrating information is a syntropic process -- teleological (Sheaf co-homology). Homology is dual to co-homology synthesizes sheaf co-homology. Action is dual to reaction -- forces are dual. "Always two there are" -- Yoda. Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics! Energy minimization is a syntropic process -- teleological.
Potential energy is dual to kinetic energy -- energy is dual. If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics! Integrating information is a syntropic process -- teleological (Sheaf co-homology). Homology is dual to co-homology synthesizes sheaf co-homology. Action is dual to reaction -- forces are dual. "Always two there are" -- Yoda. Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics! Energy minimization is a syntropic process -- teleological.
Potential energy is dual to kinetic energy -- energy is dual. If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics! Integrating information is a syntropic process -- teleological (Sheaf co-homology). Homology is dual to co-homology synthesizes sheaf co-homology. Action is dual to reaction -- forces are dual. "Always two there are" -- Yoda. Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics! Energy minimization is a syntropic process -- teleological.
Potential energy is dual to kinetic energy -- energy is dual. If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics! Integrating information is a syntropic process -- teleological (Sheaf co-homology). Homology is dual to co-homology synthesizes sheaf co-homology. Action is dual to reaction -- forces are dual. "Always two there are" -- Yoda. Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics! Energy minimization is a syntropic process -- teleological.
I cant recall lyrics at all. But I recall the music itself very clearly, and in the right octave/key, as well as the period of time after which I first heard the song and had an autistic meltdown, listening to it around 5000 times over the course of a few weeks.
One reason I like using Neo4J is that graph networks seem like the work a bit like human memory with links between things making finding related items fast.
It always amazed me that sometimes I can recognize melodies which I heard last time like 30 years ago. Same with videogame. Never in last 35 years I played any Atari videogames and the moment I see them on youtube I remember sounds and pictures.
This content is so high level, it's almost impossible to tell if it is true or not. Physically and philosophically, I have bought in, but my want of it to be true doesn't make it so. I cannot imagine this is wrong, but where does this come from? This stuff is just crazy and I don't know if it is crazy good or just crazy lol but I'm along for the ride
This was really interesting to listen to, and your intro triggered an intuition I never thought of-- The "weights" in associative memory (as you describe as proteins folding to minimize potential energy), also have the design of reinforcing relative to *emotional stimuli*. This is an evolutionary adaptation and makes total sense-- we want to remember the things we care about or hurt us. What I didn't consider until this video is that this 'emotional weight' also behaves just like a database index, except it's bi-directional, and decaying! The surface similar to what you displayed, maybe like a paraboloid, where X is time, Y is memory complexity, and Z would describe the 'activation energy' requires to perform recall. I think it would be interesting to somehow visualize memories on this plane. I don't know if I'm rambling, but this video alone just formed so many connections between things I never thought of!
Potential energy is dual to kinetic energy -- energy is dual. If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics! Integrating information is a syntropic process -- teleological (Sheaf co-homology). Homology is dual to co-homology synthesizes sheaf co-homology. Action is dual to reaction -- forces are dual. "Always two there are" -- Yoda. Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics! Energy minimization is a syntropic process -- teleological.
Thank you. This is unbelievably simple and potentially more accurate than any other speculation. Next step: determining the shape of the proteins and categorizing them. The tools may not be there, yet... but a good hypothetical certain helps to reach certain conclusions.
Potential energy is dual to kinetic energy -- energy is dual. If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics! Integrating information is a syntropic process -- teleological (Sheaf co-homology). Homology is dual to co-homology synthesizes sheaf co-homology. Action is dual to reaction -- forces are dual. "Always two there are" -- Yoda. Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics! Energy minimization is a syntropic process -- teleological.
Thank you so much for creating this video, genuinely one of the most educational I've ever come across. I've been trying to learn more about how brains work since that's always been something I've been very curious about literally since birth, and along with entropy being my favorite physics concept this video has just led to me googling and researching for the last 4 hours (its 3am lol) trying to find out more. Really impressed with how complicated, yet still high-quality and clear, some of the topics are in this is and I'm really looking forward to watching the rest of your videos to learn more on how all of this stuff works in such an intricate way
Really good that you’re covering these foundational concepts of NNs. Cross-associative NNs, auto-associative NNs, and unsupervised learning are big missing pieces in today’s NNs.
Great video as always. For 22:50, has anyone tried stacking layers of Hopfield networks yet as a work around? Basically each layer acts in it's own feature level space and resolves for that level's most likely feature, then passes it up to the next higher order Hopfield feature space to be resolved there. It seems like this would allow you to store exponentially more overall patterns has they get resolved separately to avoid the overly busy end energy landscape. Also interestingly you can see how it would carve out the energy landscape from just the raw inputs with this. You have the Hopfield network constantly comparing itself to the some abstraction of the source input layer, meaning the more times a pattern seen the stronger it gets in the Hopfield network. Also for faster convergence, it is likely the greater the xi and hi difference the faster xi updates.
I love being here and asking “huh ?” every two minutes. Stay blessed bro. Love these videos (even though I don’t understand anything I feel like I’m learning a lot)
Very intelligent, as someone who finds this very relevant, the rgb lights add on for pico 2 looks a good way to visualize this a little affordably. I am going to read on the 2016 paper. Geoffrey Hinton is amazing to hear beforehand. Thank you for creating this
The protein example got me thinking. Is there only one unique folded configuration of the lowest energy? Can there be multiple stable comfiguration anyway, and transitions between them?
Super interesting to see that the hopfield network basically reinvents binary operations like XOR and XNOR for neurons, with the two differentiated by the weight.
Boolean algebra or propositional logic:- Truth is dual to falsity. Potential energy is dual to kinetic energy -- energy is dual. If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics! Integrating information is a syntropic process -- teleological (Sheaf co-homology). Homology is dual to co-homology synthesizes sheaf co-homology. Action is dual to reaction -- forces are dual. "Always two there are" -- Yoda. Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics! Energy minimization is a syntropic process -- teleological.
Potential energy is dual to kinetic energy -- energy is dual. If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics! Integrating information is a syntropic process -- teleological (Sheaf co-homology). Homology is dual to co-homology synthesizes sheaf co-homology. Action is dual to reaction -- forces are dual. "Always two there are" -- Yoda. Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics! Energy minimization is a syntropic process -- teleological.
So, if a protein chain simply minimizes it's potential energy, then how is it possible to have different spatial configurations that actually define a protein's function? Addressing this somewhere around 8:30 may improve the flow of information a lot. Thanks for the really nice video.
3:45 Well, that's really all paradoxes are. The reason people typically consider them to be impossobilities is because they're stuck in the mindset of believing what they first learn, rather than automatically considering that their perspective may be wrong or incomplete.
Great video. Little sad to see that anti-training wasn't mentioned. It doesn't really solve the problem with training two sequences that are 'close' together, so that's fair. But it does help, and has an interesting analogy with physiology. Essentially, if you try to train two sequences that are too close to each other, their valleys will overlap, which means you might try to aim for one specific sequence and end up falling into the other. And if they're really close, you'll actually create a new local minimum that sits between the two. In those cases, what you can do is identify all your local minima and then run the algorithm backwards, training hills on top of all your local minima. For stand-alone minima, this doesn't matter because they're still local minima. But if a minima is a false sequence that sits between two or more neighboring targets, this builds a hill in between those two neighboring valleys, helping to make those nearby sequences more distinct. As Geoffy Hinton has pointed out, this has an interesting conceptual analog to dreaming, where we seem to replay experiences and concepts from our day (to a vague extent) and sleeping/dreaming also seems to help with learning and memory. Similarly by making the Hopfield focus on its memoreis while playing them backwards, so to speak, helps to solidify its own memory. It may be little more than a metaphorical analog, but I think its still quite interesting.
That’s exactly right! Bolzmann machines, which are an improved version of Hopfield nets in fact do just that, with contrastive hebbian learning, by increasing the energy of “fake” memories. Hopfield networks, being the first model, don’t have that property in the conventional form though. So we will talk about this in the Bolzmann machines video. Good catch!
@@ArtemKirsanov Ah, gotcha I didn't realize the idea of 'anti-learning' applied to boltzman machines. I've only messed with restricted boltzmann machines and I always thought of them as stacked reversible auto-encoders. Never though that the updating method may be replicating the same 'anti-learning' process - though it does make sense since autoencoders are trying to make a bunch of weird, distinct hyper-dimensional valleys. Maybe it's more apparent with the more general Boltzmann machine. Looking forward to that video!
Ironically, within the framework of quantum mechanics, one could actually say that the ball *does* "search" every possible path in order to find the "correct" one. It simply performs the "search" in parallel, not sequentially. And it's less of a search and more of an average of all paths. The principle of stationary action is the driving principle behind Newtonian dynamics and itself follows directly from the interference between many "virtual" trajectories, it turns out that the paths which are close to the "true path" (the classical path) have very little variance in their action, which rough speaking means that they end with nearly identical phase shifts (e^iHt/h, Ht ~ action, h = Planck constant) and can interfere constructively, whereas paths which are far from the "true path" have wildly varying actions, even if two paths are similar to each other. So they pick up big phase shifts and end up interfering destructively, leaving only the contributions from the paths "close to" the classically observed path. As far as I know this is the only way to derive the principle of stationary action, and the same basic idea is essential to finding transition probability amplitudes in QFT. It really does seem like the universe simultaneously tries all conceivable paths, superimposed together.
One could also say this for optics with Fermat's principle or classical mechanics with Hamilton's principle. Even though variational formulations are mathematically beautiful, I'd be cautious to assume that "reality works this way" i.e. "searches through all paths". They are one equivalent description of many (even though it is remarkable that they pop up basically everywhere).
@@asdf56790 I agree with your caution. I'm just increasingly convinced that theory and experiment are pointing this way and the onl obstacle is our flawed intuition and prejudice. We want there to be only a single, consistent reality. But this forces us into some intense mental and mathematical gymnastics to make the equations of QM fit observations. If you take the equations at face value then you have no trouble, you just have to contend with the idea that reality is not a single line of well defined events, but multiple histories occurring simultaneously and generally able to interfere with each other. An electron passing through a Stern-Gerlach device would then actually travel both paths, in "separate worlds" but these paths can still interfere and superimpose so long as you don't take any steps to determine which path was taken. Like if you redirect the paths to converge into a single path and put the whole thing in a box so you can only see the output, you cannot determine which path was taken and you can show experimentally that the output electron superposition of spin states is preserved. But the universe doesn't know ahead of time (superdeterminism notwithstanding) whether you will take a peek in the box and catch the electron with its pants down. In my view, the explanation requiring the fewest assumptions is that all paths really are taken, but with the assumptions that 1. When we observe a property we can only observe definite values, not superpositions, and 2. paths can interfere (unless decoherence has occurred). A poor and rushed explanation but this is kind of my thought process. As you say, there are many alternative interpretations. It's pretty much philosophy at this point. 😅
What’s the status of all those parallel paths? You’ve used the term "virtual" so you seem to view them as part of some kind of potentiality, not getting actualized in the observer measured reality (I’m using these terms very loosely, I don’t know exactly what they mean). If I’ve understood right, in the MWI there’s no actual convergence on a path, each of the possible parallel paths are actualized paths that get to be part of reality.
@@asdf56790 This is true, all subsequent physical theories are simply better and better approximations of reality, we shouldn't assume reality works that way without more experiemental and theoretical verification.
Holomorphic functions are independent of path (angle) -- complex differentiable. Potential energy is dual to kinetic energy -- energy is dual. If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics! Integrating information is a syntropic process -- teleological (Sheaf co-homology). Homology is dual to co-homology synthesizes sheaf co-homology. Action is dual to reaction -- forces are dual. "Always two there are" -- Yoda. Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics! Energy minimization is a syntropic process -- teleological.
Join Shortform for awesome book guides and get 5 days of unlimited access! Get 20% off at shortform.com/artem
I have a more streamline answer to the protein problem. The protein doesn't start folding when it's a complete sequence, it folds as the sequence is being built. This computationally and temporally constrains the degrees of movement, limiting the number of molecular forces at work at any one given time. Meaning that the part of the sequence that has already been constructed, is already folded into it's low energy state, and the part that hasn't been build isn't preturbing the current folding stage. The folding process is constrained to occur as sequentially as possible, not in parrallel.
This is top notch content, good work.
A threshold activation heatmap over a parallel distribution of temporal sequential threads is more descriptive. Each thread operates in its own input/output relative connection space and favors specific input sequences over time. Maximum amplification of a sequence (i_1/time + i_2/time + i_3/time...) indicating highly favored temporal sequence and (i_3/time - i_2/time - i_1/time...) indicating the least favored temporal sequence (with temporal sequences in-between these 2 extremes). Each thread is measured against its threshold (T), Amplification (A), and a latent timeframe (L) and elapsed time (E) for sequence-coordinated activation. When A exceeds T, the output is calculated as (1 - |L - E|) to output partners. Favored detection sequences can be defined by a integer to define (most favored position) within the temporal sequential thread. The process can be tuned by sensory reward detections over time, increasing mutation velocity in a direction, changing the param magnitude in that direction, acting on thresholds and shift the sequence of most value for each thread. There is more optimizations to further optimize this style of learning, by extending it with threads that mutate other threads based on their activation levels, allowing mutation behavior to be inferred and leveraged by the network as it's trained (it begins to handle it's own mutations internally based on inference). Then it's a matter of reading the heat map to see what parts of the network like doing certain task, and seeing the state transitions of the network.
how do you pay for the subscription from Russia?
Potential energy is dual to kinetic energy -- energy is dual.
If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics!
Integrating information is a syntropic process -- teleological (Sheaf co-homology).
Homology is dual to co-homology synthesizes sheaf co-homology.
Action is dual to reaction -- forces are dual.
"Always two there are" -- Yoda.
Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics!
Energy minimization is a syntropic process -- teleological.
In fact, many proteins function in a _local_ minimum that is _not_ the global minimum. This is why proteins denature irreversibly when exposed to heat; there's an energy barrier that they can never come back from if they cross it.
What's stopping the proteins to fold to the global minimum immediately? And can it spontaneously transition from local to global minimum?
They function at their own global minimum, but the global minimum is also defined by the environment, for example pH, or various chemical agents. Also, for some proteins denaturation is reversible ("renaturation") when the conditions for denaturaing are removed. Irreversibility often caused by protiens interconnecting each other and forming a mesh, losing almost all degrees of freedom; in this case you cannot talk about a global minimum of an individual protein molecule.
@@GeoffryGifari The space of possible conformations for a protein is gigantic, very high dimensional, so even if they were started in a random state, it would be very unlikely for them to fall into the global minimum right away. As it happens, though, they are constructed one amino acid at a time and the enzyme that builds them keeps anything from interacting with the most recently added residues, so there is a systematic way to do it. At each step, the already-extruded portion of the polypeptide finds its own minimum, and that limits the trajectory of the conformation as it grows. Thus, it's always in some kind of local minimum at every step.
There are many other factors at play, too. There are chaperone proteins that prevent selected parts from interacting, there is a whole different process for proteins that are supposed to be in a membrane, and on and on. Life never gives you simple, straightforward rules.
What stops it from falling to the global minimum is the energy barriers surrounding local minima. That's pretty much the definition of local minimum, surrounded by energy barriers. If the water molecules that are constantly battering it give it enough energy, it can hop over the barrier and fall into the nearest minimum in that direction, so to speak, but there is no dynamical reason for it to progress toward the global minimum at any given moment; it has to reach it by random jostling. It's theoretically possible for it to jump out of the global minimum, too, in the same way, but by definition, the global minimum has the biggest barriers in the whole space, so it's likely to stick around there.
@@onebronx Some good points there; I was thinking only of heat denaturing. The result of physical heat denaturing is inevitably the global minimum, and at least when I was in university in 2006, we didn't hear about any proteins coming back from that. If a protein functions at its global minimum, then you can always cook it until it chemically comes apart, but that is another story. Most proteins need to be able to cycle between conformations at low energy scales, e.g. the binding energy of a small molecule, so they don't typically function at the global minimum. It would be pretty contrived if they did, if you think about it.
RNAzymes, on the other hand denature and cease function at _low_ temperatures, rather than high, last I checked, and this happens for the same reason: an enzyme needs to access multiple conformation states to function, so it doesn't function if denied the energy to flop around. There are way more interaction motifs for RNA than for polypeptides, so the energy landscape, while bumpy, doesn't generally have an irreversible global minimum like a protein, so they can return to function pretty well after being cooked.
That's just what I was taught ~2006, so if we know any different now, please show me a citation.
@@davidhand9721 well, again, are we talking about heat denaturating of a single polypeptide, or of a large collective? Those are different environments -- in a collective, large molecules can start interlinking, so there is no much sense to talk about some individual global minimums. Of course there is always an ultimate global minimum -- death of the Universe, but we're usually interested in somewhat more "locally global" minimums, where your surrounding environment does not kill you if you stretched a bit too much :)
For polymers it is even more complicated because of topology -- there are millions way to make a polypeptide chain self-entangled like a rope in a washing machine, and it actually can denaturate in some configuration with a _higher_ energy, simply because it cannot detangle itself from it.
Intuitively, a global minimum (or very close one) should be reachable if you build the chain slowly and carefully, shaking it periodically to allow some "annealing", allowing it to de-tangle early while the chain is still short. And the shorter the polypeptide, the more likely it will eventually find its global minimum (in a sense "global for a single molecule in the pH-neutral isotonic intracellular fluid").
Huge polypeptides can be farther from a global minimum for sure, but may be not so far too, due to evolutionaly pressure: the secondary/ternary structures are evolving and they were naturally-selected for as much stability as it still allows them to function. Those pretty alpha-heilces and beta-sheets, varoius cross-links and nicely-fitting active zones overall create more bonds per unit of length than some randomly aligned segments of the same chain - they evolved for that! And more bonds means less free energy.
And then, near the global minimum, you can have some local minimums with slightly higher energies, corresponding to different "activated" configuration. OTOH, the alternative "activated" configuration can be thought as "global" minimums for a system "protein + activator".
Ladies, gentlemen, and fabulous folks of every flavor, the legend is back!
bro got lost into obsidian css configuration, but now he returns to brain cell
He's something else, manages to make computational neuroscience engaging WHILE not giving up on the details
Fabulous folks of every flavor :) :)
@@terbospeed leftist
Saw this video a month ago and even watched it twice. Just had to come back and drop a comment now that the legend’s got a Nobel under his belt.
This is an insanely educational video, as a ML researcher working on representation learning for multi modal retrieval, this is insanely helpful and relatable. I think you just gave me a new area to look at now, how exciting, i owe you one.
He has screwed you over. This goes nowhere.
@@blakesmith4879explain
Everyone involved in this field needs to take a step back and focus on developing empathy before engaging with emerging technologies. The money is big but the forces behind the scenes have nefarious goals. Will you engage ethically going forward?
Because there are no positive role models. We are in a battle for humanity's future.
@@blakesmith4879 I agree, the inspiration from PGMs and energy states are interesting and comes closer to formalising emergent properties in biological evolution
Potential energy is dual to kinetic energy -- energy is dual.
If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics!
Integrating information is a syntropic process -- teleological (Sheaf co-homology).
Homology is dual to co-homology synthesizes sheaf co-homology.
Action is dual to reaction -- forces are dual.
"Always two there are" -- Yoda.
Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics!
Energy minimization is a syntropic process -- teleological.
John Hopfield wasn't the first to describe the formalism which has been subsequently popularised as "Hopfield networks".
It seems much fairer to the wider field and long history of neuroscientists, computer scientists, physicists, and so on to call them "associative memory networks", i.e. Hopfield was definitely not the first/only to propose the network some call "Hopfield networks". For instance, after the proposal of Marr (1971), many similar models of associative memory were proposed, e.g., those of Nakano (1972), Amari (1972), Little (1974), and Stanley (1976), which all have a very similar (or exactly the same) formalism as Hopfield's 1982 paper.
Today, notable researchers in this field correct their students' papers to replace instances of "Hopfield networks" with "associative memory networks (sometimes referred to as Hopfield networks)" or something similar. I would encourage you to do the same in your current/future videos.
I deeply regret making a similar mistake regarding this topic in one of my earlier papers. However, I am glad to correct the record now and in the future.
Refs:
D Marr. Simple memory: a theory for archicortex. Philos Trans R Soc Lond B Biol Sci, 262(841):23-81, July 1971.
Kaoru Nakano. Associatron-a model of associative memory. IEEE Transactions on Systems, Man, and Cybernetics, SMC-2(3):380-388, 1972. doi: 10.1109/TSMC.1972.4309133.
S.-I. Amari. Learning patterns and pattern sequences by self-organizing nets of threshold elements. IEEE Transactions on Computers, C-21(11):1197-1206, 1972. doi: 10.1109/T-C.1972.223477.
W.A. Little. The existence of persistent states in the brain. Mathematical Biosciences, 19(1):101-120, 1974. ISSN 0025-5564. doi: doi.org/10.1016/0025-5564(74)90031-5.
J. C. Stanley. Simulation studies of a temporal sequence memory model. Biological Cybernetics, 24(3):121-137, Sep 1976. ISSN 1432-0770. doi: 10.1007/BF00364115.
Wow, you cited your sources on a TH-cam comment! Thanks for the info.
See also: Sitgler's law
Thank you for sharing your knowledge
Wow, thanks for the info!
Relative connection spaces are dimensionally agnostic, they don't presupose a dimensionality for each node in the connection space, it's better at tracking large distributions (where a heat map can highligh areas of activity [threshold activations], to see the areas that light up when the system is doing specific task or undergoing a specific sensory data pattern). This way the dimensionality isn't constrained to a 2d sheet and predifined curvature manifold, you can better see the modal transitions of the system via this heat map.
Minor correction: 'Cells that fire together, wire together' was coined by Carla Shatz (1992). Unlike Donald Hebb's original formulation, Shatz's summary of Hebbian learning eliminates the role of axonal transmission delays. By extension, neural networks which remain true to Hebb's original definition should go beyond rate coded models and instead simulate the time delays.
Yes latent time parameters need to be implemented. A threshold activation heatmap over a parallel distribution of interconnected temporal sequential threads is more descriptive in targeting what he is trying to convey in larger distibutions where hopfield computational structure fails. Each thread operates in its own input/output relative connection space and favors specific input sequences over time. Maximum amplification of a sequence (i_1/time + i_2/time + i_3/time...) indicating highly favored temporal sequence and (i_3/time - i_2/time - i_1/time...) indicating the least favored temporal sequence (with temporal sequences in-between these 2 extremes). Each thread is measured against its threshold (T), Amplification (A), and a latent timeframe (L) and elapsed time (E) for sequence-coordinated activation. When A exceeds T, the output is calculated as (1 - |L - E|) to output partners. Favored detection sequences can be defined by a integer for each input within a temporal sequential thread (a mutable trainable param), representing the input's favored position within the temporal sequence. The process can be tuned by sensory reward detections over time, increasing mutation velocity in a direction, changing the param magnitude in that direction, acting on thresholds and shift the sequence of most value for each thread. There is more optimizations to further optimize this style of learning, by extending it with threads that mutate other threads based on their activation levels, allowing mutation behavior to be inferred and leveraged by the network. Then it's a matter of reading the heat map to see what parts of the network like doing certain task given a specific time slice, and seeing the state transitions of the network when other distributions become active.
Learning is a teleological process, syntropic!
Potential energy is dual to kinetic energy -- energy is dual.
If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics!
Integrating information is a syntropic process -- teleological (Sheaf co-homology).
Homology is dual to co-homology synthesizes sheaf co-homology.
Action is dual to reaction -- forces are dual.
"Always two there are" -- Yoda.
Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics!
Energy minimization is a syntropic process -- teleological.
At the beginning, when you mention the O(n) problem, as a programmer it just intuitively makes you want to use a tree or a hash map lol
In any case, another banger!
Its fascinating to see how these things work!
I had the same thought. Though, a hashmap or something similar probably wouldn't work, since many times the key is incomplete or noisy, which would cause the hashing function to return a hash that would map to the wrong index
Thanks for the incredible quality in your videos
This was one of the good oness. I really loved it and hope part 2 comes out sooner. Keep up the amazing production sir.
I wanted to do research on something like this a year or two ago. This is amazing, I've got some work to do with this.
I was thinking about your channel less than an hour ago.
Your back!!! awesome vid as always.
Finally, you are comeback 🎉
A great video, that you clearly put a lot of work into. Really well thought out and explained.
very cool video, keen to see how the broader arguments progresses in this series
Awesome video, like always.
Just a small nitpick: your speaker audio jumps between the left and right audio channel, which is quite distracting - especially with headphones.
You can easily solve this by setting the voice audio track to "mono" when editing the video.
Cheers
Exceptional pedagogical skill! I’m not able to hold these types of explanations in my mind, so any attempt at following such a web of relations would quickly have me lost. But this is a masterclass in clear considerate communication 🙏
Waiting for your video so long. Thank you so much
This is a wildly fantastic video
Man, really thankful to your contents. I was facinated by your video about TEM, and started trying to fully understand that network(and memory in general) in my leisure time since about a year ago. I learned about latent variables, transformer architecture(fantastic videos by andrej karpathy), autoencoders, etc, but got stuck at (modern)hopfield nets, which I think is super important in the architecture of TEM. Very glad to see that you start to touch this field of Hopfield Nets, this is probably the best video about vanilla hns I've ever watched. Really looking forward to your video about Boltzman Machines and Modern Hopfield Nets, always appreciates your videos!
Potential energy is dual to kinetic energy -- energy is dual.
If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics!
Integrating information is a syntropic process -- teleological (Sheaf co-homology).
Homology is dual to co-homology synthesizes sheaf co-homology.
Action is dual to reaction -- forces are dual.
"Always two there are" -- Yoda.
Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics!
Energy minimization is a syntropic process -- teleological.
This was really clear, accurate, and easy to follow. 10/10, would watch again.
Potential energy is dual to kinetic energy -- energy is dual.
If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics!
Integrating information is a syntropic process -- teleological (Sheaf co-homology).
Homology is dual to co-homology synthesizes sheaf co-homology.
Action is dual to reaction -- forces are dual.
"Always two there are" -- Yoda.
Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics!
Energy minimization is a syntropic process -- teleological.
That’s really interesting theme, as a programmer I’m gonna try to create it by myself)
I was reading and watching videos about metacognition and bayesian probability and now you have thrown me into a new rabbit hole! 😅
Your videos are incredible and it's great to have a new one. Thank you!
Potential energy is dual to kinetic energy -- energy is dual.
If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics!
Integrating information is a syntropic process -- teleological (Sheaf co-homology).
Homology is dual to co-homology synthesizes sheaf co-homology.
Action is dual to reaction -- forces are dual.
"Always two there are" -- Yoda.
Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics!
Energy minimization is a syntropic process -- teleological.
It sounds like a really useful way to reduce the problem space. Well done!!
Potential energy is dual to kinetic energy -- energy is dual.
If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics!
Integrating information is a syntropic process -- teleological (Sheaf co-homology).
Homology is dual to co-homology synthesizes sheaf co-homology.
Action is dual to reaction -- forces are dual.
"Always two there are" -- Yoda.
Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics!
Energy minimization is a syntropic process -- teleological.
Problem, reaction, solution -- the Hegelian dialectic.
Respect for using Coldplay 🔥🔥🔥
Super intuitive! Very well done! I will wait for the video on Modern Hopfield Network :P
4:00 LOL "The ball doesn't search through all possible trajectories to select the optimal parabolic one."
The visualization of the "trajectory space" is even funnier ;)
I suspect there's different encodings for proteins with identical function, but which are more robust wrt folding consistently.
Potential energy is dual to kinetic energy -- energy is dual.
If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics!
Integrating information is a syntropic process -- teleological (Sheaf co-homology).
Homology is dual to co-homology synthesizes sheaf co-homology.
Action is dual to reaction -- forces are dual.
"Always two there are" -- Yoda.
Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics!
Energy minimization is a syntropic process -- teleological.
Another great video! I really liked your energy landscape and gradient descent animations especially.
Potential energy is dual to kinetic energy -- energy is dual.
If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics!
Integrating information is a syntropic process -- teleological (Sheaf co-homology).
Homology is dual to co-homology synthesizes sheaf co-homology.
Action is dual to reaction -- forces are dual.
"Always two there are" -- Yoda.
Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics!
Energy minimization is a syntropic process -- teleological.
12:21 Watching this to maximise my happiness. Max happiness Min unhappiness, this is the way. Thank you!
i am here again after the Nobel .
doing research in this domain for living but this video is always refreshing .
Nobel prize winning video
Potential energy is dual to kinetic energy -- energy is dual.
If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics!
Integrating information is a syntropic process -- teleological (Sheaf co-homology).
Homology is dual to co-homology synthesizes sheaf co-homology.
Action is dual to reaction -- forces are dual.
"Always two there are" -- Yoda.
Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics!
Energy minimization is a syntropic process -- teleological.
please try to make more videos, your content is extremely good
One of the best most
Clear videos I’ve ever seen EVER❤ 🙏🏻THANK YOU ❤😊
Holy cow, this makes a complex topic so intuitive.
Hopfield networks are amazing! They are studied in physics, biology, machine learning, mathematics and chemistry
The rabbit hole goes extremely deep
Potential energy is dual to kinetic energy -- energy is dual.
If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics!
Integrating information is a syntropic process -- teleological (Sheaf co-homology).
Homology is dual to co-homology synthesizes sheaf co-homology.
Action is dual to reaction -- forces are dual.
"Always two there are" -- Yoda.
Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics!
Energy minimization is a syntropic process -- teleological.
one of the most underrated channels.
I cant recall lyrics at all. But I recall the music itself very clearly, and in the right octave/key, as well as the period of time after which I first heard the song and had an autistic meltdown, listening to it around 5000 times over the course of a few weeks.
Дружище, связка отлично работает. Всем советую. Спасибо!
Beautifully explained. Thank you
watching this video after the announcement of Physics Nobel 2024 :)
One reason I like using Neo4J is that graph networks seem like the work a bit like human memory with links between things making finding related items fast.
Much love bro incredible video ❤ thank you
This is really cool! Thanks for your work Artem!
It always amazed me that sometimes I can recognize melodies which I heard last time like 30 years ago. Same with videogame. Never in last 35 years I played any Atari videogames and the moment I see them on youtube I remember sounds and pictures.
Hello. It's been long since you last uploaded the last video. I hope you are well.
Best videos, bro. Keep them coming 🎉
So good. Thank you
- Excellent! Thx.
- Very well presented: clear/concise, yet fairly comprehensive - and w/ great visualizations.
- Keep up the great content!...
This content is so high level, it's almost impossible to tell if it is true or not. Physically and philosophically, I have bought in, but my want of it to be true doesn't make it so.
I cannot imagine this is wrong, but where does this come from?
This stuff is just crazy and I don't know if it is crazy good or just crazy lol but I'm along for the ride
This was really interesting to listen to, and your intro triggered an intuition I never thought of--
The "weights" in associative memory (as you describe as proteins folding to minimize potential energy), also have the design of reinforcing relative to *emotional stimuli*. This is an evolutionary adaptation and makes total sense-- we want to remember the things we care about or hurt us.
What I didn't consider until this video is that this 'emotional weight' also behaves just like a database index, except it's bi-directional, and decaying! The surface similar to what you displayed, maybe like a paraboloid, where X is time, Y is memory complexity, and Z would describe the 'activation energy' requires to perform recall. I think it would be interesting to somehow visualize memories on this plane.
I don't know if I'm rambling, but this video alone just formed so many connections between things I never thought of!
Potential energy is dual to kinetic energy -- energy is dual.
If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics!
Integrating information is a syntropic process -- teleological (Sheaf co-homology).
Homology is dual to co-homology synthesizes sheaf co-homology.
Action is dual to reaction -- forces are dual.
"Always two there are" -- Yoda.
Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics!
Energy minimization is a syntropic process -- teleological.
That excitatory and inhibitory connections remind me of statistical correlation function
Excitatory is dual to inhibitory.
Association is dual to disassociation.
Cause is dual to effect -- correlation.
"Always two there are" -- Yoda.
Thank you. This is unbelievably simple and potentially more accurate than any other speculation. Next step: determining the shape of the proteins and categorizing them. The tools may not be there, yet... but a good hypothetical certain helps to reach certain conclusions.
Potential energy is dual to kinetic energy -- energy is dual.
If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics!
Integrating information is a syntropic process -- teleological (Sheaf co-homology).
Homology is dual to co-homology synthesizes sheaf co-homology.
Action is dual to reaction -- forces are dual.
"Always two there are" -- Yoda.
Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics!
Energy minimization is a syntropic process -- teleological.
great video , love the way of thinking thing through
Кайф слушать и офигевать)
Greetings from Austria, keep doing what you're doing!
This is incredibly well put together.
Thank you for this video!
Awesome. I just can’t wait for the next video!
High quality and useful, thank you.
Thank you so much for creating this video, genuinely one of the most educational I've ever come across. I've been trying to learn more about how brains work since that's always been something I've been very curious about literally since birth, and along with entropy being my favorite physics concept this video has just led to me googling and researching for the last 4 hours (its 3am lol) trying to find out more. Really impressed with how complicated, yet still high-quality and clear, some of the topics are in this is and I'm really looking forward to watching the rest of your videos to learn more on how all of this stuff works in such an intricate way
Thank you!!
Please keep going. Keep dedicating your time to your pursuit of wonder.
Really good that you’re covering these foundational concepts of NNs. Cross-associative NNs, auto-associative NNs, and unsupervised learning are big missing pieces in today’s NNs.
Great video as always. For 22:50, has anyone tried stacking layers of Hopfield networks yet as a work around? Basically each layer acts in it's own feature level space and resolves for that level's most likely feature, then passes it up to the next higher order Hopfield feature space to be resolved there. It seems like this would allow you to store exponentially more overall patterns has they get resolved separately to avoid the overly busy end energy landscape.
Also interestingly you can see how it would carve out the energy landscape from just the raw inputs with this. You have the Hopfield network constantly comparing itself to the some abstraction of the source input layer, meaning the more times a pattern seen the stronger it gets in the Hopfield network. Also for faster convergence, it is likely the greater the xi and hi difference the faster xi updates.
I love that you made everything dark mode. (Noticed when I saw the Wikipedia logo)
nice video! thank you for your work
youre the best for making this i love you man
Man, I just love your videos
Fabulous, bravo
I love being here and asking “huh ?” every two minutes.
Stay blessed bro. Love these videos (even though I don’t understand anything I feel like I’m learning a lot)
Very intelligent, as someone who finds this very relevant, the rgb lights add on for pico 2 looks a good way to visualize this a little affordably. I am going to read on the 2016 paper. Geoffrey Hinton is amazing to hear beforehand. Thank you for creating this
The protein example got me thinking. Is there only one unique folded configuration of the lowest energy? Can there be multiple stable comfiguration anyway, and transitions between them?
This is one of the best channels
As always, outstanding video!
Thanks for such a this amazing content
wow, very insightful, very brilliant.
Can wait to see this video!!!!!
i want to program this immediately, this is a great sign.
Super interesting to see that the hopfield network basically reinvents binary operations like XOR and XNOR for neurons, with the two differentiated by the weight.
Boolean algebra or propositional logic:-
Truth is dual to falsity.
Potential energy is dual to kinetic energy -- energy is dual.
If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics!
Integrating information is a syntropic process -- teleological (Sheaf co-homology).
Homology is dual to co-homology synthesizes sheaf co-homology.
Action is dual to reaction -- forces are dual.
"Always two there are" -- Yoda.
Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics!
Energy minimization is a syntropic process -- teleological.
Thank you very informative❤❤
Isn't the 2nd law of thermodynamics more directly linked to entropy? Is there an analog for entropy in the associative memory network?
Potential energy is dual to kinetic energy -- energy is dual.
If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics!
Integrating information is a syntropic process -- teleological (Sheaf co-homology).
Homology is dual to co-homology synthesizes sheaf co-homology.
Action is dual to reaction -- forces are dual.
"Always two there are" -- Yoda.
Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics!
Energy minimization is a syntropic process -- teleological.
great video as always
I'm subbing this dude is amazing
Love you man
Another awesoem Video :)
So, if a protein chain simply minimizes it's potential energy, then how is it possible to have different spatial configurations that actually define a protein's function?
Addressing this somewhere around 8:30 may improve the flow of information a lot.
Thanks for the really nice video.
Basically, a few words on how do cells define a proper energy landscape that fits a particular task.
My God, your videos are amazing
The outro music 🙏🏻🙏🏻🙏🏻
Plz make a video about modern hopfield net or dense assosiative memory. Cuz it a different and generalize perspective of mopdern hopfield nets.
Great video
Imagine your favorite song is viva la vida by Coldplay, and you DON’T want to kill yourself. Crazy, right?
Can you do a video on the work of Dmitry Krotov showing that attention mechanisms are a special case of associative memory networks
Waited for this...!!
Didnt you already upload a video on boltzman machines? I thought I saw it last year.
Reminds me of gradient descent
That's because this is it.
your intro is beautiful btw
TH-cam channel out there ❤
3:45 Well, that's really all paradoxes are. The reason people typically consider them to be impossobilities is because they're stuck in the mindset of believing what they first learn, rather than automatically considering that their perspective may be wrong or incomplete.
Great video. Little sad to see that anti-training wasn't mentioned. It doesn't really solve the problem with training two sequences that are 'close' together, so that's fair. But it does help, and has an interesting analogy with physiology.
Essentially, if you try to train two sequences that are too close to each other, their valleys will overlap, which means you might try to aim for one specific sequence and end up falling into the other. And if they're really close, you'll actually create a new local minimum that sits between the two.
In those cases, what you can do is identify all your local minima and then run the algorithm backwards, training hills on top of all your local minima. For stand-alone minima, this doesn't matter because they're still local minima. But if a minima is a false sequence that sits between two or more neighboring targets, this builds a hill in between those two neighboring valleys, helping to make those nearby sequences more distinct.
As Geoffy Hinton has pointed out, this has an interesting conceptual analog to dreaming, where we seem to replay experiences and concepts from our day (to a vague extent) and sleeping/dreaming also seems to help with learning and memory. Similarly by making the Hopfield focus on its memoreis while playing them backwards, so to speak, helps to solidify its own memory.
It may be little more than a metaphorical analog, but I think its still quite interesting.
That’s exactly right! Bolzmann machines, which are an improved version of Hopfield nets in fact do just that, with contrastive hebbian learning, by increasing the energy of “fake” memories.
Hopfield networks, being the first model, don’t have that property in the conventional form though. So we will talk about this in the Bolzmann machines video.
Good catch!
@@ArtemKirsanov Ah, gotcha I didn't realize the idea of 'anti-learning' applied to boltzman machines. I've only messed with restricted boltzmann machines and I always thought of them as stacked reversible auto-encoders.
Never though that the updating method may be replicating the same 'anti-learning' process - though it does make sense since autoencoders are trying to make a bunch of weird, distinct hyper-dimensional valleys. Maybe it's more apparent with the more general Boltzmann machine.
Looking forward to that video!
doesn't a principal component analysis solve this problem?
Thanks !
2:16 a* great introduction
Ironically, within the framework of quantum mechanics, one could actually say that the ball *does* "search" every possible path in order to find the "correct" one. It simply performs the "search" in parallel, not sequentially. And it's less of a search and more of an average of all paths. The principle of stationary action is the driving principle behind Newtonian dynamics and itself follows directly from the interference between many "virtual" trajectories, it turns out that the paths which are close to the "true path" (the classical path) have very little variance in their action, which rough speaking means that they end with nearly identical phase shifts (e^iHt/h, Ht ~ action, h = Planck constant) and can interfere constructively, whereas paths which are far from the "true path" have wildly varying actions, even if two paths are similar to each other. So they pick up big phase shifts and end up interfering destructively, leaving only the contributions from the paths "close to" the classically observed path. As far as I know this is the only way to derive the principle of stationary action, and the same basic idea is essential to finding transition probability amplitudes in QFT. It really does seem like the universe simultaneously tries all conceivable paths, superimposed together.
One could also say this for optics with Fermat's principle or classical mechanics with Hamilton's principle. Even though variational formulations are mathematically beautiful, I'd be cautious to assume that "reality works this way" i.e. "searches through all paths". They are one equivalent description of many (even though it is remarkable that they pop up basically everywhere).
@@asdf56790 I agree with your caution. I'm just increasingly convinced that theory and experiment are pointing this way and the onl obstacle is our flawed intuition and prejudice. We want there to be only a single, consistent reality. But this forces us into some intense mental and mathematical gymnastics to make the equations of QM fit observations. If you take the equations at face value then you have no trouble, you just have to contend with the idea that reality is not a single line of well defined events, but multiple histories occurring simultaneously and generally able to interfere with each other. An electron passing through a Stern-Gerlach device would then actually travel both paths, in "separate worlds" but these paths can still interfere and superimpose so long as you don't take any steps to determine which path was taken. Like if you redirect the paths to converge into a single path and put the whole thing in a box so you can only see the output, you cannot determine which path was taken and you can show experimentally that the output electron superposition of spin states is preserved. But the universe doesn't know ahead of time (superdeterminism notwithstanding) whether you will take a peek in the box and catch the electron with its pants down. In my view, the explanation requiring the fewest assumptions is that all paths really are taken, but with the assumptions that 1. When we observe a property we can only observe definite values, not superpositions, and 2. paths can interfere (unless decoherence has occurred). A poor and rushed explanation but this is kind of my thought process. As you say, there are many alternative interpretations. It's pretty much philosophy at this point. 😅
What’s the status of all those parallel paths? You’ve used the term "virtual" so you seem to view them as part of some kind of potentiality, not getting actualized in the observer measured reality (I’m using these terms very loosely, I don’t know exactly what they mean). If I’ve understood right, in the MWI there’s no actual convergence on a path, each of the possible parallel paths are actualized paths that get to be part of reality.
@@asdf56790 This is true, all subsequent physical theories are simply better and better approximations of reality, we shouldn't assume reality works that way without more experiemental and theoretical verification.
Holomorphic functions are independent of path (angle) -- complex differentiable.
Potential energy is dual to kinetic energy -- energy is dual.
If energy is being conserved then duality is being conserved -- the 5th law of thermodynamics!
Integrating information is a syntropic process -- teleological (Sheaf co-homology).
Homology is dual to co-homology synthesizes sheaf co-homology.
Action is dual to reaction -- forces are dual.
"Always two there are" -- Yoda.
Convergence (syntropy) is dual to divergence (entropy) -- the 4th law of thermodynamics!
Energy minimization is a syntropic process -- teleological.
OMG WTF HOW DID YOU KNOW MY FAVOURITE SONG?!!! AHHHHHH