Logical Perceptron

Mathematical Coincidence

มุมมอง 5 500

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 19 ส.ค. 2024
Exploration of the Logic of a Perceptron, the impact of its parameters, and how to interpret Neural Network in term of logic as well.
#SoME2 #mathematics #maths #math #neuralnetworks #perceptron #neuron #deeplearning #machinelearning #logic #truthtable #digitalcircuits
Music : soundcloud.com...

ความคิดเห็น • 12

@RSLT 2 ปีที่แล้ว ⁺⁵
Very Interesting Great video!
@mathematicalcoincidence5906 2 ปีที่แล้ว ⁺¹
Thank you!!
@robharwood3538 ปีที่แล้ว ⁺¹
Great video, especially with the visualizations, e.g. with the hyperplane cut by a line.
Interesting extension of this idea: If you consider that *multiple-input* Boolean logical gates/functions are non-linear, then you can recover the full range of possible logical gates/functions by including the non-linear terms as additional inputs to the perceptron. For example, for binary Boolean gates, you need not only an input for x1 and for x2, but *also* another input for the 'product' x1x2, where the 'product' (think like multiplication) is exactly equivalent to Boolean AND.
So, if your perceptron now takes a third parameter, let's call it x12 (meaning the AND value of x1 ^ x2), then you can have a third weight w12 on this input, and with these *three* inputs, plus the bias input, you can recreate XOR like so:
y(x1, x2, x12) = b + w1x1 + w2x2 + w12x12
With b=-1, w1=2, w2=3, w12=-6.
(x1, x2, x12, forumula, y)
0 0 0 -1 0
0 1 0 2 1
1 0 0 3 1
1 1 1 -2 0
I only used those values of weights to show that perceptron can have a variety of weights that generate the same XOR output. But, more clean and simple values for weights would be b=0, w1=1, w2=1, w12=-2. This gives exact output for XOR, no thresholding needed.
So, the point is that binary Boolean gates/functions are not fully described by simple linear operation on two Boolean variable inputs. To make them behave *like* a linear operation, also need to supply the product (or conjunction, or join) of the two Boolean variables as a third Boolean input.
This is interesting and perhaps useful, but it quickly runs into a problem. In order to handle Boolean functions of *more* variables, say 3 or 4 or more Boolean variables, then you need not only the conjunctions of x1 and x2, but all *pairs* of basic variables, and *also* all triplets of variables, and all *quadruplets* for 4 or more variables.
For 2 'basic variables', you need a total of 4 weights (bias, x1, x2, x12). For 3 basic variables, you need 8 weights (bias, x1, x2, x3, x12, x13, x23, x123). For 4 variables, you need 16 weights, etc. For n basic variables, you need 2^n inputs!
In some cases, with few Boolean variables, you might be able to handle this. E.g. with 8 variables, you could give each perceptron 2^8 = 256 inputs and weights, but for 16 inputs, you would need 65,536 inputs and weights per perceptron! That's a bit crazy, and it just gets worse the more basic variables you want a single perceptron to handle.
However, one beauty of perceptrons is that they themselves are non-linear!
And, in particular, a single perceptron can take in two inputs, x1, x2, and produce the conjunction (product / join) of those two, i.e. what we earlier called x12.
And so, if you start with two basic variables, then the *first* layer of perceptrons cannot recreate XOR, but if you have a *second* layer of perceptrons, one which produces x1, one produces x2, and one produces x12, then you can combine those three to create XOR in the final output layer!
There are multiple ways to do this, and you can use more than one internal/hidden layer, but a simple way is to have:
2 inputs: (x1), (x2)
2 hidden perceptrons: (x1 . not x2), (not x1 . x2); call these as h1 and h2 as inputs for next layer)
1 output perceptron: (h1 + h2) = XOR!
(This version does not require any perceptron to take in more than 2 inputs. Of course, can achieve the same thing with a three-input perceptron at the end with inputs x1, x2, x12.)
So, my point is that a single layer of perceptron would require more than two inputs, and they would have to be non-linear inputs (e.g. additional conjunctive inputs such as x12), in order to make XOR (or EQU). But you can also get the same non-linear effect by having multiple layers of perceptrons.
(In the latter case, however, the extra layers of perceptrons would have to learn to generate the correct non-linear functions such as x1.x2 or whatever. So, there may still be good cases to supply the extra conjunctive inputs directly to the first layer.)
Anyway, it's an interesting exploration!
@AkantorJojo 2 ปีที่แล้ว ⁺⁴
That's a good video :D
I especially liked the conclusion at the end, as it made it 'click' for me on why NeuronalNetworks are not performing well on certain tasks.
That said, I've heard quite a bit that it's not the new hotness to have NeuronalNetworks that do not just work on a given input, but also "concider" previously processed input(s); that for example beeing extremly usefull in object tracking on video and similar applications where a sequence of data points are (more or less) steadily connected and thus their existing connection can be used in processing them instead of relying on extracting all information just from a single point/frame/picture.
In essence what I just described above should be some kind of im plementing "a clock" to a NeuronalNetwork so it can also use the information of the previous input / timestep / cycle.
How exactly that is done is the next intersting question :D
Follow up video? ;)
@mathematicalcoincidence5906 2 ปีที่แล้ว ⁺¹
Thank you ! :)
You're right, actually neural network have evolved to resolve this problem like you said in a way that process multiple times the input data, like residual connection, recurrent neural network, stack of encoder decoder in transformers,..
And it is a trade off to keep optimization and have access to complex recursive behavior (that simple feed forward NN might miss)
But it is not possible to have a clock and a statistical optimization mechanism (because it would be stuck in any loop of the NN) so one should rethink NN architecture from scratch like maybe isolate the statistical part from the looping and recursive part. I'm still thinking on this so maybe not the next video but one day... ;)
@kaushalgagan6723 ปีที่แล้ว ⁺¹
Great work, hoping for more videos
@mathematicalcoincidence5906 ปีที่แล้ว
Thank you !! ☺️
@brendawilliams8062 2 ปีที่แล้ว ⁺²
It appears to be number factorization and then triangulation, with the hidden layer similar to what may be compared to the mathematics referred to as the vacuum space. Gates are Maxwell mathematically.
@mathematicalcoincidence5906 2 ปีที่แล้ว
Thank you for your comment, can you explain a bit more please, I don't get the link between Maxwell, gates and vacuum space
@brendawilliams8062 2 ปีที่แล้ว
@@mathematicalcoincidence5906 you are the Educator. I am a dumbfounded person.
@brendawilliams8062 2 ปีที่แล้ว
@@mathematicalcoincidence5906 I am like a person working on a backroad training to recognize a car. 😂
@brendawilliams8062 2 ปีที่แล้ว
@@mathematicalcoincidence5906 The triangulation I spend time with is: 1000564416 times 1000799917 with four products. I enjoyed your channel presentation.

ต่อไป

เล่นอัตโนมัติ

Why Does Diffusion Work Better than Auto-Regression?