I almost commented a few videos ago that you have single handled staffed all US semi conductor fabs with engineers in the next 10 years just by posting. Happy to see you grow so much even in your own niche without clickbait.
you have single handled staffed all US semi conductor fabs with engineers in the next 10 years just by posting. THE SAME THOUGHT OCCURED TO ME. BY PRODUCING THESE VIDEOS HE IS OPENING UP A WORLD OF OPPORTUNITIES FOR OTHERS TO SEE AND CONSIDER.
definitely, one of the topics I've always been interested in but never had any good source of information about, if I was younger I'd strongly consider partaking in it
As a ECE sophmore in college, I just wanted to say that you're playing an amazing role in developing the next generation of semiconductor engineering! :)
FWIW, that observation that IO consumes more power than MAC operations in an AI accelerator is pretty universal across problem domains. I often quip that it's a silly accident of history that we call the metal boxes "computers" since almost none of the power, gate count, mass, etc., is actually used directly for computation. Most of computing in-practice is about getting the right data to the right place at the right time. I have a patent on internet-scale CDN configuration. But it's all the same at every scale. Pushing configuration data across the globe to the right server. Pushing weight data across the chip to the right IO pin. The memory/storage hierarchy instantly becomes the constraint as soon as you try to scale compute at any scale, in any domain. The ideas driving photonic compute for AI will be directly applicable to more seemingly mundane use cases.
It was not always so in the past. Leakage could also be a major contributor. But it was always known that the wires and power density would become the main problem, both in power and delay.
Electrical signals are essentially sent at the speed of light (because it's not charge carriers that transmit the signal, it's electrical fields), so it's not the signal propagation speed that allows high throughput, it's ability to distinguish signals. Electrical fields get "smudged" along the way, but so does electromagnetic signals. But second one does it remarkably lesser. Also, electrical logic gates takes some time to transition from one state to another, and that's the major factor in limiting throughput. If there would be faster switching transistors - higher frequencies would be available. I don't know about photonics, but it seems that for it transition is either much faster, or architecture is completely different, like, instead of switching state, light is split along the way and goes through preconfigured logic gates, so processing is faster while it goes through same transformations, but takes some time to switch from one configuration to another. But there's possibility that same results could be achieved with using electronic components.
I wouldn't say electrical signals travel at "essentially the speed of light." That applies to maybe radio waves in free space. But velocity factor/wave propagation speed is typically ~64% the speed of light (Cat 5 data cables) to ~90% the speed of light (RF signals). Without taking insulation into account which reduces VF further. Even if the jump is from 90% to 99% the speed of light, that optimization would result in huge improvement. But like you said, it's about ability to distinguish signals, the accuracy of detection at the receiving end. Without that it's unusable.
@@davidb5205 True, but overcomplicated, so that's why there was a word "essentially" - because it's still by multiple orders of magnitude faster than charge carries move. Also, in photonics light is travelling significantly slower than c too, because it does so in a medium (glass, or whatever), which slows down electromagnetic wave propagation.
If gate transition time is the bottleneck, shouldn't FPGAs or static circuits (simple wires) not have this limitation? Or you can not implement matrix multiplication without using gates?
@@XCSme FPGAs aren't simple wires, they too use transistors and switch state each clock iteration based on input. Static circuits... well, as long as they do not use any capacitors and do not have too much capacitance or inductance on their own, they'd be lightning fast, but as useful as a plain wire or resistor, unable to compute anything. One could say that transistor is a teeny-tiny capacitors. And transistors takes time to charge. The less capacitance they have - the faster is charging. And despite modern nanometers transistors have neglectable capacitance, they still have it and they need some time to charge or discharge. If you have, tons of transistors, but they're mostly in parallel, then you can raise clock cycle maybe to tens of GHz and it would be fine, but you won't be able to do many operations per cycle, only the basic ones. If, on the other hand, you have same number of transistors interconnected with each other like in CPU, then your frequency would be limited by the latest one in chain - you need to be sure that each one in series before this one had enough time to charge/discharge, otherwise last one could end up in incorrect state.
@@aberroa1955 Thanks a lot for the response! I realized that my initial comment was a bit stupid to suggest an FPGA without logical gates, as that's the G stands for in the acronym... That being said, is it really no way to compute anything without using transistors? What if you use the voltage value as the output? Let's say you have to make an addition, if you feed 0.5V and 0.3V at the input, and link them in series, you should get 0.8V (maybe this is just an analog computer? en.wikipedia.org/wiki/Analog_computer ). Also, for example division could be done by adding resistors in parallel, so let's say you want to divide by 3, you feed 1V at the entrance, and then have 3 resistors/paths, to get input divide by 3 you could measure the output voltage of one of the paths.
I work in a research group which develops simulation tools for these photonic circuits. This video was very well explained. I can't wait to see what photonic circuits will be used for in the future. Thanks for making this video!
He said at the end a 1D row of interferometers can perform like a 2D array using time instead, would the same priinciple apply for 2D to 3D if the accuracy for 2D can be improved ?
@@raphaelcardoso7927 We develop tools that leverage Artificial Neural Networks to simulate the performance of photonic devices. All of our softwares are free and open source.
Wow, your videos just get better and better. As I watched this I kept having flashbacks to my university math/physics discussions on matrix mechanics of more than 50 years ago, and realizing that those concepts remain important in today's world.
A few mistakes: 1. ML typically consists of many matrix-vector multiplication steps, not matrix-matrix multiplications. 2. At 5:02 you meant picojoules, not petajoules 3. As I understand it (not an expert in photonics, though I have worked on an ML accelerator), for a given level of accuracy a photonic matrix-vector multiplication circuit will consume more power than a digital one, mostly because of the digital-to-analog and analog-to-digital steps. So I think it's somewhat misleading to say that power is not the problem. 4. I think the last point about replacing one of the axes with time is also misleading. That can be done for any circuit ("time-multiplexing") and will proportionally decrease throughput. So it's far from a solution to the density problem.
yeah, but I still feel like mentioning matrix-matrix multiplication is going to confuse the average viewer more than illuminate, compared to matrix-vector. most ML accelerators are built to accelerate matrix-vector products (e.g.: they use weight stationary systolic arrays). this is because accelerators rarely have the memory bandwidth to support matrix-matrix products at full throughput; they require the higher operational intensity of the static matrix/dynamic vector product.
Convolutions are typically expressed using im2col, which makes them an instance of the matrix-matrix multiply. They are extremely common in vision-based applications, so I think the statement is absolutely justified!? I would consider the questions whether a matrix-matrix product is decomposed into matrix-vector multiplications in a given accelerator an implementation detail, rather than an inherent feature of the underlying problem.
1) Electrical signals also travel at the speed of light (speed of light inside the conducting material), the signal is transmitted by photons. The main limiting factor of electronic computers are the capacities inside them. The most basic one is the capacity of the FET gates. In case for a FET to function the gate needs to reach the desired charge and this although getting smaller and smaller with the new nano transistors is still there. The same applies to discharging of those capacities which still takes time and also dumps all of their energy into heat. 2) the speed of light is a limiting factor even on photonics: A 4ghz chip, something that might be in a modern computer has a 4ghz clock and a period of 0.25ns between clock cycles, light can only travel 75mm in such a period and this is in the best case scenario (in vacum). A theoretical 40ghz photonic computer will have a 0.025ns or 25ps period and light will only be able to cover 7.5mm. This means that even in a 40ghz chip, the maximum distance for the datapath inside a computation core is at 7.5mm in the best case scenario. Having photonic computers working at teraherz is almost certainly sci-fi. And ofcource this type of cpu, with such a small distance covered between clocks will have very big memory bottlenecks (time it takes in cycles for data to be stored-recovered from the memory) and will require the memory to be very very close to the chip.
@@tf_d I just noticed a mistake in my comment. I said that the datapath on a 40ghz core would be 7.5mm. In reality datapaths are normaly pipelined so, each individual stage of the pipeline would be limited to that length (this totaly applies to electronic computers). Pipelines on cpus today are around 8-20 stages long. Im not sure if pipelining would work on photonics and I think there would need to be electronic circuits between the stages anyways.
@@billwhoever2830 I don't see why pipelining wouldn't be possible with photonics, they're technically able to do anything that an electronic circuit can.
Excellent work! My company is one of those working on photonics/quantum compute InFlight as optical networks transit the world. Though quite different, great progress has been made.
6:28 High bandwidth is not due to the physical transfer speed, in fact, electricity also move as speed as light. Bandwidths usually determined by how many bits per transfer and how many transfer per second. A normal GPU transfers couple hundred bits at few GHz.
It's Speed of Light Through a Medium... Switching to photonics removes the need for conductive metals and voltage transformation. Light TX/RX is a lot more simple, and a much clearer medium so the speed of light is faster in that medium.
Silicon Photonics represent a quantum leap in technological speed and power efficiency. One major issue when dealing with light is the fact that you're reading the probabilities of the light waves. You run into quantum mechanics at this level. Light is sensitive to interference of the environment through quantum decoherence. I believe there will be a solution to this problem as our understanding of quantum systems evolves.
How does this interference from the environment behave? Can it be algorithmically modeled? If so, I believe it might possible to create a noise generator which mimics this behavior during neural network training. This can help “robustify” the neural networks, to prepare them for inference on such optical devices. This can provide an software rather than hardware approach to mitigating the accuracy issue.
have you heard of Brainchips Akida chip? Currently in production. Akida is a neuromorphic system on a chip designed for a wide range of markets from edge inference and training with a sub-1W power to high-performance data center applications. The architecture consists of three major parts: sensor interfaces, the conversion complex, and the neuron fabric. Akida incorporates a Neuron fabric along with a processor complex used for system and data management as well as training and inference control. The chip efficiency comes from their ability to take advantage of sparsity with neurons only firing once a programmable threshold is exceeded. NNs are feed-forward. Neurons learn through selective reinforcement or inhibition of synapses. Sensory data such as images are converted into spikes. The Akida NSoC has neuron fabric comprised of 1.2 million neurons and 10 billion synapses. For training, both supervised and unsupervised modes are supported. In the supervised mode, initial layers of the network are trained autonomously with the labels being applied to the final fully-connected layer. This makes it possible for the networks to function as classification networks. Unsupervised learning from unlabeled data as well as label classification is possible.
I watched a you tube called "The next big step in computing" by Anastasi, she mention how they are trying to use light in a analog form, different intensity's as a new way to compute. Not as in depth as here but still over my head.
I have recently seen a great video on why our end/or gates will always dissipate energy. The answer "boils down" to entropy. Depending on how far into theory one wants to dabble this might be pretty interesting content.
gates are used to make "bits" interact and potentially effect a change of states, that change will of course necessitate a certain amount of work, however tiny it is we can't get a system to change states without expending energy somewhere.
Awesome video. Another reminder of why I’m subscribed. 👍🏼 This technology is really cool. It seems like the use case to make this commercially viable is training massive neural networks rather than inference. It’s the training that is computationally expensive and requires stupid amounts of computing power. That’s a challenge that needs to be solved. Inference on the other hand is trivial by comparison. Almost every smartphone these days has a built in neural engine that can run inference in real time at less than a watt for relatively simple problems, and even moderate to large problems can be run through inference on a traditional modern CPU with no dedicated matrix multiplier.
I wonder if you got the photonics cheap enough and small enough, the accuracy could be improved by running the same calculation multiple times and averaging it. Though the extra electronics and redundancy might offset any gains made…
From what I've read (ML is my main field), even AlphaZero (and definitely MuZero) run on a "high end PC". The training was done on TPUs and simulation on CPU servers.
Also, the problem is how DL model is querried in reinforcement learning scenario - it is querried thousands of times per step for simulating the game in it's "state space" (evaluation of a tree of future steps)
Katago, which is a Go AI based on AlphaGo Zero with some extra improvements, is superhuman at just a couple hundred playouts, which on my computer (gtx 1650) only takes a couple seconds to achieve (about 3-5). On a high end computer this is achieved in less than a second per move. The original AlphaGo was a frankenstein of neural networks and needed a lot of MCTS rollouts to make up for it, subsequent Go AIs can be superhuman running on an iphone even
This was a very informative video. I am in fact working on a time-multiplexed SiPh matrix multiplication design like the one you mentioned towards the end of your video.
Yes, but you probably wouldn't want to, for this to work you want coherent light, which for LEDs is going to mean throwing away most of it. A laser is what you want here really
@@itonylee1 if I recall, integrating the light source well is actually one of the major pitfalls/ cost centres that is as of yet unresolved. Integrating with the design means you don't need to align it/tune it. But making light sources out of silicon is really hard.
You mention that the 2016 AlphaGo was ran on 48 TPUs. Were these required for the Inference step used during the matches? Or was the final trained version running on just the laptop we saw in the documentary? Thanks for the great video!
6:20 - Wrong. It's not strictly because the signals move with the speed of light. Electricity travels at about 270000km/s, really close to light speed. What is really holding electronic processors back is the fact that electricity generates heat. Exponentially as you increase the throughput whereas light does not.
Amazing and insightful video. I take solace in the fact that brain is much more efficient if not the best at specialised tasks. Let's hope the photonic innovators are able to get a product market fit, and who knows, due to efficiency reason we just might be able to simulate quantum computers before we actually design quantum computing at scale !
Are these light matrices forming AND, NOR, OR, XOR gates, etc? Or is this a different type of computing that isn't "Turing style" ? in other words, are neural networks different from these logic gates?
Woah this is blowing my mind! It's amazing, the things nature has provided for us humans. And the human scientific collaborative effort never ceases to impress me. Thank you for this video.
Small detail, but electrons don't move through the chip/wires. They just wiggle around; the energy is transmitted over the electric field around the conductor, not by the electrons. Doesn't really matter as your point is still valid, just a technicality.
Electrons do flow in low frequency conductors, especially the DC power. The energy is indeed in the fields, the motion of charge carriers are described by the edge of the fields (current and magnitude)
It would be interesting to know some numbers. So far as I can find out, Google's TPUs use a little bit off-standard 16 bit floating point format for all of their data. You don't need the high accuracy of a 32 or 64 bit float, at least for inference. If the silicon photonics ADC/DAC has an effective end-to-end precision of an 8 bit float, then the gap between them and what is useful for AI is very, very large. If its equivalent to 12 bits, then its not as much of a problem. The other thing that would be nice to know is how much process variation there is in individual interferometers. One nice thing about digital electronics is that you fabricate a chip, you test it at speed and if it gives you the right digital answers, the chip is good. With analog electronics, you might have an interferometer with a reliable 32 bit equivalent signal to noise ratio, but non-linearity and variation between interferometers on the same chip might push the effective precision way down into the single digits, especially with a light path passing through multiple optical elements, testing every possible light path may be functionally impossible. With digital electronics, all you have to know is that one device's output falls within certain bounds to know that you can chain together an unlimited number of them with no loss of accuracy. With analog electronics, chaining them together always compounds the error, whether its RMS error of the noise floor being added or the multiplicative error in the actual signal. Anyway, I don't expect answers to these questions, but they are questions that I think the answers to determine whether digital photonics will be a thing in the future.
I saw an article in high school in the early 2000 on photonics research for replacing buses on motherboards from MIT. The idea was to reduce heat loss and latency.I wonder if this is an offshoot of that research?
Thank you for this! I have been seriously wondering about Lightmatter and I just checked up on them recently. Looks like they’re hiring some powerful folks and hopefully going to be able to offer real products soon!
Great Video really interesting to see how other fields are tackling the issue of power consumption. From what I understand, it is not a fair comparison to make between conventional neural networks and the human brain. The human brain works on a completely different mode of computation where data storage and computation are unified and signals are carried through spike potentials. Hopefully photonics can be applied to designing analogs for spiking neural networks.
I think there is a bit of an error in this video. the MZI is a passive device which uses a half-silvered mirror to create interference patterns, so there is no voltage applied to the MZI. What I suspect you may be referring to is the Kerr effect where the reflective index changes with applied voltage, and if you used it with an MZI then this is likely to be what gives you the desired properties.
Hi there, I really appreciate your content. Just a side note: I believe you meant 20 or 1 'pico' Joules / MAC and not Peta which would be about 278GWh = 19k households/year?
Cloud Tensor Processing Units (TPUs) Tensor Processing Units (TPUs) are Google’s custom-developed application-specific integrated circuits (ASICs) used to accelerate machine learning workloads. TPUs are designed from the ground up with the benefit of Google’s deep experience and leadership in machine learning.
the part about alpha go and how many TPUs were used. it's no wonder i cant find anywhere to build AI for StarCraft on my PC at home. ambitious but just not gonna happen, it seems. nevermind that, this talk/video was spectacular and incredibly informative. thank you.
In the challenges section what does John mean when he states that the photonic chips aren't used for training, but only for 'inferences' due to their lower accuracy? Great Presentation btw !
I'm far, far away from this area of study, but I am an EE nonetheless... could they replace the "thin film heater" with a Piezo element on each of those interferometers to slightly deform the one leg? This stuff is so cool.
There's some inconsistency in the argument. At the start you note that most energy is lost on data transfers, yet these are untouched by the photonics, they tackle the multiplication instead. And I can't help but notice that important part of the photonic circuit is a heater, presumably to affect length of one of the paths, adjusting the interference. So while there seems to be obvious advantage in speed of the multiplication itself, it's not clear how much if any energy does it save.
One thing that seems to be overlooked in the video is the use of a nonlinear activation function as part of the computation. I do not think matrix multiplies all by themselves give the desired effect.
Great video! Thank you very much for making it! I’m currently working on a research project with ultra low precision neural networks. I wanted to ask if reducing the number of bits in the activations and/or weights to about 2-3 bits each (using state of the art Quantization methods) would help with the issues with photonics accelerators raised in this videos accuracy and scale? In general, most neural networks these days can be quantized down to 4-bits with almost no loss of performance, using the latest Quantization methods. So 8-bits might be a bit unnecessary, if these methods are used.
Do Photonics require extreme low temps, such as q-bits do currently? Quantum bits receive noise from temperature, so those chips work most reliably in a very low temp, close to 0K. It ends up with a machine thats mostly multi stage cooler, with a chip on the tip. Are photonics the same?
I created a Neural network I trained to work as a Binary ALU. Even better for this I trained "Cells" which act as logic gates and I would love to see my data encoded into glass such that it could function as a full on ALU in light.
I see this being a much more viable path to future computing than quantum computers. Even if the chips are substantially bigger, they'll use far less energy and won't require cooling in the same way as traditional transistors. I think it's really exciting and I hope to see this continue to grow and advance!
Check out Xanadu Photonics, squeezed state photons make quantum computing also possible in photonics - although the photodetectors have to be cooled in a liquid oxygen bath.
Congratulations for the very interesting and informative video. However, I guess that probably you meant femto rather than petajoule per MAC. Furthermore, the speed of light is invoked inopportunely both for justifying the very large frequencies and the short time of computation. In optical fibres light propagates 1.5 times slowlier than in empty space, and in SOI waveguides it does even 2.8-3 times slower; by contrast, the RF or microwave signal in a modulator travels faster. And in general, electricity propagates to a speed comparable to c, because it's the electromagnetic field to propagate it, not the electrons in the metals, which, as a whole, drift by cm/h (when applying DC). The point is that in photonics you use dielectrics like glass, silica or intrinsic Si, therefore absorption is much smaller than in a conductive material; this would be evident if the same circuit were implemented with microwaves on a microstrip. However, the problem with metal connections when using high clock rates in digital circuitry is that you have to charge and discharge the parasitic capacitance of those lines. About the second misconception, an electrical circuit would be much slower with respect to the MZI mesh because of its RC time constants: it's not that the electric signal propagates slowlier, it's that the transient is much longer. Regarding the 1980s Bell Labs research you mentioned, I guess that they made an optical computer, I doubt (but I should check) that it was based on optical transistor as that technology is still at the proof of concept phase. However, this does not change your point, that announcements about silicon photonics neural networks replacing TPUs must be taken with caution.
Have they tried encodings that use transitions rather than levels? Id guess probably so but going off the graphics in the video it makes me wonder. For instance, no transition = 0, transition = 1, and the level is never really looked at just edges. Or pulse of light = 1 and nothing = 0. Probably both cases have a start byte for Synchronization. But the pulse idea has an issue that switching speed must be way way faster than the previous techniques... more used for low speed power line communications e.g. like DALI (the same wires that power the device can short the wires together instantaneously for communications, while the power rails filter out the communications with L and C. There's also positive edge = 1 negative edge = 0 ... that also makes you have to be able to switch much faster at TX side but also easy on the Rx side less error prone. I'd assume error correcting codes were used??? Also possibilities like QAM constellations can boost bandwidth exponentially but SNR must also increase. You could even construct more constellation points in more dimensions than 2 with multiple light frequencies on the same line Maybe a trial and error thing after these results figuring out what schemes make the systems fastest I suppose
Hmm I get the hunch that these chips will go the way of transputers, as per nice idea at the time, but I feel something far smaller is around the corner. There is a Chinese researcher who is doing it on the molecular scale using crystal lattices and the doping of them with different atoms in order to change their topological properties. The light therefore behaves according to the structure of the lattice. You can us entangled photons to send signals and the photons themselves are in a squeezed light state. I understand the doping is done using a femtosecond laser. I'm a little unclear on the details but it is the leading edge in photonic computers. The work is published in respectable journals but still highly experimental.
Lightmatter has an upcoming talk about doing this at wafer scale coming up in 2 weeks at Hot Chips. I hope you can tell us what they're up to! (And Ranovus).
A small note, a petajoule is the amount of energy unleashed by a 250 kt nuclear bomb. You probably mean femtojoule. :) The structure of feedforward deep neural networks is unfortunately very sensitive to computation error which is why typically these often employ at least 32-bit floating point arithmetic. Backpropagation of these networks to update weights through many layers can result in cumulative error which limits model performance. For optical scaling operations, there are additional error sources due to quantum detection fluctuations, flaws in the optical system that cause scattering and coherent noise, sampling and quantization error, not to mention power consumption from electro-optical interfaces that can be quite substantial. There may be neural networks for which optical scaling operations are suitable, however, the conventional feedforward deep neural network, because of its reliance on precision matrix multiplication operations so that backpropagation can be performed using the adjoint operation, is going to be quite challenging. There are plenty of ideas and simulations floating around for this but very little in the way of actually attacking the real issues surrounding optical neural network implementations, just mostly hype.
I don't think anyone is interested in training on photonic accelerators, it's all inference. Quantization is very commonly employed to make inference cheaper, which results in errors similar to photonic accelerators, though smaller in magnitude (IIRC current photonic accelerator designs get 2-4 bits of precision, classical inference accelerators are typically in the 8-16 bit range). So I think most of what you're saying here is a non sequitur with regard to the published research.
@@taktoa1 Run something as simple as MNIST on an optical accelerator and get 99% accuracy and then we'll talk. The key with digital quantized neural networks is that despite the fact they're quantized they are also deterministic, that is, given an input, the output is the same each time, as there is no measurement noise. Therefore if you train with quantization error, the network can learn that error. However, analog physical systems have measurement error. It's not just that the optical system achieves the "equivalent" of 2-4 bits of precision, its that no matter how many average photons are used to represent a signal, there are going to be measurement outliers. Due to the nonlinear operations of ReLu and Maxpool, outliers due to measurement error can accumulate in deep neural network layers. So it seems to me that having many deep layers and nonlinear operations like ReLu and Maxpool make it extremely difficult for an analog multiplier, especially one susceptible to quantum noise, is going to produce reproducible, reliable inference. Because of the extreme sensitivity of feedforward neural networks to cumulative error, if training is performed digitally for inference that is to occur on an analog/optical computer, the training model must be extremely accurate, including effects of quantization, noise sources including Poisson, thermal, coherent noise, system manufacturing error, etc., and even then the variation due to measurement error may limit the ultimate inference accuracy. It may be required to train a neural network for each physical system because the manufacturing tolerances of two different optical chips may be too different for a network trained on one chip to work on another chip. Biological neural networks seem to work quite effectively without being deterministic despite the fact these are implemented on analog computer wetware. Deep feedforward neural networks seem like a poor fit for analog computing, especially quantum noise limited computing for which the power consumption is directly influenced by the number of photons required to achieve a certain SNR due to Poisson noise (SNR being proportional directly to the square root of power, and so SNR increasing only slowly with increased power consumption). Even other solutions that use electric charge (mythic.ai/) with similar electric charge quantization problems are limited in the number of layers that can be implemented. The whole reason why feedforward deep neural networks were created in the first place is because backpropagation is possible using a bit of clever calculus and the chain rule. Training is the problem, because if you don't have any other kind of neural network you can effectively train that is resistant to measurement error, analog computation is not going to be a viable solution for neural network inference. Neural network accelerators like the Tensor processor have sucked all of the air out of the room for research into any other kind of neural network architecture, and as long as this is the case, the market will not care about analog computers because the current feedforward deep neural networks were created for deterministic, digital machines.
Correction: high speed transmission lines of electricity also travel at the speed of light... electronic circuits are physically electromagnetic waves, just much of the time circuit theory is an adequate tool and very simplified. While the overall movement of electrons themselves, ie, electron drift, is incredibly slow (shockingly slow if you didn't know already) Typical FR4 dielectric on a pcb slows the waves by a factor of 4. Fibre optics also slow the light. Depends on the material how much
Considering less heat is generated, in fact almost none at all, wouldn't it be possible to stack the dies vertically? The yield rate would have to be very high.. but you could stack thousands of these chips on top of each other, correct?
What would be pretty wild would be using both time offsetting and wavelength multiplexing to increase throughput. If I understand it, it would be like light based hyperthreading, except you could do 3, 4, or more threads all independently. I guess it would just rely on how passive the structures would actually be.
I worked on this datacom side for photonic switches. The thing about wavelength multiplexing (WDM) when used with MZIs is that crosstalk can be a killer, depending on the MZIs used. Depending on the interconnect topology used for the MZI mesh, crosstalk can cascade through the MZI mesh ultimately increasing the "noise" level beyond practicality. This also inhibits scaling these meshes out, as you could imagine!
At ~3:30 You say the image data is matrix multiplied by the weights, which is true for the weights in first layer, but at each NN layer a matrix multiply is done. Not a big deal of course, but maybe this comment prevents some future confusion for some folks
Something that most of the pro photonic accelerators often ignore, whether in academia or industry, is the power consumed in the LASER!!! They often don’t include that to make their efficiency numbers look hot!
"old" silicon can still compete well into the future, well, if we replace silicon for another material (yes that is in research since +10 years) we could get the same result with less electricity used, thus faster and more efficient computers. The "Real" fun begins when scientists achieves room-temperature superconductivity, that would enable computers running Much, much faster than current computers while using close to zero in electricity (as superconductivity would allow electrons to flow with no resistance, thus using electricity solely for the calculations/data movement through the material)
When Deep Blue beat Garry Kasparov, some wag said: "Sure, but how did it do in the post-game interview?" Probably wouldn't be hard to train a neural network to give trite answers to trite questions, with a few quips thrown in. "Mr Deep. Can I call you Deep, or do you prefer Blue". "Whichever you like." "OK Deep, how do you think Mr Kasparov played?" "Pretty well - for a human.". "Why didn't you take his pawn at move 35?" "It wins at depth 6, but loses at 16. Humans are so slow."
I almost commented a few videos ago that you have single handled staffed all US semi conductor fabs with engineers in the next 10 years just by posting. Happy to see you grow so much even in your own niche without clickbait.
you have single handled staffed all US semi conductor fabs with engineers in the next 10 years just by posting. THE SAME THOUGHT OCCURED TO ME. BY PRODUCING THESE VIDEOS HE IS OPENING UP A WORLD OF OPPORTUNITIES FOR OTHERS TO SEE AND CONSIDER.
This video is sponsored by chipactVPN
I’m getting my comp eng degree rn rn
definitely, one of the topics I've always been interested in but never had any good source of information about, if I was younger I'd strongly consider partaking in it
As a ECE sophmore in college, I just wanted to say that you're playing an amazing role in developing the next generation of semiconductor engineering! :)
A big Thank You to Alex Sludds too (from grateful audience)!
@Nobody Important thats none of your business
@Nobody Important Time travel.
@Nobody Important
The video may have been on private before?
@@outerspaceisalie
Indeed. Should we take him out? He might have learned too much...
FWIW, that observation that IO consumes more power than MAC operations in an AI accelerator is pretty universal across problem domains. I often quip that it's a silly accident of history that we call the metal boxes "computers" since almost none of the power, gate count, mass, etc., is actually used directly for computation. Most of computing in-practice is about getting the right data to the right place at the right time.
I have a patent on internet-scale CDN configuration. But it's all the same at every scale. Pushing configuration data across the globe to the right server. Pushing weight data across the chip to the right IO pin. The memory/storage hierarchy instantly becomes the constraint as soon as you try to scale compute at any scale, in any domain. The ideas driving photonic compute for AI will be directly applicable to more seemingly mundane use cases.
It always comes down to logistics and thermodynamics, humanity's two biggest nemeses, doesn't it?
It was not always so in the past.
Leakage could also be a major contributor.
But it was always known that the wires and power density would become the main problem, both in power and delay.
Electrical signals are essentially sent at the speed of light (because it's not charge carriers that transmit the signal, it's electrical fields), so it's not the signal propagation speed that allows high throughput, it's ability to distinguish signals. Electrical fields get "smudged" along the way, but so does electromagnetic signals. But second one does it remarkably lesser. Also, electrical logic gates takes some time to transition from one state to another, and that's the major factor in limiting throughput. If there would be faster switching transistors - higher frequencies would be available. I don't know about photonics, but it seems that for it transition is either much faster, or architecture is completely different, like, instead of switching state, light is split along the way and goes through preconfigured logic gates, so processing is faster while it goes through same transformations, but takes some time to switch from one configuration to another. But there's possibility that same results could be achieved with using electronic components.
I wouldn't say electrical signals travel at "essentially the speed of light." That applies to maybe radio waves in free space. But velocity factor/wave propagation speed is typically ~64% the speed of light (Cat 5 data cables) to ~90% the speed of light (RF signals). Without taking insulation into account which reduces VF further. Even if the jump is from 90% to 99% the speed of light, that optimization would result in huge improvement. But like you said, it's about ability to distinguish signals, the accuracy of detection at the receiving end. Without that it's unusable.
@@davidb5205 True, but overcomplicated, so that's why there was a word "essentially" - because it's still by multiple orders of magnitude faster than charge carries move. Also, in photonics light is travelling significantly slower than c too, because it does so in a medium (glass, or whatever), which slows down electromagnetic wave propagation.
If gate transition time is the bottleneck, shouldn't FPGAs or static circuits (simple wires) not have this limitation? Or you can not implement matrix multiplication without using gates?
@@XCSme FPGAs aren't simple wires, they too use transistors and switch state each clock iteration based on input. Static circuits... well, as long as they do not use any capacitors and do not have too much capacitance or inductance on their own, they'd be lightning fast, but as useful as a plain wire or resistor, unable to compute anything.
One could say that transistor is a teeny-tiny capacitors. And transistors takes time to charge. The less capacitance they have - the faster is charging. And despite modern nanometers transistors have neglectable capacitance, they still have it and they need some time to charge or discharge. If you have, tons of transistors, but they're mostly in parallel, then you can raise clock cycle maybe to tens of GHz and it would be fine, but you won't be able to do many operations per cycle, only the basic ones. If, on the other hand, you have same number of transistors interconnected with each other like in CPU, then your frequency would be limited by the latest one in chain - you need to be sure that each one in series before this one had enough time to charge/discharge, otherwise last one could end up in incorrect state.
@@aberroa1955 Thanks a lot for the response! I realized that my initial comment was a bit stupid to suggest an FPGA without logical gates, as that's the G stands for in the acronym...
That being said, is it really no way to compute anything without using transistors? What if you use the voltage value as the output? Let's say you have to make an addition, if you feed 0.5V and 0.3V at the input, and link them in series, you should get 0.8V (maybe this is just an analog computer? en.wikipedia.org/wiki/Analog_computer ).
Also, for example division could be done by adding resistors in parallel, so let's say you want to divide by 3, you feed 1V at the entrance, and then have 3 resistors/paths, to get input divide by 3 you could measure the output voltage of one of the paths.
I work in a research group which develops simulation tools for these photonic circuits. This video was very well explained. I can't wait to see what photonic circuits will be used for in the future. Thanks for making this video!
What types of tools do you develop?
He said at the end a 1D row of interferometers can perform like a 2D array using time instead, would the same priinciple apply for 2D to 3D if the accuracy for 2D can be improved ?
@@raphaelcardoso7927 We develop tools that leverage Artificial Neural Networks to simulate the performance of photonic devices. All of our softwares are free and open source.
@@Soken50 I am not sure, since I do not deal with the theory-sort of stuff, my wokr us mostly in the software development side of things.
Hi can we connect on LinkedIn ??
Thanks!
Wow, your videos just get better and better. As I watched this I kept having flashbacks to my university math/physics discussions on matrix mechanics of more than 50 years ago, and realizing that those concepts remain important in today's world.
A few mistakes:
1. ML typically consists of many matrix-vector multiplication steps, not matrix-matrix multiplications.
2. At 5:02 you meant picojoules, not petajoules
3. As I understand it (not an expert in photonics, though I have worked on an ML accelerator), for a given level of accuracy a photonic matrix-vector multiplication circuit will consume more power than a digital one, mostly because of the digital-to-analog and analog-to-digital steps. So I think it's somewhat misleading to say that power is not the problem.
4. I think the last point about replacing one of the axes with time is also misleading. That can be done for any circuit ("time-multiplexing") and will proportionally decrease throughput. So it's far from a solution to the density problem.
1. A 1xn matrix is a vector so eh... plus if you do batch learning you end up with true matrix-matrix products
yeah, but I still feel like mentioning matrix-matrix multiplication is going to confuse the average viewer more than illuminate, compared to matrix-vector. most ML accelerators are built to accelerate matrix-vector products (e.g.: they use weight stationary systolic arrays). this is because accelerators rarely have the memory bandwidth to support matrix-matrix products at full throughput; they require the higher operational intensity of the static matrix/dynamic vector product.
2. 28 orders of magnitude is a lot
Petajoules in a chip sounds fun
Convolutions are typically expressed using im2col, which makes them an instance of the matrix-matrix multiply. They are extremely common in vision-based applications, so I think the statement is absolutely justified!?
I would consider the questions whether a matrix-matrix product is decomposed into matrix-vector multiplications in a given accelerator an implementation detail, rather than an inherent feature of the underlying problem.
At 5:04, did you mean to write picojoule instead of petajoule?
You did a really good job on this one, man. That's no small feat, bravo
1) Electrical signals also travel at the speed of light (speed of light inside the conducting material), the signal is transmitted by photons. The main limiting factor of electronic computers are the capacities inside them. The most basic one is the capacity of the FET gates. In case for a FET to function the gate needs to reach the desired charge and this although getting smaller and smaller with the new nano transistors is still there. The same applies to discharging of those capacities which still takes time and also dumps all of their energy into heat.
2) the speed of light is a limiting factor even on photonics: A 4ghz chip, something that might be in a modern computer has a 4ghz clock and a period of 0.25ns between clock cycles, light can only travel 75mm in such a period and this is in the best case scenario (in vacum). A theoretical 40ghz photonic computer will have a 0.025ns or 25ps period and light will only be able to cover 7.5mm. This means that even in a 40ghz chip, the maximum distance for the datapath inside a computation core is at 7.5mm in the best case scenario. Having photonic computers working at teraherz is almost certainly sci-fi. And ofcource this type of cpu, with such a small distance covered between clocks will have very big memory bottlenecks (time it takes in cycles for data to be stored-recovered from the memory) and will require the memory to be very very close to the chip.
This.
@@tf_d I just noticed a mistake in my comment. I said that the datapath on a 40ghz core would be 7.5mm. In reality datapaths are normaly pipelined so, each individual stage of the pipeline would be limited to that length (this totaly applies to electronic computers).
Pipelines on cpus today are around 8-20 stages long.
Im not sure if pipelining would work on photonics and I think there would need to be electronic circuits between the stages anyways.
@@billwhoever2830 I don't see why pipelining wouldn't be possible with photonics, they're technically able to do anything that an electronic circuit can.
Great work Jon!
thanks to you, he really made light work of this topic!
Thank you Alex, all the best blessings to you and yours.🌟🌟🌟🌟🌟🌟🌟🌟
Excellent work! My company is one of those working on photonics/quantum compute InFlight as optical networks transit the world. Though quite different, great progress has been made.
Quick posts! Really enjoying your silicon rabbit hole.
6:28 High bandwidth is not due to the physical transfer speed, in fact, electricity also move as speed as light. Bandwidths usually determined by how many bits per transfer and how many transfer per second. A normal GPU transfers couple hundred bits at few GHz.
It's Speed of Light Through a Medium...
Switching to photonics removes the need for conductive metals and voltage transformation.
Light TX/RX is a lot more simple, and a much clearer medium so the speed of light is faster in that medium.
Small correction (I may be nitpicking) electricity can transmit data at speeds of 50%-99% speed of light
Silicon Photonics represent a quantum leap in technological speed and power efficiency. One major issue when dealing with light is the fact that you're reading the probabilities of the light waves. You run into quantum mechanics at this level. Light is sensitive to interference of the environment through quantum decoherence. I believe there will be a solution to this problem as our understanding of quantum systems evolves.
@@rufushawkins3950 Very simplified it means something is quantifiable, as in a photons energy is discrete in a way.
@@rufushawkins3950 When used as a noun it means small, when used as an adjective it means big. English!
Geometry is the key to solving the quantum mechanics problem
Wow, there is so much wrong with this one small comment that it would take an essay to pick it apart.
How does this interference from the environment behave? Can it be algorithmically modeled?
If so, I believe it might possible to create a noise generator which mimics this behavior during neural network training. This can help “robustify” the neural networks, to prepare them for inference on such optical devices.
This can provide an software rather than hardware approach to mitigating the accuracy issue.
have you heard of Brainchips Akida chip? Currently in production.
Akida is a neuromorphic system on a chip designed for a wide range of markets from edge inference and training with a sub-1W power to high-performance data center applications. The architecture consists of three major parts: sensor interfaces, the conversion complex, and the neuron fabric.
Akida incorporates a Neuron fabric along with a processor complex used for system and data management as well as training and inference control. The chip efficiency comes from their ability to take advantage of sparsity with neurons only firing once a programmable threshold is exceeded. NNs are feed-forward. Neurons learn through selective reinforcement or inhibition of synapses. Sensory data such as images are converted into spikes. The Akida NSoC has neuron fabric comprised of 1.2 million neurons and 10 billion synapses. For training, both supervised and unsupervised modes are supported. In the supervised mode, initial layers of the network are trained autonomously with the labels being applied to the final fully-connected layer. This makes it possible for the networks to function as classification networks. Unsupervised learning from unlabeled data as well as label classification is possible.
Thanks for metioning the paper! I knew I recognized this and when you showed it I realized it was 4 years since I read it.
I watched a you tube called "The next big step in computing" by Anastasi, she mention how they are trying to use light in a analog form, different intensity's as a new way to compute. Not as in depth as here but still over my head.
Happy to know that i am not only one following her
Oy another Anastasia followers nice
I have recently seen a great video on why our end/or gates will always dissipate energy.
The answer "boils down" to entropy.
Depending on how far into theory one wants to dabble this might be pretty interesting content.
Can you give a better set of keywords or a full title?
@@SianaGearz why pure information gives of heat
By up and atom
@@stefanklaus6441 Watched it yesterday, great video
gates are used to make "bits" interact and potentially effect a change of states, that change will of course necessitate a certain amount of work, however tiny it is we can't get a system to change states without expending energy somewhere.
im really glad to see this tech being mentioned more and more.
Excellent channel. Objective, serious and extremely informative.
Channels like these are what make TH-cam great. Not those bonehead vloggers.
"Photonic Neural Networks" That a yummy combo of words. I hope this video doesn't disappoint.
Awesome video. Another reminder of why I’m subscribed. 👍🏼
This technology is really cool. It seems like the use case to make this commercially viable is training massive neural networks rather than inference. It’s the training that is computationally expensive and requires stupid amounts of computing power. That’s a challenge that needs to be solved. Inference on the other hand is trivial by comparison. Almost every smartphone these days has a built in neural engine that can run inference in real time at less than a watt for relatively simple problems, and even moderate to large problems can be run through inference on a traditional modern CPU with no dedicated matrix multiplier.
I wonder if you got the photonics cheap enough and small enough, the accuracy could be improved by running the same calculation multiple times and averaging it. Though the extra electronics and redundancy might offset any gains made…
i have the idea to do computing with light 5 years ago, but have no clue to get it done. glad to see a big step in computing!
From what I've read (ML is my main field), even AlphaZero (and definitely MuZero) run on a "high end PC". The training was done on TPUs and simulation on CPU servers.
Also, the problem is how DL model is querried in reinforcement learning scenario - it is querried thousands of times per step for simulating the game in it's "state space" (evaluation of a tree of future steps)
Katago, which is a Go AI based on AlphaGo Zero with some extra improvements, is superhuman at just a couple hundred playouts, which on my computer (gtx 1650) only takes a couple seconds to achieve (about 3-5). On a high end computer this is achieved in less than a second per move. The original AlphaGo was a frankenstein of neural networks and needed a lot of MCTS rollouts to make up for it, subsequent Go AIs can be superhuman running on an iphone even
Light doesn’t travel faster than electrons, but it’s the frequency of light that allows more information to be transmitted and processed faster.
This was a very informative video. I am in fact working on a time-multiplexed SiPh matrix multiplication design like the one you mentioned towards the end of your video.
Welcome back on Asianometry. Thanks for your answer. I’m not so ready! First my family. See you soon. I hope! DV
I wonder if it is possible to have multi-stack-layer of LED film to do the similar task, since LED can both emit light and also photoelectric?
LEDs aren't sensitive enough
@@animeshthakur5693 Sure, but in theory, it is possible to integrate LED within semiconductor die process.
Yes, but you probably wouldn't want to, for this to work you want coherent light, which for LEDs is going to mean throwing away most of it. A laser is what you want here really
@@itonylee1 if I recall, integrating the light source well is actually one of the major pitfalls/ cost centres that is as of yet unresolved. Integrating with the design means you don't need to align it/tune it. But making light sources out of silicon is really hard.
When I was getting my chemistry degree I noticed multiple labs working on materials for things like this. It's cool seeing it hit youtube.
You mention that the 2016 AlphaGo was ran on 48 TPUs. Were these required for the Inference step used during the matches? Or was the final trained version running on just the laptop we saw in the documentary?
Thanks for the great video!
So there an analog aspect of these calculators as well? Very cool...exactly what was wanted. Can't wait to see how this tech works out....cheers
I can't thank this channel enough. Good job.
Good that you started looking at this. More presentations in this area should follow. Talk to more experts.
Excellent video! Learned a lot.
Well done! 😃
6:20 - Wrong. It's not strictly because the signals move with the speed of light. Electricity travels at about 270000km/s, really close to light speed.
What is really holding electronic processors back is the fact that electricity generates heat. Exponentially as you increase the throughput whereas light does not.
12:41 LET's GO, literally called it because rate coding is how our neurons are organized!
Amazing and insightful video. I take solace in the fact that brain is much more efficient if not the best at specialised tasks. Let's hope the photonic innovators are able to get a product market fit, and who knows, due to efficiency reason we just might be able to simulate quantum computers before we actually design quantum computing at scale !
Are these light matrices forming AND, NOR, OR, XOR gates, etc? Or is this a different type of computing that isn't "Turing style" ? in other words, are neural networks different from these logic gates?
In my favourite si-fi movie Bicentennial Man they kinda show photonic brain, even though it's called positronic. I love it now...
Woah this is blowing my mind! It's amazing, the things nature has provided for us humans. And the human scientific collaborative effort never ceases to impress me. Thank you for this video.
Small detail, but electrons don't move through the chip/wires. They just wiggle around; the energy is transmitted over the electric field around the conductor, not by the electrons. Doesn't really matter as your point is still valid, just a technicality.
Electrons do flow in low frequency conductors, especially the DC power. The energy is indeed in the fields, the motion of charge carriers are described by the edge of the fields (current and magnitude)
It would be interesting to know some numbers. So far as I can find out, Google's TPUs use a little bit off-standard 16 bit floating point format for all of their data. You don't need the high accuracy of a 32 or 64 bit float, at least for inference. If the silicon photonics ADC/DAC has an effective end-to-end precision of an 8 bit float, then the gap between them and what is useful for AI is very, very large. If its equivalent to 12 bits, then its not as much of a problem.
The other thing that would be nice to know is how much process variation there is in individual interferometers. One nice thing about digital electronics is that you fabricate a chip, you test it at speed and if it gives you the right digital answers, the chip is good. With analog electronics, you might have an interferometer with a reliable 32 bit equivalent signal to noise ratio, but non-linearity and variation between interferometers on the same chip might push the effective precision way down into the single digits, especially with a light path passing through multiple optical elements, testing every possible light path may be functionally impossible.
With digital electronics, all you have to know is that one device's output falls within certain bounds to know that you can chain together an unlimited number of them with no loss of accuracy. With analog electronics, chaining them together always compounds the error, whether its RMS error of the noise floor being added or the multiplicative error in the actual signal.
Anyway, I don't expect answers to these questions, but they are questions that I think the answers to determine whether digital photonics will be a thing in the future.
I saw an article in high school in the early 2000 on photonics research for replacing buses on motherboards from MIT. The idea was to reduce heat loss and latency.I wonder if this is an offshoot of that research?
Great video as always! Keep them coming, love it!
Thank you for these fascinating videos.
Thank you for this! I have been seriously wondering about Lightmatter and I just checked up on them recently. Looks like they’re hiring some powerful folks and hopefully going to be able to offer real products soon!
5:03 Are you sure it's petajoules? That's the energy equivalent of about 1 megaton of TNT.
Great Video really interesting to see how other fields are tackling the issue of power consumption. From what I understand, it is not a fair comparison to make between conventional neural networks and the human brain. The human brain works on a completely different mode of computation where data storage and computation are unified and signals are carried through spike potentials. Hopefully photonics can be applied to designing analogs for spiking neural networks.
I think there is a bit of an error in this video. the MZI is a passive device which uses a half-silvered mirror to create interference patterns, so there is no voltage applied to the MZI. What I suspect you may be referring to is the Kerr effect where the reflective index changes with applied voltage, and if you used it with an MZI then this is likely to be what gives you the desired properties.
Such a fun video ! Great work guys !
Would be great to get an update .. once we have more info on the Chinese Taichi photonic chip.. is it hype or real ?
12:35 - That's really clever.... Easy to get bogged down by the "right and wrong" ways to use tools.
Hi there, I really appreciate your content. Just a side note: I believe you meant 20 or 1 'pico' Joules / MAC and not Peta which would be about 278GWh = 19k households/year?
I wonder how this compares to the analog circuits that are being used to run neural networks.
This is one of my favourites don’t know why it doesn’t have more views
Cloud Tensor Processing Units (TPUs)
Tensor Processing Units (TPUs) are Google’s custom-developed application-specific integrated circuits (ASICs) used to accelerate machine learning workloads. TPUs are designed from the ground up with the benefit of Google’s deep experience and leadership in machine learning.
The caliber of this video was very impressive!
the part about alpha go and how many TPUs were used. it's no wonder i cant find anywhere to build AI for StarCraft on my PC at home. ambitious but just not gonna happen, it seems.
nevermind that, this talk/video was spectacular and incredibly informative. thank you.
6:19 voltage potential is also transmitted at the speed of light
In the challenges section what does John mean when he states that the photonic chips aren't used for training, but only for 'inferences' due to their lower accuracy?
Great Presentation btw !
5:07 20 petaJoule per Mac? Is it not picojoule?
I'm far, far away from this area of study, but I am an EE nonetheless... could they replace the "thin film heater" with a Piezo element on each of those interferometers to slightly deform the one leg? This stuff is so cool.
This is very interesting. Excited to see what the future brings in this space.
would you consider investing in which company making some of the axes for the photonic/silicon era?
There's some inconsistency in the argument. At the start you note that most energy is lost on data transfers, yet these are untouched by the photonics, they tackle the multiplication instead. And I can't help but notice that important part of the photonic circuit is a heater, presumably to affect length of one of the paths, adjusting the interference. So while there seems to be obvious advantage in speed of the multiplication itself, it's not clear how much if any energy does it save.
One thing that seems to be overlooked in the video is the use of a nonlinear activation function as part of the computation. I do not think matrix multiplies all by themselves give the desired effect.
Usually the nonlinearity is achieved outside of the photonic part :/
Great video! Thank you very much for making it!
I’m currently working on a research project with ultra low precision neural networks. I wanted to ask if reducing the number of bits in the activations and/or weights to about 2-3 bits each (using state of the art Quantization methods) would help with the issues with photonics accelerators raised in this videos accuracy and scale?
In general, most neural networks these days can be quantized down to 4-bits with almost no loss of performance, using the latest Quantization methods. So 8-bits might be a bit unnecessary, if these methods are used.
Do Photonics require extreme low temps, such as q-bits do currently?
Quantum bits receive noise from temperature, so those chips work most reliably in a very low temp, close to 0K. It ends up with a machine thats mostly multi stage cooler, with a chip on the tip. Are photonics the same?
Man, you are remarkable. Btw, do u do financial consulting for tech companies? Or plan to do in the future?
Deployed Worldwide Through My Deep Learning AI Research Library. Thank You
8:23 - I'm not very surprised. I figured it could be done. That kind of manufacturing isn't in my wheelhouse. But, architecture is, haha.
I created a Neural network I trained to work as a Binary ALU. Even better for this I trained "Cells" which act as logic gates and I would love to see my data encoded into glass such that it could function as a full on ALU in light.
What is the point of using a neural network as an ALU
I saw that analog computing could have conversion to digital and back between a few steps to recover accuracy, with some circuit tradeoffs
I see this being a much more viable path to future computing than quantum computers. Even if the chips are substantially bigger, they'll use far less energy and won't require cooling in the same way as traditional transistors. I think it's really exciting and I hope to see this continue to grow and advance!
Check out Xanadu Photonics, squeezed state photons make quantum computing also possible in photonics - although the photodetectors have to be cooled in a liquid oxygen bath.
Congratulations for the very interesting and informative video.
However, I guess that probably you meant femto rather than petajoule per MAC.
Furthermore, the speed of light is invoked inopportunely both for justifying the very large frequencies and the short time of computation. In optical fibres light propagates 1.5 times slowlier than in empty space, and in SOI waveguides it does even 2.8-3 times slower; by contrast, the RF or microwave signal in a modulator travels faster. And in general, electricity propagates to a speed comparable to c, because it's the electromagnetic field to propagate it, not the electrons in the metals, which, as a whole, drift by cm/h (when applying DC).
The point is that in photonics you use dielectrics like glass, silica or intrinsic Si, therefore absorption is much smaller than in a conductive material; this would be evident if the same circuit were implemented with microwaves on a microstrip. However, the problem with metal connections when using high clock rates in digital circuitry is that you have to charge and discharge the parasitic capacitance of those lines.
About the second misconception, an electrical circuit would be much slower with respect to the MZI mesh because of its RC time constants: it's not that the electric signal propagates slowlier, it's that the transient is much longer.
Regarding the 1980s Bell Labs research you mentioned, I guess that they made an optical computer, I doubt (but I should check) that it was based on optical transistor as that technology is still at the proof of concept phase. However, this does not change your point, that announcements about silicon photonics neural networks replacing TPUs must be taken with caution.
Have they tried encodings that use transitions rather than levels? Id guess probably so but going off the graphics in the video it makes me wonder. For instance, no transition = 0, transition = 1, and the level is never really looked at just edges. Or pulse of light = 1 and nothing = 0. Probably both cases have a start byte for Synchronization. But the pulse idea has an issue that switching speed must be way way faster than the previous techniques... more used for low speed power line communications e.g. like DALI (the same wires that power the device can short the wires together instantaneously for communications, while the power rails filter out the communications with L and C.
There's also positive edge = 1 negative edge = 0 ... that also makes you have to be able to switch much faster at TX side but also easy on the Rx side less error prone. I'd assume error correcting codes were used???
Also possibilities like QAM constellations can boost bandwidth exponentially but SNR must also increase. You could even construct more constellation points in more dimensions than 2 with multiple light frequencies on the same line
Maybe a trial and error thing after these results figuring out what schemes make the systems fastest I suppose
Hmm I get the hunch that these chips will go the way of transputers, as per nice idea at the time, but I feel something far smaller is around the corner. There is a Chinese researcher who is doing it on the molecular scale using crystal lattices and the doping of them with different atoms in order to change their topological properties. The light therefore behaves according to the structure of the lattice. You can us entangled photons to send signals and the photons themselves are in a squeezed light state. I understand the doping is done using a femtosecond laser. I'm a little unclear on the details but it is the leading edge in photonic computers. The work is published in respectable journals but still highly experimental.
Lightmatter has an upcoming talk about doing this at wafer scale coming up in 2 weeks at Hot Chips. I hope you can tell us what they're up to! (And Ranovus).
Paralell operation can be achieved using different frequency lights on the same chip simultanously.
A small note, a petajoule is the amount of energy unleashed by a 250 kt nuclear bomb. You probably mean femtojoule. :)
The structure of feedforward deep neural networks is unfortunately very sensitive to computation error which is why typically these often employ at least 32-bit floating point arithmetic. Backpropagation of these networks to update weights through many layers can result in cumulative error which limits model performance. For optical scaling operations, there are additional error sources due to quantum detection fluctuations, flaws in the optical system that cause scattering and coherent noise, sampling and quantization error, not to mention power consumption from electro-optical interfaces that can be quite substantial. There may be neural networks for which optical scaling operations are suitable, however, the conventional feedforward deep neural network, because of its reliance on precision matrix multiplication operations so that backpropagation can be performed using the adjoint operation, is going to be quite challenging.
There are plenty of ideas and simulations floating around for this but very little in the way of actually attacking the real issues surrounding optical neural network implementations, just mostly hype.
I don't think anyone is interested in training on photonic accelerators, it's all inference. Quantization is very commonly employed to make inference cheaper, which results in errors similar to photonic accelerators, though smaller in magnitude (IIRC current photonic accelerator designs get 2-4 bits of precision, classical inference accelerators are typically in the 8-16 bit range). So I think most of what you're saying here is a non sequitur with regard to the published research.
@@taktoa1 Run something as simple as MNIST on an optical accelerator and get 99% accuracy and then we'll talk. The key with digital quantized neural networks is that despite the fact they're quantized they are also deterministic, that is, given an input, the output is the same each time, as there is no measurement noise. Therefore if you train with quantization error, the network can learn that error. However, analog physical systems have measurement error. It's not just that the optical system achieves the "equivalent" of 2-4 bits of precision, its that no matter how many average photons are used to represent a signal, there are going to be measurement outliers. Due to the nonlinear operations of ReLu and Maxpool, outliers due to measurement error can accumulate in deep neural network layers. So it seems to me that having many deep layers and nonlinear operations like ReLu and Maxpool make it extremely difficult for an analog multiplier, especially one susceptible to quantum noise, is going to produce reproducible, reliable inference. Because of the extreme sensitivity of feedforward neural networks to cumulative error, if training is performed digitally for inference that is to occur on an analog/optical computer, the training model must be extremely accurate, including effects of quantization, noise sources including Poisson, thermal, coherent noise, system manufacturing error, etc., and even then the variation due to measurement error may limit the ultimate inference accuracy. It may be required to train a neural network for each physical system because the manufacturing tolerances of two different optical chips may be too different for a network trained on one chip to work on another chip.
Biological neural networks seem to work quite effectively without being deterministic despite the fact these are implemented on analog computer wetware. Deep feedforward neural networks seem like a poor fit for analog computing, especially quantum noise limited computing for which the power consumption is directly influenced by the number of photons required to achieve a certain SNR due to Poisson noise (SNR being proportional directly to the square root of power, and so SNR increasing only slowly with increased power consumption). Even other solutions that use electric charge (mythic.ai/) with similar electric charge quantization problems are limited in the number of layers that can be implemented.
The whole reason why feedforward deep neural networks were created in the first place is because backpropagation is possible using a bit of clever calculus and the chain rule. Training is the problem, because if you don't have any other kind of neural network you can effectively train that is resistant to measurement error, analog computation is not going to be a viable solution for neural network inference. Neural network accelerators like the Tensor processor have sucked all of the air out of the room for research into any other kind of neural network architecture, and as long as this is the case, the market will not care about analog computers because the current feedforward deep neural networks were created for deterministic, digital machines.
Correction: high speed transmission lines of electricity also travel at the speed of light... electronic circuits are physically electromagnetic waves, just much of the time circuit theory is an adequate tool and very simplified. While the overall movement of electrons themselves, ie, electron drift, is incredibly slow (shockingly slow if you didn't know already)
Typical FR4 dielectric on a pcb slows the waves by a factor of 4. Fibre optics also slow the light. Depends on the material how much
Thursdays I fry my brain with First We Feast in the morning and then educate myself at night with Asianometry.
Considering less heat is generated, in fact almost none at all, wouldn't it be possible to stack the dies vertically? The yield rate would have to be very high.. but you could stack thousands of these chips on top of each other, correct?
What would be pretty wild would be using both time offsetting and wavelength multiplexing to increase throughput. If I understand it, it would be like light based hyperthreading, except you could do 3, 4, or more threads all independently. I guess it would just rely on how passive the structures would actually be.
I worked on this datacom side for photonic switches. The thing about wavelength multiplexing (WDM) when used with MZIs is that crosstalk can be a killer, depending on the MZIs used. Depending on the interconnect topology used for the MZI mesh, crosstalk can cascade through the MZI mesh ultimately increasing the "noise" level beyond practicality. This also inhibits scaling these meshes out, as you could imagine!
OK. It seems like old video but also just released. It will probably be fantastic.
Excellent channel
At ~3:30 You say the image data is matrix multiplied by the weights, which is true for the weights in first layer, but at each NN layer a matrix multiply is done. Not a big deal of course, but maybe this comment prevents some future confusion for some folks
Something that most of the pro photonic accelerators often ignore, whether in academia or industry, is the power consumed in the LASER!!! They often don’t include that to make their efficiency numbers look hot!
How much is it, and is it getting lower?
You say that the MACs multiply at high precision. But high precision is not actually required for neural network inference (nor trainig).
Was wondering when you would cover this.
Can you do a report on Rigetti computing?
"old" silicon can still compete well into the future, well, if we replace silicon for another material (yes that is in research since +10 years) we could get the same result with less electricity used, thus faster and more efficient computers. The "Real" fun begins when scientists achieves room-temperature superconductivity, that would enable computers running Much, much faster than current computers while using close to zero in electricity (as superconductivity would allow electrons to flow with no resistance, thus using electricity solely for the calculations/data movement through the material)
Fascinating. Thank you!
Wow...this is incredible!
If accuracy can be improved and scaled then it can be used for inference?
When Deep Blue beat Garry Kasparov, some wag said: "Sure, but how did it do in the post-game interview?" Probably wouldn't be hard to train a neural network to give trite answers to trite questions, with a few quips thrown in. "Mr Deep. Can I call you Deep, or do you prefer Blue". "Whichever you like." "OK Deep, how do you think Mr Kasparov played?" "Pretty well - for a human.". "Why didn't you take his pawn at move 35?" "It wins at depth 6, but loses at 16. Humans are so slow."
And today we have Meta's chatbot dissing Zuck! 🤣
the bit-flip that caused that move really broke Kasparov
Holy crud.... That is insane. This reminds me of the x64 jump, and I think it will be as, if not more, significant.