It certainly is more useful working professionally with graphics and layout. Also extend an image with generative fill in the current Photoshop beta has proven to be amazing and extremely helpful in daily work.
Would love to see an image downscaled in Photoshop, then upscaled, then have the original and the upscaled version compared at different downscaling levels to see where it gets it correct and where it starts to fall apart so to speak. What a time to be alive!
We talked about GigaGAN in the upscaling community in march. Their results look amazing, but since there is no code/models for us to validate their claims on our own input images, we remain sceptical. In contrast DiffBIR (which just released recently), or StableSR, or ResShift, DAT, SRFormer, HAT etc, we can all run them or train models for them ourselves)
Thank you, was looking for this answer as I am currently trying to find a good upscaler. I can do a decent amount of upscaling in automatic 1111 but I run out of memory quite quickly and I need to get to around 10000x10000 in pixels for a decent DPI for actually printing these generated images. As you seem knowledgable, are there any free-to-use models that I could apply? I could even pay a small sum, just not hundreds of dollars per image.
@@AgrippaTheMighty That's a bit more problematic :) There is research into video and I guess in 2 years this problem will be solved and also Stable Diffusion will be able to generate video. Video is just different in so many ways, if you take a still from a movie, even in 4k, it's horrible quality compared to a photograph in the same resolution. You could of course try to upscale video yourself, just splice the video into images then upscale each image with a low denoising strength. I'm atm upcaling a 1500x2000 image to 15000x20000, it takes around 30 minutes on a 3090rtx 😭😅. What it does is basically breaking the picture down into sub pictures, because the picture as it is does not fit into the video memory, then it upscales each sub picture, stitches them together, then it does a band pass on the stitches and goes through them using a gradient of denoising strengths (that last one was me guessing, but that's what I would have done).
I'm so happy I found this channel years ago before all that recent AI frenzy. Last week I heard someone say that AI is new and I just though that I've been holding to my pappers for years now.
I fell off my chair when I saw what this upscaler could do, from the dog to the humans, all those tiny hairs... It brings out so much detail it's hard to believe it's real. Mindblowing
Yes but the problem is they keep the model for themselves, which means it might not be that impressive in the real world. Stable Diffusion and Topaz Gigapixel are also impressive, it's like you're seeing AI upscaling for the first time.
@@Slav4o911 I've already seen upscaling, it's just that this model's quality especially with hair is outstanding. And yeah I'm aware they keep it for themselves, but that pretty much applies to any revolutionary/cutting edge technology
@@zRedPlays Both SD and Topaz can recreate hair. That's why I'm not impressed, unless I see the model for myself or other people not connected to the company test it. Also they've probably hand picked the results.
Between simulon featured in Matt Wolfe's most recent video, and GigaGAN here, it feels like a reminder that _yes, the singularity is still on schedule._
You should do a few videos showing where to get, and how to use, some of these different AIs you show. i know its not your normal thing, but it would be very helpful, and there probably isnt a channel that could do it better.
the results are really cool, but it has to be mentioned that these are not being upscaled unconditionally, they are being conditioned by a prompt (CLIP) and that's what makes the upscaling so good, also some other cool techniques used in the paper.
Yeah, but it's GAN. It works in a specific pretrained patterns (e.g. human portrait, dog, elefant) and you need separate models for every new concept. And it works well with GAN-generated images, everything else is turned to mush when converted to format in which those models can work with.
There is a huge difference between figuring out the fine details and "artistically visualizing" them. The latter can be fun and all, but only the former is useful for serious work.
Looks like we're progressively getting closer to that CSI Zoom and Enhance effect, but in reality. This theoretically opens up so much in the field of data compression. Whilst not perfectly lossless, you could store images in a literal 1/10th or even 1/100th the fraction of storage, and get a near equivalent output with only minor visual degradation.
Interesting! I think the issue is that while the results will look crisp and (probably) realistic, they'll be different each time. So pictures of your friends might not even look like your friends the next time you open the picture. That said, perhaps if you were to store the original image still in fairly high quality it could work. For example, have it upgrade the final difference between say a 2k and an 8k image -- the finer details might not be something you'd notice anyway, but would save a lot of space.
Of course the problem with using this in a CSI context is that these "enhanced" details are completely made up. I can't wait until this actually gets used in court one day and neither judge nor jury have the knowledge to call out the images thus enhanced by law enforcement as what they are: fabrications. I can even see an expert witness arguing "we are using a superhuman AI to reconstruct what the most likely details of this image look like, so it should absolutely be admissable as evidence!"
I followed lesson the last semester and my professor pointed out how difficult it is to have disentangled representation, so very Happy to see this results
Did he ever give a reason as to why? I am genuinely interested. Is it just because there are so many less ways to have separate, narrowly defined representations than a jumbled mess of undefined "black box" representations that don't make intuitive sense for human brains? So is it just like... restricting the neural network to a "human-readable" domain, in a sense?
0:53 - It almost got the Porsche 356 shape right - but not quite. I am very familiar with it. I give it 7/10 - and eagerly await being able to award 11/10!
All that was missing was combining Upscallers with AI generation to understand what is the image about and keep them as accurate as possible. And it's finally here! I'm guessing this process would cause problems for more complex images or things that are hard to identify, like a gathering or a dense jungle. But who knows.
AI image generation IS upscaling. It takes an image of 100% noise. And then it denoises it. Removes the noise. Upscalling the software first brings the original image to the target resolution. So it's not individual pixels at a lower res but giant blocks of 100s of pixels at the target higher resolution. Now this is where they are the same. The starting images just appear different. But they essentially do the same thing. The AI de-noises the image in small steps 20-100 times until the target image is found. Go watch a video on how generative AI works. There's a decoder network that translates the query or words into something the second generative network will understand. These two things all though very slightly different under the hood ARE the same thing or the same family of AI. The neural networks are structured in a remarkably similar way all though they might be trained on different data sets. I.e. the encoder networks are trained on images that were taken in high res and manually lowered so that it can compare its output to the original high res and learn from it.
2:11 I remember laughing at the scenes in movies where the nerd CIA officer enhances the face of a criminal captured from a 320x200 CCTV with the help of hitting ENTER on keyboard:) I used to play games like The Alley Cat on 80x86 IBM Ps/2 machines with CGA video card. What a time to be alive!
Just want to mention this because i can tell from the comments some people dont know. Inage Genrative AI and upscaling AI are almost the samw thing. They are remarkably similar under the hood in how they work and how they are structured. Key differences would be training. Generative had to match images to words and is trained on data sets that have pictures with words stating what the picture is or whats in it. Upscalling you take high resolution pictures, reduce the resolution. Give the network the low res ask for the high res. Compare the output to the original high res, train the network from there. It might even go deeper and do object recognition first on the lower res, to more specifically know what it should fill in the missing pixels with to be more realistic. For example upscaling a dog at low res. You dont see individual hairs, but the network sees its a dog and knows it ahould produce fine hairs in the output. Upscaling IS generative but instead of starting with pure noise, and finding the target image, it starts with a low res image thats first put into a higher target res format (not upscaled just taking individual pixels and scaling them into giant pixels) maybe it even noises this image up a bit before starting. But where they are both the same is they both take the original image, be it pure noise for generation or the lowscale image for upscale. It iteratively removes a little bit of noise in small steps doing 20-100 or more iterarions depending on the structure of that AI until your output image is found. Remove too much noise in one step and your quality goes down significantly. Its alot more nuanced than this, alot of them do things inbetween each step that are difficult to explain here. So that say query "dog with wings" doesnt produce a dog with a tiny wing on its nose and a large wing not attatched in the corner of the screen. It kinda of in somesense fact checks to the query and makes the image more relevant but this has to do with modifying the noise image on each iterative step through a seperate process. Its best to find videos that actually explain the behind the scenes of both upscaling and image generation. These are REALLY good. And its crazy hearing how it actually does it. Theres also different types of image generation. I believe i described a difusion based system. But might have the names mixed up
Whats really cool is that this could make video and image compression absurdly good. If you have a fast, local program that can upscale images in ways that are fully plausible to humans, then you only need to store a low res version. Its like how traditional compression can utilise redundancies in an image (like large blocks of the same colour), AI upscaling dependent compression could utilitse redundancy in human perception (fur looks like fur no matter the specific hair orientations). Imagine a 4k movie being stored in 10s of megabytes.
This is really incredible. However, I can already see people attempting to claim that this could make the classical crime tv "enhance" trope a reality - but the upscaled version does not contain more actual signal, so it only looks like something that could be true, but need not be.
No. Without any kind of feedback loop of the outputs it's impossible to have any kind of coherence. It's like asking two artists to draw the same image without communicating. The architecture needs significant changes in order to improve temporal coherence. Usually that also means significantly slowing down the model.
@@Sekir80 I have a degree in AI and happen to specialize in Computer Vision so I'm familiar with the kind of architecture that the paper is using and what kind of architecture is needed in order to make temporal cohesion. The problem is that at this moment the computers are not nearly strong enough to allow for temporal cohesion with a naive approach. There are some attempts being made to use a very advanced approach to solve this issue but it needs at least a year or two to produce any decent results.
@@acmhfmggru Thanks to you as well! I was asking that, because without understanding the paper (which I didn't read, at all) I can't be certain there's no feedback. But you both telling me that implies the nature of this kind of AI thing isn't meant to do it on itself. And of course, if it would have a temporal coherence part, they would be boasting about it. I think.
Looking forward to your take on the 3D Gaussian Splatting paper! There is also a new paper that just came out, which extends it to dynamic scenes. Seems to work pretty well there as well.
Wow! How long before we can tell an AI 'i want to watch a movie that is The Empire Strikes Back, but done in a Film Noir style, with Ice Cube as Han Solo and where the movie is from the perspective of The Empire being the good guys. Blend with jokes and characters from The Office, and make it 8 x 5 minute long animated cartoons in the style of The Power Puff girls. But add a twist at 3 points in the movie, based on Pulp Fiction and Snow White' and it produces a quality product that is super enjoyable? I think 14 years 3 months and 17 days (approx). That is a time to be alive, I tell you what!
During one demo, you showed an image being upscaled, then zooming in on some subsection and upscaling that too... what a great example of the power and usefulness of upscaling. If the upscaling is good enough, you could give it a map to have it draw the world I'm still just waiting for some ridiculous new compression ratios using similar techniques, though
The comparison to Stable Diffusion seems WAY off... they must be using the original SD 1.5 from a year ago instead of some of the incredible community-made finetunes or the new SDXL. I'm sure this GAN has it's use cases but I see way more mutilated images compared to SD.
@@sevret313 So very very true. Too bad Károly doesn't call them out when they do such blatant mischaracterization. I've come to realize this channel needs to be taken with a massive grain of salt over the last year, as I've gotten more into the weeds of AI projects.
This would be very useful on video, there are movies like the star wars prequels that were shot at 1080p, upscaling them to 8K would be easy with this.
Once this tech is perfected the input resolution almost won't matter to the layman anymore, now someone chooses their output resolution and its viola. Those fine hairs from nothing in the sample show that even 420p video would be crisp 4k with full detail, giving a full experience either way though not technically accurate to the 4k original if it existed.
Mind blowing. Are you going to talk about DLSS3.5 and ray reconstruction or is it not crazy enough ? Because I think I understand how it works, but I wouldn't be against a more precise explanation
Looks good, but it still lacks the common sense that humans apply. Like the scarf on the dog, a human looks at the low res dots on the scarf and thinks "that's probably a repeated pattern of the same thing, mostly likely just round white dots" but the AI tries to make a unique shape for each dot based on the way it happened to get pixellated.
If you zoom into the upscaled image and then upscale it again could you do an infinite zoom? I wonder what kind of artifacts the upscaler would create.
You could, but it would do a bad job for anything but fractal-style images. It wouldn't know how to just up the resolution. Let's say your GAN is trained on human faces like this one is, then when you zoom in it would try to make the zoomed in portion look like a human face inside of the larger human face. That would end up looking monstrous or stupid for most applications.
Is it really upscaling, though? It generates new pixels that aren't there, purely by guessing what the image shows (person, horse, etc.) so the image you see isn't actually reflecting reality, though what the AI thinks what the reality should look like.
Now I need to see this applied in stable diffusion. SD tooling is already insane, but I can't imagine how far you could go, especially with things like comfyUI.
Matching up GANs with generative AI is challenging because of how GANs are trained for very, very specific kinds of images. I guess what you could do is have a large list of GANs in A1111 or whatever and then manually select an appropriate GAN for the kind of image you've generated, but unless you're limiting yourself to pretty limited sets of subject matter, you're going to fill up your whole hard drive with GANs pretty quick. Not to mention someone actually has to make the GAN models in the first place, which is far from trivial. These are not simple or small, like Loras.
@@michaelleue7594 People using comfyui already built an habit of using different models for different tasks at each step of their composition, so it wouldn't be that far fetched doing the same with a GAN model. The super-resolution ability would certainly be fit for this workflow.
@@Exilum That's fine, if you're looking for super resolution on a portrait, or for a cat or dog, but you're gonna have a rough time if your composition is even slightly more complex than that. And frankly, there are easier ways to get great resolution on extremely simple compositions already, without resorting to giant, hyper-specific models.
@@michaelleue7594 (to clarify, comfyui allows you to work on sections of an image easily, so you can do 100x superresolution per-subject, then bind the seams using another model)
Well... The "enhance image" from Hollywood movies that anyone that ever played with Paint knew it was BS is now a real thing..... "Never say never" has never been more true.
Note for upscaling: remember that 100% of the added detail is made up, so you can't do the 'enhance' trick from TV and movies where they discover new information but unfuzzing a picture. Seeing what's there in a bad photo can give info by showing what's there, but details will still be made up.
yes it makes a prediction based on the context of the scene, which would be an approximate representation of the actual scene. nobody disagrees with that. but what suprises is, with the way this technology can upscale images, unlike old technologies, where upscaling barely made a difference that is significant.
@@businessmanager7670 Yes. Just making the point that you can't add truth to an image no matter how good the AI. Unlike TV and movies, where they get a 'true' image they can act on. Note that some people _do_ believe this can be done.
@@thekaxmax you can make a prediction of the truth though, which sometimes can be the actual Truth. that's the whole point with prediction systems like the brain and AI. sometimes our predictions can represent the truth.
What a marvel of technology. Would it be possible to have a video of how/where to access these tools? Ideally as user friendly as possible first… for example DrawThings works even on iPads but it’s SDXL only to my knowledge and not GAN? (myself I don’t even know the difference yet). Thanks for these videos (and the free Raytracing courses, started watching those and they are a blast!)
When looking at the pixelated photos, it's clear to me that it's pictures created with a mosaic effect from high resolution images. They are perfectly pixelated, no artifact (like often in JPEG format). So until someone else, like a journalist tests it on more realistic low resolution photos, I would remain caution about the upscaling claim.
Not gonna lie... This is probably the most legitimately exciting research I've seen in a while. This could very well be a first step towards ideal AI art creation, in my opinion
if this is not the best result for upscaling then what is it? would love to have an Ai that not only upscale but increases dynamic range and turns compressed or oversaturated images to be more raw looking... like turning an old phone camera look like a a new DSLR in terms of color, dynamic range, details, etc... if it makes sense.
Please correct me if I am wrong, but if it's 1 BILLION params, that probably means we cannot run it on a consumer-grade GPU with a 24GB VRAM limit. Not for mere mortals?
Upscaling keeps on blowing my mind. Perhaps more than img generation
well wasnt this one technically a mix of image generation and upscaling
Enhance…. Enhance…. Enhance…. Enhance….. I can see the suspect in the reflection of her eyes.
And we all laughed at that!!! 😮
@@sgttomas We still do. If you can see the suspect in her eyes, it will be a made-up one.
Remember when they said CSI is completely fake because they zoomed in cameras and improved picture quality😂
It certainly is more useful working professionally with graphics and layout. Also extend an image with generative fill in the current Photoshop beta has proven to be amazing and extremely helpful in daily work.
Would love to see an image downscaled in Photoshop, then upscaled, then have the original and the upscaled version compared at different downscaling levels to see where it gets it correct and where it starts to fall apart so to speak. What a time to be alive!
So now we wont only have over-compressed images, but also under-compressed images? De-compressed? over-decompressed? The future is weird man
it won't work that way, as the details are 100% daydreamed or generated.
@@sc0rpi0n0 thats the point, as a reminder to see what might have been missed in the details
a control variable, so to speak?
The point is to see how good the second image comes out and how it compares and contrasts with the original.
could be a good thing to remind lawyers that this technique is NOT "zoom and enhance" so they don't go entering AI daydreamed details into evidence.
We talked about GigaGAN in the upscaling community in march. Their results look amazing, but since there is no code/models for us to validate their claims on our own input images, we remain sceptical. In contrast DiffBIR (which just released recently), or StableSR, or ResShift, DAT, SRFormer, HAT etc, we can all run them or train models for them ourselves)
Thanks for telling about all the cool projects
Isn't the code available now?
Edit: I guess the code is available but not the model.
Thank you, was looking for this answer as I am currently trying to find a good upscaler. I can do a decent amount of upscaling in automatic 1111 but I run out of memory quite quickly and I need to get to around 10000x10000 in pixels for a decent DPI for actually printing these generated images.
As you seem knowledgable, are there any free-to-use models that I could apply? I could even pay a small sum, just not hundreds of dollars per image.
I was just wondering, how about video upscaling?
@@AgrippaTheMighty That's a bit more problematic :)
There is research into video and I guess in 2 years this problem will be solved and also Stable Diffusion will be able to generate video. Video is just different in so many ways, if you take a still from a movie, even in 4k, it's horrible quality compared to a photograph in the same resolution.
You could of course try to upscale video yourself, just splice the video into images then upscale each image with a low denoising strength.
I'm atm upcaling a 1500x2000 image to 15000x20000, it takes around 30 minutes on a 3090rtx 😭😅. What it does is basically breaking the picture down into sub pictures, because the picture as it is does not fit into the video memory, then it upscales each sub picture, stitches them together, then it does a band pass on the stitches and goes through them using a gradient of denoising strengths (that last one was me guessing, but that's what I would have done).
Since I've started watching you, I feel like I forgot how much AI has developed over 3 years, you have to look back to see how insane progress is
remember when DallE 1 was impressive, its not even an afterthought now
Gta sa as a kid vs Rdr2 as an adult
Still too early to call it "AI"... Just call it NN or ANN
@@getsideways7257 no, AI means any bot, hardcoded or with machine learning, what you are referring to is AGI
I'm so happy I found this channel years ago before all that recent AI frenzy.
Last week I heard someone say that AI is new and I just though that I've been holding to my pappers for years now.
Wow, super cool. We can use this to make decades old photo look clearer.
I fell off my chair when I saw what this upscaler could do, from the dog to the humans, all those tiny hairs... It brings out so much detail it's hard to believe it's real. Mindblowing
Okay man, but are you hurt in any way?
Do I need to call the ambulance?
Yes but the problem is they keep the model for themselves, which means it might not be that impressive in the real world. Stable Diffusion and Topaz Gigapixel are also impressive, it's like you're seeing AI upscaling for the first time.
@@Slav4o911 I've already seen upscaling, it's just that this model's quality especially with hair is outstanding. And yeah I'm aware they keep it for themselves, but that pretty much applies to any revolutionary/cutting edge technology
@@Cola-42 Nah dude my mind being blown counteracted my fall and I'm now on top of my chair
@@zRedPlays Both SD and Topaz can recreate hair. That's why I'm not impressed, unless I see the model for myself or other people not connected to the company test it. Also they've probably hand picked the results.
would have been nice if they had released the model
reminds me of ""Open""AI
@@terpy663 people run up to 30b parameter LLMs on their own pcs all the time
What is Block N Load's prodigy doing here ??
Especially because 1b parameter can fit on most consumer grade GPU nowadays
@@harrytsang1501 we do run 30b parameter LLMs on our own pcs mate. Models are just 30-60gb tenor files. Just 1 GPU can be a huge step into ML!
Between simulon featured in Matt Wolfe's most recent video, and GigaGAN here, it feels like a reminder that _yes, the singularity is still on schedule._
You should do a few videos showing where to get, and how to use, some of these different AIs you show.
i know its not your normal thing, but it would be very helpful, and there probably isnt a channel that could do it better.
I'm glad to see adversarial networks getting some more love!
the results are really cool, but it has to be mentioned that these are not being upscaled unconditionally, they are being conditioned by a prompt (CLIP) and that's what makes the upscaling so good, also some other cool techniques used in the paper.
Yeah, but it's GAN. It works in a specific pretrained patterns (e.g. human portrait, dog, elefant) and you need separate models for every new concept. And it works well with GAN-generated images, everything else is turned to mush when converted to format in which those models can work with.
Are there any public models available to use it yet? I'd love to give it a try.
Upsampling and signal processing goes this far, people who did this paper need more recognition.
There is a huge difference between figuring out the fine details and "artistically visualizing" them. The latter can be fun and all, but only the former is useful for serious work.
Looks like we're progressively getting closer to that CSI Zoom and Enhance effect, but in reality.
This theoretically opens up so much in the field of data compression. Whilst not perfectly lossless, you could store images in a literal 1/10th or even 1/100th the fraction of storage, and get a near equivalent output with only minor visual degradation.
Interesting! I think the issue is that while the results will look crisp and (probably) realistic, they'll be different each time. So pictures of your friends might not even look like your friends the next time you open the picture. That said, perhaps if you were to store the original image still in fairly high quality it could work. For example, have it upgrade the final difference between say a 2k and an 8k image -- the finer details might not be something you'd notice anyway, but would save a lot of space.
Of course the problem with using this in a CSI context is that these "enhanced" details are completely made up. I can't wait until this actually gets used in court one day and neither judge nor jury have the knowledge to call out the images thus enhanced by law enforcement as what they are: fabrications. I can even see an expert witness arguing "we are using a superhuman AI to reconstruct what the most likely details of this image look like, so it should absolutely be admissable as evidence!"
I followed lesson the last semester and my professor pointed out how difficult it is to have disentangled representation, so very Happy to see this results
Did he ever give a reason as to why? I am genuinely interested.
Is it just because there are so many less ways to have separate, narrowly defined representations than a jumbled mess of undefined "black box" representations that don't make intuitive sense for human brains?
So is it just like... restricting the neural network to a "human-readable" domain, in a sense?
0:53 - It almost got the Porsche 356 shape right - but not quite. I am very familiar with it.
I give it 7/10 - and eagerly await being able to award 11/10!
All that was missing was combining Upscallers with AI generation to understand what is the image about and keep them as accurate as possible. And it's finally here!
I'm guessing this process would cause problems for more complex images or things that are hard to identify, like a gathering or a dense jungle. But who knows.
AI image generation IS upscaling. It takes an image of 100% noise. And then it denoises it. Removes the noise. Upscalling the software first brings the original image to the target resolution. So it's not individual pixels at a lower res but giant blocks of 100s of pixels at the target higher resolution.
Now this is where they are the same. The starting images just appear different. But they essentially do the same thing. The AI de-noises the image in small steps 20-100 times until the target image is found.
Go watch a video on how generative AI works. There's a decoder network that translates the query or words into something the second generative network will understand.
These two things all though very slightly different under the hood ARE the same thing or the same family of AI. The neural networks are structured in a remarkably similar way all though they might be trained on different data sets.
I.e. the encoder networks are trained on images that were taken in high res and manually lowered so that it can compare its output to the original high res and learn from it.
Literally all upscalers work this way. Just with way less parameters and training data
2:11 I remember laughing at the scenes in movies where the nerd CIA officer enhances the face of a criminal captured from a 320x200 CCTV with the help of hitting ENTER on keyboard:)
I used to play games like The Alley Cat on 80x86 IBM Ps/2 machines with CGA video card. What a time to be alive!
The zoom and enhance that CSI Shows has used is now real.
Pretty much they had this technology 20 years ago, but only now it has surfaced 👌
Just want to mention this because i can tell from the comments some people dont know. Inage Genrative AI and upscaling AI are almost the samw thing. They are remarkably similar under the hood in how they work and how they are structured.
Key differences would be training. Generative had to match images to words and is trained on data sets that have pictures with words stating what the picture is or whats in it.
Upscalling you take high resolution pictures, reduce the resolution. Give the network the low res ask for the high res. Compare the output to the original high res, train the network from there.
It might even go deeper and do object recognition first on the lower res, to more specifically know what it should fill in the missing pixels with to be more realistic. For example upscaling a dog at low res. You dont see individual hairs, but the network sees its a dog and knows it ahould produce fine hairs in the output.
Upscaling IS generative but instead of starting with pure noise, and finding the target image, it starts with a low res image thats first put into a higher target res format (not upscaled just taking individual pixels and scaling them into giant pixels) maybe it even noises this image up a bit before starting.
But where they are both the same is they both take the original image, be it pure noise for generation or the lowscale image for upscale. It iteratively removes a little bit of noise in small steps doing 20-100 or more iterarions depending on the structure of that AI until your output image is found. Remove too much noise in one step and your quality goes down significantly.
Its alot more nuanced than this, alot of them do things inbetween each step that are difficult to explain here. So that say query "dog with wings" doesnt produce a dog with a tiny wing on its nose and a large wing not attatched in the corner of the screen.
It kinda of in somesense fact checks to the query and makes the image more relevant but this has to do with modifying the noise image on each iterative step through a seperate process.
Its best to find videos that actually explain the behind the scenes of both upscaling and image generation. These are REALLY good. And its crazy hearing how it actually does it.
Theres also different types of image generation. I believe i described a difusion based system. But might have the names mixed up
Whats really cool is that this could make video and image compression absurdly good. If you have a fast, local program that can upscale images in ways that are fully plausible to humans, then you only need to store a low res version.
Its like how traditional compression can utilise redundancies in an image (like large blocks of the same colour), AI upscaling dependent compression could utilitse redundancy in human perception (fur looks like fur no matter the specific hair orientations).
Imagine a 4k movie being stored in 10s of megabytes.
Your the only TH-camr I have notifications on for 😅I just love watching your videos 😊
WHAT A TIME TO BE ALIVE!!!
At 2:07 there's a mistake. It says "Coarse," but it's actually spelled "Horse." ;)
This is really incredible. However, I can already see people attempting to claim that this could make the classical crime tv "enhance" trope a reality - but the upscaled version does not contain more actual signal, so it only looks like something that could be true, but need not be.
This is an absolute game changer! The amount of applications this can have is just absolutely *huge*!
so the depixelization we seen in movies is finally real
Can't wait for it to become available for video restoration. It will create wonders.
I've always known GAN will be making a comeback. SD'd better watch its back!
I love the upscaling. I wonder how temporally coherent it is. In other words: can we use this for video upscaling?
No. Without any kind of feedback loop of the outputs it's impossible to have any kind of coherence. It's like asking two artists to draw the same image without communicating. The architecture needs significant changes in order to improve temporal coherence. Usually that also means significantly slowing down the model.
@@Chaosligend Dumb question: how do you know there's no feedback?
@@Sekir80 I have a degree in AI and happen to specialize in Computer Vision so I'm familiar with the kind of architecture that the paper is using and what kind of architecture is needed in order to make temporal cohesion. The problem is that at this moment the computers are not nearly strong enough to allow for temporal cohesion with a naive approach. There are some attempts being made to use a very advanced approach to solve this issue but it needs at least a year or two to produce any decent results.
@@Chaosligend Thank you! I'm not in the field, an expert's opinion is truly appreciated.
@@acmhfmggru Thanks to you as well! I was asking that, because without understanding the paper (which I didn't read, at all) I can't be certain there's no feedback. But you both telling me that implies the nature of this kind of AI thing isn't meant to do it on itself. And of course, if it would have a temporal coherence part, they would be boasting about it. I think.
Looking forward to your take on the 3D Gaussian Splatting paper! There is also a new paper that just came out, which extends it to dynamic scenes. Seems to work pretty well there as well.
I just choked my coffee when I saw that enhanced elephant image. That is a really good image to prove the quality.
I'm hoping for fast enough upscaling that you could have it as a browser addon, and zooming in doesn't make stuff pixelated.
Wow! How long before we can tell an AI 'i want to watch a movie that is The Empire Strikes Back, but done in a Film Noir style, with Ice Cube as Han Solo and where the movie is from the perspective of The Empire being the good guys. Blend with jokes and characters from The Office, and make it 8 x 5 minute long animated cartoons in the style of The Power Puff girls. But add a twist at 3 points in the movie, based on Pulp Fiction and Snow White' and it produces a quality product that is super enjoyable?
I think 14 years 3 months and 17 days (approx). That is a time to be alive, I tell you what!
I think you should ask ChatGPT that question.
After all, who knows better than an AI, what an AI will be capable of?
@@the_real_glabnurb good idea! It said 'can eat more'... not even sure what that means?
What a time to be alive....another video
I love how AI could turn blurry images into none blurry images, good job little AI👍🏻👍🏻
During one demo, you showed an image being upscaled, then zooming in on some subsection and upscaling that too... what a great example of the power and usefulness of upscaling. If the upscaling is good enough, you could give it a map to have it draw the world
I'm still just waiting for some ridiculous new compression ratios using similar techniques, though
The issue being that it is fabricating the results, not upscaling to actual truth.
The comparison to Stable Diffusion seems WAY off... they must be using the original SD 1.5 from a year ago instead of some of the incredible community-made finetunes or the new SDXL. I'm sure this GAN has it's use cases but I see way more mutilated images compared to SD.
Yes, this is the problem with researchers doing their own benchmark as they're always incentivised to make their competition underperform.
@@sevret313 So very very true. Too bad Károly doesn't call them out when they do such blatant mischaracterization. I've come to realize this channel needs to be taken with a massive grain of salt over the last year, as I've gotten more into the weeds of AI projects.
This paper is 5 months old, so no SD XL back then
What it's like to trip on acid: 1:08
This would be very useful on video, there are movies like the star wars prequels that were shot at 1080p, upscaling them to 8K would be easy with this.
Once this tech is perfected the input resolution almost won't matter to the layman anymore, now someone chooses their output resolution and its viola. Those fine hairs from nothing in the sample show that even 420p video would be crisp 4k with full detail, giving a full experience either way though not technically accurate to the 4k original if it existed.
Need something like this which can handle the continuity of video.
Finally a GAN breakthrough!! What a time to be alive!
Mind blowing.
Are you going to talk about DLSS3.5 and ray reconstruction or is it not crazy enough ? Because I think I understand how it works, but I wouldn't be against a more precise explanation
Looks good, but it still lacks the common sense that humans apply. Like the scarf on the dog, a human looks at the low res dots on the scarf and thinks "that's probably a repeated pattern of the same thing, mostly likely just round white dots" but the AI tries to make a unique shape for each dot based on the way it happened to get pixellated.
I Cant wait for this to be publicly available!!!
Now make it available for thr public to use? The upscalers out there suck
If you zoom into the upscaled image and then upscale it again could you do an infinite zoom? I wonder what kind of artifacts the upscaler would create.
You could, but it would do a bad job for anything but fractal-style images. It wouldn't know how to just up the resolution. Let's say your GAN is trained on human faces like this one is, then when you zoom in it would try to make the zoomed in portion look like a human face inside of the larger human face. That would end up looking monstrous or stupid for most applications.
Read "Blind Lake"
Skip to the bit where he says Teddie-bear 2:56
Is it really upscaling, though? It generates new pixels that aren't there, purely by guessing what the image shows (person, horse, etc.) so the image you see isn't actually reflecting reality, though what the AI thinks what the reality should look like.
Holy moist papers! Those super res upscale blows everything we currently have outta the water.
That is astounding! What a time to be alive!!!
Now I need to see this applied in stable diffusion. SD tooling is already insane, but I can't imagine how far you could go, especially with things like comfyUI.
Matching up GANs with generative AI is challenging because of how GANs are trained for very, very specific kinds of images. I guess what you could do is have a large list of GANs in A1111 or whatever and then manually select an appropriate GAN for the kind of image you've generated, but unless you're limiting yourself to pretty limited sets of subject matter, you're going to fill up your whole hard drive with GANs pretty quick. Not to mention someone actually has to make the GAN models in the first place, which is far from trivial. These are not simple or small, like Loras.
@@michaelleue7594 People using comfyui already built an habit of using different models for different tasks at each step of their composition, so it wouldn't be that far fetched doing the same with a GAN model. The super-resolution ability would certainly be fit for this workflow.
@@Exilum That's fine, if you're looking for super resolution on a portrait, or for a cat or dog, but you're gonna have a rough time if your composition is even slightly more complex than that. And frankly, there are easier ways to get great resolution on extremely simple compositions already, without resorting to giant, hyper-specific models.
@@michaelleue7594 I mean sure, I won't fight you on this. I do like bringing my guns to a sword fight though, so my hopes and dreams are still there.
@@michaelleue7594 (to clarify, comfyui allows you to work on sections of an image easily, so you can do 100x superresolution per-subject, then bind the seams using another model)
Looks like the CSI "Enhance... enhance... enhance... " technology is here with us now.
When and where can we use this? I have a ton of old cell phone pictures I would love to upscale!
I can’t wait for all these things to come standard with new desktop and laptop computers.
Fun fact Raster images can be viewed as vectors with fast upscaling, downscaling
Well... The "enhance image" from Hollywood movies that anyone that ever played with Paint knew it was BS is now a real thing..... "Never say never" has never been more true.
Just... probably don't use it in the court room
Kids in 2030: " Wow, is this CSI a memory pod or just top notch cinéma vérité? "
don't forget it's hallucinating, not actually recalling real data
I wonder how it performs on photos of birthmarks and melanoma. Does it "hallucinate" OR identify the disease based on macroscopic images?
Amazing video as always. Would really like to see the pre-pixalated images vs the upscaled ones but I'm assuming those weren't available
"Zoom! Enhance!"
The zoom and enhance feature from all those CSI shows finally works in real life!
ENHANCE! ENHANCE! 😎
I'm testing GigaGan and it works very well.
Let's put Patterson-Gimlin footage through this
can it be used to upscale videos? Consider that this feature is used in the browser and can see high quality video with low size!
Everytime mind blowing papers.
Note for upscaling: remember that 100% of the added detail is made up, so you can't do the 'enhance' trick from TV and movies where they discover new information but unfuzzing a picture. Seeing what's there in a bad photo can give info by showing what's there, but details will still be made up.
yes it makes a prediction based on the context of the scene, which would be an approximate representation of the actual scene. nobody disagrees with that.
but what suprises is, with the way this technology can upscale images, unlike old technologies, where upscaling barely made a difference that is significant.
@@businessmanager7670 Yes. Just making the point that you can't add truth to an image no matter how good the AI. Unlike TV and movies, where they get a 'true' image they can act on. Note that some people _do_ believe this can be done.
@@thekaxmax you can make a prediction of the truth though, which sometimes can be the actual Truth.
that's the whole point with prediction systems like the brain and AI.
sometimes our predictions can represent the truth.
True, that is fabricating evidence and that is a crime.
"As you can see here, if you take the reflected image from his sunglasses and ENHANCE ENHANCE ENHANCE!!! the pink elephant did it!" 😀
What a marvel of technology.
Would it be possible to have a video of how/where to access these tools? Ideally as user friendly as possible first… for example DrawThings works even on iPads but it’s SDXL only to my knowledge and not GAN? (myself I don’t even know the difference yet).
Thanks for these videos (and the free Raytracing courses, started watching those and they are a blast!)
"Zoom in. More! Now enhance resolution..."
A common movie quote now reality.
When will the public be able to try this out??
When looking at the pixelated photos, it's clear to me that it's pictures created with a mosaic effect from high resolution images. They are perfectly pixelated, no artifact (like often in JPEG format). So until someone else, like a journalist tests it on more realistic low resolution photos, I would remain caution about the upscaling claim.
What about "sticky textures" when moving the subject around? Does this one have that?
Could you do a really good infinite zoom with these upscaling techniques? ENHANCE! ENHANCE! 😀
it would become all hallucination very quickly
Hi doc please demonstrate on how can we use this demo
Has it been made available? The upscaling sounds pretty neat. run some old manga scans through it and see how it does.
No, gotta wait until they implement it somewhere and start selling it
@@Vlow52 this is adobe. I am pretty certain I know where they will sell it
seeing Geoffrey Hinton's pixelated face staring in to my soul was not what i needed tonight
Well "super resolution". It's pretty much making up the detail in the image.
What a time to be alive!!!!!!!!!!!!!
what a time to be alive!
Awesome! However... I'm wondering if I missed the memo about A.I. finally generating hands perfectly.
this new Technic is incredible
Woooow!!! I will try it in some projects!🎉
maybe it can now help viewing the skies better with a modest telescope. wat a time to be a.i. !
Not gonna lie... This is probably the most legitimately exciting research I've seen in a while. This could very well be a first step towards ideal AI art creation, in my opinion
...enhance 34 to 36, pan right and pull back, stop, enhance 34 to 47... once there, Im hooked.
Can't wait for the code to be released!
i can't wait these to be implemented in VR headsets.
if this is not the best result for upscaling then what is it? would love to have an Ai that not only upscale but increases dynamic range and turns compressed or oversaturated images to be more raw looking... like turning an old phone camera look like a a new DSLR in terms of color, dynamic range, details, etc... if it makes sense.
Detective: “Enhance… enhance… ENHANCE!! Would you look at that; I think we’ve got our guy. Send out an APB for a suspect with really strange hands.”
As someone who makes PC mods for things like AI Background Upscale (Final Fantasy IX and Chrono Cross): Is this already usable via Python or whatever?
See and click as usual…bravo!
When you're long gone, an ai will be continuing your work on youtube and it will say "what a time to be dead!" instead of "alive"
Remember those scenes in movies where they would pause a video on a security, zoom in and press the magic upscaling buttom? Well... It's now possible
Absolutly Amazing. 😄
AI is progressing so rapidly with each month producing so many advances, that papers from just few months ago feels like ages ago.
This paper is 5 months old xD
@@DajesOfficial why would you assume im talking about this particular paper
@@jbcom2416 why would you edit if that's not the case?
Out of this world! 🤩
Please correct me if I am wrong, but if it's 1 BILLION params, that probably means we cannot run it on a consumer-grade GPU with a 24GB VRAM limit. Not for mere mortals?
I used to watch your videos with excitement. Now I watch them with dread.
somehow the CSI "zoom in on the reflection" meme is becoming reality!