The babel image archive (and its text counterpart) are really just a glorified two-way hashing function. The text archive has the nice property of being one long stream of identically-distributed chatacters, so `Babel(n+1)` = `Babel(n)[1..]`, and `Babel^-1([a, b, ..., z])` is easy to compute. You effectively recreated Base65536, with a lossy compression applied to the image beforehand (grayscale followed by quantization), and of course a neat little GUI to make it look nice
An 8-bit greyscale image of the resolution described is significantly smaller than the actual hash described (talking about the Babel archive), even uncompressed.
Now what would elevate this more is making a numeral network find an image of m×n size using as little characters as possible and make it search for the writing text challenge using something similar to Tom7's uppestcase and lowestcase algorithm. Also bonus points if it's only using printable Latin ascii. Then it would be a good sigbovik paper
I just love dumb stupid theoretical things like this, it's incredible how numbers contains the secrets and the lies about those secrets of the universe, and everything in between as well!
@@mynameissanttu Yeah no I am not the most religious person, it's just fascinating to me how math having infinite combinations and can basically prove anything that ever exists in the world. I just wanted to share what Galileo said about this.
@@mynameissanttu it's a quote from a famous scientist guy. You're the one who shares believes nobody asked for. This isn't one of those "repent for your sins! Here's a bunch of bible verses: Exodus 21, Deuteronomy 21" spam comments.
@@mynameissanttu Maybe leave your tyranny out of this. No one asked for your interpretation of it, but even so, no one stops you from commenting about it the same way they also have the right to post.
The one thing to end this off would be to create a meaning finder. basically in the original library of babel story, there are these robots that look through each and every book in the library for meaning and if it doesn't have any meaning. it gets thrown away. the easiest way to do it is to use an image classifier and ask a ChatBot if the image has any meaning like a sea of noise to a pretty picture. if it identifies that the image has any meaning, it gets stored in a folder. else, it gets discarded.
That's a cool concept in theory, but since there are so many images that could be generated, it would take so much time and processing power to find anything of value.
I have an idea for how to go about this. Encode color values as overlapping binary vectors and store as a sparse distributed memory or a self-organizing map. Then upload existing images. Once a threshold density is reached in a region of this vast latent space, you will be able to find all related images near any that were uploaded. This is roughly the architecture of the AI model I am designing.
Everything has meaning that is why it is stored in the library of babel, in English, German may sound like gibberish, but not in German. A well cyphered message should look like a random assortment of letters, but that is precisely why it is valuable, because it looks like gibberish. To a librarian every book should be valuable, because someone wanted to say something, even if it's "vvvvvvv", because if not, we would have had one less cool game.
This is a fun idea, i expected something like a web scraper somehow constantly finding new images and updating one local image that is the average of every image found by the program, but this is way better.
This is not how the babel site works, though. It doesn’t actually have the information of the image, but a seed to generate it. What you have done is encode it…
i'd really love if the seed was way smaller, like 400 characters long for a uploaded image, but we have to remember that the seed itself is the chance of every pixel on the uploaded image being where it is, so.. almost impossible to shrink it down. i thought about encoding an image to numbers, then trying to find it inside PI, so we would just need to compute PI and then use a adress to get our image back, but also the chance is so low. damn.
@@neey3832 the problem with shrinking it is that the number of different images you can possibly display depends on the number. So with 400 characters, if it's numbers 0-9 like in the website then it's a "mere" 10^400 images, which may sound like a lot but it's way to low of a number to display most (almost all) images for anything more than a tiny resolution. Using digits of PI wouldn't help either since the position of the digit would, on average, be as big of a number as the sequence stored there
It also contains all images with images within them that can be displayed at the programs resolution and color depth! (Since any image containing any image(s) at any transformation/degradation than the original size/color is in the set of all images to ever exist)
I thought of this idea and wrote this exact program about 10 years ago. There was also an art exhibit at a famous museum in the US some number of years ago that was based on this idea. I've thought about it a lot, and ways to reduce the insanely huge set of noise down to a smaller subset of "identifiable" images, but it's still sooooooooo much more data than your brain can comprehend.
Looking for signals in the noise sounds like a “diffusion” model for image generation. Which is how modern image generation AI works. “Walking” from an identifiable image to a cleaner image is how that technique progresses towards the requested image. Check it out and see if that’s what you mean.
@@RebrandSoon0000how are they going to find it in the INFINITE MESS OF RAINBOW STATIC? See, unless somebody already has that image they aren't gonna find it.
what you're specifically talking about is the utf-8 encoding. codenoodles is (implicitly) using the utf-16 encoding here; that's what wchar_t represents, on windows at least (it's not very portable). you're right that codenoodle is restricting themself to a subset though, since there are 21 "useful" bits in a unicode character.
it's funny that the babel image keys are just a different file format than png but it's still storing the image in bytes. You can also send the data of your png image lol.
Forgot what it is actually called but this kind of reminds me of the “monkey with the typewriter” thing. You could have an infinite amount of people and one (or more) could randomly generate any video on TH-cam, just in low quality of course.
I don't know if you have understood that theorem, but the point of that experiment was to show that it was impossible for any monkey to write even the slightest sensible word given infinite time. So no, no one will randomly generate a youtube video
@Alkanen well, turning an image into a compressed bit of trash and then storing that lossless is still not lossy. Jpg will gradually destroy an image if you save it over and over again. Lossless formats (like png) will not do that, even if your image format can't even store anything other than a low res greyscale. It's still lossless in the sense that you can save it over and over again. Think of a plain text editor. Compared to a book (or pdf) with page layout etc, you will lose that info in plain text. But once it's plain text, it will not deteriorat
@@borstenpinsel yes, but using that argument storing a single 0 or 1 indicating if the entire image is more or less than 50 % luminance would be great since it's lossless, whereas in any practical terms at all it would be considered utterly useless (except in some *extremely* niche applications). It won't deteriorate though ;)
@Alkanen of course but every recorded picture is already lossy. People a hundred years ago would have loved an audio recording that still sounded squeaky and pretty bad to be played a hundred times without it sounding even worse. Or the black and white pictures of the time to keep their contrast and not bleach over time. And if you need to compress data down until the original isn't recognised anymore the storage isn't meant for that data. But it's still lossless. Greyscale bitmap exists. So does (or did) 16 bit bitmap. Then came 24bit and now we have 32. This was normal a few decades ago.
i got my own image archive (since 2000) 150 000 images, 1/4 are my own images. 1000+ folders, almost half a terabyte. that would take years to scan through, lol. but i still search for old pics regularly (it takes usually 30 secs to find an image manually thanx to several levels of subfolders, up to 20 maybe). sometimes i create new folders/subfolders but it was pretty much optimized even 10 yrs ago. it's hard to take care of that many images. and yes i got several backups, lol
"The Library of Babel" by Jorge Luis Borges... the funny thing is that ALL the images that can be displayed in 640x416 or in 4K etc are finite. And this also applies to videos. Every 4K video that can be displayed in a monitor is a member of a finite set!
Imagine that every chunk of digital information, like an image, a video or a program, it's a limited set of 1s and 0s at the base level. So, it means that every single image, for example, is a single integer number, a very big number. If you convert base 2 (binary) to base 10 (decimal) you have a number that can describe an image, or any digital data. There are many ways to encode images, some more efficient than others, and basically you created a raw image encoder. The unicode representation is actually equivalent to the binary representation. If you take the PNG format for example, a lossless efficient format, you could have very similar results. The thing about unicode is that not every character is printable, and different softwares deal with certain characters differently. If you rename a PNG to TXT and open it in notepad for example, you'll see equaly strange chars, like the ones you have in your encoder, and if you try to copy and paste them into another notepad and save it with PNG extension, chances are that it would be corrupted, because some non printable chars would not be copied or pasted properly. So basically, what Babel Image Archives is doing is not giving you a path to the image, but the whole image information, in a decimal format.
Another idea is that you could open the jpeg (or whatever image format you have) file of the image you want to "find" in a hex editor, copy the content there, and make that the be string that identifies the image's location. Then on the backend, you can just use that file's location as the content of your image.
If it uses base 10, a lot of storage is wasted because those numbers still use 8 bit. But i also did not see the point of using unicode instead of ascii.
@@lynnwilliam I see your point, but i disagree since the video was mainly just about his approach on that. If all you want is to find out how you can make some sort if "infinite image library" yourself, take the trivial approach and just change the file format from .png or so to .txt. Send that text to someone, have them paste it into notepad, save it as .png and its an image again.
ปีที่แล้ว +2
By this logic even a Base64 encoding of images is just getting the location in babel's library of Base64, and converting image back to normal is just getting the image from that location :D
If the location is smaller than the corresponding image file, then there must be images that can't be found there because there would already be another image in that same location as there are more images than locations. Though of course, you need to count the size accounting for any forms of compression/decompression the rendering algorithm might use, and not necessarily the raw bitmap that ends up on the screen.
@@RampagingCoder Either images that are just slightly different end up being identified as the same; or if something more scrambly like a hashing algorithm is used, then some images will end up pointing to the address of some other completely different images and cannot be actually found.
More than a program more value is in you described your exploratory journey with succeses and fails. This can help new adepts to don't afraid to experiment and don't give up on first fails. I can't wait for more 😉
technically, unicode is a character set describing those characters. the term you are looking for is utf8, which is an encoding method for unicode characters, along with utf16, etc.
As images can include text, there are images with passwords for every bitcoin wallet. Also you could make a text image, with all your passwords, upload, get the babelia hash and use it as your master password container
I would like to see a version of this program which has limited the degree of pixel variation in order to dramatically decrease the total number of images stored. Most of Babel Image Archives is noise (random pixels beside each other) but if you were to write a program which assigned a difference value between every two colors (two shades of blue are very low difference, same color is 0 difference, blue to purple is medium difference, blue to orange is large difference), and then limited the total variation between a pixel and the 25 pixels around it (5x5 block) you would force out much of the utterly random images drastically reducing the total images and rendering the ones which existed much more likely to be look like something. You could also say that overall say 1% of pixels can go up to the max differential, 10% can have up to a fairly high differential, and 25% have an above default differential the remaining pixels have a default differential rating to allow for reasonable shapes but not pure random noise.
This idea is really cool until you realise how large the space actually is. And it's not impressively large but rather boringly large. The idea is basically - you can draw anything by using a lot of pixels. Which is true, yet not as exciting. Congratz however on the execution, I loved the UI!
For the keys, just use pixel color values (for example, a 2x2 white image would be “16777216167772161677721616777216” and then encode it with base 65535, then encode the encoded key with base 65535, and you’re there
I think the problem with The Babel Image Archive is that its images are 8bit RGB, which consist of 256x256x256(8 bits/256 colors per channel)=16,777,216 colors. Multiply that by the number of pixels, 640x416=266240, you get 4,466,765,987,840 possible images. I think if they used a limited palette like 512 colors(3 bits/8 colors per sample) that the images would still look good but still be able to contain a absolutely massive library of images. They would still have a library of 136,314,880 images.
Instead of selecting characters in order how they appear in the unicode table, you could arrange them by perceived brightness. That way just by looking at the location string in an editor set to the correct width, you can actually see a B&W representation of the image. Sort of like ASCII art, except unicode.
Okay but does it have an image of travis scott on a soldier tf2 costume sitting on a beanos chair and eating a green mcrib in a space mcdonalds while he talks to goku from dragon ball who is the worker there?
4'33 I think it'd be better if it only used the "safe" Unicode Characters (i.e. NO unassigned code points, non-characters, surrogates (this can raise issues if the string is sent as UTF-16), formatting characters, control characters (including null, bell, backspace, or other non-printable characters), certain punctuation marks (as they may need to be escaped in sone programming languages), combining characters (this includes diacritical marks, which might have issues due to Unicode Normalization Forms), or Private-Use characters (these may have characteristics defined by private users that might not be desirable))
1:15 Yeah, not very hard to imagine, it's all stored as bytes, just combine the bytes together as though they were each a single digit of a Base-256 number, and you've got yourself a way of turning any data stored on a computer into a number that you can reconstruct the data from.
this is essentially what we had to do back in the Usenet days - you could only post text, so all images/videos had to be converted to ascii code and then decoded by the reader.
The important difference with the Babel website is that the "image location" is always 960,000 characters, even if the uploaded image is nothing but white pixels. And it's by design, because it's less obviously just an encoded version of the image. It's about the illusion that it's actually the image's address in the archive, and the images are not in any specific order so you never know what you find when you browse the archive.
They mentioned that at the end, where they can get the text encoding and share that, then maybe even encrypt that so it's like it has double encryption
In essence, this TH-cam video already existed out there before you made it. It was just cut into individual frames, then each frame was cut into tiny squares of the proper resolution. Every video past and future is there, even the 3D ones, which means everything in reality is there as an explorable interactive 3D render. If you wanted to, you could watch my thumbs type this comment from my point of view. You could also watch me type something I’ve never typed before, while holding a cigarette and wearing a watch I’ve never owned. lol
When debugging GBA or other old gaming consoles or even computer programmes, it's not uncommon to "image dump" the memory, so one can see whether images and graphics are stored / processed into memory correctly. So for regions where there are images it shows the images while other regions show as seemingly random colours - that's kinda what you did. Kinda cool
If you click the generate button for an infinite amount of time in this program you made you would at some point get a meme saying "Babel image archives is better lmao"
This seems like a neat situation where AI could actually be useful, and not just displacing something else. If you take an image generation model that makes an image from a prompt and a seed number, you might be able to make a program that decodes an input image back into those values, even if the input image was not made by AI. it would take way more processing power per image, but it would give a way shorter location.
At first I thought the idea was kinda cool, but the more I think about it, the more pointless it becomes. Your "location" is basically the entire image, encoded in a special format that you just came up with. Your "archive" doesn't contain anything, it just converts the encoded image that you give it into a something that can be displayed on the screen. Congrats, you just invented a slightly worse version of the bitmap file format!
If you do stick with Unicode anyways, not caring how big the URL actually is, I have a different suggestion. There are more than double the 65,536 characters in Unicode now. So any that would normally be in the reserved area could be mapped to different characters. What's more, this would also let you take out other problematic characters, like ones that look identical, or the combining characters. In short, I think you could probably create a table of 65,536 Unicode characters, and then map your bit stream to them. Most would correspond to their Unicode code point, but they wouldn't have to. It would be less performant, but I think it would probably be fine. You could even consider having the actual parsing be on the user-side, which would let you write this in JavaScript. I suspect it would still be quite fast as long as you keep the image resolution down. (And you could use the built-in to downscale any image to the resolution.)
this website is fake you know, you can literally just take a screenshot and immediately upload it and it would give you a location for the image like how it's possible it already got there? so they just trick you to think they already got it but the truth is they don't have any photo unless you uploaded it, they display a random square full of colorful pixels because that what you would think that amount of photos would look like but the truth is it's all fake. there's no way anyone could store it it's like saying there's a website that show how many ants alive on earth right now
Sooo it's basically just a compression algorythm/storage format? Saying it contains every image ever created is like saying a typewriter contains every sentence ever created. A typewriter doesn't contain the information, it just converts it to a new medium. The codes aren't a "location" or an "identifier", they're the image itself and the site just decodes them. That's not to say it's useless. A typewriter can be useful, but you call it a typewriter, not an "endless library". The whole concept seems misleading and unnecessary to me 🤷♂
Once in primary school about 12 years old (35 years ago about 1988) i was fascinated with projecting fonts and diffrent symbols at 8x8 box (with milimeter paper and pencil), once i wander what if i generate all possible combinations, then i realised it will be 2^64 combinations.... then i tried to limit it to sensible (for human ) images checkerboards, lines at every angleat different angle etc. bit it stil overhelmed me finally. My thougths could be base of some lossless comprestion algorithm but then in BASIC on ZX Spectrum this was over imagination borderes even
I wonder if you could encrypt combinations of unicode say, any given character could represent a combination of other characters and you have a key that determines how many levels of encryption has been done, so you can theoretically compress any image down to a single unicode symbol and a key that describes how many layers to unpack that single value down in order to get the actual values needed to represent the original image. Maybe it could be represented by a decimal point.
you know unicode has about 149,813 characters, and if you use a string of 16 characters for each image location, you would have about 10^82 different possible permutations/combinations of image locations, which is about the number of atoms in the observable universe...
This is of course false, since the color pallet used does not use all possible visible colors, so if the images that I have in mind is outside the color pallet used, the there is no image of it, the original program is flawed in loads of ways, like size which you explained. The problem is, if I have an image that is 1Bx1B, no program exists that already 'has' those images. It is like saying for every pixel, you have 3 values r g b, a float is 4 bytes so 12 bytes, disregarding alpha. And then every single variation of a float times 3 times the amount of pixels you have. I am not going to do that math, but there is no program that has these values. The amount of processing power/storage simply does not exist, and never will. I hate generalizations when it comes to computer programs.
This is amazing. From pure curiosity, I became interested in this topic recently and it led to me to do a calculation of how many possible unique FullHD 24-bit coloured images could exist. And the number is about 10^14 000 000 (ten to the power of fourteen million). Which is… a lot. So I was thinking how could I do something interesting with it, some cool custom program. And then I saw your video. At first I thought there's no way I had just thought about it and I got this recommendation on TH-cam, but then it really inspired me even more. Thank you so much, awesome job!
That's awesome that you had the same curiosity as me! It's insane how many images there could be even in a seemingly small resolution, but with full HD, it's practically infinate. Thanks for your comment and support!
I've always thought of the babel image archive as being dumb, when you upload an image it's just giving you a textual representation of the same data. So... what's the point?
ah yes, and so does this program! In every format. And also every binary file that was ever stored and every RAM contents and .... bits = [] :: (1 :: bits) ++ (0 :: bits)
it would have to be a tiny bit more clever to generate them in a "useful" order, but that's like ... wholly irrelevant, considering how useless it is anyway :D
You see, what it SHOULD'VE done is, for the first image, make everything except the top left pixel black. make each image counting up in a base 255 counting system with red being the ones place, green being the "tens" place, and blue being the "hundreds" place. Counting in every combination of the RGB channels, we would have 255^3 or 16,581,375 different images, for a change in.. one pixel. You just continue like this, making the next pixel the "thousandth", or 255^4th place, and so on. Not only do you know that every theoretical image can now be found, you can scroll in-order instead of it being random. This would make every possible image be 255^3^(640*416), which would be a very, VERY big number.
This is conceptually and functionally different to Babel. First, the Babel location values are *numbers* not *strings of text* so their decimal representation is neither here nor there, it's just provided as a convenience. Second, and perhaps most fundamentally, you're just writing out a bitstream to a file (basically how uncompressed BMP files work), and the fact that the file can be interpreted as text is entirely irrelevant - it's almost like renaming picture.jpg to picture.wav and claiming you've converted an image to a sound!
The 0-9 numbers are so it's less obvious it's just an encoding of the image itself in a human readable format. Which is to say, it converts images into bitmaps which are base 256 and then into base 10. Now, all you need to do is go from 0 to 10^900000 and send all the results through low paid foreign labor so they can tell you if something interesting popped up.
I already made code like this to try to find a pattern in locating information within a list of data generated from combinations, my resolution to convert information into location and then converting it back to information worked well, but I soon realized that this type of program doesn't make any sense
would you be fancy to recreating the Sloot Digital Coding System if you don't fear being assassinated for creating the invention that would accelerate mankind?
The babel image archive (and its text counterpart) are really just a glorified two-way hashing function. The text archive has the nice property of being one long stream of identically-distributed chatacters, so `Babel(n+1)` = `Babel(n)[1..]`, and `Babel^-1([a, b, ..., z])` is easy to compute.
You effectively recreated Base65536, with a lossy compression applied to the image beforehand (grayscale followed by quantization), and of course a neat little GUI to make it look nice
you just made my brain fry
An 8-bit greyscale image of the resolution described is significantly smaller than the actual hash described (talking about the Babel archive), even uncompressed.
It's basically a really lousy image format for which you also need a website to open the images.
Now what would elevate this more is making a numeral network find an image of m×n size using as little characters as possible and make it search for the writing text challenge using something similar to Tom7's uppestcase and lowestcase algorithm. Also bonus points if it's only using printable Latin ascii. Then it would be a good sigbovik paper
i like your funny words magic man
I just love dumb stupid theoretical things like this, it's incredible how numbers contains the secrets and the lies about those secrets of the universe, and everything in between as well!
"Mathematics is the language in which God has written the universe."
- Galileo Galilei
@@oggunlukarmy4901maybe leave religion out of this. Have your own beliefs but I'm sure the original comment didn't ask for your beliefs.
@@mynameissanttu Yeah no I am not the most religious person, it's just fascinating to me how math having infinite combinations and can basically prove anything that ever exists in the world. I just wanted to share what Galileo said about this.
@@mynameissanttu it's a quote from a famous scientist guy. You're the one who shares believes nobody asked for. This isn't one of those "repent for your sins! Here's a bunch of bible verses: Exodus 21, Deuteronomy 21" spam comments.
@@mynameissanttu Maybe leave your tyranny out of this. No one asked for your interpretation of it, but even so, no one stops you from commenting about it the same way they also have the right to post.
The one thing to end this off would be to create a meaning finder. basically in the original library of babel story, there are these robots that look through each and every book in the library for meaning and if it doesn't have any meaning. it gets thrown away.
the easiest way to do it is to use an image classifier and ask a ChatBot if the image has any meaning like a sea of noise to a pretty picture.
if it identifies that the image has any meaning, it gets stored in a folder. else, it gets discarded.
That's a cool concept in theory, but since there are so many images that could be generated, it would take so much time and processing power to find anything of value.
I have an idea for how to go about this. Encode color values as overlapping binary vectors and store as a sparse distributed memory or a self-organizing map. Then upload existing images. Once a threshold density is reached in a region of this vast latent space, you will be able to find all related images near any that were uploaded. This is roughly the architecture of the AI model I am designing.
But what if someone finds meaning in the meaningless? What if "shlungdinghliythujn" is the funniest shit they've ever seen?
Everything has meaning that is why it is stored in the library of babel, in English, German may sound like gibberish, but not in German. A well cyphered message should look like a random assortment of letters, but that is precisely why it is valuable, because it looks like gibberish.
To a librarian every book should be valuable, because someone wanted to say something, even if it's "vvvvvvv", because if not, we would have had one less cool game.
@CodeNoodles sorry but your video is just a base6000 converter😂good work though
This is a fun idea, i expected something like a web scraper somehow constantly finding new images and updating one local image that is the average of every image found by the program, but this is way better.
Isn't this basically just making an image file type? like jpg etc.
Yes
yes it's base but with lossy
Yes, It is.
Yes, but worse
YES
arent you literally just taking the pixel data dumping it into a file and then going hey, this is unicode ?
Basically lol
This is not how the babel site works, though. It doesn’t actually have the information of the image, but a seed to generate it. What you have done is encode it…
It’s basically the same thing. His site has all the images too, you just type some characters and it gets an image
i'd really love if the seed was way smaller, like 400 characters long for a uploaded image, but we have to remember that the seed itself is the chance of every pixel on the uploaded image being where it is, so.. almost impossible to shrink it down.
i thought about encoding an image to numbers, then trying to find it inside PI, so we would just need to compute PI and then use a adress to get our image back, but also the chance is so low. damn.
there are trillions googols of possible images, it's impossible to store them in a small string like that @@neey3832
@@neey3832 seed hunting my future
@@neey3832 the problem with shrinking it is that the number of different images you can possibly display depends on the number. So with 400 characters, if it's numbers 0-9 like in the website then it's a "mere" 10^400 images, which may sound like a lot but it's way to low of a number to display most (almost all) images for anything more than a tiny resolution. Using digits of PI wouldn't help either since the position of the digit would, on average, be as big of a number as the sequence stored there
not only does it contain every image in history, it also contains every image that will ever be made.
It also contains all images with images within them that can be displayed at the programs resolution and color depth! (Since any image containing any image(s) at any transformation/degradation than the original size/color is in the set of all images to ever exist)
watching ½ of the video,
and thinking: You are just
making a new file-format
for images (& its viewer).
very well done; but
nothing so special…
according to the title
Why do you write in lines like that
@@McGoomba IDK*
maybe job‧deformation: DTP,
ADHD, (f)or fun \ 'cause I can.
I thought of this idea and wrote this exact program about 10 years ago. There was also an art exhibit at a famous museum in the US some number of years ago that was based on this idea. I've thought about it a lot, and ways to reduce the insanely huge set of noise down to a smaller subset of "identifiable" images, but it's still sooooooooo much more data than your brain can comprehend.
Yeah, that's the problem. If only we had a way to reduce the set of images so that it could be searched for something interesting.
Looking for signals in the noise sounds like a “diffusion” model for image generation. Which is how modern image generation AI works. “Walking” from an identifiable image to a cleaner image is how that technique progresses towards the requested image. Check it out and see if that’s what you mean.
It’s nice to see a return to these kind of coding challenges.
I feel so bad about this, but my first thought when seeing this was “so how much of it is porn?”
Not my best moment.
Lol
Technically it may have
Babel is not going to have "everying" as far as porn goes, unless they want the FBI at their door as far as CSAM and gore goes.
@@RebrandSoon0000how are they going to find it in the INFINITE MESS OF RAINBOW STATIC? See, unless somebody already has that image they aren't gonna find it.
Actually, unicode is a 32bit format, but it takes one, two, three (maybe not three) or four bytes depending on the value
what you're specifically talking about is the utf-8 encoding.
codenoodles is (implicitly) using the utf-16 encoding here; that's what wchar_t represents, on windows at least (it's not very portable).
you're right that codenoodle is restricting themself to a subset though, since there are 21 "useful" bits in a unicode character.
it's funny that the babel image keys are just a different file format than png but it's still storing the image in bytes. You can also send the data of your png image lol.
Fun fact: this site contains US military secrets as well as nuclear launch codes
Forgot what it is actually called but this kind of reminds me of the “monkey with the typewriter” thing. You could have an infinite amount of people and one (or more) could randomly generate any video on TH-cam, just in low quality of course.
The infinite monkey theorem
I don't know if you have understood that theorem, but the point of that experiment was to show that it was impossible for any monkey to write even the slightest sensible word given infinite time. So no, no one will randomly generate a youtube video
Congrats ! You just basically made computer file system. Did you ever tried to open an image with block note ?
Bro is just reinventing the .bmp file format.
i dont really understand how this is different from something like a PNG or JPG. those can technically display every single image too, cant they?
JPG isn't lossless. It is practically impossible to store all the combinations in a JPG image. PNGs can be lossless.
@@AkshayCS1sure, jpeg isn't lossless, but it's a whole lot less lossy than this project is
@Alkanen well, turning an image into a compressed bit of trash and then storing that lossless is still not lossy. Jpg will gradually destroy an image if you save it over and over again. Lossless formats (like png) will not do that, even if your image format can't even store anything other than a low res greyscale. It's still lossless in the sense that you can save it over and over again.
Think of a plain text editor. Compared to a book (or pdf) with page layout etc, you will lose that info in plain text. But once it's plain text, it will not deteriorat
@@borstenpinsel yes, but using that argument storing a single 0 or 1 indicating if the entire image is more or less than 50 % luminance would be great since it's lossless, whereas in any practical terms at all it would be considered utterly useless (except in some *extremely* niche applications). It won't deteriorate though ;)
@Alkanen of course but every recorded picture is already lossy. People a hundred years ago would have loved an audio recording that still sounded squeaky and pretty bad to be played a hundred times without it sounding even worse. Or the black and white pictures of the time to keep their contrast and not bleach over time. And if you need to compress data down until the original isn't recognised anymore the storage isn't meant for that data. But it's still lossless.
Greyscale bitmap exists. So does (or did) 16 bit bitmap. Then came 24bit and now we have 32. This was normal a few decades ago.
i got my own image archive (since 2000) 150 000 images, 1/4 are my own images. 1000+ folders, almost half a terabyte. that would take years to scan through, lol. but i still search for old pics regularly (it takes usually 30 secs to find an image manually thanx to several levels of subfolders, up to 20 maybe). sometimes i create new folders/subfolders but it was pretty much optimized even 10 yrs ago. it's hard to take care of that many images. and yes i got several backups, lol
"The Library of Babel" by Jorge Luis Borges...
the funny thing is that ALL the images that can be displayed in 640x416 or in 4K etc are finite.
And this also applies to videos. Every 4K video that can be displayed in a monitor is a member of a finite set!
Imagine that every chunk of digital information, like an image, a video or a program, it's a limited set of 1s and 0s at the base level. So, it means that every single image, for example, is a single integer number, a very big number. If you convert base 2 (binary) to base 10 (decimal) you have a number that can describe an image, or any digital data. There are many ways to encode images, some more efficient than others, and basically you created a raw image encoder. The unicode representation is actually equivalent to the binary representation. If you take the PNG format for example, a lossless efficient format, you could have very similar results. The thing about unicode is that not every character is printable, and different softwares deal with certain characters differently. If you rename a PNG to TXT and open it in notepad for example, you'll see equaly strange chars, like the ones you have in your encoder, and if you try to copy and paste them into another notepad and save it with PNG extension, chances are that it would be corrupted, because some non printable chars would not be copied or pasted properly. So basically, what Babel Image Archives is doing is not giving you a path to the image, but the whole image information, in a decimal format.
bestie made an image format and sold it as the library of babel image archive
Another idea is that you could open the jpeg (or whatever image format you have) file of the image you want to "find" in a hex editor, copy the content there, and make that the be string that identifies the image's location. Then on the backend, you can just use that file's location as the content of your image.
i took a picture after this video so the program is obsolete
It would already be on there
@@ewetwentythree my picture doesn’t identify as an image so that’s cap.
Am I wrong to assume that this is just a naive bmp encoding?
Cool project nonetheless.
The number of digits to represent the image location was never a problem. I don't know why you thought you had fix it.
If it uses base 10, a lot of storage is wasted because those numbers still use 8 bit. But i also did not see the point of using unicode instead of ascii.
Nothing wrong with wasted storage, he wasted the coolest parts of the video on that.
@@lynnwilliam I see your point, but i disagree since the video was mainly just about his approach on that. If all you want is to find out how you can make some sort if "infinite image library" yourself, take the trivial approach and just change the file format from .png or so to .txt. Send that text to someone, have them paste it into notepad, save it as .png and its an image again.
By this logic even a Base64 encoding of images is just getting the location in babel's library of Base64, and converting image back to normal is just getting the image from that location :D
could you mash together every meme from the 2000's so we can have the funniest image on earth
If the location is smaller than the corresponding image file, then there must be images that can't be found there because there would already be another image in that same location as there are more images than locations. Though of course, you need to count the size accounting for any forms of compression/decompression the rendering algorithm might use, and not necessarily the raw bitmap that ends up on the screen.
there is no difference between converting it and "finding it"
@@RampagingCoder Either images that are just slightly different end up being identified as the same; or if something more scrambly like a hashing algorithm is used, then some images will end up pointing to the address of some other completely different images and cannot be actually found.
just think of it this way, the "address" is just the conversion data of the original image, with some quality loss of course@@tiagotiagot
Imagine someone’s just casually scrolling through this and then they find like a pic of their home
More than a program more value is in you described your exploratory journey with succeses and fails.
This can help new adepts to don't afraid to experiment and don't give up on first fails.
I can't wait for more 😉
you can use the first bit to encode a secret message into the data
So. just invented bmp format.
technically, unicode is a character set describing those characters. the term you are looking for is utf8, which is an encoding method for unicode characters, along with utf16, etc.
hello df user
No, the term he was looking for is utf-16. Because that's what he used
@@BromeoWuggles hello df usr
As images can include text, there are images with passwords for every bitcoin wallet.
Also you could make a text image, with all your passwords, upload, get the babelia hash and use it as your master password container
I would like to see a version of this program which has limited the degree of pixel variation in order to dramatically decrease the total number of images stored. Most of Babel Image Archives is noise (random pixels beside each other) but if you were to write a program which assigned a difference value between every two colors (two shades of blue are very low difference, same color is 0 difference, blue to purple is medium difference, blue to orange is large difference), and then limited the total variation between a pixel and the 25 pixels around it (5x5 block) you would force out much of the utterly random images drastically reducing the total images and rendering the ones which existed much more likely to be look like something. You could also say that overall say 1% of pixels can go up to the max differential, 10% can have up to a fairly high differential, and 25% have an above default differential the remaining pixels have a default differential rating to allow for reasonable shapes but not pure random noise.
This idea is really cool until you realise how large the space actually is. And it's not impressively large but rather boringly large. The idea is basically - you can draw anything by using a lot of pixels. Which is true, yet not as exciting.
Congratz however on the execution, I loved the UI!
So it's a very roundabout image file codec.
For the keys, just use pixel color values (for example, a 2x2 white image would be “16777216167772161677721616777216” and then encode it with base 65535, then encode the encoded key with base 65535, and you’re there
I think the problem with The Babel Image Archive is that its images are 8bit RGB, which consist of 256x256x256(8 bits/256 colors per channel)=16,777,216 colors. Multiply that by the number of pixels, 640x416=266240, you get 4,466,765,987,840 possible images. I think if they used a limited palette like 512 colors(3 bits/8 colors per sample) that the images would still look good but still be able to contain a absolutely massive library of images. They would still have a library of 136,314,880 images.
the amount of possible images is actually 16777216^266240, which is a number with 1.9 million digits
@@progect3548 how did you come to that number?
Instead of selecting characters in order how they appear in the unicode table, you could arrange them by perceived brightness. That way just by looking at the location string in an editor set to the correct width, you can actually see a B&W representation of the image. Sort of like ASCII art, except unicode.
that would be really cool
So somewhere in there are spiderman pictures?
Yep
Okay but does it have an image of travis scott on a soldier tf2 costume sitting on a beanos chair and eating a green mcrib in a space mcdonalds while he talks to goku from dragon ball who is the worker there?
Yes, it does 😆
W program@@CodeNoodles
Yes from every angle
EVEN BETTER@@David280GG
I remember having this exact idea when I was a kid, except I was picturing a folder instead of a website or program.
4'33 I think it'd be better if it only used the "safe" Unicode Characters (i.e. NO unassigned code points, non-characters, surrogates (this can raise issues if the string is sent as UTF-16), formatting characters, control characters (including null, bell, backspace, or other non-printable characters), certain punctuation marks (as they may need to be escaped in sone programming languages), combining characters (this includes diacritical marks, which might have issues due to Unicode Normalization Forms), or Private-Use characters (these may have characteristics defined by private users that might not be desirable))
1:15 Yeah, not very hard to imagine, it's all stored as bytes, just combine the bytes together as though they were each a single digit of a Base-256 number, and you've got yourself a way of turning any data stored on a computer into a number that you can reconstruct the data from.
So you basically just created a new image format?
this is essentially what we had to do back in the Usenet days - you could only post text, so all images/videos had to be converted to ascii code and then decoded by the reader.
In fact, it's just a different image storage format, not its location)
So there is a program that knows i took a picture of me using the toilet? 💀
This is EXACTLY the question i was wondering when i layed down in bed every day since 1 year. Feels good to finally have an anwser, thank so much !
The important difference with the Babel website is that the "image location" is always 960,000 characters, even if the uploaded image is nothing but white pixels. And it's by design, because it's less obviously just an encoded version of the image. It's about the illusion that it's actually the image's address in the archive, and the images are not in any specific order so you never know what you find when you browse the archive.
babel archive is just reinventing the wheel and which is base64 image
The image I thought of was a 3D model of a Light Blue Yoshi in Rick Astley's clothes doing the Sonic CD stare meme pose I made once for no reason.
You should make this into something bigger where you can share images to people just using the text
I think they call that a 'link'
They mentioned that at the end, where they can get the text encoding and share that, then maybe even encrypt that so it's like it has double encryption
4:37 You can store 2^16 unique values in 16 bits because 0 is itself a unique value. 65535 is just the largest value it can store.
perhaps try using the "key"s like a seed for a random number generator, so it creates the same image with the same seed
That won't work. There will be overlap and inaccessible numbers, and it is impossible to inverse engineer the seed of a random value.
@@luigidabro actually, some algorithms do have invertible seeds.
Why not using UTF-8? It can encode all unicode characters in minimum amount of bytes, and is the default encoding for c++ strings
Keep up the good work! you are so underrated my guy.
In essence, this TH-cam video already existed out there before you made it.
It was just cut into individual frames, then each frame was cut into tiny squares of the proper resolution.
Every video past and future is there, even the 3D ones, which means everything in reality is there as an explorable interactive 3D render.
If you wanted to, you could watch my thumbs type this comment from my point of view. You could also watch me type something I’ve never typed before, while holding a cigarette and wearing a watch I’ve never owned. lol
Once i had done the same opposite thing in python... for my game, i converted all data into an image and decoded the image to load the data
When debugging GBA or other old gaming consoles or even computer programmes, it's not uncommon to "image dump" the memory, so one can see whether images and graphics are stored / processed into memory correctly. So for regions where there are images it shows the images while other regions show as seemingly random colours - that's kinda what you did. Kinda cool
If you click the generate button for an infinite amount of time in this program you made you would at some point get a meme saying "Babel image archives is better lmao"
True
So basically it's an image compression algorithm right?
Also that's really cool
No, since there is no compression going on. It's more like a differently encoded bmp file.
Sloot would be proud.
More like inflation algorithm
I think you need to look into vector representation spaces. Which coverd what you are already doing
Does it even have frames of images?
This seems like a neat situation where AI could actually be useful, and not just displacing something else. If you take an image generation model that makes an image from a prompt and a seed number, you might be able to make a program that decodes an input image back into those values, even if the input image was not made by AI. it would take way more processing power per image, but it would give a way shorter location.
Funny thing is when you use the search feature Bable requests you upload an image.
At first I thought the idea was kinda cool, but the more I think about it, the more pointless it becomes.
Your "location" is basically the entire image, encoded in a special format that you just came up with. Your "archive" doesn't contain anything, it just converts the encoded image that you give it into a something that can be displayed on the screen.
Congrats, you just invented a slightly worse version of the bitmap file format!
@@chrizTwoEight so was the original babel canvas website, just with different encoding and with encryption.
If you do stick with Unicode anyways, not caring how big the URL actually is, I have a different suggestion.
There are more than double the 65,536 characters in Unicode now. So any that would normally be in the reserved area could be mapped to different characters. What's more, this would also let you take out other problematic characters, like ones that look identical, or the combining characters.
In short, I think you could probably create a table of 65,536 Unicode characters, and then map your bit stream to them. Most would correspond to their Unicode code point, but they wouldn't have to. It would be less performant, but I think it would probably be fine.
You could even consider having the actual parsing be on the user-side, which would let you write this in JavaScript. I suspect it would still be quite fast as long as you keep the image resolution down. (And you could use the built-in to downscale any image to the resolution.)
this website is fake you know, you can literally just take a screenshot and immediately upload it and it would give you a location for the image like how it's possible it already got there? so they just trick you to think they already got it but the truth is they don't have any photo unless you uploaded it, they display a random square full of colorful pixels because that what you would think that amount of photos would look like but the truth is it's all fake. there's no way anyone could store it it's like saying there's a website that show how many ants alive on earth right now
So you're an idiot?
@@cambodia-rocks yes bro, did i miss anything?
This is very intresting and cool!
Congrats btw so far its been a bit i still remember your first few videos lol
So not only does this website have the most important historic photographs, but also MD R34? Very varied I see.
i made an image to sound converter too. you can record the sound, and you will be able see the image back again!
I am so impressed and inspired to learn more about this! Great job 😄
Sooo it's basically just a compression algorythm/storage format? Saying it contains every image ever created is like saying a typewriter contains every sentence ever created. A typewriter doesn't contain the information, it just converts it to a new medium. The codes aren't a "location" or an "identifier", they're the image itself and the site just decodes them. That's not to say it's useless. A typewriter can be useful, but you call it a typewriter, not an "endless library". The whole concept seems misleading and unnecessary to me 🤷♂
Once in primary school about 12 years old (35 years ago about 1988) i was fascinated with projecting fonts and diffrent symbols at 8x8 box (with milimeter paper and pencil),
once i wander what if i generate all possible combinations, then i realised it will be 2^64 combinations.... then i tried to limit it to sensible (for human ) images
checkerboards, lines at every angleat different angle etc. bit it stil overhelmed me finally.
My thougths could be base of some lossless comprestion algorithm but then in BASIC on ZX Spectrum this was over imagination borderes even
I wonder if you could encrypt combinations of unicode say, any given character could represent a combination of other characters and you have a key that determines how many levels of encryption has been done, so you can theoretically compress any image down to a single unicode symbol and a key that describes how many layers to unpack that single value down in order to get the actual values needed to represent the original image. Maybe it could be represented by a decimal point.
you know unicode has about 149,813 characters, and if you use a string of 16 characters for each image location, you would have about 10^82 different possible permutations/combinations of image locations, which is about the number of atoms in the observable universe...
Wow that's like saying your monitor has everything anyone has ever seen, because it can rearrange into any image just at a limited resolution.
This is of course false, since the color pallet used does not use all possible visible colors, so if the images that I have in mind is outside the color pallet used, the there is no image of it, the original program is flawed in loads of ways, like size which you explained.
The problem is, if I have an image that is 1Bx1B, no program exists that already 'has' those images.
It is like saying for every pixel, you have 3 values r g b, a float is 4 bytes so 12 bytes, disregarding alpha.
And then every single variation of a float times 3 times the amount of pixels you have.
I am not going to do that math, but there is no program that has these values.
The amount of processing power/storage simply does not exist, and never will.
I hate generalizations when it comes to computer programs.
This is purely theoric
This is amazing. From pure curiosity, I became interested in this topic recently and it led to me to do a calculation of how many possible unique FullHD 24-bit coloured images could exist. And the number is about 10^14 000 000 (ten to the power of fourteen million). Which is… a lot. So I was thinking how could I do something interesting with it, some cool custom program. And then I saw your video. At first I thought there's no way I had just thought about it and I got this recommendation on TH-cam, but then it really inspired me even more. Thank you so much, awesome job!
That's awesome that you had the same curiosity as me! It's insane how many images there could be even in a seemingly small resolution, but with full HD, it's practically infinate. Thanks for your comment and support!
birb
Does it have screenshots of my friends from the discord?
I've always thought of the babel image archive as being dumb, when you upload an image it's just giving you a textual representation of the same data. So... what's the point?
The UI of the program is nice and simple :)
what library did you use ? and you did all in c++ right?
The UI is made from a library I created called Glass. And yes, it's all programmed in C++.
ah yes, and so does this program! In every format. And also every binary file that was ever stored and every RAM contents and ....
bits = [] :: (1 :: bits) ++ (0 :: bits)
it would have to be a tiny bit more clever to generate them in a "useful" order, but that's like ... wholly irrelevant, considering how useless it is anyway :D
in fact, just the natural numbers already contain every binary sequence ever, as any binary sequence counts as a binary format of a natural number.
You see, what it SHOULD'VE done is, for the first image, make everything except the top left pixel black.
make each image counting up in a base 255 counting system with red being the ones place, green being the "tens" place, and blue being the "hundreds" place.
Counting in every combination of the RGB channels, we would have 255^3 or 16,581,375 different images, for a change in.. one pixel.
You just continue like this, making the next pixel the "thousandth", or 255^4th place, and so on.
Not only do you know that every theoretical image can now be found, you can scroll in-order instead of it being random.
This would make every possible image be 255^3^(640*416), which would be a very, VERY big number.
This is conceptually and functionally different to Babel. First, the Babel location values are *numbers* not *strings of text* so their decimal representation is neither here nor there, it's just provided as a convenience. Second, and perhaps most fundamentally, you're just writing out a bitstream to a file (basically how uncompressed BMP files work), and the fact that the file can be interpreted as text is entirely irrelevant - it's almost like renaming picture.jpg to picture.wav and claiming you've converted an image to a sound!
Bro discovered raw image formats
i got so scared when the image i thought of just suddenly popped up as the very first one
Wow, strange coincidence I guess.
How to play the game?
new *image format* just dropped
The 0-9 numbers are so it's less obvious it's just an encoding of the image itself in a human readable format. Which is to say, it converts images into bitmaps which are base 256 and then into base 10.
Now, all you need to do is go from 0 to 10^900000 and send all the results through low paid foreign labor so they can tell you if something interesting popped up.
whatever photo effect that is u made at 7:40 goes hard
ULTRAKILL FONT SPOTTED
16 bit numbers actually range from 0-65535, so it’s 65536 combinations in total.
Finally, someone else thought of this thing
Sounds like an amazing compressed file format
I already made code like this to try to find a pattern in locating information within a list of data generated from combinations, my resolution to convert information into location and then converting it back to information worked well, but I soon realized that this type of program doesn't make any sense
SO I can FIND an image by looking for all the pixel data in the archive? Amazing!
you could search randomly for a trillion years and never even come close to something
would you be fancy to recreating the Sloot Digital Coding System if you don't fear being assassinated for creating the invention that would accelerate mankind?