it's very strange that ascii/utf8 verification functions just return a boolean. like it would be logical to at least return the position of the last valid byte, it literally does not cost anything.
which verification function, in C3 and Rust, and presumably many other implementations, they return an error type, zozin literally uses it in this project
You can return a count of valid bytes that construct a valid utf-8 sequence and if returned value not equal to the length of the one parsed utf-8 sequence you will know that, even just by trying to advance by that value and trying to check in the middle of the continuation byte or just by seen that it's zero.
and it's really out of character for tom scoot. first time i watched it, i thought, tom scoot would not do that would he, and then he choked immediately. it's hilarious
I believe your implementation for checking valid utf-8 sequence is longer than it needs to be. Mine was literally 4 functions: calculate expected length in bytes from first byte, calculate expected length for a int32 code point, read code point from a buffer and return its int32 value or error -1 value, write code point to the buffer and return ammount of bytes written or return 0 if buffer was not enough (which is obvious because we know how much bytes each code point would be). Initial int32 value for code point is first byte ANDed by with 0xFF >> expected length, which we are know from itself just by looking at it, which I also realized I can compress with just inverting all the bits and checking backwards that value must be less than 16 thus returning 4, or less then 32 and return 3, less than 64 and return 2 or returning 0 for error. Ascii just returns 1. For each byte we multiply code point by 64 (basically shifting it upwards by 6 bits), then check current byte have top bits '10' (the first byte is masked) and return error if not, then just add byte value ANDed with 0x3F. Writing is almost the same, but backwards. The whole utf8.h file is 74 lines of C code and that's with unrelated shit inside and with conversions from xml unicode escapes on top of that.
1:00:18 my brain is too smol for bit operations I need excessive visual aid to not get repeatedly confused. With a ton of comments and some helper functions. Once naively thought to roll own Unicode lib as icu4c is just so big. That was a bottomless pit of misery. Initially I just wanted to sort "correctly". After a long while got very lost in the spec and gave up.
It took me 4 days to watch this video. Literally started it the day of and had to keep refreshing the page so TH-cam would work when I came back to play more. It's not even one of your longest, just life got in the way. I've looked at both C2 and C3, and I don't think either is particularly good. There may be some elements here and there to copy, but overall I think they're both pale imitations of C++ trying to simplify back to C. I'm sure plenty will disagree, if not with every point at least some, but I think if a language is going to proclaim itself better than C, it should 1) have arrays and strings as first-class objects in the language and default to a _view type of some kind 2) all strings should by default be f-strings if the compiler can resolve them at compile time and we should try our damnedest to avoid printf style functions as the most prevalent means of printing messages anywhere while retaining the ability to use it if truly necessary 3) have operator and function overloading, as well as UDL's 4) have RAII with defer being an option, but one that's not encouraged 5) an import system that isn't annoying, of which Rust and Python are closest but still kind of suck and most importantly 6) stop deleting useful features just because they're not the recommended method of doing things, such as goto or include. I also obsess over the names I give types, variables and functions. I tend towards the Whatever_This_Case_Is_Called for types and this_case_for_functions, but I prefer giving variables shorter and less annoying names. I don't want to type 20 characters at a time to reference a variable, and I don't want to constantly tab complete to get every variable out. I'm generally only verbose when writing prose, not code. I abhor exceptions in my own code, preferring the style of returning an error code and taking all arguments that require modification as references or pointers. I was thinking about how I'd handle that for my own language, and I'm thinking that there should be a compiler or linker flag that allows a stub to be generated which would capture all exceptions and translate them into a simple error code. That way you could disable exceptions with one fell swoop and check for an error like C's errno. I'm sure that'll leave a few people aghast, but I hate exceptions that much. I feel like most error code in "modern" languages tends to be that of explicit denial. The programmer denies that an error can happen, usually with a ! or a ? attached to the function call, and either ignores it anyhow, or the program "crashes" with a language specific stub generated message. It's the equivalent of wrapping the entirety of main with a try/catch block and just saying whatever. I'd prefer that if the programmer is going to ignore errors anyway that we don't have a bunch of random !'s and/or ?'s all over the place.
I can't necessarily speak to the intent at 1:08:00, but the C3 site says that contracts enforce both runtime *and* compile-time constraints, so it might be signaling the compiler to check lengths at comptime, if possible. Otherwise, you're depending on LLVM to (hopefully) do that for you in an optimization pass.
Even if that's the intent, it would still be better to pull it into the function body or even be a part of the header *after* the function name, parameters and types.
Hi Tsoding, a fellow minion here. Would love to see a OS and macros setup stream. As a habitual watcher, would love to personally put in place better coding habits that i have not developped over my time learning on the web. As a fellow recreational programmers I would really appreciate a good standard method to follow when programming. Take care.
Im pretty new to networking so correct me if im wrong, does http stand on top of tcp and the application? So when zozin said "it uses tcp instead of http" i assume he meant it doesn't use http, since using http also uses tcp? Any explanation is appreciated
Honestly, UTF-8 is easy to implement because you can just check a mask of top '10', remember the length of the UTF-8 sequence and just calculate what value did you get. To combat overlong characters you just switch on the length of recieved sequence and check that resulting code point is no less than a minimum code point on which you should have that length. That's the entirety of UTF-8, it's stupid simple and beautifful. Just ignore any linkage to the Unicode code points and pass int32 value to the higher level that actually will check is this a valid glyph or not, 2000 years later we gonna have more languages (emojies probably) anyway.
Why waste time implementing a UTF-8 validation function when you can waste time implementing a whole websocket server to run a set of tests that prove you don't need to waste time implementing the validation function? We're doing true 'Web Dev' today 😂 Penger! Glad we're doing the validation func for fun anyway though, I was curious how you'd pull it off
Actually I'm a web developer, but I know what UTF-8 is xDDD, me being a web developer does not mean I don't do things as embedded programming or apps programming in my spare time xD
Just because you can make web application it does not mean you're a web developer. I can make web applications too. Webdev is a state of mind, not the stack.
i just know this guy is a genius but does anyone know or can tell me what he works at or in? like what does he do for a living...i see him do fuck ton of projects but never the same thing twice...im so confused yet intrigued...if u know tell me🙏🏼
I think he said he before that he writes code for some US company when he wants and needs money, otherwise he streams. Seems like a perk of being a damn good programmer.
@@phat80well, I did it because I needed. Its not like people prostitute themselves because they want 😅. I found this chanel, in my defense, that should count to something.
The case that you're supposed to check that you're dropped TCP as soon as you're recieved a part of the frame is, first of all, is already covered, because that's what you're doing, and second is just impossible and stupid. THEY sent message and THEIR OS telled them that message was send, but it is not guarantied to be recieved already: OS just copies a buffer to the queue to be sent over TCP, you as application by default have no idea that message was already in fact recieved. That's just stupid.
Not I think about it, you can slow-loris websocket server with already wrong UTF-8 string, and if there is a different punishment for wrong UTF-8 encoding and for wrong valid UTF-8 message, you can avoid to be logged or something like that if you're not checking bytes as soon as they come in.
it's very strange that ascii/utf8 verification functions just return a boolean. like it would be logical to at least return the position of the last valid byte, it literally does not cost anything.
which verification function, in C3 and Rust, and presumably many other implementations, they return an error type, zozin literally uses it in this project
You can return a count of valid bytes that construct a valid utf-8 sequence and if returned value not equal to the length of the one parsed utf-8 sequence you will know that, even just by trying to advance by that value and trying to check in the middle of the continuation byte or just by seen that it's zero.
watched it live! absolute penger
tom scott mentioned lesgoo
I love your javascript tutorials
3:18 The vape meme is pretty funny tho lol
and it's really out of character for tom scoot. first time i watched it, i thought, tom scoot would not do that would he, and then he choked immediately. it's hilarious
Brooo 1:35:20 got me.. "Suffering from success" 🤣🤣
I believe your implementation for checking valid utf-8 sequence is longer than it needs to be. Mine was literally 4 functions: calculate expected length in bytes from first byte, calculate expected length for a int32 code point, read code point from a buffer and return its int32 value or error -1 value, write code point to the buffer and return ammount of bytes written or return 0 if buffer was not enough (which is obvious because we know how much bytes each code point would be).
Initial int32 value for code point is first byte ANDed by with 0xFF >> expected length, which we are know from itself just by looking at it, which I also realized I can compress with just inverting all the bits and checking backwards that value must be less than 16 thus returning 4, or less then 32 and return 3, less than 64 and return 2 or returning 0 for error. Ascii just returns 1.
For each byte we multiply code point by 64 (basically shifting it upwards by 6 bits), then check current byte have top bits '10' (the first byte is masked) and return error if not, then just add byte value ANDed with 0x3F. Writing is almost the same, but backwards. The whole utf8.h file is 74 lines of C code and that's with unrelated shit inside and with conversions from xml unicode escapes on top of that.
56:40 if you don’t care about surrogates, you only need to check the first two extension bytes of the 4 bytes case, the others can’t overflow
1:00:18 my brain is too smol for bit operations I need excessive visual aid to not get repeatedly confused. With a ton of comments and some helper functions.
Once naively thought to roll own Unicode lib as icu4c is just so big. That was a bottomless pit of misery. Initially I just wanted to sort "correctly". After a long while got very lost in the spec and gave up.
It took me 4 days to watch this video. Literally started it the day of and had to keep refreshing the page so TH-cam would work when I came back to play more. It's not even one of your longest, just life got in the way. I've looked at both C2 and C3, and I don't think either is particularly good. There may be some elements here and there to copy, but overall I think they're both pale imitations of C++ trying to simplify back to C.
I'm sure plenty will disagree, if not with every point at least some, but I think if a language is going to proclaim itself better than C, it should 1) have arrays and strings as first-class objects in the language and default to a _view type of some kind 2) all strings should by default be f-strings if the compiler can resolve them at compile time and we should try our damnedest to avoid printf style functions as the most prevalent means of printing messages anywhere while retaining the ability to use it if truly necessary 3) have operator and function overloading, as well as UDL's 4) have RAII with defer being an option, but one that's not encouraged 5) an import system that isn't annoying, of which Rust and Python are closest but still kind of suck and most importantly 6) stop deleting useful features just because they're not the recommended method of doing things, such as goto or include.
I also obsess over the names I give types, variables and functions. I tend towards the Whatever_This_Case_Is_Called for types and this_case_for_functions, but I prefer giving variables shorter and less annoying names. I don't want to type 20 characters at a time to reference a variable, and I don't want to constantly tab complete to get every variable out. I'm generally only verbose when writing prose, not code.
I abhor exceptions in my own code, preferring the style of returning an error code and taking all arguments that require modification as references or pointers. I was thinking about how I'd handle that for my own language, and I'm thinking that there should be a compiler or linker flag that allows a stub to be generated which would capture all exceptions and translate them into a simple error code. That way you could disable exceptions with one fell swoop and check for an error like C's errno. I'm sure that'll leave a few people aghast, but I hate exceptions that much. I feel like most error code in "modern" languages tends to be that of explicit denial. The programmer denies that an error can happen, usually with a ! or a ? attached to the function call, and either ignores it anyhow, or the program "crashes" with a language specific stub generated message. It's the equivalent of wrapping the entirety of main with a try/catch block and just saying whatever. I'd prefer that if the programmer is going to ignore errors anyway that we don't have a bunch of random !'s and/or ?'s all over the place.
I can't necessarily speak to the intent at 1:08:00, but the C3 site says that contracts enforce both runtime *and* compile-time constraints, so it might be signaling the compiler to check lengths at comptime, if possible. Otherwise, you're depending on LLVM to (hopefully) do that for you in an optimization pass.
Even if that's the intent, it would still be better to pull it into the function body or even be a part of the header *after* the function name, parameters and types.
Hi Tsoding, a fellow minion here. Would love to see a OS and macros setup stream. As a habitual watcher, would love to personally put in place better coding habits that i have not developped over my time learning on the web. As a fellow recreational programmers I would really appreciate a good standard method to follow when programming. Take care.
3:25 Literally me 🤣
Im pretty new to networking so correct me if im wrong, does http stand on top of tcp and the application? So when zozin said "it uses tcp instead of http" i assume he meant it doesn't use http, since using http also uses tcp? Any explanation is appreciated
Search for tcp socket vs http server
5:20 BRO IM DYIIING AHHAH
You need to use the volkswagen npm package
You are the man tsoding! So smart! Keep it up my brother.
Dear Mr. Tsoding, would you like to do the ZverDVD unboxing video in 2024, what do you think?
Thank you....
Honestly, UTF-8 is easy to implement because you can just check a mask of top '10', remember the length of the UTF-8 sequence and just calculate what value did you get. To combat overlong characters you just switch on the length of recieved sequence and check that resulting code point is no less than a minimum code point on which you should have that length. That's the entirety of UTF-8, it's stupid simple and beautifful. Just ignore any linkage to the Unicode code points and pass int32 value to the higher level that actually will check is this a valid glyph or not, 2000 years later we gonna have more languages (emojies probably) anyway.
6:25 yes, it can
What experiments can you explore to address the challenges for C to become a memory safe language?
42:39 I cant for the life of me remember what the name/from with cartoon this sound is aaaaaa does somone knwos?
curb your enthusiasm
@horempatorhoremski6627 i love you cheer m8
As a Gernan I approve of the Autobahn thing xD
I wonder if it's possible to train a neural net to produce output to pass all tests 🤔
C does recognizes unicode code points and you enter them with "\u" and "\U". Everyone does that and the way C3 interprets "\x" is just wrong.
What is C3?
C cubed
Been watching this azozin guy for a year now and I still can't understand a word he says
very interessting series, learned alot!
he changed the name!!!
optimizing for yt algo now? one step closer to a real youtuber™
Look! The end screen is back! :))
Why waste time implementing a UTF-8 validation function when you can waste time implementing a whole websocket server to run a set of tests that prove you don't need to waste time implementing the validation function? We're doing true 'Web Dev' today 😂 Penger! Glad we're doing the validation func for fun anyway though, I was curious how you'd pull it off
Actually I'm a web developer, but I know what UTF-8 is xDDD, me being a web developer does not mean I don't do things as embedded programming or apps programming in my spare time xD
Just because you can make web application it does not mean you're a web developer. I can make web applications too. Webdev is a state of mind, not the stack.
i just know this guy is a genius but does anyone know or can tell me what he works at or in? like what does he do for a living...i see him do fuck ton of projects but never the same thing twice...im so confused yet intrigued...if u know tell me🙏🏼
He is a professional streamer
I think he said he before that he writes code for some US company when he wants and needs money, otherwise he streams. Seems like a perk of being a damn good programmer.
@@error-4518 I think he once mentioned he worked on the some internal Hooli team but i might be wrong
@@error-4518 ohh cooll!!...thanks
@@ryzh6544 thanks
Bro really dont like web developers
Who does
J blow /s
Even if you’re a web dev yourself you must hate yourself 😅
@@phat80well, I did it because I needed. Its not like people prostitute themselves because they want 😅. I found this chanel, in my defense, that should count to something.
Doesn't Node have a native module for WebSockets too? ws is a library... Maybe he does look at it. I'm just at 39:49
The case that you're supposed to check that you're dropped TCP as soon as you're recieved a part of the frame is, first of all, is already covered, because that's what you're doing, and second is just impossible and stupid. THEY sent message and THEIR OS telled them that message was send, but it is not guarantied to be recieved already: OS just copies a buffer to the queue to be sent over TCP, you as application by default have no idea that message was already in fact recieved. That's just stupid.
Not I think about it, you can slow-loris websocket server with already wrong UTF-8 string, and if there is a different punishment for wrong UTF-8 encoding and for wrong valid UTF-8 message, you can avoid to be logged or something like that if you're not checking bytes as soon as they come in.
You tee eff 8
WOW
So pengers.
the
E
Poggers