Windows doesn’t use UTF-8, because UTF-8 was invented very late in the NT development process. In addition, Unicode was limited to 16 bits at that point, so a fixed-width 2-byte encoding made a lot of sense.
I am till this day unsure whether that Unicode is really an UTF-16 so that there are single characters that are encoded in multi-word fashion (and special API calls are required to count the eg the length of a string), or it is something what I would call "Unicode Classic", where nothing above 16-bits exists.
Originally the NT kernel was developed with text represented as UCS-2, the predecessor of UTF-16. Nowadays windows accepts UTF-16, because it is largely backwards-compatible. NT also used ASCII + windows code page hell before that
Hilarious, I've literally had to deal with an encoding issue this week. People like to have csv templates for various data handling needs internally, and I've worked to expose some api's for them to automate things by using those. A guy sent in a template and a nbsp was parsed into an invalid unicode character. Couldn't understand why it was happening until I saw that the file was ANSI-encoded. Like ok, I get it that it's hard to fix windows and the encoding itself, but holy crap it's so easy to just make all your software (Excel, cough) encode to UTF-8 by default to mitigate this.
It is really sad this situation. I understand a developer not wanting to fix it because they think it is Microsoft fault but in this case remove all your windows binary and state that your project do not support windows. Now they let vulnerable software on the wild and people pay the price for their laziness.
Not shipping windows binaries is not really a solution because (unfortunately) windows dominates the desktop market and ceasing to support it is probably a death sentence for most desktop apps.
That sounds more like paying the price for using an unsafe OS. The fix would boil down to sidestepping windows unsafe mess of text encoding by implementing your own conversion to ANSI, because thats the only thing one can trust Windows to handle safely. The whole localised rich encoding feature is inherently unsafe until windows addresses this, because the OS implements it inconsistently. So, for external developer, the only way to use the whole OS feature is patching around it, language by language, making sure no relevant component of windows does this kind of coersion and thus introduce a vulnerability. Language and encoding specific escape-characters means language and encoding specific patching of all programs. If windows insists on calling it a feature, then being unsafe is a windows feature.
Many hours have i spent cracking software, never knowing exactly what the registers were for... That info probably would have sped things up a great deal.
How does this work if you have an ANSI based locale installed along with a Unicode based locale For example having English (US) as your main one but using Japanese Locale to run certain Visual Novels like Yosuga no Sora
I've come across the issue of windows command line parsing quite often. And it's really frustrating that if you have an attacker controlled argument (even just parts of it) that there's no reliable way to prevent RCE
2:51 "Because on this architecture, thats the first one out" The heck? RCX is the first parameter (input) to a function on x64 MS ABI. RCX, RDX, R8, R9, Stack is the order for params here
Writing an ANSI app and looking confused when Wide chars are not working is a weird thing. Maybe people thought it will "just break" so it will "just" fail to open the file instead of replacing chars but to be honest anyone who writes Windows apps and doesnt use wide chars for all APIs doesnt know what they are doing.
Wouldn't a fix for the command line input bug be conversion to ANSI *before* sanitation? Sanitizing an insanely large character set is a giant pain in the first place and could lead to more bugs in my opinion. I have a very archaic way of thinking of these things like unicode shouldn't be used for command input at all, I understand that it's important for things like multi-lingual databases and systems.
This is a lesson in how NOT to present an article in a video. You insist on the introductory parts then quickly skip over all the important parts. For some parts you're just like "see this..." expecting us to pause the video to read from article before you skip elsewhere after 5 seconds.
So. Parameters are split. Locate (possible) keywords. Translate found keywords into integers. Run the integers through a switch() to select what features to go with. Quite simple ensure that the split command substrings aren't escaping. Be more verbose.
If Windows doesn’t fix this, the only real fix is to do the conversion yourself, as soon as possible. Exactly what benefit does this implicit convertion provide if its inherently unsafe? The fix is to do what the function does, but safely, before it is called… Just make sure you don’t implicitly use any of the functionality of our unsafe functions when making syscalls. Brilliant! What’s next, implement your own user-space if you don’t like the privilege escalation features of Windows?
hey eric are you able to do some testing on the new grand theft auto vice city next gen mod and check if it's safe to run and stuff. I appreciate your efforts so much :D
"Why is it when something happens, it's always you three?"
Unicode, SQL, and 255:
Complexity, a tale as old as programming
datetime? 😂
Don’t forget free()!
@@sherpya What's wrong with DateTime?
@EnjoyCocaColaLight timezones or better the absence of 😂
Windows doesn’t use UTF-8, because UTF-8 was invented very late in the NT development process. In addition, Unicode was limited to 16 bits at that point, so a fixed-width 2-byte encoding made a lot of sense.
I am till this day unsure whether that Unicode is really an UTF-16 so that there are single characters that are encoded in multi-word fashion (and special API calls are required to count the eg the length of a string), or it is something what I would call "Unicode Classic", where nothing above 16-bits exists.
@@cameramaker It’s complicated, it’s generally UTF-16, with non-BMP character support, but with some glitches sometimes.
Originally the NT kernel was developed with text represented as UCS-2, the predecessor of UTF-16. Nowadays windows accepts UTF-16, because it is largely backwards-compatible.
NT also used ASCII + windows code page hell before that
@@cameramaker I think you're describing the difference between UTF-16 (it is variable length) and UCS-2 (obsolete, fixed length).
reminds me of the time i figured out if you put "⣿" after your message in a game you could bypass the chat filter, a lot of slurs that day
lmao that 5 year old who hacked microsoft and his dad is a security engineer (mashing buttons in login screen bypassed the password login lol)
@@Emayeah ok
lol
@@Emayeah The same thing happened with Linux mint's login at one point, but you needed to mash a physical and virtual keyboard both
@@xdcraze very sociable arent you
There’s a lot of rabbit holes with windows’ infinite layers of legacy jank, but character sets and encoding might be the deepest.
@@james-m-8285 bush hid the facts
who would win?
an gazillion, trillion big tech corporation, trying to become an monopoly like google;
An funny japanese currency symbol.
newline in japanese
Not even the C main function is safe 😭
As a Rust developer I can confirm the pain of making stuff with Windows in mind and how weirdly it handles things like this
Windows api and kernel isn’t hard? 😭
@@mila-d5bYou’re not hard
What's the code to remove zany's game ban from 2018
I feel the same pain after dealing with windows_sys and SIDs with Rust
@@mila-d5bNot that hard tbh with most things. My biggest beef is dealing with char conversion
I cant recall how I found your channel and I'm glad I did find someone worth listening to.
Hilarious, I've literally had to deal with an encoding issue this week. People like to have csv templates for various data handling needs internally, and I've worked to expose some api's for them to automate things by using those.
A guy sent in a template and a nbsp was parsed into an invalid unicode character. Couldn't understand why it was happening until I saw that the file was ANSI-encoded.
Like ok, I get it that it's hard to fix windows and the encoding itself, but holy crap it's so easy to just make all your software (Excel, cough) encode to UTF-8 by default to mitigate this.
The Windows API is full of this sort of spaghetti for backwards compatibility. Not surprised!
another backwards compatibility is your intel processor could still run DOS apps
It is really sad this situation. I understand a developer not wanting to fix it because they think it is Microsoft fault but in this case remove all your windows binary and state that your project do not support windows. Now they let vulnerable software on the wild and people pay the price for their laziness.
Not shipping windows binaries is not really a solution because (unfortunately) windows dominates the desktop market and ceasing to support it is probably a death sentence for most desktop apps.
@@Felipe_9999 not sure curl would suffer by losing the windows desktop market
That sounds more like paying the price for using an unsafe OS. The fix would boil down to sidestepping windows unsafe mess of text encoding by implementing your own conversion to ANSI, because thats the only thing one can trust Windows to handle safely.
The whole localised rich encoding feature is inherently unsafe until windows addresses this, because the OS implements it inconsistently.
So, for external developer, the only way to use the whole OS feature is patching around it, language by language, making sure no relevant component of windows does this kind of coersion and thus introduce a vulnerability.
Language and encoding specific escape-characters means language and encoding specific patching of all programs. If windows insists on calling it a feature, then being unsafe is a windows feature.
Eric putting in a serial experiments lain reference was NOT on my 2025 bingo
You must be new here
Should've been, it's basically a free square
@ unfortunately yes
YEN
I ₩on
@@EricParker imagine getting hacked because of yen symbol...
@@EricParker ₣unn¥ ₱un
Many hours have i spent cracking software, never knowing exactly what the registers were for...
That info probably would have sped things up a great deal.
How does this work if you have an ANSI based locale installed along with a Unicode based locale
For example having English (US) as your main one but using Japanese Locale to run certain Visual Novels like Yosuga no Sora
Bro...
Why are visnovs running fucking ANSI to begin with
what a way to flex your fetish
Me seeing the thumbnail: "Huh, I bet this has to do with ANSII characters"
Sees the video: LMAO
Sir actually my computer keep using 100% cpu could u please tell if it it is hacked or not
I've come across the issue of windows command line parsing quite often. And it's really frustrating that if you have an attacker controlled argument (even just parts of it) that there's no reliable way to prevent RCE
Windows, Visual Studio and Edge, stocks prices widget. Most Windows vid ever
4:19 aka transliteration, right?
2:51 "Because on this architecture, thats the first one out" The heck? RCX is the first parameter (input) to a function on x64 MS ABI.
RCX, RDX, R8, R9, Stack is the order for params here
Is there any way you can see if NL hybrid is safe I've heard so much controversy about it
i love shift-jis
Contrary to usual English pronunciation standards, "Cuckoo" is pronounced "kookoo" with more stress and a higher pitch on the first syllable.
microsoft grindset
Writing an ANSI app and looking confused when Wide chars are not working is a weird thing. Maybe people thought it will "just break" so it will "just" fail to open the file instead of replacing chars but to be honest anyone who writes Windows apps and doesnt use wide chars for all APIs doesnt know what they are doing.
0:38 Lain mentioned! Let's all love Lain!
Backward compatibility is cool but I guess sometimes it is time to rewrite wherever possible
Windows Terminal supports unicode rendering if you wanna run command prompts with support!
It sure does.
chcp 65001 is UTF-8, for example.
I used it recently to set formatting symbols to generate divided square via batch.
@DimkaTsv I love windows terminal ever since it first released I cannot lie
Wouldn't a fix for the command line input bug be conversion to ANSI *before* sanitation? Sanitizing an insanely large character set is a giant pain in the first place and could lead to more bugs in my opinion.
I have a very archaic way of thinking of these things like unicode shouldn't be used for command input at all, I understand that it's important for things like multi-lingual databases and systems.
0:35 me reference???
This is a lesson in how NOT to present an article in a video. You insist on the introductory parts then quickly skip over all the important parts. For some parts you're just like "see this..." expecting us to pause the video to read from article before you skip elsewhere after 5 seconds.
So. Parameters are split. Locate (possible) keywords. Translate found keywords into integers. Run the integers through a switch() to select what features to go with.
Quite simple ensure that the split command substrings aren't escaping. Be more verbose.
If Windows doesn’t fix this, the only real fix is to do the conversion yourself, as soon as possible. Exactly what benefit does this implicit convertion provide if its inherently unsafe? The fix is to do what the function does, but safely, before it is called…
Just make sure you don’t implicitly use any of the functionality of our unsafe functions when making syscalls. Brilliant! What’s next, implement your own user-space if you don’t like the privilege escalation features of Windows?
Thanks for telling me this right after I updated windows)
no way that breaks windows security, microsoft also does something important wrong
finally i have an easy way to refer to my channel name without doing weird symbols that most keyboards might not have lol
0x3130DB8EFC3F79D19F80
But is it stored little-endian or big-endian?
what about "pi^47"
@@gairisiuil that just looks wierd to me though i do realize youre right and thats a way to type it lol it just wont work as a handle on here or on x
@ ill be honest at the moment i dont know the difference i always have to look it up cuz i get em confused all the time
Your audio is somewhat muffled.
Classic Windows.
hey eric are you able to do some testing on the new grand theft auto vice city next gen mod and check if it's safe to run and stuff.
I appreciate your efforts so much :D
Why are you calling them "roman characters"?
It's Latin
I assumed they were called roman characters from the fact it is called romanization to write foreign (usually CJK) languages in them.
The Roman alphabet is literally a synonym for the Latin alphabet. Roman and Latin characters are the same thing.
Wrong. It's not 32 to 128 for ANSI printable range. It's 32 to 127 and 127 is DEL so really it is 32 to 126 inclusive. Nice try though.
Thats crazy
2 trillion dollar corporation
as many vulnerabilities as in open source
probably more since we can't look at their source code to check.
hey🎉
Racism on top
Edit: top comment is gay
Edit: Racism
you didn't even edit the comment what
bait used to be believable
???
First
First.
I hear ANSI and I think of ANSI.SYS and making BBS art with TheDraw