It's worth mentioning that .NET does indeed use two bytes per character, but that is only for characters in the Basic Multilingual Plane. It supports characters outside of the BMP by using surrogates, in a manner similar to UTF8. A character like 😊 requires 4 bytes to store instead of two.
@@Lemmy4555 For the sake of your sanity I'd recommend builtin functions & existing libraries for dates and timezones. And for unicode grapheme clusters.
Great talk once again! Working at a company where the core processing still runs on an IBM mainframe (EBCDIC encoding), websites that use UTF8 and support an additional language that is not English, we've had some of these issues before..! 😁
Very nice talk! Have been down the rabbit hole many times with various encodings. Will give all new starters this video to watch as a primer for how crazy the landscape is :D
I stared at the options for ordering the city names for several minutes. 'cause we don't have a definite rule either if e.g. Ö comes right after O oder if all Ä,Ö and Ü are just scrammed after Z. My German gut said Ö is a type of O. So Österreich comes after Ostern but before Zürich. :D
20:24 even in German speaking countries this is not always handled the same, because some of the dictionaries, encyclopediae, phone books, etc. these countries treat Ä, Ö and Ü just like A, O and U, others like AE, OE and UE (which is where these letters come from historically) and some put them at the end of the alphabet.
I suppose some dictionaries also (correctly) would distinguish diaeresis from umlaut, and sort them differently. (Which you can't do with just Unicode and a Locale string: you'd need a proper dictionary with the words in them)
Hi and thanks for this nice dive in the matter. A bit disappointed you didn't mention EBCDIC :-) must be because it was a microcomputer oriented talk but thanks again anyway.
16:02 If you live in IJmuiden, Steam will remember your address as Ijmuiden, which looks looks weird and I need to correct it every time. I guess even in Unicode you don't escape the Anglocentrism
14:40 As a Bulgarian, yes, that used to be a thing, yes we hated it, yes I hate being reminded that that was a thing. Thank god we dont have to deal with that BS anymore.
5:16 Minor nitpick, You said there were a lot of 4-bit microprocessors when ASCII was designed. ASCII was designed in 1969 and the first 4-bit microprocessor was invented in 1971.
He has done the same talk multiple times, yes. You don't put down this amount of work to just do it once. IMHO, this talk is only scratching the surface.
Dylan, every lecture you give is a masterpiece. We're not worthy.
It's worth mentioning that .NET does indeed use two bytes per character, but that is only for characters in the Basic Multilingual Plane. It supports characters outside of the BMP by using surrogates, in a manner similar to UTF8. A character like 😊 requires 4 bytes to store instead of two.
Was geniunanly impressed by the simplicity and complexity
Great talk! Шикарный доклад! Շատ լա՜վն ելույթն ա
Absolutely a must watch for any programmer that needs to deal with strings! Fantastic!
what about dates and timezones
@@Lemmy4555 For the sake of your sanity I'd recommend builtin functions & existing libraries for dates and timezones. And for unicode grapheme clusters.
@@Lemmy4555
*Nails scratching on chalkboard*
We don’t talk about those two.
Great talk once again! Working at a company where the core processing still runs on an IBM mainframe (EBCDIC encoding), websites that use UTF8 and support an additional language that is not English, we've had some of these issues before..! 😁
The Harry Potter email story is very impressive, even the people working at the post office understand encoding.
My guess is that they had trouble with commercial software in the post office all the time
Laughed all the way through, very informative and entertaining. Greetings from the land of Ő and Ű ;)
can't believe 43 minutes passed, thanks Dylan for those awesome 43 minutes with you :)
And only after watching this i have some understanding of how utf8 works. Thank you!
Very nice talk! Have been down the rabbit hole many times with various encodings. Will give all new starters this video to watch as a primer for how crazy the landscape is :D
I stared at the options for ordering the city names for several minutes. 'cause we don't have a definite rule either if e.g. Ö comes right after O oder if all Ä,Ö and Ü are just scrammed after Z.
My German gut said Ö is a type of O. So Österreich comes after Ostern but before Zürich. :D
"Politics creates the problems technology tries to solve" - Dylan Beattie
Hyvää Syntymäpäivää! ;) And thanks for great talk..
Torilla Tavataan!
WoW Impressive !!! অসাধারণ ।
a lot of stuff i didn’t know! Great talk!
This was extremely interesting and entertaining
Excellent talk!
41:00 you don’t drive cars out of soviet union, only tanks.
Oh I know the Chinese problem as I got it a lot while using copy past from Linux to Windows over synergy. Pasting without formatting helped ;)
Very nice talk.
20:24 even in German speaking countries this is not always handled the same, because some of the dictionaries, encyclopediae, phone books, etc. these countries treat Ä, Ö and Ü just like A, O and U, others like AE, OE and UE (which is where these letters come from historically) and some put them at the end of the alphabet.
I suppose some dictionaries also (correctly) would distinguish diaeresis from umlaut, and sort them differently.
(Which you can't do with just Unicode and a Locale string: you'd need a proper dictionary with the words in them)
@@FindecanorNotGmail diaresis is hardly used in German. I know it only from surnames like Groër (a former Austrian cardinal).
In some languages, such as Swedish, "ae" is a letter, not a ligature.
Æ would be Norwegian or Danish. In Sweden it's Ä.
you can't indent with vertical tabs. when you type vertical tab on a TTY, the page advances by "a bunch"
Hi and thanks for this nice dive in the matter. A bit disappointed you didn't mention EBCDIC :-) must be because it was a microcomputer oriented talk but thanks again anyway.
yeah, the speaker didn't mention why 8 bits were available in the first page. Or which company invented the concept of a codepage either.
I love that gay pirates are winning, and I happen to be straight.
Does Windows support the Ninja emoji? 🥷🆚🏴☠ → 🏁 ?
16:02 If you live in IJmuiden, Steam will remember your address as Ijmuiden, which looks looks weird and I need to correct it every time. I guess even in Unicode you don't escape the Anglocentrism
CP/M worked on microcomputers, not minicomputers!
14:40 As a Bulgarian, yes, that used to be a thing, yes we hated it, yes I hate being reminded that that was a thing. Thank god we dont have to deal with that BS anymore.
Ironically the Net uses NETASCII with CR LF line terminators ;)
8:49 strange that they included both kinds of phi, but no psi. Psi is used a lot in physics, even in high school physics.
5:16 Minor nitpick, You said there were a lot of 4-bit microprocessors when ASCII was designed. ASCII was designed in 1969 and the first 4-bit microprocessor was invented in 1971.
No, he said it's "fast even on a 4-bit microprocessor". And bit masking was likely a thing long before that.
@@SaHaRaSquad He said immediately that there were 4 bit microprocessors when ASCII was designed.
38:06 🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣🤣
My take on 🏳️🌈🏴☠️🏁 is "Gay Pirate Racing" because I sure as heck want to watch that on weekends!
Please leave politics out of plain text. Sincerely, guy who'd watch more hours of this enigma wrapped in an anecdote wrapped in a vest ❤
Dylan is essentially hbomberguy a bit older with more hair.
So..., what should I use? Is there a real fix for this madness?
UTF-8 helps a lot
Damn, I think I know this guy 🤔, can somebody please remind me what he is known for?
0:25 noone laughed? That was pretty funny...
Windows now supports 8 flags, but still no real national flag: 🏁🚩🎌🏴🏳🏳🌈🏳⚧🏴☠
Please keep politics in software.
Leave politics out of software. Thank you.
oh my sweet summer child
Poe's Law dictates no-one can tell if you're joking
@@RoamingAdhocrat did you watch the presentation? Specifically, the last minute?
@@masheroz must be a John!
John u're an idiot
please leave politics out of software
♲ a recycled talk of in-cohesive random facts ♲
A polished talk of historical artifacts
He has done the same talk multiple times, yes. You don't put down this amount of work to just do it once.
IMHO, this talk is only scratching the surface.