std::format doesn't to any output in c++20. This comes in c++23. There it is completely independent of streams and it is also specified to work with utf8 seamlessly, if the platform supports it.
I want a unicode-aware string class (where size(), ==, [], etc. all do the right thing and respect encoding) and a unicode-aware CC that allows me to type unicode literals literally into my source code, required by the standard!
@@pixelsupreme6824 what is the problem? This is how UTF8 works. Yes, there is a little overhead, but the format is universal and works everywhere. You do not care about the size of the buffer, because the string class that is used takes care of it. You want to be able to store UTF8 characters, eack taking between 1 and 4 bytes and to be able to access / iterate / compare them individually, just like in a normal ASCII string. I also have rolled my own UTFChar class, and UTFString, which is actually std::basic_string, which makes the entire interface that is used for std::string, to be used for UTF8 strings.
It shouldn't be called size(). Better to have numberOfGlyphs(), numberOfCharacters(), numberOfCodePoints(), these are all potentially different numbers
char8_t is just an alias to unsigned char, to complement char16_t and char32_t that where introduced in c++11. And u8string is just an alias for std::basic_string. Completely missing is any conversion or utf algorithms. So only a tiny addition.
Terribly needed, but another horrible C++ API, which everyone will hate. Maybe implement the much cleaner C APIs first. iswfc, towfc, wcsnorm, u8norm, and the whole wcs => u8 POSIX variants. I did the missing wcs parts in safeclib already. strings are unicode, and unfortunately we only have wchar_t, no u8 yet. github.com/rurban/safeclib/tree/master/src/extwchar
Crowdfund and help this guy! He is doing great work.
50:42 in case that you want to see the high-level API for C++23 to transcode strings between different encodings. Great talk! 👍
This is Amazing.
PERFECT.
What a coincidence...I had a brush with encoding hell this week. This is a great informative talk!!!
I'm a little late. But I'm currently going through encoding hell!
yes, yes, and yes. full ack!
How is std::format in cpp20 doing it? Is it as bad as streams?
std::format doesn't to any output in c++20. This comes in c++23. There it is completely independent of streams and it is also specified to work with utf8 seamlessly, if the platform supports it.
I want a unicode-aware string class (where size(), ==, [], etc. all do the right thing and respect encoding) and a unicode-aware CC that allows me to type unicode literals literally into my source code, required by the standard!
@@pixelsupreme6824 what is the problem? This is how UTF8 works. Yes, there is a little overhead, but the format is universal and works everywhere. You do not care about the size of the buffer, because the string class that is used takes care of it. You want to be able to store UTF8 characters, eack taking between 1 and 4 bytes and to be able to access / iterate / compare them individually, just like in a normal ASCII string.
I also have rolled my own UTFChar class, and UTFString, which is actually std::basic_string, which makes the entire interface that is used for std::string, to be used for UTF8 strings.
It shouldn't be called size(). Better to have numberOfGlyphs(), numberOfCharacters(), numberOfCodePoints(), these are all potentially different numbers
17:00 lol
Doesn't char8_t and u8string in C++20 fix our problems with utf-8 and unicode?
char8_t is just an alias to unsigned char, to complement char16_t and char32_t that where introduced in c++11. And u8string is just an alias for std::basic_string. Completely missing is any conversion or utf algorithms. So only a tiny addition.
Terribly needed, but another horrible C++ API, which everyone will hate.
Maybe implement the much cleaner C APIs first.
iswfc, towfc, wcsnorm, u8norm, and the whole wcs => u8 POSIX variants.
I did the missing wcs parts in safeclib already. strings are unicode, and unfortunately we only have wchar_t, no u8 yet. github.com/rurban/safeclib/tree/master/src/extwchar
Why C++ should consider the C API implementation for a better unicode support. The C API is sometimes an obstacle.