What are UTF-8 and UTF-16? Working with Unicode encodings

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 ต.ค. 2024

ความคิดเห็น • 37

  • @lalpremi
    @lalpremi 9 หลายเดือนก่อน +8

    That is exactly what I want to know, showing some great tools. Thank you for sharing, and have a great day:-)

    • @ErikWilde
      @ErikWilde  9 หลายเดือนก่อน +2

      Thank you!!

  • @IndianaJoenz
    @IndianaJoenz 2 หลายเดือนก่อน +1

    Thank you for a great talk with useful visuals! I make a Unix program (durdraw) for drawing Unicode and other text art, and find myself working with different character encoding regularly. Perhaps I missed it, but Utf-8's backwards compatibility with ASCII is worth considering when choosing an encoding scheme. I also liked the useful "od" syntax. I rarely encounter Utf-16, but thanks to your video I will now be able to recognize it in a hex dump.

  • @nervocalm
    @nervocalm 9 หลายเดือนก่อน +1

    Excellent visual explanation! Couldn't be clearer! I didn't know that it would choose the correct length to each character. I thought it always has a fixed length. I really would like to know more, about this in general... Headers, BO, LE, etc. I also find it very interesting and very useful to work with ETL in data engineering. If you think of something else besides the links you already shared in the description please let me know. Thank you for making this video.

  • @brm901
    @brm901 2 ปีที่แล้ว +7

    great and informative video ; thanks

    • @ErikWilde
      @ErikWilde  13 วันที่ผ่านมา

      @@brm901, thanks for the kind words!

  • @VishwaMukh
    @VishwaMukh 14 วันที่ผ่านมา +1

    Sir, Very well explained. Thanks.

    • @ErikWilde
      @ErikWilde  13 วันที่ผ่านมา

      @@VishwaMukh , thanks for the kind words!

  • @higiniofuentes2551
    @higiniofuentes2551 9 หลายเดือนก่อน +2

    Thank you for this very useful video!

  • @qeetcode
    @qeetcode 10 หลายเดือนก่อน +2

    Great explanation. Much appreciated.

    • @ErikWilde
      @ErikWilde  10 หลายเดือนก่อน

      Thanks a lot, @qeetcode!

  • @flaviomelo7893
    @flaviomelo7893 ปีที่แล้ว +1

    Hi Erik, congratulations on the video and thanks for sharing your knowledge. I am migrating an Oracle database on Solaris Sparc that is using UTF-16BE, while the destination uses UTF-8. In your opinion, what would be the best approach to converting the data source?

    • @ErikWilde
      @ErikWilde  ปีที่แล้ว +1

      Whatever migration tool you are using should really give you that option. If it does not give you that option I would look for a different tool.

  • @Soupie62
    @Soupie62 3 หลายเดือนก่อน

    If you have a CPU where every address is 16 bits wide, you may as well use UTF-16 as default. If memory is 8 bits wide, use UTF-8.
    For 32 bit (or 64 bit) you can store multiple characters per RAM address, no matter what system you choose.

    • @ErikWilde
      @ErikWilde  3 หลายเดือนก่อน

      In the end, if you care about memory efficiency, UTF-8 may be the best choice if you mostly use ASCII characters. But there (sadly) is no generally best default choice.

  • @pazaresosset6348
    @pazaresosset6348 2 หลายเดือนก่อน +1

    thanks, very interesting video

  • @nournote
    @nournote 10 หลายเดือนก่อน +1

    Thanks. Very informative.

  • @akshardrashti
    @akshardrashti 3 หลายเดือนก่อน

    Please how do I find encoding of my file

  • @gersoncjunior
    @gersoncjunior 3 หลายเดือนก่อน

    Thanks for sharing that!

  • @AshisRout-b4q
    @AshisRout-b4q 11 หลายเดือนก่อน +1

    you have a linkedin handle?
    I find this very interesting

    • @ErikWilde
      @ErikWilde  11 หลายเดือนก่อน

      www.linkedin.com/in/erikwilde

  • @parsifal8232
    @parsifal8232 ปีที่แล้ว +1

    6:29 please go into the details "byte order mark" in utf 16

    • @parsifal8232
      @parsifal8232 ปีที่แล้ว

      or general into additional byte info for example in txt files, bom withaut bom, maby how to add additional info into a jpg file (without damaging it.) ..

    • @ErikWilde
      @ErikWilde  ปีที่แล้ว +2

      A byte order mark depends on the format you are using. Specifically in Unicode the byte order mark talks about byte order in UTF-16. How to do it another day to four minutes is a very different question. For UTF-16, the byte order mark signals whether the Unicode file uses big endian or little endian format.

  • @LuisHernandez-dv4xu
    @LuisHernandez-dv4xu ปีที่แล้ว +1

    ¡Muchas gracias!

  • @human4566vv
    @human4566vv ปีที่แล้ว

    Hi thanks man, thanks for the video

  • @sabitkondakc9147
    @sabitkondakc9147 ปีที่แล้ว

    It seems that windows switched to utf8 either, speaking of win10 21H2 and later.

    • @ErikWilde
      @ErikWilde  ปีที่แล้ว +3

      Nobody can escape globalization, sooner or later you have to support more than just ASCII or the fragmented ISO 8859 character sets. At that point, Unicode and very likely UTF-8 become your best friends.

    • @sabitkondakc9147
      @sabitkondakc9147 ปีที่แล้ว

      @@ErikWilde I'm having a hard time grasping the fact that native windows api only accepted utf-16 encoded strings up to day, such a rubbish decision!
      This explains why windows takes up a huge RAM, not to mention that completely redundant cpu cost for the sake of utf transformation.

  • @gt10i
    @gt10i 5 หลายเดือนก่อน

    Danke!

  • @MrJloa
    @MrJloa ปีที่แล้ว

    I wonder Microsoft's office still can't open files in utf8 😳

  • @RobertHernandez-t5q
    @RobertHernandez-t5q 13 วันที่ผ่านมา

    Johnson Eric Thomas Jose Perez Elizabeth

  • @BlueShield-hh9js
    @BlueShield-hh9js 6 หลายเดือนก่อน

    8 bit bounce

  • @BlueShield-hh9js
    @BlueShield-hh9js 6 หลายเดือนก่อน

    1988 wow

  • @صالحمحمد-ص2ك1ك
    @صالحمحمد-ص2ك1ك ปีที่แล้ว

    Hi utf8.46

  • @Tapajara
    @Tapajara 11 หลายเดือนก่อน

    UTF-16 should be abandoned because it is so problematical.

    • @ErikWilde
      @ErikWilde  11 หลายเดือนก่อน +1

      Maybe it's problematic, but be prepared to have to deal with it for many years to come.