All Rust string types explained

แชร์
ฝัง
  • เผยแพร่เมื่อ 29 ธ.ค. 2024

ความคิดเห็น • 342

  • @letsgetrusty
    @letsgetrusty  ปีที่แล้ว +21

    📝Get your *FREE Rust cheat sheet* :
    letsgetrusty.com/cheatsheet

    • @mohaofa1544
      @mohaofa1544 10 หลายเดือนก่อน +1

      In C, you can use the strlen() function to determine the length of a string. The sizeof() operator does not give you the length of a string, but rather the size of the array that holds the string.
      So correct code would be something like this
      Char string[]="hello world!";
      Char buffer[strlen(string)+1];
      Most Rust have skill issues regarding on C and yet they talk about it because they want to pormote SAFETY FEATURE that expert C coder don't need - good for webdev and python programmer - hope you stay away from C because you really don't understand it

  • @bloody_albatross
    @bloody_albatross ปีที่แล้ว +403

    Note: While ASCII characters are stored in bytes, it only uses 7 bits. Meaning it only supports 128 distinct values (not 256 distinct values). There are encodings that extend ASCII to use the 8th bit, though, like e.g. ISO-8859-1 (aka latin1). But that's not ASCII.

    • @Brandlingo
      @Brandlingo ปีที่แล้ว +50

      Sadly a little known fact. And only this 7 bit ASCII range is identical for many encodings such as many Windows codepages or even UTF-8 which makes them backwards compatible.

    • @alexwhitewood6480
      @alexwhitewood6480 ปีที่แล้ว +21

      True.
      ASCII: 7 bit (128 chars)
      "Extended" ASCII: 8 bit (256 chars)

    • @CharlieLavers
      @CharlieLavers ปีที่แล้ว +30

      For historical reason the 8th bit was traditionally used in ascii as a parity bit as a rudimentary integrity check

    • @lumberjackdreamer6267
      @lumberjackdreamer6267 ปีที่แล้ว +4

      To be more precise, it’s 127 characters in ascii plus the ‘\0’
      Characters from 1 to 31 are control characters. Then the first real character is at 32, it’s the space.
      Character 126 is tilde. 127 is DEL.

    • @fswerneck
      @fswerneck ปีที่แล้ว +15

      @@lumberjackdreamer6267 they are 128 as 0 is included.

  • @chrissaltmarsh6777
    @chrissaltmarsh6777 ปีที่แล้ว +206

    A (mostly retired) person who has used (still use) C.
    C is concentrated, very very smart, for its age and still pretty important.
    If you want to understand the cleverness of Rust, write your experiments in C, fall down the holes, and you will understand why it is clever.
    (In general, try to understand one level down)
    (Oh, and get close to the metal. There is the fun!)

    • @EbonySeraphim
      @EbonySeraphim ปีที่แล้ว +20

      To be more clear, you have to try to write "portable" C with an additional zinger of handling data with more than ASCII or ISO-8859-1 encoding. If you are writing C for a known and targetted platform + OS, strings and paths really aren't that hard to manage.

    • @htspencer9084
      @htspencer9084 ปีที่แล้ว +16

      Yeah, I'm glad that the rust community as a whole still respects and shows deference to C.
      It's not about one being better than the other, it's about different overall philosophies :)

    • @apestogetherstrong341
      @apestogetherstrong341 ปีที่แล้ว +4

      Your computer is not a PDP-11, you are not "close to the metal" with C.

    • @chrissaltmarsh6777
      @chrissaltmarsh6777 ปีที่แล้ว +19

      @@apestogetherstrong341 One of the computers was a PDP, and close to the metal is what I did - and still do, but not with a PDP. And C can hit the hardware addresses. Pretty close to the metal I would say.

    • @JDLuke
      @JDLuke ปีที่แล้ว +4

      @@apestogetherstrong341 I dunno, man. When a string is still a raw sequence of bytes and the pointer to it could be NULL, pointing to the zeroeth byte of the heap, or maybe it's already been free()ed (no good way to know) I'd say that's as close to metal as generally feasible.
      Of course, if you're running in some virtualized environment, the metal is isolated to some degree regardless of the language you used.

  • @Alguem387
    @Alguem387 ปีที่แล้ว +61

    I want an array of characters!
    YOU CANT HANDLE AN ARRAY OF CHARACTERS!!!

    • @PixelThorn
      @PixelThorn 11 หลายเดือนก่อน +2

      Very funny, thank you 👍

  • @chrraz
    @chrraz ปีที่แล้ว +95

    The Rc and Arc examples are actually a little misleading. The Rc::from() call will actually clone the str slice that it is given. Just like the Box, the Rc owns its data. All the Rc::clone calls will not clone the string data further though :)

    • @MRL8770
      @MRL8770 ปีที่แล้ว +12

      I'm not even sure why Rc was used in this case. A regular slice would do the trick, no copying and reference-counting involved.
      A better example would be if the original string wasn't 'static and went out-of-scope, while the reference-counted copy gets retained.

    • @Takatou__Yogiri
      @Takatou__Yogiri 4 หลายเดือนก่อน

      ​@@MRL8770 he just wanted to show some examples. But he did not find a good example to show. I'm still new to rust. Idk what you all yapping about but here I'm giving my opinion 😅

    • @MRL8770
      @MRL8770 4 หลายเดือนก่อน

      @@Takatou__Yogiri I think the point of an example is to show a situation where a discussed feature would provide some kind of benefit. It didn't provide any benefit there, in fact that code would be better without unnecessary reference counting.

  • @andrei5680
    @andrei5680 ปีที่แล้ว +138

    I would like to thank you for creating such well explained videos. As a rust novice I learned so much from your videos.

    • @letsgetrusty
      @letsgetrusty  ปีที่แล้ว +7

      Thank you for watching :)

  • @maltamo
    @maltamo ปีที่แล้ว +8

    I don't get why the Rc isn't thread safe compared to Arc. Since it's immutable, the threads won't be able to modify it's value, so, why do you need to make it atomically accessible? There is no threat by only reading a shared data, or am I missing something?

    • @belowdecent6494
      @belowdecent6494 ปีที่แล้ว +27

      The issue is not with the string contents but with reference counting done by Rc, Rc updates its reference count in a thread-unsafe way, thus the need for Arc

    • @maltamo
      @maltamo ปีที่แล้ว +1

      @@belowdecent6494 I see! Totally makes sense now. Thank you for your answer.

  • @ishi_nomi
    @ishi_nomi ปีที่แล้ว +45

    Great content. Those types may seems messy but they feels so natural as a c programmer.

  • @Veptis
    @Veptis 4 หลายเดือนก่อน +3

    I don't directly use Rust, but have recently been exposed to rust for a backend. And some of these concepts do sound really useful.

    • @mjpthetrucker9485
      @mjpthetrucker9485 หลายเดือนก่อน

      Learning it to use with Tauri. It's not bad. I feel smart pointers in c++ negate the complaints towards it regarding management. Rust doesn't allow newer programmers to make certain mistakes so I def get its appeal.

  • @ed9w2in6
    @ed9w2in6 ปีที่แล้ว +13

    I think there's an error in the CString example ( 18:00 ).
    The branch for success getenv call should be execute when `!val.is_null()` is true. The negation keyword `!` is missing.

    • @guidow9616
      @guidow9616 ปีที่แล้ว +2

      Wanted to point out the same, but then decided to look for it in the comments first. glad i was no the only one :)

  • @OmnipotentEntity
    @OmnipotentEntity ปีที่แล้ว +20

    Small errata I noticed: at 18:15 the null check seems to be inverted.

    • @OmnipotentEntity
      @OmnipotentEntity ปีที่แล้ว +1

      Great video btw! Thanks for the resource.

  • @avinashthakur80
    @avinashthakur80 ปีที่แล้ว +6

    Few questions:
    1. You initially said strings in rust are immutable. But later presented "&mut str" saying that it is mutable string, which allows in place transformations.
    What did you mean by former statement then ?
    2. You said "Box" doesn't have the capacity information. By capacity, do you mean the length of string ?(usually capacity != used size).
    If you meant capacity as in a vector, then what is difference between "Box" and "&str" as both don't have capacity?
    If you meant capacity = size, then how can this string be used in reality ? As lack of size & no null-termination would mean that we can't know where the string ends.

    • @avishjha4030
      @avishjha4030 ปีที่แล้ว +5

      Strings are immutable by default unless quantified by "mut". Same applies to other data types as well. Box doesn't have capacity information, i.e. not ideal for expansion. The difference between Box and &str is that Box is mutable and the owner, versus &str is immutable (you can mutate it as &mut str) and is borrowed not owned.

    • @anon_y_mousse
      @anon_y_mousse ปีที่แล้ว +1

      Nothing is mutable by default so to make it mutable you have to add the keyword mut. The capacity is used to allow resizing a string, otherwise it's just a slice, i.e. a view into another string. I suppose they thought it was cool to drop it because it saves 8 bytes, and it doesn't add a ton of complexity to the base string type. To work with an immutable string you really only need the length and pointer to the data anyway, though if they're going through all this trouble of defining multiple ways of dealing with strings they should've included a short string type that encodes the length in the first byte the way Pascal did it.

    • @KohuGaly
      @KohuGaly ปีที่แล้ว +1

      1. string slices in rust are _usually_ immutable (as in, most commonly you access a string slice through immutable reference "&str"). The mutable reference to a string slice "&mut str" is an extremely niche type to see in the wild, because there is very little you can actually do with it safely.
      2. By capacity we mean the size of the allocation where the string slice is stored. In case of Box, the size of the allocation will be the same as the size of the string slice in bytes. In case of String, it may have extra memory at the end, so it doesn't have to reallocate every time the size of the string changes during modification. The String type is just Vec type (dynamically sized array), but wrapped in an API that preserves valid UTF-8 encoding in the stored bytes.

    • @Mankepanke
      @Mankepanke ปีที่แล้ว

      Main thing about Box is that it's an alternative to String, not &str. I.e. you want an _owned_ value, but don't want to present it as something you can/should modify and therefore don't need the capacity. If you didn't need an owned value, you should just use &str directly. This is why the video talked about Box not having the capacity inside of it.

    • @jma42
      @jma42 ปีที่แล้ว

      > You initially said strings in rust are immutable. But later presented "&mut str" saying that it is mutable string, which allows in place transformations.
      He said its immutable by _default_

  • @mback3713
    @mback3713 ปีที่แล้ว +14

    I feel that RUST fights me when I am working with internationally standardized protocols that assume ASCII formats and with tools that only care about ASCII and use that encoding as optimization. Rust needs an ANSI-C compatible ASCII-only type to interoperate with C serializations where high performance is a fundamental requirement... there are two scenarios I work with on a day to day basis where this is important:
    1. processing of simple data that is ascii tag to ascii data... where the code optimization to assume ascii reduces parsing overhead.
    2. processing of low level serialization standards that not only assume ascii but depend upon ascii... where unicode checks are unnecessary and degrade application performance.
    Rust needs string types that specify encoding. Instead of "String" it needs "String" (or some similar syntax) types. Then... String and String and String and.... any other interesting encoding can be clearly communicated and converted through standard means. Stop fighting us and provide good abstractions that don't force upon devs a one world policy... let the world be open and instead establish standards for conversion.

    • @okuno54
      @okuno54 10 หลายเดือนก่อน +6

      Assuming someone hasn't already written it... then write it yourself. All you're asking for a thin wrapper around Vec and/or &u8, and Rust gives you the ability to write that abstraction and drop it in. You just gotta do it (or more likely, find it in the ecosystem)

    • @antifa_communist
      @antifa_communist 10 หลายเดือนก่อน +2

      The syntax you suggest doesn't make any sense in Rust terms and encapsulating it like that would incur an overhead. Just have different types for different strings. Someone has probably already made a library for it, which you probably wouldn't need anyway because utf8 is compatible with it.

    • @antifa_communist
      @antifa_communist หลายเดือนก่อน

      You aren't forced to use the String and &str types. You can use different ones. And encoding isn't an issue usually unless you actually need to do something with the values because it's just random bytes. You don't need to know the encoding or anything for it to exist and be passed around. You can just use a Vec instead of a String and a &[u8] instead of &str. If you're dealing with ASCII and you want some extra typing, use AsciiChar instead of u8. Don't forget core::mem::transmute and casting pointer types.

  • @giladkay3761
    @giladkay3761 ปีที่แล้ว +18

    Great video. I do feel like these videos need a bit more of the "behind the scenes" animations of how the bits are stored and handled, because they do a great job in driving the point home

  • @SKTTWkartrider
    @SKTTWkartrider 10 หลายเดือนก่อน +2

    4:32 `sizeof` is the operator rather than function in C, although it usually looks like a function by syntax

  • @MrKaNuke
    @MrKaNuke ปีที่แล้ว +4

    My first CS course was in C and I never quite understood much of these concepts but your video just explained it so well. I would love to understand better how some of these C characteristics lead to security and runtime vulnerabilities and how Rust prevents them.

  • @EbonySeraphim
    @EbonySeraphim ปีที่แล้ว +2

    I don't get Arc for immutable strings. If it's immutable, why is any synchronization necessary?

    • @kebien6020
      @kebien6020 11 หลายเดือนก่อน

      I think synchronization is necessary for the reference counting, not for the string content.

  • @Aucacoyan
    @Aucacoyan ปีที่แล้ว +9

    This is the free bible of strings! Thank you so much for the effort, this is string guide to everyone, both new and advanced alike!

  • @wrfsh
    @wrfsh 9 หลายเดือนก่อน +1

    3:26: I don't have much to say against rust, but i also think it's misleading to show off the worst possible C code that uses strcpy, and running on an architecture/OS from 30 years ago where non-executable stacks and overflow guards don't exist, which would likely prevent this from being a security issue. If we want to use silly C code let's at least compare it against equally silly rust code. One could argue that it's easier to write C that way than rust, and i can agree with that. But it doesn't mean that modern C code is written that way at all, and i think it should've been called out at least

  • @exhilex
    @exhilex 10 หลายเดือนก่อน +1

    Loved your video❤
    Easy, simple and precise detailed explanation of the cause behind each string type.
    Makes us realise that on the low level, how complex very simple but fundamental aspects like strings can be.
    Keep up the good work 👍🏻

  • @themisir
    @themisir ปีที่แล้ว +3

    18:10 correction: I believe the condition branches should be the other way around; or the condition should be negated

  • @sirrobertdowneysenior8080
    @sirrobertdowneysenior8080 ปีที่แล้ว +3

    Just want to say I am on 20 years old PC and rust apps just run rocket. I love you rust developer!!! need not to say lazy electron / js devs will burn in hottest corner of hell.

  • @MRL8770
    @MRL8770 ปีที่แล้ว

    To add to this allready great video, 'static doesn't really mean that the data is going to be there for the entire lifetime of the program. 'static is applicable anywhere when the variable is guaranteed to be unbound by any lifetime restrictions. So, you may find 'static added to trait bounds that apply to types such as Rc. You may also find &'static applied to lazily instantiated variables using the lazy_static crate.

  • @nils3030
    @nils3030 ปีที่แล้ว +2

    Is your Cow example at 14:51 really a good example for Cow? It doesn't show what Cow can provide, does it? I mean automatically copying the underlying data when you try to modify it.

  • @KyleHarrisonRedacted
    @KyleHarrisonRedacted 11 หลายเดือนก่อน +1

    5:17 really. Guaranteed UTF8? So about this when I had to deal with when comparing two strings, a title value found inside a html payload vs a title value stored in the database, and they don’t equate because one was ascii and one was utf8 despite being exactly identical values letter for letter when looking at them in the step debugger. Was a bugger of a bug to figure out what the actual problem was but rust wasn’t guaranteeing any particular character encoding for me..

    • @chri-k
      @chri-k 10 หลายเดือนก่อน

      that's just cursed.

    • @skejeton
      @skejeton 4 หลายเดือนก่อน

      Do you mean ANSI? UTF-8 is an extension over ASCII, so they're mutually compatible.

  • @suya1671
    @suya1671 ปีที่แล้ว +300

    13 seconds in I see a typo: `PathBuff`

    • @keenant
      @keenant ปีที่แล้ว +110

      💪💪💪path buff yo

    • @a13m34
      @a13m34 ปีที่แล้ว +45

      Hell yeah brother 💪💪💪💪💪

    • @neociber24
      @neociber24 ปีที่แล้ว +44

      Yooo buff that path 💪🏽💪🏽💪🏽

    • @31redorange08
      @31redorange08 ปีที่แล้ว +14

      There's another one: ’a

    • @letsgetrusty
      @letsgetrusty  ปีที่แล้ว +170

      That's not a typo 💪💪💪

  • @DK1PL
    @DK1PL 10 หลายเดือนก่อน +1

    I am not a Rust programmer and my roots are in C. I recognize the advantages of UTF-8 strings and slices but I wonder why so many data types. Why so complicated, why a data type for each use case? String, slice and byte array (C-String) are enough. Other properties of String e.g. Heap, Stack, Const, Atomic, etc. could be handled by keywords that can be used for all other types. Additional the "r#" notation where the \" is enough. So I inevitably have the impression that Rust, just like C++, is stepping into infinite ambiguity - see also Perl. And then there's the "unsafe" option, if I opt for security, then I remain secure, don't I? In large software that is developed over years/decades it's only a matter of time before you find reasons for "unsafe" and then Rust reduces to C. So I might as well write my code in C from the beginning? 🤔

    • @maniacZesci
      @maniacZesci 7 หลายเดือนก่อน +2

      Not nearly the same thing. With Rust "unsafe" you have localized code snippets where you can look at in your code base if needed. Also you can build safe abstractions around unsafe code. And check out Rust book or some other resource what Rust "unsafe" actually is. It only allows you five extra actions that you can not do in safe Rust, the rest of Rust rules still apply.

  • @KyleHarrisonRedacted
    @KyleHarrisonRedacted 11 หลายเดือนก่อน +7

    4:30 so don’t be lazy and either provide a big enough buffer if you’re going to use magic hard coded numbers or use one of the tiny handful of functions c has to query at run time what size is needed.
    It’s seriously not that bad

    • @whoman0385
      @whoman0385 หลายเดือนก่อน

      "who needs mountain climbing gear when you can just grip more tightly"

  • @voidemon490
    @voidemon490 ปีที่แล้ว +1

    Thank you so much. because of your amazing videos I started switching to rust from TS ❤

  • @gottox
    @gottox ปีที่แล้ว +14

    Thanks for this video! If we compare the data types of rust and C, we need at least to distinguish between char * and char[] on the C side. If we want to be super correct, we also need `const char *` and `const char[]` too. Also, it's PathBuf, not PathBuff :)

    • @alexwhitewood6480
      @alexwhitewood6480 ปีที่แล้ว +3

      Yeah that was super confusing when I initially started learning C.

    • @gottox
      @gottox ปีที่แล้ว +2

      @@alexwhitewood6480 tbh, C does a very good job to hide the differences through implicit lookups, but it fails in a few edgecases where it becomes really confusing. sizeof(char*) vs sizeof(char[]) being the most famous. example.

  • @simon-off
    @simon-off 9 หลายเดือนก่อน

    I'm finally getting around to learning rust and these videos are a great recourse! Thank you 🙏

  • @kreuner11
    @kreuner11 ปีที่แล้ว +2

    C also has char16_t, Unicode library strings, other library strings like Glib probably has something like that for dynamic length strings, often a program may implement their own strings without using null termination which libc has partial support for already, etc

  • @eagerestwolf
    @eagerestwolf 7 หลายเดือนก่อน +2

    UTF-8 actually isn’t universally compatible. For *nix based systems (Unix, which uses ASCII; Linux, which uses UTF-8; and macOS, which also uses UTF-8) UTF-8 is compatible; however, Windows doesn’t use ASCII or UTF-8 internally. It uses wide strings (or strings made of 16-bit characters, allowing for more combinations since Windows predates Unicode). Hence why UTF-16 exists, that’s the Unicode version of Windows’ wide strings. That being said Windows itself can handle UTF-8 strings, but the Windows API cannot. I would guess this is why Rust includes the OsString type, since macOS and Linux both use UTF-8 in their APIs (since neither Apple nor the Linux Foundation is overly concerned about ~20 year old systems).

  • @ce5983
    @ce5983 ปีที่แล้ว +3

    Can someone explain the first unsafe c string example with the overlong sequence for A 0x41 (according to the comment)? Is the problem with it just that its two bytes or that its not actually 0x41 at all? I dont understand what 0xC1, 0x81 has to do with A

    • @awesome_aiden
      @awesome_aiden ปีที่แล้ว +1

      When encoding UTF8, you may need to pad the codepoint with leading 0 bits. This is because UTF8 only stores 7-bit, 11-bit, 16-bit, or 21-bit codepoints. Overlong sequences arise when you pad the codepoint more than necessary.

    • @awesome_aiden
      @awesome_aiden ปีที่แล้ว +2

      More information is on Wikipedia.
      wikipedia.org/wiki/UTF-8#Overlong_encodings

    • @awesome_aiden
      @awesome_aiden ปีที่แล้ว +2

      If say, you are trying to sanitize a Windows filename, then you would want to filter out backslashes. In correctly formed UTF8, you could just remove all 0x5C bytes. Overlong backslashes could skip this check, and get converted to regular backslashes when converting to UTF16 (Windows filename).

    • @ce5983
      @ce5983 ปีที่แล้ว

      @@awesome_aiden ah thanks 👍 hope I don't have to worry these kinds of char encoding issues anytime soon or ever lol seems like a headache and a half 😅

  • @nullplan01
    @nullplan01 9 หลายเดือนก่อน +1

    I will note that as of C99, we have already had wchar_t for representing wide characters, and as of C23, char8_t, char16_t, and char32_t are joining in. The use of 16-bit wchar_t should be deprecated, libraries should switch to 32-bit wchar_t (which has been state of the art on UNIX since the late nineties, but Windows is slow to adapt), and legacy UCS-2/UTF-16 strings should be represented with char16_t.

  • @friedrichmyers
    @friedrichmyers 2 หลายเดือนก่อน

    Just what I need to implement my own string library in C.
    Wish me luck!

  • @c3cris2
    @c3cris2 3 หลายเดือนก่อน

    In Os_string is there a typo? For ok enum you used Ok(string) => but for Err(os_string) where do you define just string?

  • @jamesdi7261
    @jamesdi7261 ปีที่แล้ว +15

    The diversity of types is a safety feature itself because you can't simply assign one string to another without explicit conversion. It's controlled by a type system.

  • @icoudntfindaname
    @icoudntfindaname 4 หลายเดือนก่อน

    Doesn't such indirection have performance overheads?
    Are the safety checks runtime?

  • @bozhidaratanasov7800
    @bozhidaratanasov7800 11 หลายเดือนก่อน

    Still confused about &str vs str. The memory layout of &str (pointer, length) doesn't look like the memory layout of other normal references. Is this the layout of &str or more exactly str? If so, why do we have to always use str with a reference, but not String?

  • @petermaltzoff1684
    @petermaltzoff1684 ปีที่แล้ว

    13:56 looks like on line 3 the raw string literal is missing the two hash symbols delineating the start and end.
    Can you have raw string literals without this hash symbol?

  • @LibreGlider
    @LibreGlider ปีที่แล้ว +6

    I like these more in-depth vids.

  • @WolvericCatkin
    @WolvericCatkin 10 วันที่ผ่านมา

    7:08 I don't think there's any string types in the standard library, which result in their contents being stored on the stack... with a null-terminated string (I think...?) it's possible to create an allocated string-type which uses a union (a block of memory with multiple potential layouts) to store its allocation data, but can alternatively store the string directly, if it's small enough to store in that same space... I believe strings following that design could be stored on the stack, but don't feature in the standard library...

  • @plawzzz6629
    @plawzzz6629 10 หลายเดือนก่อน

    Great explanation. At 10:59 I had to specify the type - shared_text:Arc so it would build

  • @jongeduard
    @jongeduard ปีที่แล้ว +4

    Thanks for this really great video with such a complete overview!
    Small note however, your C string example at 17:35 has a bug. You forgot the exclamation mark in front of the is_null check, inverting your if logic. Whoopsy.

    • @chronxdev
      @chronxdev ปีที่แล้ว +1

      came here for this, I almost thought I was crazy for a sec there

    • @jongeduard
      @jongeduard ปีที่แล้ว

      ​@@chronxdev It's also funny because it actually directly displays the reason for safe Rust code with the Option enum type and how that is only possible to extract when it actually has a value, which is what people refer to when they talk about the power of Rust's type system. And which is what you will use in most cases when not directly talking to C libraries.

  • @BryanBaron55
    @BryanBaron55 ปีที่แล้ว +3

    What an awesome video!
    You actually covered everything in very understandable way.
    God job, buddy. Keep it up.

  • @segsfault
    @segsfault ปีที่แล้ว +3

    1. Doesn't the `String` come with an overhead of storing the size of the string? imagine a string 4 bytes long and having a variable 4 bytes to store the size (uint32).
    2. Won't the re-sizing the string add arithmetic overhead of adding/subtracting the value? whereas in c you can just assign the character after the last character to `0x0`
    3. Rust strings can be implemented in C too right? just have a struct:
    ```c
    struct String {
    char* data;
    uint32 length;
    }
    ```
    and then just have functions to manipulate it?
    4. if Strings in rust are always UTF-8 encoded, doesn't this add a overhead in the program's performance?
    Because string-related function in the rust's std library will have the code that works with UTF-8 string which will add a performance overhead since working with variable-length encodings like UTF-8 have a complex logic compared to working with fixed-length encodings like ASCII, which is basically waste of CPU time because in ALOT of cases you just only need ASCII
    5. A "string-slice" in C can be just another `String` struct i defined above and the `data` pointer now points to the start of the slice and `length` is equal to the size of the slice.

    • @ABaumstumpf
      @ABaumstumpf ปีที่แล้ว +1

      1 - Yes it comes with an overhead. But in general those things will be negligible. It really depends on the size of the string and how exactly it is implemented.
      For example many languages (C++ and Rust included) make the class quite a bit bigger - in the range of 24-32 bytes. That holds the pointer to the data, how long it is, and how much storage is reserved. So yes a very short string takes significantly more memory.
      2 - Yes and no. In C you can NOT just add a character to a "string" cause a "char*" used for storing strings does not have the information of either how long it is or how much memory it has. If you do not create your own struct for that then you are doing hundreds of times more work just to add a single character.
      Say your string is "C string is fast" and you now want to append "er!" to finish the sentence. These are the steps you need to do:
      linearly search for the end of the string - and you already did way more arithmetic calculation than needed in C++/Rust. Then reallocate the string, then append "er!\0".
      In C++ you also most of the time have small-string-optimisation so strings shorter than 22 character are stored directly in the class without any dynamic memory involved.
      3 - kinda, but you are missing some critical information and the memory-layout is bad.
      4 - not really. UTF8 is a character-encoding and that has absolutely no impact on performance directly as long as you do not use the extended features. Btw C is not ACII either - it can be UTF16 (which factually is slower and more memory intensive), or even EBCDIC - so all your string-manipulations would be wrong.
      5 - sure. And you just need to make sure that the underlying string is never rellocated.
      It would be a lot more convincing if you gave some better C-examples and not ideas that come from fresh students. C does have some major advantages but you havent touched on any of them.

    • @segsfault
      @segsfault ปีที่แล้ว

      @@ABaumstumpf i am still learning c so please do forgive me, so i wanted to reply to few of your points.
      3. How is the memory layout of my struct bad?
      4. I haven't worked with UTF-8 directly but i know that there are this things called "codepoints" and stuff which are to be handled by the code logic, so won't the UTF-8 implementation of a string function be a bit more slow because of handling all that code logic?
      which is fine if you're using UTF-8 strings but what if you're just using ASCII characters, won't that extra logic slow down the code a bit?
      and the last "unlisted" point, umm can you give some c-examples and stuff? i am quite new to C, just over a year and i still have alot of stuff to learn.

    • @ABaumstumpf
      @ABaumstumpf ปีที่แล้ว +1

      @@segsfault There is no problem with you just learning C - but then a video about Rust is the wrong place for such questions as those are unlikely to even get the C-code correct (if not making it intentionally bad).
      As for the layout: You want your fixed-size member first as that allows for allocating the entire object in 1 contiguous block of memory. This is a bit more advanced C than you would ever see in this type of video here and not something you need to know when starting with C in general, but just doing 1 malloc instead of 2 will be faster and having it in a single block of memory means that accessing the data effectively only goes through 1 indirection instead of 2. So both creation and usage of the string would be faster.
      "which are to be handled by the code logic"
      only if you are using code that needs to deal with them. The encoding has absolutely no influence on how you decide on dealing with the data. UTF8 is a superset of ASCII meaning in C you can have your "normal" Char-string and either treat it as ASCII, Latin8 or UTF8. That is up to you. If you then want to get the correct string-length and you are treating it as UTF8 then it becomes slightly more computationally intense cause you are doing more work, but you are also getting more functionality for that as just treating it as ASCII would give you the wrong result.
      That is a very easy to make beginner error that i have seen a couple of times with tracing. People want to write out what their program is doing but also make it look pretty so they use the bordering-symbols to print a table. And then they suddenly notice that they get misallignments and wrong formatting at best, or memory-corruption and crashes at worst cause they did not account for those symbols being more than 1 byte.
      If you know that you are dealing with ASCII then you would just handle the string byte-by-byte and you would have no extra overhead either.
      As for examples - my C is a bit rusty after doing just C++ for many years now. And sadly the last few courses i had seen were basically horrible where they started with the most archaic version of C written in a horrible error-prone style. C is a relatively old language as it was created in the 1970s, and standardised in 89. But since then the language has evolved and gotten some nice additions.
      In general learning-by-doing and getting some instruction-series on TH-cam works, or if you can a course at your local university.

    • @segsfault
      @segsfault ปีที่แล้ว

      @@ABaumstumpf thanks alot!

    • @ABaumstumpf
      @ABaumstumpf ปีที่แล้ว

      @@segsfault and just cause it popped up in my recommendations:
      th-cam.com/video/QpAhX-gsHMs/w-d-xo.html
      ACCU has a lot of nice videos about specific topics in relation to C and C++

  • @6srer
    @6srer ปีที่แล้ว +1

    I didn't quite get the difference btwn &String and &str
    Literals and non-literal strings

  • @petermaltzoff1684
    @petermaltzoff1684 ปีที่แล้ว

    Both &str and Box save on memory by not storing the capacity. Kinda confused because he mentions the Box as an alternative to &str but states it saves on memory, as though the &str type doesnt. Am I missing something or is my confusion justified? 😂

  • @flippert0
    @flippert0 ปีที่แล้ว +2

    Apart from String and &str I only needed &[u8, N] (or &[u8]) yet. I think 99% is done with String and &str and the other types are almost never encountered in normal programs.

    • @98f5
      @98f5 4 หลายเดือนก่อน

      U think 99% of code only uses strings?

  • @intrexballistica
    @intrexballistica 2 หลายเดือนก่อน

    if all strings are encoded with UTF eight does that mean rust can’t make an LSP client per specification?

  • @MrDujmanu
    @MrDujmanu ปีที่แล้ว

    Come up with an idea and write a Rust book, you're exceptionally good at explaining.

  • @legittaco4440
    @legittaco4440 10 หลายเดือนก่อน

    4:30 Slight mistake, it should be sizeof(my_string)+1. strlen doesn’t count the null character.

  • @SyamaMishra
    @SyamaMishra ปีที่แล้ว

    This is the single best reference on Strings I've seen. I'd love to know about things like OSStrExt and WTF8 too.

  • @AsbestosSoup
    @AsbestosSoup ปีที่แล้ว

    what does it mean the we cannot use str because the size/length is not known at compile time? I see this explanation everywhere but I seem to be missing some key detail to understand correctly. If you store a string (not String) in the binary for static, read-only access, don't you have access to the slice's length, given that you need it to access it?

    • @AsbestosSoup
      @AsbestosSoup หลายเดือนก่อน

      Ok I found this video again and I can provide more details for anyone who needs them... `str` is not a data structure, it is JUST a string of characters. That's it. Most likely stored in the binary. Rust "forces" users to use the str type in its borrowed form / string slice &str because when the data is embedded in the binary, the only/most practical? way to access such data is by grabbing the base address and putting it in the stack (via a pointer, with a statically-defined size). Furthermore, the length of this string of characters is also needed, else the compiler wouldn't know where the end of such characters is since Rust does not use null terminators (`\0`). Hence, when you declare `let my_str = "Hello, World!"; you are indeed creating a string slice/pointer `&str` and explicitly defining the string itself for the compiler to calculate its length, all in a single step. It may feel a bit counterintuitive at first since it's doing all this at once, and this implementation is a bit hidden from the user (similar to how &str is syntactic sugar for &'static str).
      TL;DR: `str` is a dynamically sized type (DST) not a data structure. Rust deliberately avoids storing the length information of such strings, since it's redundant in most cases.

  • @captainfordo1
    @captainfordo1 ปีที่แล้ว +1

    This alone makes me never want to touch Rust again. Definitely sticking with C.

  • @bluebukkitdev8069
    @bluebukkitdev8069 11 หลายเดือนก่อน

    Liked for the Rc, that's good info.

  • @TobiasFrei
    @TobiasFrei ปีที่แล้ว

    Great idea to present them all in one place and even including Cow 🤓👍

  • @foraminifera7001
    @foraminifera7001 ปีที่แล้ว +9

    I almost cried when I suddenly saw "Привіт світ!"🥺. Thank you❤

  • @meetarthur9427
    @meetarthur9427 ปีที่แล้ว +1

    let some_large_text: &'static str = "is already enough, you can't mutate it and it resides in binary, lives entire programm and thread safe"

  • @kdcadet
    @kdcadet ปีที่แล้ว

    Very good level of technical detail!
    Thank you!

  • @izumiosana
    @izumiosana 2 หลายเดือนก่อน

    I learn a lot besides rust. Thanks.

  • @heynicetomeetyou
    @heynicetomeetyou ปีที่แล้ว +1

    Loved it, informative and straight to the point

  • @chaicblack7415
    @chaicblack7415 11 หลายเดือนก่อน

    I find it useful some c or cpp knowledge for understanding rust.

  • @danser_theplayer01
    @danser_theplayer01 3 หลายเดือนก่อน

    String is a binary buffer of numbers of specific byte sizes for different interpretation tables (ASCII is I believe twice smaller than Unicode8), that also has different endianess depending on the computer, so you supposedly should be able to have pointers to the string, pointers to the individual character in the string, pass a value to some function, or pass a pointer etc... So no wonder a low level memory safe language decides to not make any assumptions.

    • @toby9999
      @toby9999 2 หลายเดือนก่อน

      Unicode8? You mean UTF-8 or u8? Neither ASCII nor UTF-8 has endianness. UTF-16 does. But UTF-8 suffers from the disadvantage of not using a consistent number of bytes per character (or symbol or code point). It gets messy.
      ASCII is not half of u8. The first 128 positions within the UTF-8 encoding range are ASCII code points i.e. 1 byte per character. Or perhaps you mean 128 is half 256?
      Not sure what you mean by "byte sizes"? Bytes are 8 bits by definition.
      Strings are a buffer of bytes. These bytes may be interpreted in a number of ways depending on the encoding.

  • @thulist
    @thulist 8 หลายเดือนก่อน

    Great video, very easy to follow. Subbed

  • @MatveyTsivinyuk
    @MatveyTsivinyuk ปีที่แล้ว

    Glad to know about all the shtring types in Rust /s

  • @kischinhevsky
    @kischinhevsky ปีที่แล้ว +7

    The problem isn't that Rust has all of these different string types. it's that you can't sometimes just easily parse them or assign them to other variables, and the format macro just won't really work unless you do a lot of parsing gymnastics that make the code simply unreadable. These are things that should probably be done by the language. I enjoyed using Rust when I was not faced with these annoying issues, but in the end it was such a hassle that I gave up on it and rewrote my stuff in Go.

  • @charliesumorok6765
    @charliesumorok6765 11 หลายเดือนก่อน +1

    in C, string *literals* are arrays of characters.

  • @yapayzeka
    @yapayzeka ปีที่แล้ว

    the below type should be &String as it is at 6:50 . to be a string slice it should be &my_string[..] or something.

  • @sleepybraincells
    @sleepybraincells 9 หลายเดือนก่อน

    incredible video jam packed with info

  • @racum
    @racum ปีที่แล้ว

    Thank you! ...this was very clarifying!!

  • @fcolecumberri
    @fcolecumberri ปีที่แล้ว

    I am not 100% sure in rust (only like 99.99%), but if Rust uses short string optimization the same way C++ does, the image at 6:15 is technically wrong, "Hello!" would have been saved inside the pointer avoiding the need of extra memory allocation. For any string bigger than the size of the pointer, then the image would have been correct.

  • @TK-fo5xl
    @TK-fo5xl 10 หลายเดือนก่อน

    Would anyone explain why the code at 04:00 is dangerous?

    • @chri-k
      @chri-k 10 หลายเดือนก่อน

      A bad UTF-8 parser could do something wrong if given this input, but that has nothing to do with the string itself, or even with C for that matter.
      this is not a good example.

    • @TK-fo5xl
      @TK-fo5xl 9 หลายเดือนก่อน

      @@chri-k Thanks!!

  • @ikhlasulkamal5245
    @ikhlasulkamal5245 ปีที่แล้ว

    Thanks for the video, it helps me a lot about strings xD

  • @moigncoin4870
    @moigncoin4870 8 หลายเดือนก่อน

    Gosh this is such a good video. Thank you

  • @Saturate0806
    @Saturate0806 ปีที่แล้ว +1

    Excellent explanation, well done.

  • @RobertBarry-y5i
    @RobertBarry-y5i 4 หลายเดือนก่อน +1

    UTF-8, like two's complement , opcodes and file signatures, is one of those topics everyone should learn thoroughly for life. If you can translate the binary at 2:00 effortlessly into Unicode code points then you are proficient. Being able to look through a book of ones and zeros and identify the meaning should be the end goal of an education in computing. I spent six weeks at a computer screen about ten years back when eBPF first came out looking at a couple of hundred pages of binary and working out what it did - and how to get shell of course. By the way, Rust is NOT memory safe. When the bug bounties reach 8 figures, I know a couple of guys who will be claiming. 🙂

  • @pramodjingade6581
    @pramodjingade6581 ปีที่แล้ว

    Thank you for the detailed explanation!!

  • @theowillis6870
    @theowillis6870 10 หลายเดือนก่อน

    doesnt C have
    unint8_t* and uint8_t [] ?

  • @Amish_Avenger
    @Amish_Avenger ปีที่แล้ว +2

    So string is basically just std::string from C++?

  • @sachinmurali3524
    @sachinmurali3524 ปีที่แล้ว

    I was really looking for this info❤❤

  • @keeroin
    @keeroin ปีที่แล้ว

    Great informative video! So useful!

  • @Nikage23
    @Nikage23 ปีที่แล้ว +15

    I'm quite pleased to see "Hello, world" in Ukrainian. So hello from Ukraine!

  • @alexclazx
    @alexclazx 9 หลายเดือนก่อน

    Very helpful to my nap break😀

  • @sbx1720
    @sbx1720 ปีที่แล้ว

    This one was really good. Thanks

  • @tianned
    @tianned 9 หลายเดือนก่อน

    22 minutes of efficiency safety and flexibility

  • @DomainObject
    @DomainObject ปีที่แล้ว

    Awesome video. Thank you!

  • @bionic_batman
    @bionic_batman ปีที่แล้ว +2

    Nice, didn't expect to see Hello World being written in Ukrainian

  • @mariluski23
    @mariluski23 3 หลายเดือนก่อน

    tip for c: don't iterate though strigs withous stopping an the null character, else you are printing the entire registry or more

  • @sundae6610
    @sundae6610 8 หลายเดือนก่อน

    how many times I've rewatch this

  • @fcolecumberri
    @fcolecumberri ปีที่แล้ว +2

    Do you think Rust approach to strings is complicated? try to manage a mix of English with Japanese with only char[]

    • @skejeton
      @skejeton 4 หลายเดือนก่อน

      I've done it before writing a text editor, it's very easy, you just need a UTF-8 parsing function (which is trivial to implement youtself).

  • @nikkiho
    @nikkiho ปีที่แล้ว +1

    Thank you for a great information.

  • @mintx1720
    @mintx1720 ปีที่แล้ว +1

    Now you just need to cover SmolStr, SmartString, SmallString, KString, EcoString...

  • @youarethecssformyhtml
    @youarethecssformyhtml ปีที่แล้ว +1

    Man that's too much and there are many string types you haven't mentioned. It's really very steep learning curve

  • @victorkochkarev2576
    @victorkochkarev2576 ปีที่แล้ว

    This is great video - thank you.

  • @ahmadrezadorkhah9574
    @ahmadrezadorkhah9574 ปีที่แล้ว

    Thanks. I needed that

  • @ChasingShadowsz
    @ChasingShadowsz ปีที่แล้ว

    Great video, thank you! 🙏

  • @seasong7655
    @seasong7655 4 หลายเดือนก่อน +1

    All string types in Rust: 22min
    All string types in Python: 22sec

  • @ramsey2155
    @ramsey2155 หลายเดือนก่อน

    C also has many many many different types for strings