Which Python @dataclass is best? Feat. Pydantic, NamedTuple, attrs...

แชร์
ฝัง
  • เผยแพร่เมื่อ 30 มิ.ย. 2024
  • Get rid of boilerplate in writing classes.
    Which dataclass alternative should you use though? In this video we test dataclasses, attrs, tuple, namedtuple, NamedTuple, dict, SimpleNamespace, and Pydantic BaseModel for speed, memory efficiency, and features.
    ― mCoding with James Murphy (mcoding.io)
    Source code: github.com/mCodingLLC/VideosS...
    Previous dataclasses video: • Python dataclasses wil...
    dataclasses: docs.python.org/3/library/dat...
    attrs: www.attrs.org/en/stable/examp...
    namedtuple: docs.python.org/3/library/col...
    NamedTuple: docs.python.org/3/library/typ...
    SimpleNamespace: docs.python.org/3/library/typ...
    Pydantic: pydantic-docs.helpmanual.io/u...
    SUPPORT ME ⭐
    ---------------------------------------------------
    Patreon: / mcoding
    Paypal: www.paypal.com/donate/?hosted...
    Other donations: mcoding.io/donate
    Top patrons and donors: Jameson, Laura M, Dragos C, Vahnekie, John Martin, Casey G, Pieter G, Krisztian M, Sigmanificient
    BE ACTIVE IN MY COMMUNITY 😄
    ---------------------------------------------------
    Discord: / discord
    Github: github.com/mCodingLLC/
    Reddit: / mcoding
    Facebook: / james.mcoding
    CHAPTERS
    ---------------------------------------------------
    0:00 Intro
    1:04 dataclass
    1:24 attrs
    2:13 tuple, namedtuple, NamedTuple
    4:05 dict
    4:39 SimpleNamespace
    4:58 Pydantic
    6:39 Speed comparison
    8:41 Memory comparison
    9:15 Feature matrix and winners
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 216

  • @LiamInviteMelonTeee
    @LiamInviteMelonTeee 2 ปีที่แล้ว +37

    I'm a simple engineering student and a modest python user but those dynamic histograms sent chills down my spine

    • @mCoding
      @mCoding  2 ปีที่แล้ว +15

      Check out plotly and the source code in the description!

  • @aarondewindt
    @aarondewindt 2 ปีที่แล้ว +126

    The buildin dataclass also has default_factory for defining default mutable values .

    • @mCoding
      @mCoding  2 ปีที่แล้ว +74

      😬 oops, thanks for pointing this out! I should have been more careful when I made the feature matrix.

    • @PeterZaitcev
      @PeterZaitcev ปีที่แล้ว +1

      Furthermore, unlike slots support, this was on the release.

  • @mpilosov
    @mpilosov 2 ปีที่แล้ว +42

    This is a great breakdown. I’ve had to explain this so many times to team members, now I’ll refer people to this video!

  • @franchiniitalo
    @franchiniitalo 2 ปีที่แล้ว +21

    Hey James, I just wanted to sincerely congratulate you for both the quality content and humor in your videos, amazing work!!

    • @mCoding
      @mCoding  2 ปีที่แล้ว +7

      Thank you very much for your kind words and support!

  • @markasiala6355
    @markasiala6355 2 ปีที่แล้ว +36

    I actually have a large ongoing project where I used namedtuples early on, with typing stored in a second tuple, then refactored to NamedTuple using the built in typing (which simplified storing the typing separately), and finally to dataclasses after seeing your video on that. It fit my application perfectly as I needed the flexibility of being able to modify the dataclass. If only I had known about dataclasses to start with. :) attr class also sounds interesting for my needs, I need to check that out.

    • @subjekt5577
      @subjekt5577 ปีที่แล้ว +1

      I wish he covered classes extending from named tuple, one of my favorite pre attr methods....

  • @kosmonautofficial296
    @kosmonautofficial296 ปีที่แล้ว +2

    Great thanks so much for this video! I am starting to study pydantic and I haven't been made aware of these differences. This is a huge help and I wish more people would explain these important differences when telling others they should use this or that.

  • @jemand771
    @jemand771 2 ปีที่แล้ว +9

    I really enjoyed the text comment/annotation overlays in this video. they both add useful background info and give the video a more relaxed vibe without distracting from the main points! :D

  • @Jakub1989YTb
    @Jakub1989YTb 2 ปีที่แล้ว +12

    2:06 got me .. "real life". Those air quotes are heavy.

  • @tamles937
    @tamles937 2 ปีที่แล้ว

    Great video! As always, the topic is well explained and I learnt something new
    The on-screen comments are really fun, I hope you'll put more of this in the future videos!

  • @Scranny
    @Scranny 2 ปีที่แล้ว

    I have used almost all of these, so I can say this is a fantastic summary of the various options.

  • @sphereron
    @sphereron ปีที่แล้ว +1

    I've often struggled with ways people define hyperparameters and inputs to neural networks in open source code. This video definitely helped me in my choice going forward.

  • @hackergr325
    @hackergr325 2 ปีที่แล้ว +3

    At first you got my interest, after the "Presenting with meaningless example" you got my attention. Awesome video once again!

  • @laurinneff4304
    @laurinneff4304 2 ปีที่แล้ว +98

    So when _are_ you going to explain slots? I have no idea what those are

    • @mCoding
      @mCoding  2 ปีที่แล้ว +61

      Gulp, I feel the pressure.

    • @AzureCz
      @AzureCz 2 ปีที่แล้ว +8

      @@mCoding yeah, I don't know what you're talking about either D:

    • @cameronball3998
      @cameronball3998 2 ปีที่แล้ว

      That was the Google search I made right after this video 😂 I am intrigued

    • @Elijah_Lopez
      @Elijah_Lopez 2 ปีที่แล้ว +7

      Classes usually use a dictionary to store variable. If you define a ___slots___ = 'var1', 'var2', you're class can only set those attributes to value mentioned in slots.

    • @AzureCz
      @AzureCz 2 ปีที่แล้ว +1

      @@Elijah_Lopez sir you're a legend

  • @fartzy
    @fartzy 2 ปีที่แล้ว +1

    Wow this is amazing man thanks for putting this together

  • @r2_rho
    @r2_rho 2 ปีที่แล้ว +1

    this is really the best Python channel on TH-cam. I've learned more on this channel than all others combined

    • @mCoding
      @mCoding  2 ปีที่แล้ว

      Wow thank you!

  • @mystisification
    @mystisification 2 ปีที่แล้ว

    Very informative video, thanks James!

  • @jochengietzen
    @jochengietzen 2 ปีที่แล้ว +13

    Please keep the onscreen comments coming! Adds the perfect amount of fun to an informative topic "cries in mypy" 😁
    Great video, thanks 😊

  • @Aang139
    @Aang139 ปีที่แล้ว +10

    Also would have loved thoughts on TypedDict which mirrors NamedTuple for dictionaries, giving type hinting and string key checking

  • @MrBoubource
    @MrBoubource 2 ปีที่แล้ว

    Wonderful last seconds, but wonderful video too!

  • @Azzonith
    @Azzonith 2 ปีที่แล้ว +2

    Great stuff!
    That would be even better if you'll make a follow up video about serialization of those objects and libs that can help.
    Often it's required to send tuple/datacalss/etc data over Kafka, to a DB or save as json and etc.
    Include 'marshmallow' lib in the vid as well!

  • @aaronm6675
    @aaronm6675 2 ปีที่แล้ว

    Already know this is gonna be helpful and instructive!

  • @mikegazes
    @mikegazes ปีที่แล้ว

    Thanks! This is exactly what I needed.

  • @doc0core
    @doc0core 2 ปีที่แล้ว +2

    This is serious pro stuff. I started using dataclass thanks to your vid, then YT pushed another vid for pydantic and I was like bleh. Luckily this vid set me striaght. Now I understand each's use case. THANKS

    • @mCoding
      @mCoding  2 ปีที่แล้ว

      Glad it helped!

  • @PythonisLove
    @PythonisLove 2 ปีที่แล้ว +1

    your videos are always the best

  • @hansdietrich1496
    @hansdietrich1496 ปีที่แล้ว

    Good comparison, thanks!

  • @tamerelsayed6368
    @tamerelsayed6368 2 ปีที่แล้ว

    thank you for the thorough explanation

  • @user-kc6wz7xr8e
    @user-kc6wz7xr8e 2 ปีที่แล้ว

    thank you man! You helped me a lot!

  • @nrbnlullu9327
    @nrbnlullu9327 2 ปีที่แล้ว

    Great video, Thanks a lot!

  • @zacky7862
    @zacky7862 2 ปีที่แล้ว +6

    Yeah pyndatic is so great for parsing/Serializing Json data.
    I've been using it. But for simple data, I use built in dataclass

  • @behnamsalehi9765
    @behnamsalehi9765 2 ปีที่แล้ว

    Thank you. This information is really useful

  • @eniocc
    @eniocc 2 ปีที่แล้ว

    Perfect video. Congrats

  • @Yotanido
    @Yotanido 2 ปีที่แล้ว +5

    I've used most of these, it turns out.
    Started with the class that repeats everything. I then used the dict to try and make things slightly more convenient, but that was only feasible in very limited circumstances.
    Then I discovered named tuples, but... they are tuples. Wasn't a huge fan.
    Then, finally, I came across attr. That was a huge revelation and I absolutely loved it. Finally something decent.
    And then dataclasses were introduced to the standard library and I basically switched to using those. attr can do more, sure - but the dataclasses are easier to use and don't need the dependency. Unless I actually need the power of attr, I'll just use these.

    • @PanduPoluan
      @PanduPoluan ปีที่แล้ว +1

      Depends on the data, tuples can be very suitable.
      For instance, I have to consume a YAML file containing a HUGE sequence of geo-coordinates (lat/long). For these kind of data, the kind that you read, keep in memory, and must not change, tuples are perfectly suitable, uses less memory, and very fast.
      And NamedTuple, just like other classes, can have methods defined within. So for instance I can write a distance_to() method which will calculate the big circle distance between one geo-coordinate with another geo-coordinate.
      If you need mutability, though, of course tuple just won't cut it.

  • @FranciscoCorreaDias
    @FranciscoCorreaDias 2 ปีที่แล้ว +1

    1:24 "Will I ever explain slots?" One week later...
    Thank you so much for your explanations, James!

  • @oxey_
    @oxey_ 2 ปีที่แล้ว

    I feel like I should go to casinos more often because I have no idea what slots are :)
    Great video! Typehint gang

  • @PanduPoluan
    @PanduPoluan ปีที่แล้ว +5

    Basically, one very strong rule of thumb is: If you need immutability and you can validate the data on your own, NamedTuple will _always_ be the best, hands down.

  • @falxie_
    @falxie_ 2 ปีที่แล้ว +1

    Really glad to see slots supported in dataclasses now. When you have a lot of instances of one class slots can save a ton of memory

    • @mishikookropiridze
      @mishikookropiridze 2 ปีที่แล้ว

      This was added in 3.10?

    • @falxie_
      @falxie_ 2 ปีที่แล้ว

      @@mishikookropiridze That's more of a statement than a question isn't it

    • @mishikookropiridze
      @mishikookropiridze 2 ปีที่แล้ว

      ​@@falxie_ It is statement and hence you can assign boolean value.

  • @danielrhouck
    @danielrhouck 2 ปีที่แล้ว +2

    Iʼm starting a new Python project and Iʼm using `attrs` because of this video. Otherwise I would have used `namedtuple`, because I think without your videos I somehow would have missed even `dataclas`.

  • @maimee1
    @maimee1 ปีที่แล้ว +5

    4:36 There's TypedDict to consider too tho. (As in the type safety thing. You could type both dict and tuple and use a static type checker. If you use PyCharm and single quotes, accessing data by key is also not typo prone.)

    • @PanduPoluan
      @PanduPoluan ปีที่แล้ว

      Why single quotes? There's no difference between single quotes and double quotes.

    • @maimee1
      @maimee1 ปีที่แล้ว +1

      @@PanduPoluan Idk, ask PyCharm (and also VS Code I just found out) out.
      Too clarify: not typo prone => there's IntelliSense / auto completion.

    • @PanduPoluan
      @PanduPoluan ปีที่แล้ว

      @@maimee1 well personally I don't find any difference between using single quotes and double quotes. But then again I always use double quotes because Black enforces that.

  • @red13emerald
    @red13emerald 2 ปีที่แล้ว

    Awesome comparison! What did you create the interactive graph at the end with? Looks like a nicer version of matplotlib.

    • @aflous
      @aflous ปีที่แล้ว

      Plotly

  • @SvetlinTotev
    @SvetlinTotev 2 ปีที่แล้ว +16

    A few arguments for dict gang:
    Everybody knows how it works and what the syntax is. Many libraries use it as inputs or outputs. If used as an interface it is easy to change it without breaking things. It is trivial to load and store them in json or send them over the network. I generally don't like speed comparisons of python code since that should never be the bottleneck of your program (you are using the wrong language if it is) but it is nice to know that dicts are fast af. I also haven't had any problems with reliability. But I guess that's partially due to my vscode extensions checking what I'm typing and giving suggestions.
    But I have to agree with you that the syntax is quite ugly compared to accessing elements with a dot. Though I don't think it would be too bad if the language added similar syntax for that (basically any kind of shorthand for ["string"]. But I guess . would be ambiguous. and most other characters already have a meaning. so maybe a double dot? some_dict..some_element)

    • @Alex-uh6qh
      @Alex-uh6qh 2 ปีที่แล้ว +5

      The problem of dicts is that you cannot check types in compile time. You can store different data types in one field due all runtime. By the way, static analyzers cannot predict the type of element of dict, so other IDEs (like PyCharm) cannot help you with suggestions, especially with available methods for each field. In your IDE, you use an AI-based extensions that predicts data types, but it is not a static analyzer

    • @masheroz
      @masheroz 2 ปีที่แล้ว +1

      This is timely. I've got a program in writing now, and am using dictionaries. I still think that formatting my data as nested dictionaries is the best representation of that data. Also, the original data format is actually defined as a dictionary.

    • @SvetlinTotev
      @SvetlinTotev 2 ปีที่แล้ว +2

      @@Alex-uh6qh This is true, but I think by picking python as your programming language you've already given up on being able to easily track the types of objects. With all the type hinting and other type-related stuff you are still quite far from the type information you have in languages like C++.

  • @Mutual_Information
    @Mutual_Information 2 ปีที่แล้ว

    I do not use data classes nearly enough. This is good motivation to change that.

  • @TechSY730
    @TechSY730 ปีที่แล้ว

    In current versions of attrs, it only requires assigning fields to an `attrs.ib` if you need anything per field option beyond a default.
    Otherwise you can use regular variable declarations like dataclasses does.
    (You might need to use the "next-gen" API, I can't remember at the moment)

  • @GRAYgauss
    @GRAYgauss 2 ปีที่แล้ว +17

    Type hint gang. I came from a C background, so ducktyping felt like a Godsend. Then I got into rust and realized how much time I was spending debugging code because it was ducktypable. (lets not forget rust's awesome toolchain compared to python's...well yeah.) Hell, I was using i_var, etc just because it made it easier to reason about and not have to backtrack, which is when I first started wondering about it...Didn't fully click until I made the switch though.

  • @elnico5623
    @elnico5623 ปีที่แล้ว

    I wish there was a channel like this for lua

  • @pedrokalil4410
    @pedrokalil4410 11 หลายเดือนก่อน

    I am the owner of a backend project at my company and i use only pydantic, as we perform multiple api calls the validations are essential, and it integrates really well with fastapi

  • @relsunkaev
    @relsunkaev 2 ปีที่แล้ว +1

    The apischema package is a good middle ground between Pydantic and dataclasses. It allows you to do the same runtime validation on dataclasses if you need to and has the same features as well as a GraphQL schema generator. It also performs validation faster than Pydantic.

    • @mCoding
      @mCoding  2 ปีที่แล้ว

      Never used that one, thanks foe llr sharing!

  • @luisraguzzoni5409
    @luisraguzzoni5409 2 ปีที่แล้ว

    Your videos are so good that I believe you could create a good intermediate-advanced python course. Just saying

  • @etienneboutet7193
    @etienneboutet7193 2 ปีที่แล้ว

    Great video ! But I feel like the onscreen comments were a bit distracting

  • @MrLiuHai
    @MrLiuHai 2 ปีที่แล้ว +2

    Thx for the explanation! But it seems at this point Python is contradicting its own Zen: "There should be one-- and preferably only one --obvious way to do it."
    IMHO one should always prefer immutability. The diff between creating a new instance and setter could be ignored. If the performance is that critical, maybe one shouldn't choose Python at the first place.

  • @adirmazhir9159
    @adirmazhir9159 2 ปีที่แล้ว +11

    its also possible to use namedtuple like this:
    T = namedtuple('T', 'n f s')

  • @jlp2011
    @jlp2011 11 หลายเดือนก่อน +2

    Pydantic 2.0’s just out, built around a Rust core. They claim up to 50x perf improvement so some of this might be changed. Still, kudos for covering v1’s overhead.

    • @mCoding
      @mCoding  11 หลายเดือนก่อน +1

      Great point! Maybe ill have to do an update video!

    • @arkadiuszszydeko7264
      @arkadiuszszydeko7264 11 หลายเดือนก่อน +1

      @@mCoding Looking forward to see how does it compare to what you presented here :)

  • @yky49
    @yky49 2 ปีที่แล้ว

    It is possible to use @dataclass(init=False) and custom __init__() for a parsing purpose. With slots for sure ;)

  • @soberhippie
    @soberhippie 2 ปีที่แล้ว +5

    Creating a new tuple still looks just as fast as modifying a value in a dict, interesting

    • @mCoding
      @mCoding  2 ปีที่แล้ว +6

      Yeah that was the biggest surprise for me, but I guess it kinda makes sense since a tuple can be implemented as a thin wrapper around raw memory, but a dict has to do hashing and such.

    • @NateROCKS112
      @NateROCKS112 2 ปีที่แล้ว

      However, you'll likely end up needing to get the tuple's values in order to instantiate a new one. So performing a function similar to dict setattr would be at a significant cost.

  • @ripp_
    @ripp_ ปีที่แล้ว

    I think in the past, because I've been lazy, I've used tuples but, because I don't hate myself, I had constants for which index was which. I don't recommend this but that would give you the speed power of tuples with some of the naming power of namedtuple

  • @LerikPav
    @LerikPav 2 ปีที่แล้ว +1

    There's also TypedDict (since 3.8) with typesafety

    • @mCoding
      @mCoding  2 ปีที่แล้ว +6

      TypedDict is actually just a dict at runtime, it's value is only for static typing.

  • @florianfuchs325
    @florianfuchs325 2 ปีที่แล้ว

    Hi
    Excellent Video! I was wondering what would be the right choice if I wanted to use the created class in a jit compiled numba function? As far as I have seen, namedtuples seem to be most suitable?

    • @PanduPoluan
      @PanduPoluan ปีที่แล้ว +1

      I think you need a class that is serializable. namedtuple and NamedTuple are serializable by default.

  • @jmcantrell
    @jmcantrell 2 ปีที่แล้ว

    What are you using for the visualizations at the end?

  • @rdean150
    @rdean150 2 ปีที่แล้ว +1

    I started using pydantic bc it allows specifying a conversion function to try to cast input values to the desired type. I didn't realize how much of a performance hit that library incurs, or that attrs can do this also but much more cheaply. I guess I should switch to attrs.

    • @mCoding
      @mCoding  2 ปีที่แล้ว +1

      I didn't specifically compare times for when you are doing conversions. Make sure to time your use case yourself since Pydantic may still be faster if you are doing conversions.

    • @rdean150
      @rdean150 2 ปีที่แล้ว

      @@mCoding Ah, thanks for the heads up. That probably accounts for a decent chunk of the time difference, as I think pydantic is always going to try to do basic type casting on all values when instantiating new instances, which surely comes with some overhead, particularly when you supply a custom function for it.

  • @Timmie_Tudor
    @Timmie_Tudor ปีที่แล้ว

    Hello, if you didn't know, I decided to use the dataclass Python decorator as my handle

    • @mCoding
      @mCoding  ปีที่แล้ว

      Haha you are gonna get a lot of accidental mentions with a handle like that!

  • @khoda81
    @khoda81 2 ปีที่แล้ว +2

    How did u measure memory usage?

  • @VegetableJuiceFTW
    @VegetableJuiceFTW 2 ปีที่แล้ว

    would have been cool to compare pydantic with the validation turned off for fairness sake :D

  • @ananzero8751
    @ananzero8751 2 ปีที่แล้ว +6

    What library was used to generate the graph? It looks nice.

    • @the_crypter
      @the_crypter 2 ปีที่แล้ว +7

      Plotly, It's easily the most interactive Visualization Library. It's as simple as matplotlib.

    • @mCoding
      @mCoding  2 ปีที่แล้ว +4

      Yep, plotly express specifically. Check out the code on github! Link in desc.

    • @saadisave
      @saadisave 2 ปีที่แล้ว +1

      @@the_crypter That's a bad measure of simplicity

  • @mrtnsnp
    @mrtnsnp 2 ปีที่แล้ว +2

    OK, on type hints. I probably need a kick in the you-know-where, but I can't get it to play nicely with a few packages and features I need. I frequently use numpy, and a lot of funtions don't really care about receiving a single number or a full array of numbers. They may not even care if the number is a float or an int, but let's focus on floats here. The return value typically has the same shape as the main input, but may be a single number.
    How do I set up type hinting for numpy arrays? How do I set type hinting up for polymorphism?

    • @wsrgs4
      @wsrgs4 2 ปีที่แล้ว

      I haven't looked into it extensively, but I'm aware there is a numpy.typing module which includes an ArrayLike type for anything that can be converted into an array, including scalars. you might want to look into the module documentation.
      specifying the dimensions of an array in python's type hinting system is generally difficult however, so I'm not sure there's a way to incorporate that information in your annotations.

    • @PanduPoluan
      @PanduPoluan ปีที่แล้ว

      Use TypeVar.
      For instance, here's a made up function:
      T = TypeVar("T")
      def makelist(n: int, item: T) -> list[T]:
      return [item for _ in range(n)]

  • @t2udu
    @t2udu 2 ปีที่แล้ว

    Really liked the visualization. Is that plotly?

    • @mCoding
      @mCoding  2 ปีที่แล้ว +1

      Yep! See the code to produce it on GitHub!

    • @t2udu
      @t2udu 2 ปีที่แล้ว

      @@mCoding will check that out

  • @trag1czny
    @trag1czny 2 ปีที่แล้ว +2

    discord gang 🤙

  • @ilyam.1872
    @ilyam.1872 2 ปีที่แล้ว +2

    Yeah that's cool and whatnot, but have you ever tried this? class D(dict): __getattr__=dict.__getitem__; __setattr__=dict.__setitem__; __delattr__=dict.__delitem__

    • @mCoding
      @mCoding  2 ปีที่แล้ว +2

      Lol no i never considered that :)

    • @ilyam.1872
      @ilyam.1872 2 ปีที่แล้ว +1

      @@mCoding absolutely should, it's so easy and error-prone, practically a cheeseburger of python.

  • @MrShoorf
    @MrShoorf 2 ปีที่แล้ว +1

    "As well as BaseModel, pydantic provides a dataclass decorator which creates (almost) vanilla python dataclasses with input data parsing and validation."

  • @12nites
    @12nites 2 ปีที่แล้ว

    man, you really hammered down on this issue. No need to watch anything else.

  • @vxsery
    @vxsery 2 ปีที่แล้ว

    🎉🎉🎉🎉

  • @ManuelBTC21
    @ManuelBTC21 2 ปีที่แล้ว +1

    If you care about correctness, I would argue for NamedTuple. The fact that it's immutable is a feature, not a bug.

    • @mCoding
      @mCoding  2 ปีที่แล้ว +5

      Immutabulity is definitely a feature, but mutability is also a feature. As always you should choose based on what is most appropriate for your problem.

  • @deekshantwadhwa
    @deekshantwadhwa 2 ปีที่แล้ว

    Which software/package/language are you using for the graphs UI in the end?

    • @mCoding
      @mCoding  2 ปีที่แล้ว

      Plotly! See the source code in the description if you would like to see the exact code i use to generate the plots.

  • @Moody0101
    @Moody0101 2 ปีที่แล้ว

    Well, firstly, your hair was so great tho

  • @joshbennett5908
    @joshbennett5908 2 ปีที่แล้ว +1

    What tool are you using for your bar chart?

    • @mCoding
      @mCoding  2 ปีที่แล้ว +1

      Plotly express! It can export to html you can share in your browser without python even installed.

  • @jakubjakubec9693
    @jakubjakubec9693 2 ปีที่แล้ว

    I have my own class decorator that returns dataclass(cls), but I get no type hints this way. Is there a way to fix it ?

  • @vekyll
    @vekyll 2 ปีที่แล้ว

    I'm a bit confused... do you have any idea why SimpleNamespace's get is so horribly slow? I mean, it's a hash lookup anyway.

  • @scottbrewer474
    @scottbrewer474 2 ปีที่แล้ว +2

    And here I was thinking I was fancy by bundling data into a dictionary vs lots of variables! (Stupid Dunning-Kruger effect)

    • @zachwhite2716
      @zachwhite2716 4 หลายเดือนก่อน

      Give yourself enough time and you’ll come back to the wisdom of simply using dictionaries instead of complex nested objects.

  • @korbiniankoch
    @korbiniankoch 2 ปีที่แล้ว

    Which tool are you using to create the interactive bar charts?

    • @mCoding
      @mCoding  2 ปีที่แล้ว +1

      Plotly express

  • @viktornerlander1409
    @viktornerlander1409 2 ปีที่แล้ว

    if i have a very large set of data, with different types of data like multiple timeseries, single character/digit variables etc, should i use dataclasses to store them? and if so how? do i pickle classes? right now i'm using pandas for everything. thanks for the video

    • @zachwhite2716
      @zachwhite2716 4 หลายเดือนก่อน

      I may be in the extreme minority here, but IMO dataclasses are not a good fit in most situations, but particularly here where you have large sets of nested data. Just stick with dict or pandas.

  • @Rebeljah
    @Rebeljah 2 ปีที่แล้ว +1

    Yeehaw baby type barren code ftw!!1

  • @grzegorzryznar5101
    @grzegorzryznar5101 ปีที่แล้ว

    @mCoding How do measure speed execution in a repative way? I was trying to measure performance, but for the same setup I had got scores differing a lot (more than few percentages). Code was purely in Pyhon, no external sources, no io, but still differences were very noticeable.

    • @mCoding
      @mCoding  ปีที่แล้ว

      For this video I believe I used timeit since they are tiny snippets, and the timing code is available in the github repository in the description. Timing measurements may vary drastically depending on things such as on your your cpu and version of Python, which is why it is always best to verify the timings for your own setup!

    • @PanduPoluan
      @PanduPoluan ปีที่แล้ว

      @@mCoding Also with Intel's franken-CPU having "P" cores and "E" cores, it will be a gamble.

  • @Destrolll
    @Destrolll 2 ปีที่แล้ว

    Please care to explain why shouldn't I assign attributes to an instance of an empty class? 4:50

  • @sevdalink6676
    @sevdalink6676 ปีที่แล้ว

    For me Pydantic is great for prototyping and the losses are acceptable for the sake of being always in detail informed about data errors. It even enables you to to skip writing early tests because of that.
    Still the charts are extremely useful to show that Pydantic can be an important target in optimization.

    • @mCoding
      @mCoding  ปีที่แล้ว +1

      An excellent point. This is Python after all, raw speed is not usually what we optimize for and paying some extra runtime cost for data validation when it "shouldn't" be needed may be worth it depending on the situation.

    • @heroe1486
      @heroe1486 10 หลายเดือนก่อน

      Is it tho ? 5 microseconds for creation, 9 and 400 ns for getting and setting, and it was before pydantic v2 enhanced by rust.
      Unless doing several thousands of those are we really concerned about those numbers in python ? Especially when writing an API where the network latency and DB queries could easily reach the 100ms mark in good conditions.

    • @sevdalink6676
      @sevdalink6676 10 หลายเดือนก่อน

      @@heroe1486 I agree that it would be great to see this video with Pydantic V2 performance included. They made amazing progress.
      I agree with the rest you said as well. You asked and answered you question. Like I said, it can be an important part, not everywhere, but it's good to have it on your checklist.

  • @chriskeo392
    @chriskeo392 2 ปีที่แล้ว

    What is the use case for slots?

  • @liesdamnlies3372
    @liesdamnlies3372 2 ปีที่แล้ว

    ALL the dataclasses

    • @mCoding
      @mCoding  2 ปีที่แล้ว +1

      I'm sure to get comments about others I forgot :)

  • @lex_darlog_fun
    @lex_darlog_fun 2 ปีที่แล้ว +1

    @mCoding are you REALLY sure you've measured memory footprint correctly? What was your test methodology? The difference between NamedTuple/dataclass/class is supposed to be quite different from what you've shown (they do differ but not THAT much).
    According to this video (it's in russian, but code is clearly visible): youtube /tsEG0WM3m_M?t=60 :
    1. The author uses pympler.asizeof() function instead of built-in ones since it's the only right way to measure *FULL* memory consumption of a given object. I personally re-tested it (generated a HUGE collections, taking literally gigabytes if RAM) - and yes, the built-in ones were returning some rediculous results, not even close to the actual RAM taken by python interpreter.
    2. According to his tests, the difference is actually like this (on 1k instances):
    2:05 - dict = ~ 1.2MB
    3:44 - dataclass = ~ 1Mb
    5:04 - namedtuple = ~ 720 Kb
    5:54 - typed NamedTuple = also ~ 720 Kb

    • @mCoding
      @mCoding  2 ปีที่แล้ว

      It's hard to say whether the way I counted things is the "correct" way because it depends on what you wanted to count, but the numbers are approximately the same with pympler vs the getsize method I used. The order of which classes use the most memory is exactly the same with either method. The main difference between what pympler does vs what I did not do is that pympler tried to account for object alignment. pympler assumes that all Python objects are 8-byte aligned and no packing is done (hence why the pympler answers are all multiples of 8), counting padding bytes in the total size count. On the opposite end my getsize assumes all objects are optimally packed together, not including padding bytes in the total size. The truth is probably somewhere in the middle and also an implementation detail that could change at any moment. But, in any case, I wouldn't call either method the "correct" one, they are both good estimates and their difference is pretty small.
      Also note that depending on the way you do your tests the data can make a big difference in how much space is actually used. For example (1,1) uses less memory than (1,2) because the 1 objects in the first tuple are the same.
      pympler
      0: dataclass (slots) - 168 bytes
      1: plain class (slots) - 168 bytes
      2: tuple - 176 bytes
      3: NamedTuple - 176 bytes
      4: namedtuple - 176 bytes
      5: attr class (slots) - 176 bytes
      6: dataclass - 432 bytes
      7: plain class - 432 bytes
      8: attr class - 432 bytes
      9: dict - 512 bytes
      10: SimpleNamespace - 552 bytes
      11: pydantic - 560 bytes
      method i used in video
      0: dataclass (slots) - 162 bytes
      1: plain class (slots) - 162 bytes
      2: tuple - 170 bytes
      3: NamedTuple - 170 bytes
      4: namedtuple - 170 bytes
      5: attr class (slots) - 186 bytes
      6: dataclass - 408 bytes
      7: plain class - 408 bytes
      8: attr class - 408 bytes
      9: dict - 488 bytes
      10: SimpleNamespace - 528 bytes
      11: pydantic - 536 bytes

    • @lex_darlog_fun
      @lex_darlog_fun 2 ปีที่แล้ว

      @@mCoding thanks for such a detailed responce.
      > For example (1,1) uses less memory than (1,2)
      Obviously, when you do performance tests, you need to intentionally break those under-the-hood optimisations. Back then, when I was checking myself examples from the forementioned video, I used the simplest values for items I could think of. iirc, each class (simple class, dataclass, dict, set, list, tuple and various types of named tuples) had just 3 values:
      1. an int, unique for each item (and I know that int is internally optimised up to 256 or smth, but that's neglegable relative to the total number of items I had for test - iirc, it was about millions, tens of millions or smth of that matter).
      2. the same int, converted to a string, padded with random ASCII characters to make all the strings of equal length (used random characters instead of zeroes - just to be sure).
      3. a float in [0, 100.0] range - also unique for each item.
      And to be the most precise, as I said, I kept increasing the number of items until the total collection size reached above 1 Gb. Each measure attmpt was done in a separate python session. And that's the thing I'm intrested the most when I asked about your methodology. With your method - did you just create a single instance and measured it or you generated a big enough number of them, measured the total consumption and divided it by the number of items? I mean, a single item difference might be 168 bytes vs 162. But if you have a tuple with a million of dataclass instances vs the same tuple type storing the same million of items with the same underlying data, but items themselves are NamedTuples now, my results were very different from what you've shown. At the end of a day, it doesn't matter that each individual instance is reported about the same. What matters is when you have a ton of them, and the only varying factor is type of an item, you should count the total difference as overhead. You won't use just a single instance of that dataclass/namedtuple in your program. So I don't know the theory behind it, but in practice my own tests gave the same results that russian guy tells in the video. And dataclass vs NamedTuple were nowhere near 162 vs 170 numbers you provide.
      Speaking of which, I have no idea how it's even possible for dataclass to take less memory than a named tuple or even a simplest tuple.
      So, could you disclose your methodology?
      To be clear: I'm not attacking, I really want to know the actual difference in various types of data containers. I'm just concerned that the numbers you provide conflict with basically everything I ever heard on the subject and with my own synthetic tests.

  • @BosonCollider
    @BosonCollider ปีที่แล้ว

    I like msgspec

  • @rikschaaf
    @rikschaaf 2 ปีที่แล้ว

    Can't you throw your python code through some optimizer to convert everything to a tuple wherever possible? Your source code would still be your own readable code, but the optimized code that comes from that will be more optimized for speed and memory usage. Best of both worlds!

  • @guzziiw
    @guzziiw ปีที่แล้ว

    Do you mind explaining why using dict is error-prone? Doesn't seem trivial to me.

    • @PanduPoluan
      @PanduPoluan ปีที่แล้ว

      Unless you define a TypedDict, you might accidentally mistyped a key resulting in a KeyError.

    • @zachwhite2716
      @zachwhite2716 4 หลายเดือนก่อน

      Personally I find that the “potential typo” issue is overstated. I have 20 years of python experience and it’s never been a serious source of errors. Code that isn’t easily understood, like when you use a mess of nested classes instead of a simple data structure with a dictionary at its root, however, has caused me a ton of problems and really hard to debug situations.

  • @dylan-dylan-dylan
    @dylan-dylan-dylan 9 หลายเดือนก่อน

    Accessing a dictionary's values by key is its primary purpose...it's only error-prone if you are ignorant to the pass-by rules of the value's type.
    #teamdict

  • @alansnyder8448
    @alansnyder8448 9 หลายเดือนก่อน

    @mCoding. Could you redo this video with Pydantic 2.0? I get what you are saying about @dataclass being used in internal applications but sometimes you don't know for sure if it won't eventually be serialized into JSON, so pydantic is something I choose if I'm not sure. I want to know if the new 2.0 with Rust implementation has gotten the speed into the same ballpark as the other options.

    • @mCoding
      @mCoding  9 หลายเดือนก่อน +1

      Hmm, perhaps. While a rust implementation under the hood may improve performance, I suspect that it will not change the qualitative picture very much. Pydantic is slower primarily because it is fundamentally doing more work, namely validation and conversion, whereas the other options do neither validation nor conversion.

    • @alansnyder8448
      @alansnyder8448 9 หลายเดือนก่อน

      @@mCoding Maybe a good video might be how to use Dataclass and Pydantic together.
      I think in my case half of my projects are with FastAPI which I love and it depends on Pydantic. I've seen too many videos that compare Pydantic with Dataclasses (yours included) and have come to think of them in the same category. Since I'm already working with Pydantic in half my projects I've just gotten very comfortable with them.
      Knowing the performance hit puts a slightly different spin on the situation so maybe Dataclasses should be used for all internal-only data that won't be parsed. So then maybe just wrap a Dataclass in a field of a Pydantic class when you need to parse it.
      I'll keep this in mind myself in the future.
      Pydantic + Dataclasses would be an interesting video for me if you solicit ideas.

  • @user-iv3tb8pp3x
    @user-iv3tb8pp3x ปีที่แล้ว

    Also dataclasses can be "frozen" so they are not modified, which to me is better than pydantic's BaseModel

  • @chaseduckett135
    @chaseduckett135 2 ปีที่แล้ว

    Are you using R ggplot for the plot?

    • @mCoding
      @mCoding  2 ปีที่แล้ว

      im using plotly!

  • @irmdev595
    @irmdev595 ปีที่แล้ว

    dataclasses video with slots and inheritance(super_init)

  • @SophieJMore
    @SophieJMore 2 ปีที่แล้ว +2

    #pro_type_hint_gang

  • @hieu8276
    @hieu8276 2 ปีที่แล้ว +1

    Still prefer dataclass since there is no need to install additional packages :)

  • @0730pleomax
    @0730pleomax 2 ปีที่แล้ว

    Pydantic, attr, dataclasses, NamedTuple

  • @Michallote
    @Michallote ปีที่แล้ว

    Okay at 5:55, One of my recent headaches is reading a god-damn xml. I hate the guts out of it. I have to parse everything as it is always in string format. xml.etree is great but I still have to manually input every string. and rename classes

  • @MithicSpirit
    @MithicSpirit 2 ปีที่แล้ว +2

    Discord gang

  • @Moody0101
    @Moody0101 2 ปีที่แล้ว

    Type Hint gang awwooo!

  • @juliejones8785
    @juliejones8785 2 ปีที่แล้ว

    If only python-box was included. It provides both dictionary style and dot style access