Python itertools - The key to mastering iteration

แชร์
ฝัง
  • เผยแพร่เมื่อ 27 มิ.ย. 2024
  • The key to iteration in Python
    A key feature of Python is it's incredible support for lazy iteration. Defining and consuming lazy sequences in Python are easy. You can even define an infinite sequence like powers of 2 or prime numbers! To make working with iterables in Python even easier, Python provides some basic building blocks to help you compose iteration primitives in different ways, which is what the itertools module is for. This module comes with 21 different primitives (and many more recipes for how to combine them), which we go over in this video. Will you use them?
    ― mCoding with James Murphy (mcoding.io)
    Docs: docs.python.org/3/library/ite...
    Source code: github.com/mCodingLLC/VideosS...
    SUPPORT ME ⭐
    ---------------------------------------------------
    Sign up on Patreon to get your donor role and early access to videos!
    / mcoding
    Feeling generous but don't have a Patreon? Donate via PayPal! (No sign up needed.)
    www.paypal.com/donate/?hosted...
    Want to donate crypto? Check out the rest of my supported donations on my website!
    mcoding.io/donate
    Top patrons and donors: Laura M, Jameson, Dragos C, Vahnekie, Neel R, Matt R, Johan A, Casey G, Mark M, Mutual Information, Pi
    BE ACTIVE IN MY COMMUNITY 😄
    ---------------------------------------------------
    Discord: / discord
    Github: github.com/mCodingLLC/
    Reddit: / mcoding
    Facebook: / james.mcoding
    CHAPTERS
    ---------------------------------------------------
    0:00 Intro
    1:47 An initial warning
    2:30 Lookahead motivating multi-accumulate
    3:09 ALL 21 ITERTOOLS - itertools.count
    3:41 itertools.cycle
    4:01 itertools.repeat
    4:17 itertools.accumulate
    4:58 itertools.batched
    5:28 itertools.chain
    5:58 itertools.chain.from_iterable
    6:18 itertools.compress
    6:42 itertools.dropwhile
    7:32 itertools.filterfalse
    8:18 itertools.groupby
    10:08 itertools.islice
    11:04 itertools.pairwise
    11:47 itertools.starmap
    12:43 itertools.takewhile
    13:02 itertools.tee
    14:22 itertools.zip_longest
    14:48 itertools.product
    15:43 itertools.permutations
    16:15 itertools.combinations
    16:55 itertools.combinations_with_replacement
    17:22 multi_accumulate example
    19:43 Thanks
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 99

  • @mCoding
    @mCoding  2 วันที่ผ่านมา +23

    Errata:
    0:00 filter(x) isn't valid; to filter out falsy values, use filter(None, x)
    2:33 & 17:22: the example call in multi_accumulate's docstrings yields an additional (1, 1) at the beginning
    4:40: the min and max should be arguments to itertools.accumulate, not list

    • @oida10000
      @oida10000 วันที่ผ่านมา

      This first error might be the reason for filter_false. But overall I find a nice synatax sugar, might look better then ! or not . Will there be a follow up with more_itertools?

  • @QuantumHistorian
    @QuantumHistorian 2 วันที่ผ่านมา +40

    The combinatorics functions are pretty damn useful IMO. Sure, it might "just" be for maths stuff, but the range of computational maths problems that involves them is vast! It's essentially the answer to the question "what are all the ways to put these inputs together?" which is super generic.

  • @dcmayo
    @dcmayo 2 วันที่ผ่านมา +46

    I use the combinatorial ones a lot, often for testing things. I use all four of them.
    I have 5 versions of networking device firmware, and I want to make sure they're all compatible with each other, so I iterate over all the pairs of combinations_with_replacement.
    I have 10 devices networked together, and I want to test throughput for all possible paths. That's combinations, or permutations if I want to test both directions.
    I have 5 versions of firmware and 3 models of devices, and I want to make sure all firmwares work on all devices, so I iterate over the product of firmwares and models

    • @scottza
      @scottza 2 วันที่ผ่านมา +1

      That's an awesome example use case. Thanks for sharing.

  • @Liam_The_Great
    @Liam_The_Great 2 วันที่ผ่านมา +37

    "iterable" doesn't sound like a word anymore

    • @zanus5591
      @zanus5591 2 วันที่ผ่านมา +3

      "irrbl"

    • @kilianvounckx9904
      @kilianvounckx9904 2 วันที่ผ่านมา +5

      It's called semantic satiation. Pretty interesting concept

    • @crayyy_zee
      @crayyy_zee 2 วันที่ผ่านมา

      ​@@kilianvounckx9904 TIL

  • @agbenfante
    @agbenfante วันที่ผ่านมา +1

    “Try not to get caught up in showing off just how well you know the itertools library”
    I feel attacked

  • @314Labs
    @314Labs 2 วันที่ผ่านมา +17

    at 4:41 I'm assuming "min" and "max" are passed to "accumulate" and not the "list" constructor

    • @shyohevzion984
      @shyohevzion984 2 วันที่ผ่านมา +1

      Yep. I noticed that too

  • @jacanchaplais8083
    @jacanchaplais8083 2 วันที่ผ่านมา +8

    I think there's a bug in your multi_accumulate example. You'd want the first argument of itertools.accumulate to be iterator, not iterable, right? For non exhaustable iterables, like range or sequences, you will end up counting the first element twice. Not an issue for min or max, but you'll definitely see an issue if you use the running sum example. iterator will have that initial value removed, though, so using that instead should solve the problem.

    • @tcbtrvpsrno5985
      @tcbtrvpsrno5985 2 วันที่ผ่านมา

      Indeed, this is also the case for, the most commonly used, python list also. Changing the "iterable" on line 105 to "iterator" as declared on line 99, would ensure only the rest of the iterating elements are included in the accumulate function (instead of the whole iterable from index 0). Anyway thanks mCoding! This is an insightful video!

  • @fabiolean
    @fabiolean 2 วันที่ผ่านมา

    I was part of a project that did analysis on proposed firewall rules. Since the rules could be subnet-to-subnet we used the combinatoric functions to ensure we analyzed every possible unique combination of source and destination addresses in the proposed rules.

  • @jevandezande
    @jevandezande 2 วันที่ผ่านมา +19

    filterfalse makes sense when you have a function defined elsewhere that you are using (e.g. filter(my_func, it)). To reverse the conditional would require writing a new function or adding a lambda (e.g. filterfalse(lambda x: not my_func(x), it)), which is ugly.

    • @syrupthesaiyanturtle
      @syrupthesaiyanturtle 2 วันที่ผ่านมา +3

      the lambda isn't ugly

    • @IceArdor
      @IceArdor 2 วันที่ผ่านมา +7

      ⁠@@syrupthesaiyanturtleit's an extra function call and requires giving a name to the temporary arguments.
      Python's functools doesn't have a nice compose function, otherwise we'd have:
      filter(compose(operator.not, my_func), iterable)
      But none of that matters because a good portion of the Python itertools library relies on filterfalse being defined, so why not make that function public if they had to implement it anyways.

    • @syrupthesaiyanturtle
      @syrupthesaiyanturtle 2 วันที่ผ่านมา +1

      @@IceArdor why not just add a parameter to the filter function instead of creating a new one entirely?

    • @crayyy_zee
      @crayyy_zee 2 วันที่ผ่านมา +1

      ​@@syrupthesaiyanturtlea new parameter, presumably something called "presume_false" would be uglier

    • @isodoubIet
      @isodoubIet วันที่ผ่านมา

      Defining filterfalse is also ugly so that's not really a great argument

  • @isodoubIet
    @isodoubIet วันที่ผ่านมา

    Combinations with replacement are very useful for implementing the statistical bootstrap.

  • @traal
    @traal วันที่ผ่านมา +1

    I use batched all the time at work, for sysadmin type stuff, or API queries that slow down when you give too many search terms.
    I’ve used the combinatoric functions to solve programming challenges, e.g. Advent of Code

  • @guidodinello1369
    @guidodinello1369 2 วันที่ผ่านมา +7

    Ive found those functions useful while doing simple grid searching where you test combinations of hyperparameters to tune ml models.

  • @Jakub1989YTb
    @Jakub1989YTb วันที่ผ่านมา

    I was like: pff, itertools. I know those, what new can I learn.
    But that last example... very clever :-) yet still just applying the basics. Nice.

  • @blitzarsun
    @blitzarsun 2 วันที่ผ่านมา +2

    I was hoping you would do itertools. Please do functools too!

  • @klmcwhirter
    @klmcwhirter 2 วันที่ผ่านมา +1

    I disagree "slightly" with your comment about using a for-loop instead of chaining the utility functions.
    First, always test the performance of the algorithm which you are conceiving. In general, the built-in utilities (especially the ones implemented in C) almost always perform better than something written in Python. But, performance in Python is not always intuitive - especially when chaining things together where boxing / un-boxing occurs. Test!
    I definitely agree with your statement about "showing off". I usually frame my comment about this as a maintainability problem. Think about the poor soul who will have to troubleshoot / enhance that part of the code 3 years from now. And if I wrote the code, inevitably that poor soul will be me.
    My rule of thumb is ... if you have to write more lines of comments to explain what is going on than the actual lines of code themselves - something is wrong.
    Thanks for another great video!

  • @_Dearex_
    @_Dearex_ วันที่ผ่านมา

    thanks for the pairwise tip!

  • @Jagi125
    @Jagi125 6 ชั่วโมงที่ผ่านมา

    I've used permutations and combinations in genomics. Probably pretty useful for big data in general.

  • @yuvaldolev7969
    @yuvaldolev7969 วันที่ผ่านมา

    Great vid. Is functools next?

  • @DrDeuteron
    @DrDeuteron 2 วันที่ผ่านมา +3

    filterfalse is necessary b/c sometimes your predicate is "bool", and lambda x: not x is blech.

  • @replicaacliper
    @replicaacliper 2 วันที่ผ่านมา

    been waiting on this one

  • @joshix833
    @joshix833 2 วันที่ผ่านมา +3

    I've created a typed_stream library to use the most important lazy functions from itertools as methods on a Stream class. That's far more readable imho than the functions

    • @aflous
      @aflous 2 วันที่ผ่านมา +1

      Link

    • @joshix833
      @joshix833 2 วันที่ผ่านมา

      @@aflous it's on pypi and on github on my account Joshix-1 with the same name (yt often deletes links)

  • @luketurner314
    @luketurner314 2 วันที่ผ่านมา +1

    The combinatoric ones are useful for Sudoku variant helper tools

  • @LasradoRohan
    @LasradoRohan 2 วันที่ผ่านมา

    Awesome video as usual. Just one question. At 18:21 wouldn't you have to skip one from the iterable when passing it to accumulate? If so, not doing so will result in functions like sum adding the same element twice.

  • @atrus3823
    @atrus3823 2 วันที่ผ่านมา

    My rule of thumb for filter and map is use filter and map, unless I want to define my own function. I always find comprehensions nicer than lambdas. For example, map(str, [1, 2, 3]) to be me is nicer and clearer than (str(a) for a in [1, 2, 3]), but (a*2 for a in [1, 2, 3]) is nicer than map(lambda a: a*2, [1, 2, 3]). Same for filter: filter(str.isupper, ['a', 'B', 'c']) is nicer than (a.isupper() for a in ['a', 'B', 'c']), but (a for a in [1, 2, 3] if a == 2) is nicer than filter(lambda a: a == 2, [1, 2, 3]).

  • @FedericoSpada13
    @FedericoSpada13 2 วันที่ผ่านมา

    14:34 I've used it to print the content of a list inside a tkinter table: I zip_longest the list and the rows already present in the table; if the row item is None (not enough rows), I add a new row, if the list item is None (too many rows), I delete the row; finally if both are present, I update the row with the list item. I've found out that this is faster than always deleting all rows and then re-add them: it's better to update the existing ones.

  • @jesavius
    @jesavius 2 วันที่ผ่านมา +13

    Nice! But I want MORE! I SAID MORE!!! As in more-itertools, of course.☺

    • @johnnyq4260
      @johnnyq4260 2 วันที่ผ่านมา +3

      If you could make this comment recursive it'd be really cool.

    • @IceArdor
      @IceArdor 2 วันที่ผ่านมา +2

      The itertools recipes are available in the more-itertools package. Rather than copy-pasting the recipes, I just pip install more-itertools and I'm done.

    • @BrennenRaimer
      @BrennenRaimer 2 วันที่ผ่านมา

      I don't understand why the docs give the recipes for more-itertools and even link its docs, but then does not just include them in the standard library itertools? It's so silly!

  • @unusedTV
    @unusedTV 2 วันที่ผ่านมา +1

    Combinatorics are guaranteed to be useful at least once a year - Advent of Code.

    • @traal
      @traal วันที่ผ่านมา

      I came here to say this. 😊

  • @atrus3823
    @atrus3823 2 วันที่ผ่านมา

    I prefer a combination of the two dot product functions. The starmap and operator are what make the first version clunky, but the rest is OK, and with a generator comprehension it can be made really nice:
    sum(x * y for x, y in zip(u, v, strict=True))

  • @BenGroebe
    @BenGroebe 2 วันที่ผ่านมา +5

    I used zip_longest recently. I needed to vertically display two lists side by side in a GUI, and there was actually very little chance they'd be the same length. zip_longest with fill="", then '
    '.join

  • @mvsh
    @mvsh วันที่ผ่านมา +1

    When your C++ background doesn't let you spell "in" without a "t" 0:04

  • @atrus3823
    @atrus3823 2 วันที่ผ่านมา

    Another advantage of product is that it can be a step in an iterator stream, whereas nested for loops can't be.

  • @Talon_24
    @Talon_24 2 วันที่ผ่านมา +3

    5:05 There's now batched and 11:24 pairwise in the standard itertools? That's fantastic, i had to write these so many times🎉
    14:40 I feel i'm using zip_longest less often than zip, but still regularily

  • @marcinpohl3264
    @marcinpohl3264 วันที่ผ่านมา

    Could you give an example of compress vs filter where one of them offers a clear benefit (readability, performance, memory usage, anything)? They seem so close they're almost interchangeable (are they?)

  • @zeropointer125
    @zeropointer125 2 วันที่ผ่านมา +2

    I would have used those combinatorics functions (and even zip longest) for advent of code challenges.
    Probably not any real world code

  • @alex_lanes
    @alex_lanes วันที่ผ่านมา

    I use tee to duplicate a generator.
    Useful for counting how many lines my Cursor SQLite returned without consuming it

  • @anon_y_mousse
    @anon_y_mousse 2 วันที่ผ่านมา

    I think the reason they have functions like filterfalse is because they want to provide a way to negate a filter function without wrapping it in a lambda in cases where you already have a function to use for filtering that you can't edit. Things like standard functions or something from a library would fit. It is annoying that it's named weirdly, though. I think something like filternot would make just as much sense and be two letters easier to type.
    I cloned the repository for Python and was playing with 3.11, but still haven't installed it. Meanwhile, my distro package is only at 3.9 and I'm leery of breaking things by upgrading it. I may end up dual installing it at some point so I can continually upgrade without worrying about breaking the system packages, but I don't use newer features like that for anything more than tests. Oh well, at least some of these will be inspiration for features I add to my own language and it makes me feel better about not having released it yet.

  • @eliavrad2845
    @eliavrad2845 วันที่ผ่านมา

    18:40 last_element seems useful. Honestly I wished there were a built-in "first" and "last" functions: there are so many times I'm using Jupyter or a console and I just want to check the structure of a random iterable (returned list, dict keys, etc), and a "next(iter())" always feels very clunky and not intuitive.

  • @Aramizyera23
    @Aramizyera23 วันที่ผ่านมา

    starmap & zip use in the example in the title is redundant: map can take multiple iterables.
    So it could be sum(map(mul, x, y))

  • @thegoose6900
    @thegoose6900 2 วันที่ผ่านมา

    Looking at your thumbnail it seems that just putting return x[0]*y[0] + x[1]*y[1] is a lot more readable + shorted, probably even faster because it doesnt have to look up for all those functionw you use

  • @MyrLin8
    @MyrLin8 วันที่ผ่านมา

    the 'better use case' you seem to miss is 'security' ... aka: encryption.

    • @mCoding
      @mCoding  วันที่ผ่านมา

      Could you elaborate? I'm not familiar with any encryption schemes that iterate over permutations or combinations.

  • @resatcanerbas3541
    @resatcanerbas3541 2 วันที่ผ่านมา

    Some of them are quite useful, but as a developer, I need to break my habits first to remember the use case and instead of loops go to the itertools.

  • @theViceth
    @theViceth 23 ชั่วโมงที่ผ่านมา

    for any benchmark testing, permutations is the best thing available.
    But saying "I use it a lot" when i literally wrote a single script doesn't feel right. Even if I use it almost daily

    • @mCoding
      @mCoding  17 ชั่วโมงที่ผ่านมา

      A single script that you use every day sounds like an awesome script!

  • @jasonhenson7948
    @jasonhenson7948 2 วันที่ผ่านมา

    I always find it hard to see how some more niche recipes/functions could be used without knowing the problem first.
    Regarding zip_longest: it feels like I might want to know if something happened, but not always care what happened.

  • @unperrier5998
    @unperrier5998 2 วันที่ผ่านมา

    You could have mentionned that multi_accumulate() and multi_reduce() are monadic and what monad they implement.

    • @mCoding
      @mCoding  วันที่ผ่านมา

      Hmmm true, but would throwing in the definition and explanation of a monad in an already 20 minute video be a good thing or a bad thing?

    • @unperrier5998
      @unperrier5998 วันที่ผ่านมา

      @@mCoding monads are meant for pure FP anyway, because in FP they don't have blocks and in particular exceptions blocks. They don't really have a place in python.
      That said it would be an interesting topic for a future video if you'd like to attempt it, because it's not an easy topic to explain clearly. Maybe a series on explainind the common monads and it could be complemented with OO design patterns.

  • @NotAUtubeCeleb
    @NotAUtubeCeleb 2 วันที่ผ่านมา +1

    Itertools group by is reminiscent of how old fashion Hadoop works with parallel key, value operations

  • @kventinho
    @kventinho 2 วันที่ผ่านมา

    hi, newbie programmer here. Why do you keep using 'assert' in your functions? What's the point of this keyword in this context?

  • @mmilerngruppe
    @mmilerngruppe 2 วันที่ผ่านมา

    12:21 there is no better place for lambda as that. add = lambda pair: pair[0] + pair[1]

    • @mCoding
      @mCoding  วันที่ผ่านมา

      On the contrary, giving a name to a lambda is considered by some (linters) to be a class 3 felony.

    • @mmilerngruppe
      @mmilerngruppe วันที่ผ่านมา

      @@mCoding okay okay okay, then just rewriting add function would be enough

  • @MLGJuggernautgaming
    @MLGJuggernautgaming 2 วันที่ผ่านมา

    In the thumbnail you’re missing a )

  • @chair547
    @chair547 2 วันที่ผ่านมา

    When she iterate on my object till I throw

  • @QuantumHistorian
    @QuantumHistorian 2 วันที่ผ่านมา +6

    How come all the comments are porn bots? And who's this Scott they're all referring to?

    • @avasam06
      @avasam06 2 วันที่ผ่านมา +9

      That's just modern day TH-cam. Dislike and report.

    • @trustytrojan
      @trustytrojan 2 วันที่ผ่านมา +2

      luckily they all got deleted, or maybe youtube is filtering them from my view

    • @johnnyq4260
      @johnnyq4260 2 วันที่ผ่านมา +1

      I don't see them. Maybe TH-cam is showing them only to frequent visitors to porn sites?😂

    • @felixfourcolor
      @felixfourcolor 2 วันที่ผ่านมา +1

      You would see them if you're early. I reported all of them, and I suppose many other early viewers did, so I guess YT has removed them.

    • @squishy-tomato
      @squishy-tomato 2 วันที่ผ่านมา +1

      Yeah - stupid bots responses are only removed after many users report them. A technical response I take 3 minutes writing gets banned automatically within seconds.
      I think I should submit my CV to youtube, because it doesn't seem like they know what they're doing.

  • @hanabimock5193
    @hanabimock5193 วันที่ผ่านมา

    Python is so Slow

  • @BlackDroid003
    @BlackDroid003 2 วันที่ผ่านมา

    If you need a list with a given length (eg for some sort of buffer/storage), instead of
    list(itertools.repeat("X", 4))
    You could also use
    ["X"] * 4
    Not sure if there are any notable performance differences, but you need one less import, and its shorter

    • @IceArdor
      @IceArdor 2 วันที่ผ่านมา +2

      Everything in itertools was written to be lazy because iterables may be have infinite items or may have a finite number of items that exceeds the amount of RAM.
      repeat("X", 1_000_000_000) versus ["X"] * 1_000_000_000
      Additionally, you may not know ahead of time how many times you need to repeat that element, and iterators don't specify a length (and shouldn't be eagerly consumed). This starts to matter once you've adopted fully lazy iterators in your code:
      excel_column_A_cells = zip(repeat("A"), count(1))
      yields A1, A2, A3, ...
      You couldn't implement this with an explicit list.

    • @anon_y_mousse
      @anon_y_mousse 2 วันที่ผ่านมา

      @@IceArdor Technically it's a generator for tuples `('A',1),('A',2),...`, but it is funny that using lazy evaluation requires writing more code.