This is the kind of videos we all need! Please more of deep diving into python standard library, still a lot of modules that are not used or misused by a growing number of people.
My favorite use is in paginated synchronous API calls. I can mask the pagination process from the consumer while iterating across records, allow the consumer to stop before reaching the final page, and in doing so potentially reduce API calls. It also makes it easy to implement retries and backoffs while maintaining state.
When using lxml.etree.iterparse() with a requests.get(stream=True) document, this is what's happening too. Means that you can just terminate the connection after you've found what you wanted from an API call. Or even better: read the specific data you want without reading the full document in memory because who knows how large it is. Not something that uses a self-built iter, but a very useful example that I use all the time for some annoying API's that are pretty much all or nothing with their data
What a way to break the ice! Thank you 😎 Unless I missed existing content, may I kindly suggest an episode featuring argument/keyword argument unpacking?
Or you can use pytest, build fixtures out of several already parametrized fixutres and let the framework mix up those parametrization combinations behind the scenes.
Nice, I didn't know about starmap. I feel like you should be able to deconstruct tuple arguments in lambdas directly, but this is much better than indexing. Another one I needed sometimes is islice, which skips elements at the beginning and ends early based on two index-like numbers.
My most used function fom itertools is groupby, to get groups of items wiith the same "identifier". Ofter used in combination with sorted. For example you have a list of tuples and whant to group by all tuples with the same first item.
What I love about iterators is the ability to do lazy sequence operations. This lets you create infinite sequences, zip them, map them, accumulate them etc. This allows for very expressive code that would otherwise be pretty noisy if written in an imperative style
Thanks for deep dive into itertools package, keep posting such videos. Key highlights can be: 1. Difference between iterable and iterator. 2. Hashable types (although hidden in this video context). 3. Itertools functions(liked permutations, combinations, starmap).
Using itertools and iterators for a while. `more_itertools` is a very useful package built on top of itertools, and in fact is recommended in the itertools docs.
The pairwise function can be very useful when you need an element and its successor in a iterable. It returns the pairs made of consecutive values. For instance : pairwise([1,3,5]) = (1,3), (3,5) I use it when I need to compute the mean of two consecutive values.
@@Michallote you can give product any number of iterables and it will give you every combination between these iterables. Lets say you have a tuple (1, 2, 3) and another one with ("a", "b", "c"). And you need every combination so 1a, 1b, 1c, 2a, 2b, 2c ... A nested loop is fine for two iterables, but lets say you have even more iterables. Nobody wants to see a triple or quintuple nested loop. Better use product(). It yields the values and is also more performant and you dont need nested loops.
itertools is one of my favorite modules of the standard library, so I made my own one extending it starting with the recipes adding to it my own creations or whatever I come across in Stack Overflow or the like... to my surprise somewhat recently they started recommending the more_itertools external library for said recipes and much more, so I also added it to mine and in the overlap I pick and choose which I like more between my take on it and more_itertools XD
starmap uses the * operator to unpack each element into multiple arguments for the given function. In Arjan’s example starmap requires a function that takes two ints as arguments where as map would require a function that takes one argument that is a tuple of ints
@@pablogonzalez7959 In general, itertools.starmap() is slightly more efficient than using map() with a lambda function as it doesn't need to create a lambda function for each item in the input iterable. This can result in a performance improvement when working with large input iterables. itertools.starmap() implementation uses an iterator internally, so it's memory efficient and faster than map when working on large input iterables. That said, the difference in performance is likely to be small, and in most cases, the readability and simplicity of the code should be the primary consideration when choosing between map() and itertools.starmap(). It's worth noting that map() is generally faster than list comprehension and generator expressions when the function is predefined and non-trivial, because map() is implemented in C. In summary, both map() and itertools.starmap() are efficient options for applying a function to each item in an input iterable, but itertools.starmap() is slightly more efficient when working with large input iterables
@@pablogonzalez7959 I don't know the CPython implementation off the top of my head, but I would only expect a difference in a case where the inner tuple is large and the map function only uses one (or a predefined set) of the elements of that tuple. In that situation starmap would result in a complexity of O(n*m) where map would be O(n * 1). The inner iterable is unlikely to be that long and the inner tuple also probably doesn't grow with your dataset, so I wouldn't worry about it. IMO where starmap makes semantic sense it also doesn't impact performance.
Because in the loop where he uses a print statement in which he sets the value of end="" (empty string) another than the default value which is ( ) which eventually leads to this behaviour, where the terminal's output is moved upwards because that's the end position of the terminal output.
You should import Iterator - and other interfaces like Mapping, Sequence, etc.- from collections.abc. The typing versions have been deprecated since Python 3.9.
Semantically, iter() exists as a builtin name but gen() does not. Structurally, generators are a subtype of iterators, supporting not just the, well, iteration through its items, but also the sending and receiving of data to and from other components of the program. As far as I remember, the generator is the basis upon which asynchronous processes are built in Python.
I think you should have mentioned zip_longest, which I have found to be the most useful itertool. The length of the zip is normally the shortest iterator, but longest it's the longest, padding out the the rest of the other iterators with None.
5:25 If iterators weren't iterable, how would for loops run over them? I thought this was the main reason for the "ambiguity" of iterators being iterables, as a sort of casting to a common type
these videos are great. the real question is ... how did you get vscode to give different color brackets to match the level of nesting there ? (i know, silly question to ask given the content)
I have the same setup in my VScode and for the life of me, I don't know what addon does what. I remember that I googled something like "how to setup VScode for python and django" and I got ideas for addons from the first three or four results.
sometimes `itertools` is combined with `operator` e.g. `operator.add`, `operator.getitem`, `operator.attrgetter`, `operator.itemgetter` to empower .starmap even more
I have used operator functions with functools.reduce(). Fun fact: reduce() was a built-in function in Python 2, demoted to being part of the functools module in Python 3.
Hey Arjan, i came to know about you recently. Turns out your different from others. Your videos are easy for beginner to understand. As am coming from commerce background, I self taught everything now working as Backend developer for a company. So I need your advice. Should I stick to Backend (python) or full stack developer?
Thanks! I think it's always useful to have some knowledge about frontend development, even as a backend developer. Since the tools we use are changing so quickly, having frontend experience as well helps you maintain a broader view of what's out there.
Assuming that the distinction between the two is the same with the synonymous methods of multiprocessing.Pool: - map() with func() and [(1,2), (3,4)] calls func((1,2)) and func((3,4)). - starmap() calls func(1, 2) and func(3, 4). Essentially the one applies the function on each item, the other unpacks each item into the arguments of the function.
Looking through my production code: (grep -rI itertools * | grep -v -e venv -e eggs | grep import ) I get a dozen instances of "from itertools import chain". Looking through my projec euler code (grep -rIh itertools * | grep -v -e venv -e lambda -e eggs | grep import): from itertools import takewhile from itertools import accumulate, count, takewhile, dropwhile, islice from itertools import product, permutations from itertools import chain from itertools import takewhile from itertools import takewhile So my use cases seem to be: 1. chain for joining lists in production 2. a wider variety of itertools for toy math problems.
Yeah but readability counts no? I find this chaining of itertools less readable than writing a good ol' function. Maybe there is a point of using itertools for the memory and speed performance?
I believe calling StopIteration an Error is factually incorrect and misleading, because I'm pretty sure it's a subtype of Exception, not Error (which itself is another subtype of Exception).
many of these things could be implemented with standard python syntax like list comprehensions, maps, standard + operators and what have you. Why would you prefer this? I guess you could do partial application, and chain more easily, but readibility and non pythoninan syntax are the down side. Not entirely convinced yet..
It is a little sad that most of your examples would be just as easily realised with generators. Where itertools really shines is in combining iterables; you only showed 'chain' and permutations/combinations. Also, working with lists makes it look less useless. The real power of itertools is in dealing with (potentially) infinite iterables.
repeat and chain don't use extra memory, unlike L = [10] * 4 witch make a list of size 4, [10,10,10,10], or L = [1, 2, 3] + [4, 5, 6] which make a new list with the content of both [1, 2, 3, 4, 5, 6]... for basic examples like those you wouldn't notice anything if you test it, but with big examples you will, in which the program will first need to take the time of making the new bigass list and them you can iterate over it, but using the itertools version you don't need to make a new list and thus you can work on the data much faster and memory efficient for example sum([10]*(10**16)) will give you a memory error because that list is too big but sum(itertools.repeat(10,10**16)) will work and eventually will give you the answer (in like 60 years or so in my computer but it will get there XD )...
Most of itertools is just clutter. For example, why use "itertools.chain()" rather than "+"? The only things I've used are the various perm and comb methods. But, still good because I like your videos.
I used islice() in a presentation I was giving on continued fractions. This was because I couldn’t be bothered putting a termination condition in the generator functions that were producing the coefficients, so islice() had the job of cutting an infinite series down to a finite length.
You can't use "+" on generators. itertools.chain() works even if its args are generators. In addition, even if the objects being + together are lists, with + you're creating a new list, consuming memory. Not a big deal if your lists are small, but if you have lists of millions of items (something quite common ini genetics), you can run out of memory. Even if memory is enough, the work of copying all elements of the lists into a new list takes time. itertools.chain() is a generator, it does not create a new list, and it does not need to copy items. It just... iterates over the items in their current location.
👷 Join the FREE Code Diagnosis Workshop to help you review code more effectively using my 3-Factor Diagnosis Framework: www.arjancodes.com/diagnosis
Please, more videos like this one!!!
Thank you!!
This is the kind of videos we all need!
Please more of deep diving into python standard library, still a lot of modules that are not used or misused by a growing number of people.
Yes please, this is awesome 😊
I too really like the deeper dive. Being able to use stock standard modules efficiently is a key component of coding.
Thank you!
Very welcome!
My favorite use is in paginated synchronous API calls. I can mask the pagination process from the consumer while iterating across records, allow the consumer to stop before reaching the final page, and in doing so potentially reduce API calls.
It also makes it easy to implement retries and backoffs while maintaining state.
looks interesting - do you have a link to any repo with examples using that?
My first thought would be to do that with generators, but doing it with iter.* is probably cleaner and more robust.
When using lxml.etree.iterparse() with a requests.get(stream=True) document, this is what's happening too. Means that you can just terminate the connection after you've found what you wanted from an API call. Or even better: read the specific data you want without reading the full document in memory because who knows how large it is.
Not something that uses a self-built iter, but a very useful example that I use all the time for some annoying API's that are pretty much all or nothing with their data
Hi Arjan, thanks for the lesson. I never used itertools before, now i now it's capability. thanks for the examples
I also love the deep dives. Please keep them comming!
This is amazing!! Love the python Deep dives.
What a way to break the ice! Thank you 😎 Unless I missed existing content, may I kindly suggest an episode featuring argument/keyword argument unpacking?
16:00 one great use of permutations is to generate input for unit testing
Or you can use pytest, build fixtures out of several already parametrized fixutres and let the framework mix up those parametrization combinations behind the scenes.
You make the best python content I’ve seen on TH-cam, great job
Wow! Thank you
I can do without the coffee, but wanted to tell you that your light/room looks really cool! I really appreciate the care you put in your videos.
BTW: Is that a Canon camera? Colors looks so good,
Thanks so much! I'm using a Sony A7 IV.
These deep dives are awesome! Thank you, Arjan!
You are welcome!
This is a great idea - I used to subscribe to Python Module of the Week and this is a brilliant replacement!
I usually also use more_itertools, really cool for windowing logic
Amazing content! Can't wait for more! :)
Thanks so much!
Nice, I didn't know about starmap. I feel like you should be able to deconstruct tuple arguments in lambdas directly, but this is much better than indexing.
Another one I needed sometimes is islice, which skips elements at the beginning and ends early based on two index-like numbers.
My most used function fom itertools is groupby, to get groups of items wiith the same "identifier". Ofter used in combination with sorted. For example you have a list of tuples and whant to group by all tuples with the same first item.
What I love about iterators is the ability to do lazy sequence operations. This lets you create infinite sequences, zip them, map them, accumulate them etc. This allows for very expressive code that would otherwise be pretty noisy if written in an imperative style
Generators?
Great idea! I'm looking forward to this new series!
Very helpful video... thank you very much 🥰
Glad it was helpful!
Thanks for deep dive into itertools package, keep posting such videos.
Key highlights can be:
1. Difference between iterable and iterator.
2. Hashable types (although hidden in this video context).
3. Itertools functions(liked permutations, combinations, starmap).
Thanks, will do!
Using itertools and iterators for a while.
`more_itertools` is a very useful package built on top of itertools, and in fact is recommended in the itertools docs.
The pairwise function can be very useful when you need an element and its successor in a iterable. It returns the pairs made of consecutive values. For instance : pairwise([1,3,5]) = (1,3), (3,5)
I use it when I need to compute the mean of two consecutive values.
My fav from itertools is product(), because in most cases it eliminates the need for having nested loops
Checked comments to see if someone else was a product fan
How does it work? Never used it before
@@Michallote you can give product any number of iterables and it will give you every combination between these iterables. Lets say you have a tuple (1, 2, 3) and another one with ("a", "b", "c"). And you need every combination so 1a, 1b, 1c, 2a, 2b, 2c ...
A nested loop is fine for two iterables, but lets say you have even more iterables. Nobody wants to see a triple or quintuple nested loop. Better use product(). It yields the values and is also more performant and you dont need nested loops.
@@Michallote It returns an iterator of the Cartesian product of the input iterables/iterators.
That’s my favorite generator function from itertools too!
Also, make tutorials on the internal workings of core python. Maybe like memory management, unusual behaviors of some methods.
Great as always
Thank you for this video! I am still waiting for a series about architecture in video game! Unfortunatelly, there was only video about plugins
@17:10 - itertools can also chain together two different types, such a tuple and a list, into a new list.
Aha now I see the usefullness, I was thinking, why don't I just do list_1 + list_2?
Great video! Can you also do one on generators and corutines?
Yet another great video, Love these deep dives, thanks
When working with numpy ndarrays, does itertools provide a faster way to iterate through the multi-dim elements than to have multiple for loops?
Great video! More of this!
from 6:30 I think (not tested yet), that you can use different iterators to loop a single time em muitliple lists, and avoid using nested loops
Spicy content!! Thank you!
Would you consider working through the other most common standard libraries? This was awesome!
Keep making deeper dive videos to help us learn more about Python
Will do!
itertools is one of my favorite modules of the standard library, so I made my own one extending it starting with the recipes adding to it my own creations or whatever I come across in Stack Overflow or the like...
to my surprise somewhat recently they started recommending the more_itertools external library for said recipes and much more, so I also added it to mine and in the overlap I pick and choose which I like more between my take on it and more_itertools XD
What benefit is there from having iter() on an iterator return a copy of the iterator? It just seems to add confusion
And what is the difference between the starmap( ) function and the simple map( ) function?
starmap uses the * operator to unpack each element into multiple arguments for the given function. In Arjan’s example starmap requires a function that takes two ints as arguments where as map would require a function that takes one argument that is a tuple of ints
@@jacatola aaah ok, thanks. And is there any difference in efficiency, speed or something like that?
@@pablogonzalez7959 In general, itertools.starmap() is slightly more efficient than using map() with a lambda function as it doesn't need to create a lambda function for each item in the input iterable. This can result in a performance improvement when working with large input iterables.
itertools.starmap() implementation uses an iterator internally, so it's memory efficient and faster than map when working on large input iterables.
That said, the difference in performance is likely to be small, and in most cases, the readability and simplicity of the code should be the primary consideration when choosing between map() and itertools.starmap().
It's worth noting that map() is generally faster than list comprehension and generator expressions when the function is predefined and non-trivial, because map() is implemented in C.
In summary, both map() and itertools.starmap() are efficient options for applying a function to each item in an input iterable, but itertools.starmap() is slightly more efficient when working with large input iterables
@@pablogonzalez7959 I don't know the CPython implementation off the top of my head, but I would only expect a difference in a case where the inner tuple is large and the map function only uses one (or a predefined set) of the elements of that tuple. In that situation starmap would result in a complexity of O(n*m) where map would be O(n * 1). The inner iterable is unlikely to be that long and the inner tuple also probably doesn't grow with your dataset, so I wouldn't worry about it. IMO where starmap makes semantic sense it also doesn't impact performance.
@@aflous ok thanks, I'll start using it then
Good stuff! 👍👍👍
Great idea 💡
Thank you! Cheers!
amazing video!
Thanks!
what an amazing video !!
hi , can you tell me how to open terminal in new tab in vs code as you ?
7:21 You can DIRECTLY iterate over the file-object to get each available line!
Btw: "file" is a built-in type! Please don't name variables like that 😉
@@ewerybody in python 2 yes, in python 3 no
8:18 why there is % after greece?
Because in the loop where he uses a print statement in which he sets the value of end="" (empty string) another than the default value which is (
) which eventually leads to this behaviour, where the terminal's output is moved upwards because that's the end position of the terminal output.
10:15 set? or is it a dictionary?
15:45 you don't need the 'perms' object ever. I'd suggest to loop directly over the itertools.permutations like you already did before :)
thank you arjan
Thanks 🙏
You’re welcome 😊
Also few of the functions from itertools could be easily replaced with list (generator) comprehension. I am just wondering about spped of that
thank you!
You're welcome!
I use islice to slice iterables in different ways. E.g.- take n elements is simply it.islice(iterable, n)
You should import Iterator - and other interfaces like Mapping, Sequence, etc.- from collections.abc. The typing versions have been deprecated since Python 3.9.
i find that you can do the same in numpy but more efficiently. So for me i always turn to numpy instead of itertools
what is the diff between iter and gen?
Semantically, iter() exists as a builtin name but gen() does not.
Structurally, generators are a subtype of iterators, supporting not just the, well, iteration through its items, but also the sending and receiving of data to and from other components of the program.
As far as I remember, the generator is the basis upon which asynchronous processes are built in Python.
Cool!
I think you should have mentioned zip_longest, which I have found to be the most useful itertool. The length of the zip is normally the shortest iterator, but longest it's the longest, padding out the the rest of the other iterators with None.
More videos like this one!!!
Seriously, itertools is a 'secret weapon' that makes processing huge amount of data very efficient.
5:25 If iterators weren't iterable, how would for loops run over them? I thought this was the main reason for the "ambiguity" of iterators being iterables, as a sort of casting to a common type
these videos are great. the real question is ... how did you get vscode to give different color brackets to match the level of nesting there ? (i know, silly question to ask given the content)
"editor.bracketPairColorization.enabled": true
I have the same setup in my VScode and for the life of me, I don't know what addon does what. I remember that I googled something like "how to setup VScode for python and django" and I got ideas for addons from the first three or four results.
@@TimLauridsen you rock! much appreciated!!
@@StavrosSachtouris thanks, the other comment above showed how its done in the settings.json file. Cheers!
Just curious and slightly off-topic, but why use vscode compared to pycharm?
sometimes `itertools` is combined with `operator` e.g. `operator.add`, `operator.getitem`, `operator.attrgetter`, `operator.itemgetter` to empower .starmap even more
I have used operator functions with functools.reduce().
Fun fact: reduce() was a built-in function in Python 2, demoted to being part of the functools module in Python 3.
Hey Arjan, i came to know about you recently. Turns out your different from others. Your videos are easy for beginner to understand. As am coming from commerce background, I self taught everything now working as Backend developer for a company. So I need your advice. Should I stick to Backend (python) or full stack developer?
Thanks! I think it's always useful to have some knowledge about frontend development, even as a backend developer. Since the tools we use are changing so quickly, having frontend experience as well helps you maintain a broader view of what's out there.
Nice video overall, but what is the difference between itertools.starmap and map?
Assuming that the distinction between the two is the same with the synonymous methods of multiprocessing.Pool:
- map() with func() and [(1,2), (3,4)] calls func((1,2)) and func((3,4)).
- starmap() calls func(1, 2) and func(3, 4).
Essentially the one applies the function on each item, the other unpacks each item into the arguments of the function.
Looking through my production code: (grep -rI itertools * | grep -v -e venv -e eggs | grep import ) I get a dozen instances of "from itertools import chain".
Looking through my projec euler code (grep -rIh itertools * | grep -v -e venv -e lambda -e eggs | grep import):
from itertools import takewhile
from itertools import accumulate, count, takewhile, dropwhile, islice
from itertools import product, permutations
from itertools import chain
from itertools import takewhile
from itertools import takewhile
So my use cases seem to be:
1. chain for joining lists in production
2. a wider variety of itertools for toy math problems.
Can you explain lexical closure, Python vs Javascript??
Very much a nit-pick, but can you clear your console between examples? I got very confused in your dataclass example why the first print was a "y"
Nice video. I suggest to zoom more into the code when you edit the video. Watching it on a phone screen is difficult
I agree with this, though in not sure how much effort it would take you.
Please make another for "functools" - also a default powerful lib
more-itertools is great too.
Can you do functools?
Great video but iterators are infact a design pattern :) Which Im surprised you didnt mention.
Yeah but readability counts no? I find this chaining of itertools less readable than writing a good ol' function. Maybe there is a point of using itertools for the memory and speed performance?
I came here for Python, I stayed for the knitwear
I believe calling StopIteration an Error is factually incorrect and misleading, because I'm pretty sure it's a subtype of Exception, not Error (which itself is another subtype of Exception).
Correct. It's not an error but an exception.
many of these things could be implemented with standard python syntax like list comprehensions, maps, standard + operators and what have you. Why would you prefer this? I guess you could do partial application, and chain more easily, but readibility and non pythoninan syntax are the down side. Not entirely convinced yet..
"sometimes if you combine these things in very complex ways then its really hard to understand." Have you *seen* iterator abuse in camp Rust?! 😂😂😂
Don't bother with the dual camera setup - we're grown-ups with attention spans measured in minutes rather than TikTok-addled teens (I suspect).
It is a little sad that most of your examples would be just as easily realised with generators. Where itertools really shines is in combining iterables; you only showed 'chain' and permutations/combinations.
Also, working with lists makes it look less useless. The real power of itertools is in dealing with (potentially) infinite iterables.
when you are typing, it looks like you triggered machine gun.
Haha, yes, I speed up those parts to avoid that you have to wait too long while I’m typing.
+
I find myself using consume from more-itertools. It allows you to fast forward through an iterator
Is there an advantage to using itertools.repeat instead of "L = [10] * 4" or itertools.chain instead of "L = [1, 2, 3] + [4, 5, 6]"?
repeat and chain don't use extra memory, unlike L = [10] * 4 witch make a list of size 4, [10,10,10,10], or L = [1, 2, 3] + [4, 5, 6] which make a new list with the content of both [1, 2, 3, 4, 5, 6]...
for basic examples like those you wouldn't notice anything if you test it, but with big examples you will, in which the program will first need to take the time of making the new bigass list and them you can iterate over it, but using the itertools version you don't need to make a new list and thus you can work on the data much faster and memory efficient
for example
sum([10]*(10**16)) will give you a memory error because that list is too big
but
sum(itertools.repeat(10,10**16)) will work and eventually will give you the answer (in like 60 years or so in my computer but it will get there XD )...
@@copperfield42 Good answer. Thanks
Most of itertools is just clutter. For example, why use "itertools.chain()" rather than "+"? The only things I've used are the various perm and comb methods. But, still good because I like your videos.
I used islice() in a presentation I was giving on continued fractions. This was because I couldn’t be bothered putting a termination condition in the generator functions that were producing the coefficients, so islice() had the job of cutting an infinite series down to a finite length.
You can't use "+" on generators. itertools.chain() works even if its args are generators.
In addition, even if the objects being + together are lists, with + you're creating a new list, consuming memory. Not a big deal if your lists are small, but if you have lists of millions of items (something quite common ini genetics), you can run out of memory. Even if memory is enough, the work of copying all elements of the lists into a new list takes time.
itertools.chain() is a generator, it does not create a new list, and it does not need to copy items. It just... iterates over the items in their current location.