Brett Slatkin - How to Be More Effective with Functions - PyCon 2015

แชร์
ฝัง
  • เผยแพร่เมื่อ 14 ต.ค. 2024

ความคิดเห็น • 54

  • @Zeturic
    @Zeturic 9 ปีที่แล้ว +13

    In many cases, rather than creating a class to enclose the iterator, it is sufficient to provide a function which creates generators (rather than simply using a function that is a generator).
    Then, you could easily get two generators for the same file just by calling it twice.
    Also, I'm confused about why you should use 'iter(foo) is iter(foo)' rather than simply 'foo is iter(foo)' which is arguably clearer, and, based on your description, would work just as well.

    • @rohitbhanot7809
      @rohitbhanot7809 6 ปีที่แล้ว

      Can you explain this with maybe an example "it is sufficient to provide a function which creates generators (rather than simply using a function that is a generator)"

  • @brnddi
    @brnddi 5 ปีที่แล้ว +4

    my right ear loved this talk.

  • @simonmasters3295
    @simonmasters3295 5 ปีที่แล้ว

    So this is about
    (a) Not being sure how big the list being processed is (it may be different at run time to development time)
    (b) Necessarily processing the entire list to get the total population (and the time that takes)
    (c) The possibility of list members changing values, or their order being changed (the CSV being re-written during or between executions)
    At another level it is about the fact that two iterations (the same code) might do different things even though they process the same list, set or whatever. The sentinel idea seems good, BUT if iter takes a long time, inter(1) = iter(2) is a rubbish test even if it works
    The Foo bar example generalises this, the % example is familiar to everyone, what we are touching on here is the fact that the list might contain a million rows, normalise might take ages, and only by knowing the data and testing for outliers and exceptions can you hope to derive intelligence and robustness of which we speak.
    Personally I am liking Python's lists and sets but there is a lot to be said for having an apporach that
    (a) Calculatse the total population once
    (b) Range check it, expecting to find a value consistent with some known historically accurate figure)
    (b) When you want x as a percentage of this total, calculate it on the fly
    What you obviously don't want to do is calculate the state total each time you want the percentage for more than one city.
    In SQL we use a concept of index coverage (where the index expression, TownName, is extended to include the town's popuation and there is no need for the database engine to perform a row lookup)
    I think some non foo bar, and non-trivial, examples would improve the talk but hats off to the presenter for introducing the subject

  • @whoisntwhoisit2126
    @whoisntwhoisit2126 6 ปีที่แล้ว +2

    at around 18:00 he talks about the two methods, list and generator and says generator was less code, but in reality he had to type more after to get the same result (next's), if he added the for loop into the generator to display similarly then it would be almost the same amount of code? Is it better to code the generator to keep looping to get results or the list to just display it? is it all about the memory usage and availability?

    • @Egzvorg
      @Egzvorg 6 ปีที่แล้ว

      Instead of typing next's you usually iterate over the result, that way it does not make any difference in usage. If you do have to pick some specific elements, that would be a reason to use a list instead.

  • @benjaminfranklin4480
    @benjaminfranklin4480 9 ปีที่แล้ว +6

    Can someone explain this line to me :
    total = sum ( ... )
    What does the underscore mean

    • @mmoran0032
      @mmoran0032 8 ปีที่แล้ว +13

      +Benjamin Franklin Since you get both the city name and the population, but you only want the population numbers, you use _ to ignore the city names. You could easily replace it with a named variable, but it is a shorthand way of saying, "I don't need this part of whatever I'm unpacking, so just ignore it." You can also use multiple underscores in a single return, so if something gives back four values but you only care about the third one (for now), you can use _, _, whatIWant, _ = returningFour()

  • @wolfgangblaszczyk6359
    @wolfgangblaszczyk6359 4 ปีที่แล้ว +1

    Helpfull presentation, but the videowindow of presentator overlaps the slides and this is annoying, because it hides some information.

  • @zhenjiangxu
    @zhenjiangxu 8 ปีที่แล้ว +2

    you can use "from collections import Iterator; isinstance(it, Iterator)" to test if an object is an iterator, which is more readable than "iter(a) is iter(a)", IMHO

    • @Homerojay79
      @Homerojay79 8 ปีที่แล้ว

      +zech xu
      I think the point he was trying to make was about exhausted iterators and that iter(x) returns itself if x is an iterator.
      The iter(a) is iter(a) bit was just something to make you curious at the beginning of the talk.
      On the other hand, when you need to write code with performance in mind, sometimes you sacrifice readability. ie:
      python -mtimeit "from collections import Iterator;s = iter([1,2]);isinstance(s, Iterator)"
      100000 loops, best of 3: 10.1 usec per loop
      python -mtimeit "iter([1,2]) is iter([1,2])"
      1000000 loops, best of 3: 1.56 usec per loop

    • @zhenjiangxu
      @zhenjiangxu 8 ปีที่แล้ว +1

      +Nrai Good to know the benchmark. The time difference is probably due to the import.

  • @henrywallace1408
    @henrywallace1408 9 ปีที่แล้ว +3

    Excellent flow of topics in the talk!
    On `iter(x) is iter(x)`: it seems obscure to me. I think it would be clearer to just write `isinstance(x, collections.Container)`.

    • @АндрейБеньковский-ш5к
      @АндрейБеньковский-ш5к 8 ปีที่แล้ว

      +Henry Wallace According to the documentation `collections.Container` is ABC for classes that provide __contains__ method. Checking `isinstance(x, collections.Container)` doesn't tell anything about the behavior `iter(x)`. It's better to check `isinstance(x, collections.Iterator)` or `x is iter(x)`.

    • @copperfield42
      @copperfield42 8 ปีที่แล้ว

      +Henry Wallace in this case you have to check versus collections.Iterator (don't confuse with Iterable) to accomplished the same as iter(x) is iter(x)

    • @Egzvorg
      @Egzvorg 6 ปีที่แล้ว

      It will not accomplish the same thing. isinstance(generator, Iterable) returns True and generator will be exhausted.

  • @AvivCMusic
    @AvivCMusic 9 ปีที่แล้ว

    Interesting talk. A question: do you mean we should *never* pass generators/iterators to functions? Only wrapping iterables?

  • @RobinAndeer
    @RobinAndeer 9 ปีที่แล้ว +8

    Great examples - I learned something new! On the last example I've used "itertools.tee" to create independent iterators to iterate multiple times over a single generator. This way you could support iterables of any kind.
    Reference: docs.python.org/3/library/itertools.html#itertools.tee

    • @BrettSlatkin
      @BrettSlatkin 9 ปีที่แล้ว +3

      Robin Andeer Great!
      But note the big waning label at the bottom of itertoos.tee:
      """
      This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().
      """
      So in the case of having one branch of the tee consume the whole iterator before the other, or iterating it multiple times, you'll end up buffering everything in memory anyways. See the implementation here:
      hg.python.org/cpython/file/04f714765c13/Modules/itertoolsmodule.c#l389

    • @RobinAndeer
      @RobinAndeer 9 ปีที่แล้ว

      Brett Slatkin I understand. Yeah, it's never a good idea to think there's something magical about an implementation. I'll see whether the class example might fit my needs but in my case I'm dealing with a pipeline of generators.

  • @malikrumi1206
    @malikrumi1206 7 ปีที่แล้ว +3

    Isn't there a rule somewhere that you shouldn't use the names of keywords and builtins as variables? I was halfway thru the video totally confused until I realized pop = population, not pop().

  • @Actanonverba01
    @Actanonverba01 8 ปีที่แล้ว +1

    Great ideas for a newbie like myself

  • @fenryrtheshaman
    @fenryrtheshaman 3 ปีที่แล้ว

    ....why raise an exception when you can just turn the iterator into a container and use that?

  • @firstnamelastname7248
    @firstnamelastname7248 7 ปีที่แล้ว

    Most of people including me are unhappy with _iter(a) is iter(a)_ , which are not easily understood. After digging around for a few days, I prefer to use the following tested code.
    if hasattr(a, "__next__"):
    raise TypeError('Exhausted iterable not allowed. If a has __next__, it will be exhausted')
    correct me if I am wrong.

    • @lbbc33
      @lbbc33 5 ปีที่แล้ว +1

      I agree with you. This is a better approach even better than `isinstance(foo, collections.abc.Iterator)` because your solution will be ensured unless Python'd break the backwards compatibility.
      Although your error message its not accurate at all. Even when the generator is already exhausted, it''ll have the '__next__' attribute and not only that, you'll be able to call it again and again getting a `StopIteration` exception all the time. IMHO this a design flaw on Python. What you check with `hasattr` is whether the object is an Iterable or an Iterator.

    • @fenryrtheshaman
      @fenryrtheshaman 3 ปีที่แล้ว

      You can technically define __next__ for a class that isn't really an iterator:
      class foo:
      def __next__(self):
      return 0
      print(hasattr(foo, '__next__'))
      But this is an edge case. Regardless, personally, I'd just unpack the iterator into a container and use that instead of raising an exception.

  • @yzlyzl1
    @yzlyzl1 9 ปีที่แล้ว

    excellent !

  • @SumoCumLoudly
    @SumoCumLoudly 5 ปีที่แล้ว +2

    Weak case for using generators. Use in the extraordinarily rare case you might have massive data, but in general you're usually better off with a list, generators can only be iterated over once as well which rules them out for a lot of cases, they are useful in rare cases, not a replacement for lists.

  • @innstikk
    @innstikk 9 ปีที่แล้ว

    why not log(message, values=[]) rather than log(message, *values). For me *values is dodgy C way of doing it.

    • @Xehanort94Ger
      @Xehanort94Ger 9 ปีที่แล้ว

      +innstikk So you don't have to write the square brackets

    • @rafehqazi8539
      @rafehqazi8539 9 ปีที่แล้ว

      +Xehanort94Ger you don't have to write the square brackets if you do values=[]

    • @Xehanort94Ger
      @Xehanort94Ger 9 ปีที่แล้ว

      Rafeh Qazi you will have to if you want to pass actual parameters that are not already in a list

    • @rafehqazi8539
      @rafehqazi8539 9 ปีที่แล้ว

      +Xehanort94Ger
      >>>def foo(x=1, y=[]):
      ... return None
      >>> foo()

    • @Xehanort94Ger
      @Xehanort94Ger 9 ปีที่แล้ว

      Rafeh Qazi Try calling it with at least two parameters and you will need the brackets (Well in your case use three parameters, because your function does not actually use the y in any list specific manner)

  • @SumoCumLoudly
    @SumoCumLoudly 5 ปีที่แล้ว +2

    So he created a problem by needlessly using a generator instead of a list to save 2 lines in a function, then solves the problem by turning the function into a class.
    Horrific advice.

    • @WeiqiSub
      @WeiqiSub 5 ปีที่แล้ว

      “Needless”. You obviously don’t get to deal with a lot of data.

    • @SumoCumLoudly
      @SumoCumLoudly 5 ปีที่แล้ว

      @@WeiqiSub I do but I only use python to get the data, I'm using c++ to work with the databases. In general, it's very rare that your ram is in danger, advocating using them over lists as standard is bad advice imo, as shown by the needless creation of a problem in the example.