Jake VanderPlas - Performance Python: Seven Strategies for Optimizing Your Numerical Code

แชร์
ฝัง
  • เผยแพร่เมื่อ 28 พ.ย. 2024

ความคิดเห็น • 12

  • @jfr9964
    @jfr9964 5 ปีที่แล้ว +14

    So I am working on analysing a chaotic system, and I needed to compute a large number of trajectories for different initial conditions. Even with multiprocessing, and all the optimization I could think of, it was taking forever. Then I watched this video, promptly added 3 lines of code: import numba, and two of the decorators, and got a speed-up by a factor of 20. Words cannot express how grateful I am for this video.

  • @thinkmichaelthink
    @thinkmichaelthink 6 ปีที่แล้ว +20

    The seven strategies are:
    1. Line profiling (4:37)
    2. NumPy (5:33)
    3. Specialized Data Structures (9:10)
    4. Cython (12:07)
    5. Numba (13:47)
    6. Dask (15:23)
    7. Find an Existing Implementation (18:56)

    • @maciejurbanski6146
      @maciejurbanski6146 5 ปีที่แล้ว

      now to make this list closer to being complete:
      PyPy - JIT like Numba for all your code, but at cost of compatibility
      Pytorch - think numpy but GPU-based (arrays -> tensors)

  • @FinallyAFreeUsername
    @FinallyAFreeUsername 6 ปีที่แล้ว +6

    Another great JVDP talk.

  • @ErickMuzartFonsecadosSantos
    @ErickMuzartFonsecadosSantos 6 ปีที่แล้ว +9

    I was expecting one of the strategies to numerical optimization in python to be using GPU execution, through pytorch/cuda, for example. Any comment on that?

    • @ErickMuzartFonsecadosSantos
      @ErickMuzartFonsecadosSantos 6 ปีที่แล้ว +8

      Ray Donnelly, executing numerical calculations on GPU goes beyond the use cases of machine learning. Take a look at general purpose programming on GPUs: en.m.wikipedia.org/wiki/General-purpose_computing_on_graphics_processing_units
      An obvious use case of this approach would be to offload to the GPU compute intensive calculations, such as matrix multiplications, thus improving performance of python code. Pytorch provides functionality similar to numpy and so could be a viable alternative for "optimizing your numerical code".
      I would have liked feedback on this approach, benchmark references or just experiences porting, say, numba code to pytorch.

    • @JJ-xi2vp
      @JJ-xi2vp 6 ปีที่แล้ว

      I was also a bit disappointed. Although there is another good talk about CuPy which aims to be a numpy for the gpu. Which sounds great.

  • @yinzhangfred
    @yinzhangfred 5 ปีที่แล้ว +2

    I am a bit surprised that he didn't mention pypy, which can be fast not only for numerical computation, but for string operations as well. But then, you would have to point out that it is not yet very compatible with pandas/numpy.

    • @magno5157
      @magno5157 5 ปีที่แล้ว

      Can Pypy compete with the speed of Numpy and Pandas? Or is Pypy slower?

    • @yinzhangfred
      @yinzhangfred 5 ปีที่แล้ว

      pypy vs numpy are for different things. If your data fits into RAM and are in an array/ data frame format, then I'd say go for numpy or pandas (I personally use pandas, though it uses about 4X RAM of your data size). On the other hand, if you write 'vanilla' python or need to process your data line by line (for whatever reason), pypy is good. It can speed up even string operations while Numba can not. @@magno5157

    • @magno5157
      @magno5157 5 ปีที่แล้ว

      ​@@yinzhangfred You've just made me curious. Could you give me examples of numerical computations that do not require arrays and data frames?

    • @yinzhangfred
      @yinzhangfred 5 ปีที่แล้ว

      @@magno5157 Ah, you are right, I did not read the title of the talk carefully, where it reads "optimizing your NUMERICAL code". The examples I have are not quite "numerical". For example, I use pypy to turn a large structured log file to csv. If I didn't come across pypy, I would have to deal with c.