ไม่สามารถเล่นวิดีโอนี้
ขออภัยในความไม่สะดวก

A Billion Rows per Second: Metaprogramming Python for Big Data

แชร์
ฝัง
  • เผยแพร่เมื่อ 18 ส.ค. 2024

ความคิดเห็น • 12

  • @supergopi
    @supergopi 7 ปีที่แล้ว +4

    blew my mind! very insightful!

  • @maximilian19931
    @maximilian19931 3 ปีที่แล้ว

    Bigtable by google, or just Sorten String can be used for analytic data(GA is using bigtable as a banking storage.

  • @denniscoder2439
    @denniscoder2439 10 ปีที่แล้ว

    Really helpful,excellent work dude!

  • @zhanxw
    @zhanxw 11 ปีที่แล้ว +1

    To compress the big data matrix, I notice you need to sort it first. How to perform sorting efficiently in this case?

  • @clot8
    @clot8 10 ปีที่แล้ว +1

    hi if new rows added would it be sorted and processed??

  • @Colstonewall
    @Colstonewall 11 ปีที่แล้ว

    Thanks

  • @AnkurVarsheny
    @AnkurVarsheny 10 ปีที่แล้ว +1

    Its no more a database, you are querying pre processed data. What will happen when you are to add new rows in your data.

    • @clot8
      @clot8 10 ปีที่แล้ว

      hmm you got a point but i am assuming new rows get processed and sorted as well??

    • @clot8
      @clot8 10 ปีที่แล้ว

      cartman but it will be inefficient at best?

    • @jfolz
      @jfolz 8 ปีที่แล้ว +1

      +Ankur Varsheny Watch till the end. Your question is answered. The tool he presents is intended for analytics, so live updates are the opposite of what you want. The data needs to be a static snapshot to give comparable results across multiple queries. That's also why warming the caches and taking the shortest runtime is not cheating, because you can expect the analyst to make many consecutive queries that need to access all of the data. Time spent pre-processing the data pays off many times over, because answers is saved later.
      The whole point here is that they built an optimized tool for their domain, so the generic database has no hope of keeping up, even if it's magic. And to top it all off they did it in Python, which everyone and their mom will tell you is the anti-language for high-throughput, number-crunching applications. Only C, Fortran etc. can be fast like that ;)

  • @AlmirBispo-CSV-Comp-DB
    @AlmirBispo-CSV-Comp-DB 6 ปีที่แล้ว +1

    Json is a shit format.I prefer csv (CSV Com DB is the future)