Pushing Java to the Limits: Processing a Billion Rows in under 2 Seconds by ROY VAN RIJN

แชร์
ฝัง
  • เผยแพร่เมื่อ 14 พ.ค. 2024
  • For updates and more, join our community 👉 / devoxx-united-kingdom
    Last January a challenge was posted online by Gunnar Morling:
    How fast can you parse a file with one billion rows of weather data using Java?
    Little did I know this deceivingly simple question would lead me down a path that taught me all about: parallelism, memory mapped files, SWAR techniques (SIMD as a register), bit twiddling, branchless code, mechanical sympathy, Graal native compilation and finally... I even turned to the dark side: using sun.misc.Unsafe.
    Join me in this deep dive where I'll explain all the code changes and tricks that took me from the reference implementation which processes the billion records in less than 4 minutes, to processing everything in under two seconds.
    Who knew Java could be this fast?
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 16

  • @Anbu_Sampath
    @Anbu_Sampath หลายเดือนก่อน +8

    Crazy engineering effort went in this challenge.

  • @KangoV
    @KangoV หลายเดือนก่อน +6

    We should soon have the "inline" keyword. A huge effort is being performed for this in the JVM so that, for example, an array of inline objects will use contiguous memory. When iterating through, you get huge speedups as you avoid all those cache misses (30x have been seen).

  • @TechTalksWeekly
    @TechTalksWeekly หลายเดือนก่อน +3

    Roy's talk has been featured in the last issue of Tech Talks Weekly newsletter 🎉 Congrats!

  • @northdankota
    @northdankota หลายเดือนก่อน +2

    I think branchless programing fast too because of the CPU cache memroy loads with bulk (from the ram to l1 l2 l3 cache), like load the hole 64 bit block the requsted data around, therefore not only requested data load, the cpu loads after the data and there is a next section of the data, this adds more optimization

  • @JavaCodeShorts
    @JavaCodeShorts 13 วันที่ผ่านมา

    Great Talk!

  • @Dragiux
    @Dragiux หลายเดือนก่อน +1

    As far as I remember constant maths can be performed at compile time (such as static final int foo = 1+1 would be written to the file as static final int foo = 2). Could you trick the compiler to do all of this heavy lifting and only produce the binary that contained final results?

    • @ericnewton5720
      @ericnewton5720 8 วันที่ผ่านมา

      Volkswagen would love to hire you for their diesel division.

  • @kaqqao
    @kaqqao 3 วันที่ผ่านมา +1

    It's disheartening to think that in this day and age performant code still has to look this awful.
    But the talk itself is great 😃

  • @lugburzhr8081
    @lugburzhr8081 8 วันที่ผ่านมา +1

    So to process 1 billion rows in Java in 2 seconds you need to use C\C++. Great job anyway!

  • @bilgehan
    @bilgehan 5 วันที่ผ่านมา

    It was fun and interesting until i saw unsafe. after that it felt meaningless, empty satisfaction imho

    • @ppsps5728
      @ppsps5728 5 วันที่ผ่านมา +1

      Same feeling, like using inline asm in C/C++ 😂

    • @royvanrijn
      @royvanrijn 4 วันที่ผ่านมา

      Why though?
      You don’t *need* unsafe to do all of this, for me it was actually a fun challenge to learn and use it.

  • @notarealperson9709
    @notarealperson9709 หลายเดือนก่อน

    imagine using java in 2024...

    • @MrKar18
      @MrKar18 หลายเดือนก่อน

      Why not? Real persons do use it 😉😂 Pun intended.

    • @MakeItStik
      @MakeItStik หลายเดือนก่อน

      Curious to know what you use ?

    • @kaqqao
      @kaqqao 3 วันที่ผ่านมา +1

      Imagine saying something that brainωormed in any year