Faster and Cheaper Offline Batch Inference with Ray

แชร์
ฝัง
  • เผยแพร่เมื่อ 28 พ.ย. 2024

ความคิดเห็น • 3

  • @Mohith7548
    @Mohith7548 10 หลายเดือนก่อน

    Can you share the full code for the audio batch inference?

  • @AnnerdeJong
    @AnnerdeJong 7 หลายเดือนก่อน

    Considering the 300g ray vs spark comparison (~15m30s-18m30s) - the spark side seems to save all the prediction outputs (`...write...save()`), but I don't see that on the ray side (`for _ in ds.iter_batches(..: pass`). Does ray's 'iter_batches()` automatically dump outputs somewhere? (e.g. when specifying `batch_format='pyarrow'` does it get automatically cached or sth in the ray object store, or sth similar?)
    If not - I'd argue it's not an entirely fair apples-to-apples comparison?

    • @fenderbender28
      @fenderbender28 7 หลายเดือนก่อน +2

      In his code, the spark writer is using format(“noop”) which means it’s not also persisting the outputs anywhere