Considering the 300g ray vs spark comparison (~15m30s-18m30s) - the spark side seems to save all the prediction outputs (`...write...save()`), but I don't see that on the ray side (`for _ in ds.iter_batches(..: pass`). Does ray's 'iter_batches()` automatically dump outputs somewhere? (e.g. when specifying `batch_format='pyarrow'` does it get automatically cached or sth in the ray object store, or sth similar?) If not - I'd argue it's not an entirely fair apples-to-apples comparison?
Can you share the full code for the audio batch inference?
Considering the 300g ray vs spark comparison (~15m30s-18m30s) - the spark side seems to save all the prediction outputs (`...write...save()`), but I don't see that on the ray side (`for _ in ds.iter_batches(..: pass`). Does ray's 'iter_batches()` automatically dump outputs somewhere? (e.g. when specifying `batch_format='pyarrow'` does it get automatically cached or sth in the ray object store, or sth similar?)
If not - I'd argue it's not an entirely fair apples-to-apples comparison?
In his code, the spark writer is using format(“noop”) which means it’s not also persisting the outputs anywhere