I have no knowledge or time to do benchmarking but, I was using pandas' "append" to combine about 8000 CSV files (about 10 GB in total) and it was taking almost an hour and a half, i decided to try polars, according to stack overflow i could use, concat, vstack, or extend, i randomly chose "vstack", and it did the same workload in less than 1 minute, same computer, same python version, same everything, all i had to do was modify the script a little bit, for example remove "index = False" when exporting the resulting (huge) dataframe to CSV.
I have no knowledge or time to do benchmarking but, I was using pandas' "append" to combine about 8000 CSV files (about 10 GB in total) and it was taking almost an hour and a half, i decided to try polars, according to stack overflow i could use, concat, vstack, or extend, i randomly chose "vstack", and it did the same workload in less than 1 minute, same computer, same python version, same everything, all i had to do was modify the script a little bit, for example remove "index = False" when exporting the resulting (huge) dataframe to CSV.
impressive!
The API is very similar to lpyspark. In fact I don't think it would be a hassle to convert existing pipelines to polars.