kellogh,
@kellogh@hachyderm.io avatar

is the ideal project, imo. it hits all the important things for me

  • replacing
  • performance engineering
  • integrates with a large open ecosystem instead of creating a walled garden
  • pleasant to use

https://github.com/pola-rs/polars/releases/tag/rs-0.36.2

fohrloop,

@kellogh Interesting. I've never tried polars. What did make you to switch from pandas to polars?

kellogh,
@kellogh@hachyderm.io avatar

@fohrloop the first thing? idk i think i had to process a huge JSON file with logs. it took like 20 seconds for pandas to process it, so i tried polars instead. the code was much cleaner, and the latency wasn’t even noticeable, like 0.1 seconds. not even worth caching

matzipan,
@matzipan@hachyderm.io avatar

@kellogh I'm a rust fan but I found polars to not be very nice to use. After wasting a few hours with it I eventually gave up and went back to python pandas and finished the task in 5 minutes... Maybe it's muscle memory from pandas that I wasn't able to transition successfully to polars, but it just didn't go smoothly

mo8it,
@mo8it@fosstodon.org avatar

@matzipan @kellogh This was my experience at the beginning too. A new API requires some time to get used to before reaching a good productivity level.

kubikpixel,
@kubikpixel@chaos.social avatar

deleted_by_author

  • Loading...
  • kellogh,
    @kellogh@hachyderm.io avatar

    @kubikpixel @mo8it @matzipan I find Pandas to be generally quite slow and Polars to be blazing fast. So I tend to use Polars to read & manipulate data, and then convert it to a Pandas dataframe for visualization and integration with ML libraries. I'm used to Spark's data frames, so while Polars is different, I find it easy to grok. It's also so stinking fast that you don't even need to bother with caching

    matzipan,
    @matzipan@hachyderm.io avatar

    @kellogh @kubikpixel @mo8it have you tried using polars from python? Is there a speed penalty there as well?

    kellogh,
    @kellogh@hachyderm.io avatar

    @matzipan @kubikpixel @mo8it yeah, that's how i use it. As with all python, there's always a speed penalty, and the way around it is to stay out of Python-land. Polars does a much better job of this than Pandas because it's lazy API ensures that it stays in Rust land while processing data. So in practice, there's hardly any performance penalty since you're rally just using Python to configure a highly optimized data processing pipeline

    matzipan,
    @matzipan@hachyderm.io avatar

    @kellogh @kubikpixel @mo8it ah okay my problem was from using polars in rust. Polars in python did not seem too bad

  • All
  • Subscribed
  • Moderated
  • Favorites
  • opensource
  • ngwrru68w68
  • DreamBathrooms
  • modclub
  • GTA5RPClips
  • InstantRegret
  • magazineikmin
  • Youngstown
  • thenastyranch
  • rosin
  • slotface
  • mdbf
  • khanakhh
  • kavyap
  • anitta
  • tester
  • normalnudes
  • Leos
  • cisconetworking
  • osvaldo12
  • everett
  • Durango
  • tacticalgear
  • provamag3
  • megavids
  • ethstaker
  • cubers
  • JUstTest
  • lostlight
  • All magazines