I'm happy to announce that I made my first contribution to the #polars library. 🤩 Polars is becoming increasingly popular in the world of data and I can very much recommend checking it out: https://github.com/pola-rs/polars . Big thanks to @marcogorelli for supporting me! #womeninfoss
Also, does anyone with any experience using both (especially for large data frames) want to weigh in on how much better Polars is re: speed and memory?
Please do not reply with something that does not answer either of these questions. Even if you think it's really helpful. Bear in mind you have no idea what I am doing or why and I have asked these specific questions for a reason.
It seems like no matter how how fancy the the data science tool (Postgresql, Polars, DuckDB, ...) I always end up with a combination of plain text (CSV/jsonl) and LMDB as the fastest and most practical solution.
I get that for production systems those other tools are great, but for one-off academic data-processing pipelines, plain text and LMDB are the only ones that never choke.
So, how come it’s possible to write (in #Polars on #Python) dataset.filter(columnA = "1") if Python doesn’t have NSE? What am I missing or misunderstanding?
Until a truly performant (= fast, low memory footprint) two dimensional storage ("table") type (*) emerges, what are the options for managing big data in #perl?
5 Latest Tools You Should Be Using With Python for Data Science.
🗂️ The article provides insightful details on tools like ConnectorX, DuckDB, Optimus, Polars, and Snakemake which could enhance data wrangling, querying, manipulation, and workflow automation capabilities.