maren, to random
@maren@fosstodon.org avatar

I'm happy to announce that I made my first contribution to the library. 🤩 Polars is becoming increasingly popular in the world of data and I can very much recommend checking it out: https://github.com/pola-rs/polars . Big thanks to @marcogorelli for supporting me!

sergi, to random
@sergi@floss.social avatar

Client libraries are better when they have no API: https://csvbase.com/blog/7

swatantra, to datascience
@swatantra@fosstodon.org avatar

Has anyone tried dataframes in #R? How was your experience, especially when working with a large dataset?

@rstats

astronomerritt, to python
@astronomerritt@hachyderm.io avatar

#Python folk: is there any reason to use #Pandas over #Polars?

Also, does anyone with any experience using both (especially for large data frames) want to weigh in on how much better Polars is re: speed and memory?

Please do not reply with something that does not answer either of these questions. Even if you think it's really helpful. Bear in mind you have no idea what I am doing or why and I have asked these specific questions for a reason.

fohrloop, to python
@fohrloop@fosstodon.org avatar

Using or for interactive dashboard, where all data might not fit into memory (need for streaming algorithms / out-of-core computing).

Which one would you suggest? Both seem to be pretty awesome!

https://github.com/pola-rs/polars
https://github.com/duckdb/duckdb

peter_mcmahan, to datascience
@peter_mcmahan@mas.to avatar

It seems like no matter how how fancy the the data science tool (Postgresql, Polars, DuckDB, ...) I always end up with a combination of plain text (CSV/jsonl) and LMDB as the fastest and most practical solution.

I get that for production systems those other tools are great, but for one-off academic data-processing pipelines, plain text and LMDB are the only ones that never choke.

brodriguesco, to python
@brodriguesco@fosstodon.org avatar

So, how come it’s possible to write (in on ) dataset.filter(columnA = "1") if Python doesn’t have NSE? What am I missing or misunderstanding?

kellogh, to opensource
@kellogh@hachyderm.io avatar

is the ideal project, imo. it hits all the important things for me

  • replacing
  • performance engineering
  • integrates with a large open ecosystem instead of creating a walled garden
  • pleasant to use

https://github.com/pola-rs/polars/releases/tag/rs-0.36.2

andrew, to random
@andrew@fediscience.org avatar

Regular PSA that @grrrck 's tidyexplain animations are phenomenal for visualizing what happens with all of {dplyr}'s join functions and {tidyr}'s pivot_wider and pivot_longer (see all of them here: https://www.garrickadenbuie.com/project/tidyexplain/)

Animation showing how left_join combines two datasets

Volker,
@Volker@fosstodon.org avatar

@andrew @grrrck Super useful. Has anyone adapted these for or yet? Might be handy for users.

ChristosArgyrop, to python
@ChristosArgyrop@mstdn.science avatar

Until a truly performant (= fast, low memory footprint) two dimensional storage ("table") type (*) emerges, what are the options for managing big data in #perl?

  1. DBI into a performant DBMS (#clickhouse/ #MariaDB column store/ #duckdb)
  2. shell over #R's data.table or #python's #polars / data.table packages, use files to get data in and some form of IPC to get data out
  3. #PDL , others ?
    (*) this a list of things one could encapsulate as objects for #perltable
    https://duckdb.org/2023/04/14/h2oai.html
    @Perl
datascience, to random

Polars is a lightning fast DataFrame library/in-memory query engine with parallel execution and cache efficiency. And now you can use is with the tidyverse syntax: https://www.tidypolars.etiennebacher.com/

victorp, to python

5 Latest Tools You Should Be Using With Python for Data Science.
🗂️ The article provides insightful details on tools like ConnectorX, DuckDB, Optimus, Polars, and Snakemake which could enhance data wrangling, querying, manipulation, and workflow automation capabilities.

  1. 🧰 ConnectorX: Simplifying the Loading of Data
  2. 🧰 DuckDB: Empowering Analytical Query Workloads
  3. 🧰 Optimus: Streamlining Data Manipulation
  4. 🧰 Polars: Accelerating DataFrames
  5. 🧰 Snakemake: Automating Data Science Workflows

https://www.makeuseof.com/latest-python-data-science-tools/

Stark9837,
@Stark9837@techhub.social avatar
pandas_dev, to random
@pandas_dev@fosstodon.org avatar

Check out Patrick Hoefler's new blog about his experience and :

https://levelup.gitconnected.com/benchmarking-pandas-against-polars-from-a-pandas-pov-554416a863db

It has some nice details about how to optimize pandas code ✨

nurkiewicz, to python
@nurkiewicz@fosstodon.org avatar
  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • tacticalgear
  • DreamBathrooms
  • thenastyranch
  • magazineikmin
  • Durango
  • cubers
  • Youngstown
  • mdbf
  • slotface
  • rosin
  • ngwrru68w68
  • kavyap
  • GTA5RPClips
  • provamag3
  • ethstaker
  • InstantRegret
  • Leos
  • normalnudes
  • everett
  • khanakhh
  • osvaldo12
  • cisconetworking
  • modclub
  • anitta
  • tester
  • megavids
  • lostlight
  • All magazines