Data people: what are some signs that your data analysis project is actually a piece of software (with all the associated accoutrements -- requirements gathering, design, testing, packaging, etc)?
Or, looking at it another way, what are some traits of a software project that differentiate it from a purely data science project (if such a distinction exists)?
We are excited to announce that Wes McKinney has joined Posit!
When we changed our name to Posit, our goal was to unify efforts around creating great tools for #datascience, regardless of language, and working with Wes is a huge step forward in realizing that dream.
(1/7)There is no better way for me to summarise the year than my Github account and my Git commits 😎
In 2023, I had more than 2500 commits, most related to project automation with Github Actions ❤️. Most of my personal projects during 2023 were related to tutorials and open-source projects. Here are the main highlights 🧶🧵👇🏼
(5/7)
This year, I also retired two major open-source projects 👋🏼:
➡️ TSstudio - my first open-source project ❤️, R package for descriptive and predictive analysis of time series data 👇🏼
🔗 https://github.com/RamiKrispin/TSstudio
➡️ Coronavirus - R package provides a tidy format for the COVID-19 dataset collected by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University.
🔗 https://github.com/RamiKrispin/coronavirus
Wondering what FORTRAN, Excel, @ProjectJupyter and @kedro have in common? Come to my talk "Data Science in production: Crossing the chasm" and you'll find out 😉
Introducing BIDEN: Binary Inference Dictionaries for Electoral NLP ⚡️ Using only compression, I demonstrate a method of binary partisan classification for campaign emails and other written political materials. This method is FAST; I train the model in about 30 seconds on a CPU, and run inference in milliseconds. No GPUs. No Neural Networks. No N-grams. No transformers. No kNN. I learned a lot!
Upcoming event!
What's New In Tidymodels with @emilhvitfeldt
The @RUGatHDSI will be hosting this event on Thursday at 5pm Eastern Time.
"The tidymodels framework is a collection of packages for modeling and machine learning using tidyverse principles. This talk will touch on a number of new additions and in-process work being done by the team."
What’s the most interesting and/or thought provoking thing you’ve read or watched on the topic of learning data science, coding, or analytics WITH assistance from any AI tool (ChatGPT, Copilot, whatever)?
Compare traditional lm() with robust rlm() using a dataset. Blue vs. red residuals visually unveil how each model handles outliers. Dive in, experiment with your data, and empower your coding journey! 💻
What’s the most interesting and/or thought provoking thing you’ve read or watched on the topic of learning data science, coding, or analytics WITH assistance from any AI tool (ChatGPT, Copilot, whatever)?
The timetk, one of the main R packages for time series analysis and forecasting ❤️, by Matt Dancho, is now available in Python 🐍. The package provides a variety of tools for working with time series data and analyzing it. The Python version leverages pandas for processing time series data and plotly for visualization.
Do you consider yourself a data scientist of any variety? Maybe it makes up a bit of your job, maybe it’s all of it, but if you went to school in the UK and can spare quarter of an hour to reflect on a few things it would be hugely appreciated.