ramikrispin, to datascience
@ramikrispin@mstdn.social avatar

Are you curious how and why the docker engine is using layers during the build time? Updated the new Docker 🐳 and Python 🐍 tutorial with a short explanation about the image layers ⬇️

https://github.com/RamiKrispin/vscode-python#the-image-layers

rpodcast, to datascience
@rpodcast@podcastindex.social avatar

Episode 128 of the @rstats @rweekly Highlights Podcast is out now! https://podverse.fm/episode/XgMBQEZeW

📄 Converting a Word table to markdown @mattdray
🌉 Sharing your model with others (Matt Kaye)
🐜 Debug package builds with local containers @statnmap

Grab yourself a new podcast app like @podverse or @merryoscar's Fountain for an easy way to interact with the show and your hosts! https://newpodcastapps.com

h/t @mike_thomas @_ColinFay 🙏

datascience, to datascience

Overview of statistical concepts in datascience: could be usefull in case you need some preformulated text to share with stakeholder... https://towardsdatascience.com/ultimate-guide-to-statistics-for-data-science-a3d8f1fd69a7

errantscience, to datascience

Data management is always something you should plan out at the start of the project... but no one ever does

stemcoding, to datascience
@stemcoding@mastodon.social avatar

I got to attend the Data Science Education Community of Practice (which is connected to the American Physical Society) conference this week at the University of Maryland. I learned a lot! @edutooters

erinmikail, to opensource
@erinmikail@mastodon.social avatar

Who are the coolest humans I know streaming or building cool collaborative demos in or

Asking for me - trying to make goals for Q3 and coming up with ideas on who to collaborate with for my day job at

Boosts appreciated! TY in advance.

pyOpenSci, to python
@pyOpenSci@fosstodon.org avatar

hey / biomedical friends! we are looking for a second reviewer to review biocypher - a package that provides a framework for creating biomedical knowledge graphs - the wonderful @arianemsasso is editor for this one. please repost!!

ramikrispin, to opensource
@ramikrispin@mstdn.social avatar

Prompt engineering with Pezzo 🚀👇🏼

Pezzo is an platform that provides tools for prompt engineering at scale. That includes features for managing prompts, creating workflows, deploying, and observability.

It supports prompts with tools such as , , AI21labs, etc., and is based on node.js and runs on .

License: Apache 2.0 🦄

Resources 📚
Source code: https://github.com/pezzolabs/pezzo
Documentation: https://docs.pezzo.ai/docs/intro

brodriguesco, to datascience
@brodriguesco@fosstodon.org avatar

Si vous avez pas pu assister au vous trouverez la présentation ici http://is.gd/repro_avignon
si vous avez apprécié, jetez un œil sur mon livre qui détaille le tout! https://raps-with-r.dev
aussi dispo sur leanpub https://leanpub.com/raps-with-r/ et Amazon! https://amazon.fr/dp/B0C87H6MGF

oldmapgallery, to maps
@oldmapgallery@sciencemastodon.com avatar

About fifty years after Smith's seminal map for the Geology of Britain and we see numerous 19th century British publishers including some version of it into their atlases, such as this from c. 1860 by Bradbury - Agnew. Hand colored to distinguish the strata.

Still a relatively young science at the time, but clearly making an impact, and unveiling an important key to understanding our world.

stevelord, to climate
@stevelord@bladerunner.social avatar

Learning me a on the tiny X230 to try and sleep at night. Processing 83 years of ERA5 2m temperature data from CDS:

A plot of the earth using Robinson projection showing the mean 2m temperature for Feb 2023.

Cyberkid1987, to tech Greek

The 15 Key Responsibilities of a Data Scientist

meresar_math, to datascience

I didn't get an academic job (hence my radio silence: I've mostly been moping 😔 and brushing up on my python/R) and am currently looking at a pivot to data science/industry in general. I've got plans for learning the stuff I need to learn and I have a couple of bootcamp-y things lined up so I can get some more industry contacts and a line on the resume that looks vaguely like relevant experience.

If anyone has advice about this sort of thing, I'd love to chat!

ngaylinn, to programming

Looks like I'm gonna have to learn to code in R for analyzing experiment data from simulated evolution and real world cell cultures.

R for Data Science looks like a good book on the topic, but any other recommendations I should check out?

ian, (edited ) to random
@ian@phpc.social avatar

"If you can't explain it to a six-year-old, you don't understand it."

  • Richard Feynman
  • Gary Short

ian, to random
@ian@phpc.social avatar

The ability to estimate when you have no data is a superpower.

It's easier to get to a range for this than an exact number.

Start with a minimum and maximum below/above which you'd be genuinely surprised if the value is outside the bounds.

Then split the probability in half (not the range; not normal). That split is ur predicted median.

Then do the same between lower bound and median for Q1. Then for median and upper bound for Q3.

Then use sampling to get an average.

ian, to random
@ian@phpc.social avatar

Bayes was like John Skeet (not doing full-time the thing he's known for) but for creating founding principles of ML

Bayesian statistics forces you to state your position (and bias) before you start...and then understand how new data changes that position.

ian, to random
@ian@phpc.social avatar

Rule of : be your own best competitor. If your ML model is your key differentiator, once you ship it start working on a new and improved version.

ian, to random
@ian@phpc.social avatar

Rule of : if it doesn't work in production it doesn't exist.

ian, to random
@ian@phpc.social avatar

K-Fold Cross Validation: re-split your training vs. test data a bunch of times (usually 5x) to see whether your model is valid or whether you wound up in one of the naturally-occurring clusters in random data.

ian, to random
@ian@phpc.social avatar

Only use neural networks when, after cleaning the data up, you have more than 250 variables.

Issue is, neural nets can't be explained fully, so at least in the UK you can't use neural nets because you can't explain the algorithm.

There's also the issue of "randomly jiggling the algorithm" that neural nets use to avoid local maxima.

Just like blockchain, if you think you need to use a neural net, no you don't.

ian, to random
@ian@phpc.social avatar

If your model is 0% sure or 100% sure of a thing, you did something very wrong. Split your test and training data (usually 80/20).

ian, (edited ) to random
@ian@phpc.social avatar

Have a favorite data science model, but try that in competition with another model.

Linear equations normally work well because you're either dealing with people or things that depend on people. But also check against a nonlinear model, and pick which one models the data better.

ian, (edited ) to random
@ian@phpc.social avatar

You can't add apples and oranges; match units on your calculations when doing data science. Normalize your units like you're back in grade school.

ian, (edited ) to random
@ian@phpc.social avatar

Exploratory data analysis

  • Understand the variables
  • Handle missing values (in a documented, well-explained way)
  • Outlier detection
  • Univariate analysis
  • Bivariate analysis

If you have two highly correlated variables, pick the one that provides more information and discard the other/noisier one.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • GTA5RPClips
  • DreamBathrooms
  • khanakhh
  • ngwrru68w68
  • Youngstown
  • magazineikmin
  • mdbf
  • slotface
  • thenastyranch
  • rosin
  • kavyap
  • tacticalgear
  • tester
  • provamag3
  • osvaldo12
  • cubers
  • ethstaker
  • everett
  • Durango
  • InstantRegret
  • Leos
  • normalnudes
  • modclub
  • anitta
  • cisconetworking
  • megavids
  • lostlight
  • All magazines