#DataScience - kbin.social

ramikrispin, 10 months ago to datascience

Are you curious how and why the docker engine is using layers during the build time? Updated the new Docker 🐳 and Python 🐍 tutorial with a short explanation about the image layers ⬇️

https://github.com/RamiKrispin/vscode-python#the-image-layers

#datascience #docker #cicd #python #cicd

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

rpodcast, 10 months ago to datascience

Episode 128 of the @rstats @rweekly Highlights Podcast is out now! https://podverse.fm/episode/XgMBQEZeW

📄 Converting a Word table to markdown @mattdray
🌉 Sharing your model with others (Matt Kaye)
🐜 Debug package builds with local containers @statnmap

Grab yourself a new podcast app like @podverse or @merryoscar's Fountain for an easy way to interact with the show and your hosts! https://newpodcastapps.com

h/t @mike_thomas @_ColinFay 🙏

#RStats #DataScience #V4V

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ statnmap

datascience, 10 months ago to datascience

Overview of statistical concepts in datascience: could be usefull in case you need some preformulated text to share with stakeholder... https://towardsdatascience.com/ultimate-guide-to-statistics-for-data-science-a3d8f1fd69a7 #datascience

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

errantscience, 10 months ago to datascience

Data management is always something you should plan out at the start of the project... but no one ever does #DataManagement #DataScience

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

stemcoding, 10 months ago to datascience

I got to attend the Data Science Education Community of Practice (which is connected to the American Physical Society) conference this week at the University of Maryland. I learned a lot! #datascience #iteachphysics @edutooters

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

erinmikail, 10 months ago to opensource

Who are the coolest humans I know streaming or building cool collaborative demos in #OpenSource #MachineLearning #DataScience or #GenerativeAI

Asking for me - trying to make goals for Q3 and coming up with ideas on who to collaborate with for my day job at #LabelStudio

Boosts appreciated! TY in advance.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ andypiper

pyOpenSci, 10 months ago to python

hey #genomics / biomedical friends! we are looking for a second reviewer to review biocypher - a #python package that provides a framework for creating biomedical knowledge graphs - the wonderful @arianemsasso is editor for this one. please repost!! #science #datascience #openscience

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ramikrispin, 11 months ago to opensource

Prompt engineering with Pezzo 🚀👇🏼

Pezzo is an #opensource platform that provides tools for prompt engineering at scale. That includes features for managing prompts, creating workflows, deploying, and observability.

It supports prompts with tools such as #chatGPT, #LangChain, AI21labs, etc., and is based on node.js and runs on #Docker.

License: Apache 2.0 🦄

Resources 📚
Source code: https://github.com/pezzolabs/pezzo
Documentation: https://docs.pezzo.ai/docs/intro

#promptengineering #nlp #llm #datascience #ai

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

brodriguesco, 11 months ago to datascience

Si vous avez pas pu assister au #RR2023 vous trouverez la présentation ici http://is.gd/repro_avignon
si vous avez apprécié, jetez un œil sur mon livre qui détaille le tout! https://raps-with-r.dev
aussi dispo sur leanpub https://leanpub.com/raps-with-r/ et Amazon! https://amazon.fr/dp/B0C87H6MGF

#RStats #datascience

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ statnmap

oldmapgallery, 11 months ago to maps

About fifty years after Smith's seminal map for the Geology of Britain and we see numerous 19th century British publishers including some version of it into their atlases, such as this from c. 1860 by Bradbury - Agnew. Hand colored to distinguish the strata.

Still a relatively young science at the time, but clearly making an impact, and unveiling an important key to understanding our world.

#maps #map #geology #history #dataviz #datascience #science #sciencehistory

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ bibliolater

stevelord, 11 months ago to climate

Learning me a #climate #datascience on the tiny X230 to try and sleep at night. Processing 83 years of ERA5 2m temperature data from CDS:

A plot of the earth using Robinson projection showing the mean 2m temperature for Feb 2023.

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

Cyberkid1987, 11 months ago to tech Greek

The 15 Key Responsibilities of a Data Scientist

#tech #DataAnalytics #DataScience #Python #cybersecurity

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Jigsaw_You, mmitchell_ai

meresar_math, 11 months ago to datascience

I didn't get an academic job (hence my radio silence: I've mostly been moping 😔 and brushing up on my python/R) and am currently looking at a pivot to data science/industry in general. I've got plans for learning the stuff I need to learn and I have a couple of bootcamp-y things lined up so I can get some more industry contacts and a line on the resume that looks vaguely like relevant experience.

If anyone has advice about this sort of thing, I'd love to chat!

#DataScience #MachineLearning #AltAc

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ erinmikail

ngaylinn, 11 months ago to programming

Looks like I'm gonna have to learn to code in R for analyzing experiment data from simulated evolution and real world cell cultures.

R for Data Science looks like a good book on the topic, but any other recommendations I should check out?
#programming #datascience

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

ian, 11 months ago (edited 11 months ago) to random

"If you can't explain it to a six-year-old, you don't understand it."

Richard Feynman

Gary Short

#kcdc #datascience

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Schrank

ian, 11 months ago to random

The ability to estimate when you have no data is a #datascience superpower.

It's easier to get to a range for this than an exact number.

Start with a minimum and maximum below/above which you'd be genuinely surprised if the value is outside the bounds.

Then split the probability in half (not the range; not normal). That split is ur predicted median.

Then do the same between lower bound and median for Q1. Then for median and upper bound for Q3.

Then use sampling to get an average.

#kcdc

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

ian, 11 months ago to random

Bayes was like John Skeet (not doing full-time the thing he's known for) but for creating founding principles of ML #datascience #kcdc

Bayesian statistics forces you to state your position (and bias) before you start...and then understand how new data changes that position.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ian, 11 months ago to random

Rule #12 of #datascience: be your own best competitor. If your ML model is your key differentiator, once you ship it start working on a new and improved version. #kcdc

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ian, 11 months ago to random

Rule #11 of #datascience: if it doesn't work in production it doesn't exist. #kcdc

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ian, 11 months ago to random

K-Fold Cross Validation: re-split your training vs. test data a bunch of times (usually 5x) to see whether your model is valid or whether you wound up in one of the naturally-occurring clusters in random data. #datascience #kcdc

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ian, 11 months ago to random

Only use neural networks when, after cleaning the data up, you have more than 250 variables.

Issue is, neural nets can't be explained fully, so at least in the UK you can't use neural nets because you can't explain the algorithm.

There's also the issue of "randomly jiggling the algorithm" that neural nets use to avoid local maxima.

Just like blockchain, if you think you need to use a neural net, no you don't.

#kcdc #datascience

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ian, 11 months ago to random

If your model is 0% sure or 100% sure of a thing, you did something very wrong. Split your test and training data (usually 80/20).

#kcdc #datascience

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ian, 11 months ago (edited 11 months ago) to random

Have a favorite data science model, but try that in competition with another model.

Linear equations normally work well because you're either dealing with people or things that depend on people. But also check against a nonlinear model, and pick which one models the data better.

#kcdc #datascience

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ian, 11 months ago (edited 11 months ago) to random

You can't add apples and oranges; match units on your calculations when doing data science. Normalize your units like you're back in grade school. #kcdc #datascience

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ian, 11 months ago (edited 11 months ago) to random

Exploratory data analysis

Understand the variables

Handle missing values (in a documented, well-explained way)

Outlier detection

Univariate analysis

Bivariate analysis

If you have two highly correlated variables, pick the one that provides more information and discard the other/noisier one.

#kcdc #datascience

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...