ramikrispin, to datascience
@ramikrispin@mstdn.social avatar
telescoper.blog, to ai
@telescoper.blog@telescoper.blog avatar

Before I head off on a trip to various parts of not-Barcelona, I thought I’d share a somewhat provocative paper by David Hogg and Soledad Villar. In my capacity as journal editor over the past few years I’ve noticed that there has been a phenomenal increase in astrophysics papers discussing applications of various forms of Machine Leaning (ML). This paper looks into issues around the use of ML not just in astrophysics but elsewhere in the natural sciences.

The abstract reads:

Machine learning (ML) methods are having a huge impact across all of the sciences. However, ML has a strong ontology – in which only the data exist – and a strong epistemology – in which a model is considered good if it performs well on held-out training data. These philosophies are in strong conflict with both standard practices and key philosophies in the natural sciences. Here, we identify some locations for ML in the natural sciences at which the ontology and epistemology are valuable. For example, when an expressive machine learning model is used in a causal inference to represent the effects of confounders, such as foregrounds, backgrounds, or instrument calibration parameters, the model capacity and loose philosophy of ML can make the results more trustworthy. We also show that there are contexts in which the introduction of ML introduces strong, unwanted statistical biases. For one, when ML models are used to emulate physical (or first-principles) simulations, they introduce strong confirmation biases. For another, when expressive regressions are used to label datasets, those labels cannot be used in downstream joint or ensemble analyses without taking on uncontrolled biases. The question in the title is being asked of all of the natural sciences; that is, we are calling on the scientific communities to take a step back and consider the role and value of ML in their fields; the (partial) answers we give here come from the particular perspective of physics


P.S. The answer to the question posed in the title is probably “yes”.


unicornCoder, to datascience
@unicornCoder@fosstodon.org avatar

some plotting of Canadian sales by cannabis type

seems like Canadian 🍁 love the dried flower

plot 2: Canadian sales of cannabis by cannabis type for year 2022/2023, with dried cannabis having sales of $3,026,970

CSVCONF, to datascience
@CSVCONF@mastodon.social avatar

📣 Our second keynote speaker of the day is @tracykteal who's reflecting on leadership

ramikrispin, to datascience
@ramikrispin@mstdn.social avatar

Gradient Descent Visualization 👇🏼

I was looking for examples of interactive data visualization for a gradient descent algorithm, and I found this app by Lili Jiang. This desktop app is based on C++ and enables simulation and visualization of different gradient descent algorithms, such as momentum, AdaGrad, RMSProp, and Adam. The app enables to compare different methods simultaneously.


Image credit: App repository

#DataScience #MachineLearning


mia, to datascience
@mia@hcommons.social avatar

James Baker on Bluesky: 'From 2025/26 University of Southampton DH will be running a new MSc in Digital Humanities (Data Science). Huge thanks to all my colleagues who've helped turn our little idea into a reality, especially the amazing
Lexi Webster https://www.southampton.ac.uk/courses/digital-humanities-data-science-masters-msc

And look out for a Humanities Data Science Lectureship we'll be advertising in the Autumn to lead this programme (ask me if you are interested in knowing more before the ad comes out).'

rhazn, to datascience
@rhazn@mas.to avatar

Interesting use of for long term trends and great use of web tech to communicate that, recommended reading: Gamers Have Become Less Interested in Strategic Thinking and Planning https://quanticfoundry.com/2024/05/21/strategy-decline/

rpodcast, to datascience
@rpodcast@podcastindex.social avatar

Back from a small break, episode 166 of the @rstats @rweekly Highlights podcast is out! https://serve.podhome.fm/episodepage/r-weekly-highlights/issue-2024-w22-highlights

📦 Generalizing OOP in R core @R_Foundation
🏨 Visualizing overture map buildings data @kyle_e_walker
📝 Refactoring a test file (part 2) @maelle

You'll see awesome new features in our show like custom chapter images and boosting directly to your hosts with a modern podcast app available at https://newpodcastapps.com/

h/t @jonocarroll & @mike_thomas 🙏

leanpub, to datascience
@leanpub@mastodon.social avatar

Interpretable Machine Learning (Second Edition): A Guide for Making Black Box Models Explainable https://leanpub.com/interpretable-machine-learning by Christoph Molnar is the featured book on the Leanpub homepage! https://leanpub.com

stevensanderson, to Excel
@stevensanderson@mstdn.social avatar

📊 Enhance Your Excel Skills with R! 📊 I wrote an R function using RDCOMClient to count sheets in an Excel workbook.

This tool automates Excel tasks, boosting productivity. Learn more techniques like this in my new book co-authored with David Kun: "Extending Excel with Python and R." Discover practical tips to enhance your data analysis skills. Get your copy here: https://packt.link/oTyZJ

#Excel #RProgramming #Python #DataScience #Automation #Programming #Coding

news, to ai
@news@mastodon.toptechtidbits.com avatar

AI-Weekly for Tuesday, May 28, 2024 - Issue 114

The Week's News in Artificial Intelligence
A Mind Vault Solutions, Ltd. Publication

Subscribers: 20,974 Opt-In Subscribers were sent this issue via email.

datasciencejobscanada, to vancouver
@datasciencejobscanada@mastodon.social avatar
ramikrispin, to python
@ramikrispin@mstdn.social avatar

Open your calendar, NumPy 2.0 is going to be out on June 16th 🚀

This is the first major release since 2006. The release includes breaking changes in the library API, and therefore, if you are planing to adopt it, some code refactoring may required.

The release includes new features, performance improvement 🏎️, improvements on the C API, and more.

More details are available on the release notes: https://numpy.org/devdocs/release/2.0.0-notes.html

#python #data #datascience #machinelearning

Posit, to datascience
@Posit@fosstodon.org avatar

Great News! The table contest deadline has been extended to June 14, 2024.

The Table Contest is a great way to show off your data storytelling skills with the community.

Extension - We re-announced the table contest at PyCon last week, and wanted to give everyone an extra couple of weeks.

⬡ Deadline for submissions is now June 14, 2024.
⬡ Submit at pos.it/table-contest
Learn More: https://posit.co/blog/announcing-the-2024-table-contest/


ramikrispin, to datascience
@ramikrispin@mstdn.social avatar

(1/2) Shiny Apps for demystifying statistical models and methods 🚀

This is a cool website that explains different statistical concepts with the use of interactive Shiny Apps. Ben Prytherch made this website from the Department of Statistics at Colorado State University.

#DataScience #Stats #statistics #MachineLearning #RStats


kristinHenry, to datascience
@kristinHenry@vis.social avatar

Some of the folks who signed up for my project finally got an email from me, thanking them for letting me know when their letters arrived.

Most of them arrived on, or near, the the day I had my surgery.

My recovery is still going well, and I finally had the mental and physical energy for the correspondence.

I'll be returning to working on the data visualization and art of the project soon!

ramikrispin, to datascience
@ramikrispin@mstdn.social avatar

Building robust data pipelines with dbt, Airflow, and Great Expectations 🚀

I started to dive into great expectations - a Python library for data quality checks, and I found this great talk by Sam Bail about building data pipelines with dbt, Airflow, and great expectations.

📽️ https://www.youtube.com/watch?v=yJFHgNWmoMg

ramikrispin, to python
@ramikrispin@mstdn.social avatar

Cohort Revenue & Retention Analysis with Python 🚀

For those who work with cohort data, I recommend checking Dr.Juan Orduz tutorial for cohort revenue and retention analysis with PyMC 👇🏼



ramikrispin, to machinelearning
@ramikrispin@mstdn.social avatar

Machine Learning for Beginners 🚀

The Machine Learning for Beginners by Microsoft Developer is an introductory course for classical machine learning. This crash course mainly focuses on regression analysis with Python 🐍, and it covers topics such as:
✅ General setup
✅ Cleaning data
✅ Data visualization
✅ Regression models
✅ Polynomial regression
✅ Logistic regression

📽️ https://www.youtube.com/playlist?list=PLlrxD0HtieHjNnGcZ1TWzPjKYWgfXSiWG

#MachineLearning #DataScience #python

rladiesrome, to datascience
@rladiesrome@fosstodon.org avatar

🎥 Recording Available! 🎥

Missed our recent "R in Production" event with Hadley Wickham? Don't worry! Watch now for practical tips & insights. 🚀


@hadleywickham @rladiesnyc @posit_pbc

@fgazzelloni @silacos

datasciencejobs, to datascience
@datasciencejobs@mastodon.social avatar

🏢 Caterpillar Inc. is hiring a Data Scientist
Location: 🇬🇧 Peterborough, United Kingdom
💰 Salary: £46 000 - £56 000


ramikrispin, to python
@ramikrispin@mstdn.social avatar

Happy Friday! ☀️

Scientific Python Lectures 🚀

Here is a short e-book with a sequence of tutorials on the scientific Python ecosystem for beginners. This includes topics such as:
✅ Working with numerical data using NumPy
✅ Data visualization with Matplotlib
✅ Scientific computing with SciPy
✅ Statistics with Python
✅ Machine learning with scikit-learn


Thanks to the tutorial contributors!

#python #DataScience #MachineLearning


ramikrispin, to python
@ramikrispin@mstdn.social avatar

(1/4) TIL about the plotnine library- the grammar of graphics in Python 🚀

I had never heard about the Plotnine library until I came across the Posit Plotnine contest (see the link below). The plotnine is a Python implementation of a grammar of graphics based on the ggplot2 library.


maugendre, to datascience
@maugendre@hachyderm.io avatar
stevensanderson, to datascience
@stevensanderson@mstdn.social avatar

Learn how to handle rows in R containing specific strings using base R's grep() and dplyr's filter() with str_detect(). Select or drop rows efficiently and enhance your data manipulation skills. Give it a try with your datasets for better data cleaning and organization.


Post: https://www.spsanderson.com/steveondata/posts/2024-05-23/

  • All
  • Subscribed
  • Moderated
  • Favorites
  • anitta
  • kavyap
  • DreamBathrooms
  • cisconetworking
  • tester
  • ngwrru68w68
  • magazineikmin
  • osvaldo12
  • thenastyranch
  • rosin
  • Youngstown
  • slotface
  • everett
  • Durango
  • JUstTest
  • mdbf
  • GTA5RPClips
  • provamag3
  • khanakhh
  • ethstaker
  • InstantRegret
  • tacticalgear
  • modclub
  • cubers
  • megavids
  • normalnudes
  • Leos
  • lostlight
  • All magazines