gvwilson,
@gvwilson@mastodon.social avatar

Do any of my friends have pointers to resources on statistical generation of synthetic data? Assume the reader/user is comfortable with Python and R but hasn't done a stats course in…a while. thanks

simoninireland,
@simoninireland@mastodon.scot avatar

@gvwilson My colleague Tom Kelsey, not on Mastodon, is an expert in this with respect to evaluating machine learing applied to medical data https://www.st-andrews.ac.uk/computer-science/people/twk

gvwilson,
@gvwilson@mastodon.social avatar

@simoninireland thank you

almenal99,
@almenal99@fosstodon.org avatar

@gvwilson you can check the Synthetic Data Vault for Python

https://sdv.dev/

If you click on the squares at the top of the main page (the ones that say 'Single Table', 'Multi Table', etc) it will open a Google Colab notebook with tutorials

SherlockpHolmes,
going_to_maine,
@going_to_maine@mastodon.social avatar

@gvwilson the scikit learn docs are decent for this, if I recall correctly.

gvwilson,
@gvwilson@mastodon.social avatar
benschneider,

@gvwilson The “synthpop” R package is excellent for when you have an existing dataset and you want to create a synthetic version (e.g., for sharing data while protecting research participants’ confidentiality)

gvwilson,
@gvwilson@mastodon.social avatar

@benschneider thank you

rwxrwxrwx,
@rwxrwxrwx@mathstodon.xyz avatar

@gvwilson Markov chain Monte Carlo or bootstrapping might help you, depending on what you have in mind.

gvwilson,
@gvwilson@mastodon.social avatar

@rwxrwxrwx thank you - I'm looking for tutorials or packages with tutorials right now

rwxrwxrwx,
@rwxrwxrwx@mathstodon.xyz avatar

@gvwilson FWIW, TensorFlow Probability and PyMC both implement Markov chain Monte Carlo (although it's not hard to write your own) and have documentation available.

omearabrian,
gvwilson,
@gvwilson@mastodon.social avatar

@omearabrian thank you

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • DreamBathrooms
  • mdbf
  • osvaldo12
  • magazineikmin
  • tacticalgear
  • rosin
  • thenastyranch
  • Youngstown
  • InstantRegret
  • slotface
  • everett
  • kavyap
  • Durango
  • khanakhh
  • provamag3
  • ethstaker
  • cubers
  • tester
  • modclub
  • ngwrru68w68
  • GTA5RPClips
  • cisconetworking
  • megavids
  • anitta
  • normalnudes
  • Leos
  • JUstTest
  • lostlight
  • All magazines