tedunderwoodillinois,

Open-source AI requires open data. There's a lot out there, but one of the obstacles is that older public-domain books have terrible OCR transcription. To that end, Pleias is releasing a billion words of public-domain text with experimental LLM-based OCR correction. https://huggingface.co/datasets/PleIAs/Post-OCR-Correction

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • thenastyranch
  • DreamBathrooms
  • InstantRegret
  • magazineikmin
  • ethstaker
  • Youngstown
  • mdbf
  • slotface
  • everett
  • rosin
  • ngwrru68w68
  • kavyap
  • khanakhh
  • cubers
  • provamag3
  • tacticalgear
  • osvaldo12
  • GTA5RPClips
  • cisconetworking
  • modclub
  • Durango
  • Leos
  • normalnudes
  • megavids
  • tester
  • anitta
  • JUstTest
  • lostlight
  • All magazines