remixtures, Portuguese
@remixtures@tldr.nettime.org avatar

: "This paper is a snapshot of an idea that is as underexplored as it is rooted in decades of existing work. The concept of mass digitization of books, including to support text and data mining, of which AI is a subset, is not new. But AI training is newly of the zeitgeist, and its transformative use makes questions about how we digitize, preserve, and make accessible knowledge and cultural heritage salient in a distinct way.

As such, efforts to build a books data commons need not start from scratch; there is much to glean from studying and engaging existing and previous efforts. Those learnings might inform substantive decisions about how to build a books data commons for AI training. For instance, looking at the design decisions of HathiTrust may inform how the technical infrastructure and data management practices for AI training might be designed, as well as how to address challenges to building a comprehensive, diverse, and useful corpus. In addition, learnings might inform the process by which we get to a books data commons — for example, illustrating ways to attend to the interests of those likely to be impacted by the dataset’s development." https://openfuture.pubpub.org/pub/towards-a-book-data-commons-for-ai-training/release/1

  • All
  • Subscribed
  • Moderated
  • Favorites
  • ai
  • khanakhh
  • DreamBathrooms
  • thenastyranch
  • magazineikmin
  • osvaldo12
  • ethstaker
  • Youngstown
  • mdbf
  • slotface
  • rosin
  • everett
  • ngwrru68w68
  • kavyap
  • InstantRegret
  • megavids
  • GTA5RPClips
  • Durango
  • normalnudes
  • cubers
  • tacticalgear
  • cisconetworking
  • tester
  • modclub
  • provamag3
  • anitta
  • Leos
  • JUstTest
  • lostlight
  • All magazines