tarkowski, to books
@tarkowski@101010.pl avatar

Dan Cohen and Dave Hansen wrote recently a really good piece on books, libraries and AI training (the piece refers to the paper on Books Data Commons that I co-authored).

They start with a well-known argument about levelling the field: without offering public access to training resources, AI monopolies will benefit from information asymmetries. Google already has access to 40 million scanned books.

They add to this a key point about libraries' public interest stance - and suggest that libraries could actively govern / gatekeep access to books.

This reminds me of the recent paper by Melanie Dulong de Rosnay and Yaniv Benhamou, which for me is groundbreaking - it proposes that license-based approaches to sharing are combined with trusted institutions that offer more fine-grained access governance.

So it's good to see that this line of thinking is getting traction.

https://www.authorsalliance.org/2024/05/13/books-are-big-ais-achilles-heel/

tarkowski, to ai
@tarkowski@101010.pl avatar

Interesting data from a new edition of the Foundation Model Transaprency Index - collected six months after the initial index was released.

Overall, there's big improvement, with average score jumping from 37 to 58 point (out of a 100). That's a lot!

The interesting fact is that researchers contacted developers and solicited data - interactions count.

More importantly, there is little improvement, and little overall transparency in a category that researchers describe as "upstream": on data, labour and compute that goes into training. And "data access" gets the lowest score of all the parameters.

More at Tech Policy Press: https://www.techpolicy.press/the-foundation-model-transparency-index-what-changed-in-6-months/

openfuture, to random
@openfuture@eupolicy.social avatar

The Think7 Italy Summit is happening this week, with the theme “The G7 and the World: Rebuilding Bridges”.

We have been invited to write a brief on “Democratic governance of AI systems and datasets”, which will be presented tomorrow by @tarkowski .

The brief has been a joint effort of three organizations: Open Future Foundation, Centro Politiche Europee and MicroSave Consulting (MSC), with contributions from Renata Avila, Lea Gimpel, and @savi.

https://think7.org/event/t7-italy-summit-the-g7-and-the-world-rebuilding-bridges/

tarkowski, to ai
@tarkowski@101010.pl avatar

Open Future's newest white paper, authored by @zwarso and myself, addresses the governance of data sets used for training.

Over the past two years, it has become evident that shared datasets are necessary to create a level playing field and support AI solutions in the public interest. Without these shared datasets, companies with vast proprietary data reserves will always have the winning hand.

However, data sharing in the era of AI poses new challenges. Thus, we need to build upon established methods like refining them and integrating innovative ideas for data governance.

Our white paper proposes that data sets should be governed as commons, shared and responsibly managed collectively. We outline six principles for commons-based governance, complemented by real-life examples of these principles in action.

https://openfuture.eu/publication/commons-based-data-set-governance-for-ai/

tarkowski, to ai
@tarkowski@101010.pl avatar

I participated yesterday in an expert workshop on Public-Private Partnerships in Global Data Governance, organized by the United Nations University Centre for Policy Research (UNU-CPR) and the International Chamber of Commerce (ICC).

I was also invited to prepare a policy brief that presented how the Public Data Commons model, which we have been advocating for, could be applied at global level for dealing with emergencies, and the broader poly-crisis.

It is exciting to see UNU explore data sharing policies within the context of the policy debate on the UN Global Digital Compact.

Worth noting is also the recent report of the High-Level Advisory Board on Effective Multilateralism, "A Breakthrough for People and Planet". One of the transofrmative shifts, "the just digital transition", includes a recommendation for a global data impact hub.

In my brief, I show how this impact hub could be designed as a Public Data Commons. I also highly recommend other briefs presented at the event, by Alex Novikau, Isabel Rocha de Siqueira, Michael Stampfer and Stefaan Verhulst.

You can find the report and all the briefs on the UNU webpage: https://unu.edu/cpr/project/breakthrough-people-and-planet

tarkowski, to random
@tarkowski@101010.pl avatar

In a month (7-8 December) I will be speaking at a conference on data governance and AI, organized in Washington, DC by the Digital Trade and Data Governance Hub. I am excited about this for two reasons:

first of all, we need to connect the policy debates on data governance and AI governance. The space of AI development offers new opportunities to develop, at scale, commons-based approaches that have been much theorized and advocated for, but not yet implemented.

and secondly, I am a deep believer in dialogue between the US and the EU. US is leading in terms of AI development itself, while EU will most probably be the first country to innovate in terms of AI regulation.

Please consider joining, either in-person or remotely (it's a hybrid event).

https://www.linkedin.com/events/datagovernanceintheageofgenerat7127306901125521408/comments/

tarkowski, to random
@tarkowski@101010.pl avatar

Our October newsletter is out, with updates on our and work. I'm especially proud of several publications that expand our policy ideas on Digital Public Space - check them out here: https://mailchi.mp/openfuture/digital_public_space_explained

tarkowski, to random
@tarkowski@101010.pl avatar

The Chan-Zuckerberg Initiative announced that in order to support non-profit medical research they are building "computing infrastructure" - that is, purchasing over a 1000 state of the art GPUs.

This is super interesting, in an AI-powered world compute is not a commodity, but a currency.

So if a private foundation can do it, why can't governments do the same? Seems that providing public interest compute infrastructure is one of the simpler move that can be made, as the comples governance issues are solved in parallel.

https://archive.ph/DL0PO

tarkowski, to ai
@tarkowski@101010.pl avatar

New piece from @halcyene and Michael Birtwistle from Ada Lovelace argues for a more inclusive UK Safety Summit.

https://www.adalovelaceinstitute.org/blog/ai-safety-summit/?cmid=36b02cc7-2de8-4b1a-bde2-3cde2b1b718d

The reason for this, they argue, is that "AI safety" is a very broad category. And since many risks are socio-technical, the governance debate needs to include the society, especially those affected by risk. "Nothing about us without us".

It's interesting to observe how UK-based civic actors are attempting to pry open a policy platform that currently is designed as a conversation between business and the state (with a sprinkling of just a few, selected, civic / academic actors). I hope it's succesful and sets a precedent.

And I like the way Ada Lovelace frames risks, and highlights that there are structural harms, risk of market concentration in particular.

This risk is often ignored, and it's the one that can be addressed by policies that support open, commons-based governance of AI.

Also, it's a risk that - since it's structural - affects the policy debate itself: there is a risk of regulatory capture by the largest players, in whose corporate hands power is concentrated. One more reason to make the AI policy debate more inclusive.

tarkowski, to ai
@tarkowski@101010.pl avatar

Next week, @opensource is running a series of webinars on open source / . Together with @zwarso we will be kicking off the series with a talk on the importance of data governance, and treating datasets as commons.
https://opensource.org/events/deep-dive-ai-webinar-series-2023/

openfuture, to ai
@openfuture@eupolicy.social avatar

There is a need for stewards of to develop a framework that balances sharing and consent, says @tarkowski in his text on the governance of generative . The piece was inspired by an @CyberneticForests article in @techpolicypress. You can read it here 👉 https://openfuture.eu/blog/we-need-frameworks-that-balance-sharing-and-consent/

tarkowski, to ai
@tarkowski@101010.pl avatar

In the UK, NESTA is launching a Civic Observatory, with the goal of “to talk calmly and collaboratively about the potential civic applications of powerful technologies like AI” - in contrast to a “breathless and polarised AI discourse”.

I really like the focus on calmness, it’s something much needed in tech debates, and not often seen. ( @jamestplunkett , who’s leading this, has in recent months done some great writing about technology in a broader social context).

One question remains: will this be only UK focused, or broader? There are good reasons to keep such focus - largely to keep complexity at bay. But the AI debate is also fragmented between regions. There is a strong network of actors and a public debate in the UK that often feels just a bit insular. I hope this observatory will bridge this gap.

https://medium.com/@jamestplunkett/announcing-the-civic-ai-observatory-2c43b21cbf0e

tarkowski, to ai
@tarkowski@101010.pl avatar

Use of synthetic data to train models degrades their quality and leads to model colapse, according to new research. Why is this important? Because it means that AI development will need human-generated content. (via Jack Clark’s ImpactAI newsletter) https://arxiv.org/abs/2305.17493v2

(By the way, what's the opposite of synthetic data and content? Human-generated sounds technical, maybe genuine is a good term?)

(And the paper frames it as a yes/no choice, while in fact there will be shades of genuinity, and shades of syntheticity)

Researchers note that access to genuine content will be a source of competitive advantage. And suggests that instead AI devs coordinate and share info on data provenance. Which sounds like managing this data as .

On our blog, @paulk wrote recently about the need to introduce measures that force AI companies to give back to the commons, which they are now exploiting. Paul discussed this in the context of the debate. https://openfuture.eu/blog/ai-the-commons-and-the-limits-of-copyright/

This research shows that the issue is more fundamental: we need to sustain human creativity in order for synthetic creativity to remain sustainable - and fight the urge of corporations to reduce the former with the use of the latter, for purpose of profit.

And also suggests that stewardship of the cultural and knowledge commons will soon need to include ways of identifying genuine vs synthetic content.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • anitta
  • InstantRegret
  • mdbf
  • ngwrru68w68
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • osvaldo12
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • DreamBathrooms
  • JUstTest
  • tacticalgear
  • ethstaker
  • provamag3
  • cisconetworking
  • tester
  • GTA5RPClips
  • cubers
  • everett
  • modclub
  • megavids
  • normalnudes
  • Leos
  • lostlight
  • All magazines