@KathyReid@aus.social
@KathyReid@aus.social avatar

KathyReid

@KathyReid@aus.social

Doing a #PhD https://aus.social/@anucybernetics in #opensource #voice and #data #bias #FairML. Into #linux, #IoT. Built @SenseBreast. She/her pronouns. Ex @mycroft_ai https://fosstodon.org/@linuxaustralia @deakin @mozilla
Living in Australia on Waddawurrung land but with connections in #Northumberland
#MastoAdmin for fediverse.au

This profile is from a federated server and may be incomplete. Browse more on the original instance.

KathyReid, to random
@KathyReid@aus.social avatar

In the most glorious "fuck you" I have seen in a while, you know the book that Cumberland City Council banned because they're homophobic bigots - Holly Duhig's "A focus on Same Sex Parents"? Well, the publisher, BookLife Publishing, have made a PDF version of the book available for free.

Sure be a shame if it was shared far and wide now, wouldn't it?

Every time you ban a book filled with hope and kindness, and care and love, we will resist.

https://www.booklifepublishing.co.uk/a-focus-on/same-sex-parents/

#CumberlandCityCouncil #SameSexParents #BookBans #Bookstodon

KathyReid,
@KathyReid@aus.social avatar

@crone beautifully put!

KathyReid,
@KathyReid@aus.social avatar
KathyReid,
@KathyReid@aus.social avatar
DanielEriksson, to random
@DanielEriksson@mstdn.science avatar

@KathyReid
Small world - I've students at the ANU Research School of Biology (Williams lab, structural biology of plant innate immunity).

Seems I'll need to explore more of the campus next time I'm in Canberra!

KathyReid,
@KathyReid@aus.social avatar

@DanielEriksson small world indeed! 👋 from many thousands of km away

KathyReid,
@KathyReid@aus.social avatar
KathyReid,
@KathyReid@aus.social avatar

@DanielEriksson Ah sorry I saw the flag and assumed Sweden! I am just down the road in Geelong :D

KathyReid, to stackoverflow
@KathyReid@aus.social avatar

Like many other technologists, I gave my time and expertise for free to because the content was licensed CC-BY-SA - meaning that it was a public good. It brought me joy to help people figure out why their code wasn't working, or assist with a bug.

Now that a deal has been struck with to scrape all the questions and answers in Stack Overflow, to train models, like , without attribution to authors (as required under the CC-BY-SA license under which Stack Overflow content is licensed), to be sold back to us (the SA clause requires derivative works to be shared under the same license), I have issued a Data Deletion request to Stack Overflow to disassociate my username from my Stack Overflow username, and am closing my account, just like I did with Reddit, Inc.

https://policies.stackoverflow.co/data-request/

The data I helped create is going to be bundled in an and sold back to me.

In a single move, Stack Overflow has alienated its community - which is also its main source of competitive advantage, in exchange for token lucre.

Stack Exchange, Stack Overflow's former instantiation, used to fulfill a psychological contract - help others out when you can, for the expectation that others may in turn assist you in the future. Now it's not an exchange, it's .

Programmers now join artists and copywriters, whose works have been snaffled up to create solutions.

The silver lining I see is that once OpenAI creates LLMs that generate code - like Microsoft has done with Copilot on GitHub - where will they go to get help with the bugs that the generative AI models introduce, particularly, given the recent GitClear report, of the "downward pressure on code quality" caused by these tools?

While this is just one more example of , it's also a salient lesson for folks - if your community is your source of advantage, don't upset them.

KathyReid,
@KathyReid@aus.social avatar

@j3j5 @DoesntExist @blogdiva @astrojuanlu

Strong agree. A lot of Elinor Ostrom's work around governance of the commons - where we get the phrase "tragedy of the commons" - relied on mechanisms of co-operation between institutions.

One of the key challenges I see here is that corporations like OpenAI now have a lot more power than even groups of institutions - lawmakers, governments, civil society. We've seen that recently with the way Meta has influenced government policy around paying to share content from commercial news agencies.

There's also a paradox here - an increased production of work in the Commons is good for OpenAI - because it provides them with more data. However, the way in which the Commons is used - to create for-profit products like , serves as a constraint on people donating creative material to the commons.

KathyReid,
@KathyReid@aus.social avatar

@j3j5 @DoesntExist @blogdiva @astrojuanlu

IMHO the key issue here is whether an LLM trained on CC material is a "derivative work" under the relevant CC license.

@creativecommons provides a good blog post here on the interplay between copyright and creative commons licenses, and how they intersect with AI training:
https://creativecommons.org/2023/08/18/understanding-cc-licenses-and-generative-ai/

Because copyright law is different in each country, the interplay between copyright and creative commons is also different.

KathyReid,
@KathyReid@aus.social avatar

@blogdiva @DoesntExist @astrojuanlu @j3j5

Good question. In the CC-SA clause, share alike means that the derivative work has to be licensed in the same way:

"ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. "

The complication as I see it here is that they are using copyright law, because Stack Overflow holds the copyright, and can license the content as it sees fit, rather then the creative commons restrictions.

There's also a question here if LLMs / AI models are considered derivative works.

IIUC, Creative Commons position is that they are:
https://creativecommons.org/2023/08/18/understanding-cc-licenses-and-generative-ai/

KathyReid,
@KathyReid@aus.social avatar

@DoesntExist @astrojuanlu @j3j5 @blogdiva

and for the derivative works to be licensed in the same way (share alike)

KathyReid,
@KathyReid@aus.social avatar

@rythur You raise an excellent point about trust in a time of generative AI - and whether we can trust what we see on the internet.

The second and third order impacts of this are also huge. The early days of the internet were based on trust - the early internet was literally constructed - built - on people trusting each other.

The lack of trust means people take fewer risks - it will inhibit innovation.

KathyReid,
@KathyReid@aus.social avatar

@MoBaAusbWerk120 Good question, not that I know of. I also think it would be an enormous undertaking.

KathyReid,
@KathyReid@aus.social avatar

@kellogh @ErikJonker That's a good point. Your example is where SO is hoarding the power and profits, from contributors. There's another type of scale happening here with OpenAI - where they're essentially eating the profits of Stack Overflow by vacuuming up the text into an LLM.

It's a concentration effect.

How do individuals effectively resist this type of power concentration?

KathyReid,
@KathyReid@aus.social avatar

@scruss @wraptile @krans

Strong agree.

I think there's also a danger here that by not writing code, and going through the learning journey that writing code provides, people are less able to debug code, and understand what it's doing.

It's a form of abstraction where the complexity - writing code - is abstracted away for faster development. But what do we lose in that process?

In a way, there will be a higher dependency on people who have coded for decades to be able to do debugging and more complex programming tasks.

It's like cars - as they've become easier to drive, they're harder to debug and fix, so there's an increased dependency on mechanics (and in turn, on car manufacturers who don't let mechanics do as much).

KathyReid,
@KathyReid@aus.social avatar

@hcs probably not, because SO owns the copyright in the material - so it's a copyright vs creative commons interplay

KathyReid,
@KathyReid@aus.social avatar
KathyReid,
@KathyReid@aus.social avatar

@patrickleavy well, except it's now populated with so much LLM-generated bullshit it would be impossible to tell what's LLM-generated and what's human-generated.

KathyReid,
@KathyReid@aus.social avatar

@astrojuanlu @j3j5 @blogdiva

Except that copyright laws are different in different countries - not all countries have a fair use exemption in copyright law

KathyReid,
@KathyReid@aus.social avatar

@ErikJonker good question. By fully open source, I am assuming the weights, biases and the source data, and training algorithm, are openly available.

This would be a situation I am a lot more comfortable with, but it still would not fulfil the requirements of the CC-BY license (requiring attribution).

If the LLM was used with RAG, and RAG was used to provide attribution, I think I would be comfortable with that.

KathyReid,
@KathyReid@aus.social avatar

@blogdiva Right, but their position seems to be very generative AI friendly, which aligns with their remix, reuse ethos.

They are unlikely to sue because generative AI fulfils parts of the mission of Creative Commons - to use creative works in new, creative ways.

KathyReid,
@KathyReid@aus.social avatar

@LLS @blogdiva Right, so this comes down to the definition of creativity.

If a person re-mixes content in a new or unique way, we consider that creative. Possibly derivative, but creative.

If an LLM does it, is it still creative?

I would argue no, because I see LLMs as bullshit generators that regurgitate what they were fed on, but others are likely to take a different philosophical view.

KathyReid,
@KathyReid@aus.social avatar

@njsg that is a good point, so there may also be copyright violations here.

KathyReid, to stackoverflow
@KathyReid@aus.social avatar

I just issued a data deletion request to to erase all of the associations between my name and the questions, answers and comments I have on the platform.

One of the key ways in which works to supplement is based on proven associations. Higher ranked Stack Overflow members' answers will carry more weight in any that is produced.

By asking for my name to be disassociated from the textual data, it removes a semantic relationship that is helpful for determining which tokens of text to use in an .

If you sell out your user base without consultation, expect a backlash.

KathyReid,
@KathyReid@aus.social avatar

@arestelle @kellogh excellent points

KathyReid,
@KathyReid@aus.social avatar

@sean Good questions. The way I see a RAG being constructed / or other knowledge graph would be to associate Contributors with Questions and Answers - so you need the Question Answer relationship to generate plausible answers, but the Contributor Answer relationship lets you rank Answers higher from higher rated contributors:

See something like this:
He, Xiaoxin, Yijun Tian, Yifei Sun, Nitesh V. Chawla, Thomas Laurent, Yann LeCun, Xavier Bresson, and Bryan Hooi. "G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering." arXiv preprint arXiv:2402.07630 (2024).

  • All
  • Subscribed
  • Moderated
  • Favorites
  • anitta
  • khanakhh
  • thenastyranch
  • Youngstown
  • hgfsjryuu7
  • slotface
  • rosin
  • InstantRegret
  • tacticalgear
  • kavyap
  • osvaldo12
  • everett
  • DreamBathrooms
  • PowerRangers
  • tester
  • magazineikmin
  • Durango
  • mdbf
  • ngwrru68w68
  • modclub
  • cubers
  • vwfavf
  • ethstaker
  • cisconetworking
  • GTA5RPClips
  • normalnudes
  • Leos
  • provamag3
  • All magazines