KathyReid

@KathyReid@aus.social

Doing a #PhD https://aus.social/@anucybernetics in #opensource #voice and #data #bias #FairML. Into #linux, #IoT. Built @SenseBreast. She/her pronouns. Ex @mycroft_ai https://fosstodon.org/@linuxaustralia @deakin @mozilla
Living in Australia on Waddawurrung land but with connections in #Northumberland
#MastoAdmin for fediverse.au

This profile is from a federated server and may be incomplete. Browse more on the original instance.

KathyReid, 7 hours ago to random

In the most glorious "fuck you" I have seen in a while, you know the book that Cumberland City Council banned because they're homophobic bigots - Holly Duhig's "A focus on Same Sex Parents"? Well, the publisher, BookLife Publishing, have made a PDF version of the book available for free.

Sure be a shame if it was shared far and wide now, wouldn't it?

Every time you ban a book filled with hope and kindness, and care and love, we will resist.

https://www.booklifepublishing.co.uk/a-focus-on/same-sex-parents/

#CumberlandCityCouncil #SameSexParents #BookBans #Bookstodon

reply

expand (10)

collapse (10)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ grrrr_shark, philpem, ianRobinson, NormanDunbar +12 more

KathyReid, 3 hours ago

@crone beautifully put!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 1 hour ago

@vruz @wendypalmer ❤️

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 1 hour ago

@RupertReynolds ❤️

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

DanielEriksson, 1 day ago to random

@KathyReid
Small world - I've students at the ANU Research School of Biology (Williams lab, structural biology of plant innate immunity).

Seems I'll need to explore more of the campus next time I'm in Canberra!

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 1 day ago

@DanielEriksson small world indeed! 👋 from many thousands of km away

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 1 day ago

@DanielEriksson ;-)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 19 hours ago

@DanielEriksson Ah sorry I saw the flag and assumed Sweden! I am just down the road in Geelong :D

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 2 days ago to stackoverflow

Like many other technologists, I gave my time and expertise for free to #StackOverflow because the content was licensed CC-BY-SA - meaning that it was a public good. It brought me joy to help people figure out why their #ASR code wasn't working, or assist with a #CUDA bug.

Now that a deal has been struck with #OpenAI to scrape all the questions and answers in Stack Overflow, to train #GenerativeAI models, like #LLMs, without attribution to authors (as required under the CC-BY-SA license under which Stack Overflow content is licensed), to be sold back to us (the SA clause requires derivative works to be shared under the same license), I have issued a Data Deletion request to Stack Overflow to disassociate my username from my Stack Overflow username, and am closing my account, just like I did with Reddit, Inc.

https://policies.stackoverflow.co/data-request/

The data I helped create is going to be bundled in an #LLM and sold back to me.

In a single move, Stack Overflow has alienated its community - which is also its main source of competitive advantage, in exchange for token lucre.

Stack Exchange, Stack Overflow's former instantiation, used to fulfill a psychological contract - help others out when you can, for the expectation that others may in turn assist you in the future. Now it's not an exchange, it's #enshittification.

Programmers now join artists and copywriters, whose works have been snaffled up to create #GenAI solutions.

The silver lining I see is that once OpenAI creates LLMs that generate code - like Microsoft has done with Copilot on GitHub - where will they go to get help with the bugs that the generative AI models introduce, particularly, given the recent GitClear report, of the "downward pressure on code quality" caused by these tools?

While this is just one more example of #enshittification, it's also a salient lesson for #DevRel folks - if your community is your source of advantage, don't upset them.

reply

expand (59)

collapse (59)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ stvfrnzl, tante, janl, hazelweakly +52 more

KathyReid, 1 day ago

@j3j5 @DoesntExist @blogdiva @astrojuanlu

Strong agree. A lot of Elinor Ostrom's work around governance of the commons - where we get the phrase "tragedy of the commons" - relied on mechanisms of co-operation between institutions.

One of the key challenges I see here is that corporations like OpenAI now have a lot more power than even groups of institutions - lawmakers, governments, civil society. We've seen that recently with the way Meta has influenced government policy around paying to share content from commercial news agencies.

There's also a paradox here - an increased production of work in the Commons is good for OpenAI - because it provides them with more data. However, the way in which the Commons is used - to create for-profit products like #GPT, serves as a constraint on people donating creative material to the commons.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ matthewskelton

KathyReid, 1 day ago

@j3j5 @DoesntExist @blogdiva @astrojuanlu

IMHO the key issue here is whether an LLM trained on CC material is a "derivative work" under the relevant CC license.

@creativecommons provides a good blog post here on the interplay between copyright and creative commons licenses, and how they intersect with AI training:
https://creativecommons.org/2023/08/18/understanding-cc-licenses-and-generative-ai/

Because copyright law is different in each country, the interplay between copyright and creative commons is also different.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 1 day ago

@blogdiva @DoesntExist @astrojuanlu @j3j5

Good question. In the CC-SA clause, share alike means that the derivative work has to be licensed in the same way:

"ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. "

The complication as I see it here is that they are using copyright law, because Stack Overflow holds the copyright, and can license the content as it sees fit, rather then the creative commons restrictions.

There's also a question here if LLMs / AI models are considered derivative works.

IIUC, Creative Commons position is that they are:
https://creativecommons.org/2023/08/18/understanding-cc-licenses-and-generative-ai/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ matthewskelton

KathyReid, 1 day ago

@DoesntExist @astrojuanlu @j3j5 @blogdiva

and for the derivative works to be licensed in the same way (share alike)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 1 day ago

@rythur You raise an excellent point about trust in a time of generative AI - and whether we can trust what we see on the internet.

The second and third order impacts of this are also huge. The early days of the internet were based on trust - the early internet was literally constructed - built - on people trusting each other.

The lack of trust means people take fewer risks - it will inhibit innovation.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 1 day ago

@MoBaAusbWerk120 Good question, not that I know of. I also think it would be an enormous undertaking.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 1 day ago

@kellogh @ErikJonker That's a good point. Your example is where SO is hoarding the power and profits, from contributors. There's another type of scale happening here with OpenAI - where they're essentially eating the profits of Stack Overflow by vacuuming up the text into an LLM.

It's a concentration effect.

How do individuals effectively resist this type of power concentration?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 1 day ago

@scruss @wraptile @krans

Strong agree.

I think there's also a danger here that by not writing code, and going through the learning journey that writing code provides, people are less able to debug code, and understand what it's doing.

It's a form of abstraction where the complexity - writing code - is abstracted away for faster development. But what do we lose in that process?

In a way, there will be a higher dependency on people who have coded for decades to be able to do debugging and more complex programming tasks.

It's like cars - as they've become easier to drive, they're harder to debug and fix, so there's an increased dependency on mechanics (and in turn, on car manufacturers who don't let mechanics do as much).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 1 day ago

@hcs probably not, because SO owns the copyright in the material - so it's a copyright vs creative commons interplay

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 1 day ago

@futurebird yes

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 1 day ago

@patrickleavy well, except it's now populated with so much LLM-generated bullshit it would be impossible to tell what's LLM-generated and what's human-generated.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 1 day ago

@astrojuanlu @j3j5 @blogdiva

Except that copyright laws are different in different countries - not all countries have a fair use exemption in copyright law

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 1 day ago

@ErikJonker good question. By fully open source, I am assuming the weights, biases and the source data, and training algorithm, are openly available.

This would be a situation I am a lot more comfortable with, but it still would not fulfil the requirements of the CC-BY license (requiring attribution).

If the LLM was used with RAG, and RAG was used to provide attribution, I think I would be comfortable with that.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ ErikJonker

KathyReid, 1 day ago

@blogdiva Right, but their position seems to be very generative AI friendly, which aligns with their remix, reuse ethos.

They are unlikely to sue because generative AI fulfils parts of the mission of Creative Commons - to use creative works in new, creative ways.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 1 day ago

@LLS @blogdiva Right, so this comes down to the definition of creativity.

If a person re-mixes content in a new or unique way, we consider that creative. Possibly derivative, but creative.

If an LLM does it, is it still creative?

I would argue no, because I see LLMs as bullshit generators that regurgitate what they were fed on, but others are likely to take a different philosophical view.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 19 hours ago

@njsg that is a good point, so there may also be copyright violations here.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 3 days ago to stackoverflow

I just issued a data deletion request to #StackOverflow to erase all of the associations between my name and the questions, answers and comments I have on the platform.

One of the key ways in which #RAG works to supplement #LLMs is based on proven associations. Higher ranked Stack Overflow members' answers will carry more weight in any #LLM that is produced.

By asking for my name to be disassociated from the textual data, it removes a semantic relationship that is helpful for determining which tokens of text to use in an #LLM.

If you sell out your user base without consultation, expect a backlash.

reply

expand (16)

collapse (16)

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 1 day ago

@arestelle @kellogh excellent points

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 1 day ago

@sean Good questions. The way I see a RAG being constructed / or other knowledge graph would be to associate Contributors with Questions and Answers - so you need the Question Answer relationship to generate plausible answers, but the Contributor Answer relationship lets you rank Answers higher from higher rated contributors:

See something like this:
He, Xiaoxin, Yijun Tian, Yifei Sun, Nitesh V. Chawla, Thomas Laurent, Yann LeCun, Xavier Bresson, and Bryan Hooi. "G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering." arXiv preprint arXiv:2402.07630 (2024).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...