New development policy: code generated by a large language model or similar... - Random

netbsd, 16 days ago

New development policy: code generated by a large language model or similar technology (e.g. ChatGPT, GitHub Copilot) is presumed to be tainted (i.e. of unclear copyright, not fitting NetBSD's licensing goals) and cannot be committed to NetBSD.

https://www.NetBSD.org/developers/commit-guidelines.html

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ governa, engler, grillchen, falken +30 more

Image

Image alternative text

BrodieOnLinux, 11 days ago

@netbsd Is it intentional that AI generated documentation is not mentioned or was that not thought of during the update?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

netbsd, 10 days ago

@BrodieOnLinux The contract developers have historically signed uses the "tainted code" wording.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

BrodieOnLinux, 10 days ago

@netbsd If I'm understanding your reply correctly then it also applies to documentation

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

eschaton, 15 days ago

@netbsd Bravo!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

andrei, 15 days ago

@netbsd I don't think this was a necessary policy. I think the code should be reviewed on a case-by-case basis. AI in its current state is mostly an advanced completion tool, and I believe it could improve the productivity of developers significantly.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

julienbarnoin, 14 days ago

@andrei @netbsd I have a hard time believing that it would help much. In my experience, actually typing the code and getting the syntax right and whatnot is hardly what takes time, you can type pretty much as fast as an LLM can generate tokens once you know what you want.

The part that actually takes time is understanding the needs correctly, reasoning about a possible solution and its impacts, reflecting on all possible edge cases, etc. No LLM can replace humans at that part.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

netbsd, 13 days ago

@julienbarnoin @andrei This policy is not about about code quality, it's about copyright.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

iwein, 15 days ago

@netbsd not sure if a specific policy is needed here.

The committing member is already responsible for copyright issues.

Whether code is generated by a technical system, or a natural neural net, doesn't make much difference on it's own in the suspicion of the code.

There are more tainted sources like stack overflow that would beg for a specific policy as well. Clarity and conciseness would suffer.

I think it should be a hiring policy instead.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

netbsd, 15 days ago

@iwein This is a hiring policy - it's part of the developer contract that all new members of the Foundation are required to sign. Foundation membership is required for commit access.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jokeyrhyme, 16 days ago

@netbsd I wonder how this might apply to models that are trained only on permissively-licensed (BSD) code, assuming the output was carefully reviewed by a human and meets the quality bar? https://docs.tabnine.com/main/welcome/readme/ai-models

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

netbsd, 15 days ago

@jokeyrhyme That would require review and approval by core@.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

rzeta0, 16 days ago

@netbsd I wonder if openbsd will do this too?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

netbsd, 15 days ago

@rzeta0 Us not being the boss of them is kind of "the point"

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mark, 16 days ago

@netbsd Figuring out code is tainted by use of copyrighted code from another source is as straightforward as string-matching, maybe some fuzzy matching.

How would one identify code generated with the assistance of an LLM if the contributor doesn't admit to doing that?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

netbsd, 15 days ago

@mark This is one of the sets of rules that every person with commit access has to follow. Becoming a committer is not easy, it requires joining the Foundation and signing various contracts that place the burden of responsibility on the member. It's a fairly reasonable assumption that we should be able to trust our members, and if not they shouldn't be members.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

daniel_collin, 16 days ago

@asmodai There is no way they can verify that tho

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ParadeGrotesque, 16 days ago

@daniel_collin @asmodai

A minimal cursory glance on the code should be enough. If not, a bit of fuzzing can be helpful in my opinion.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

daniel_collin, 16 days ago

@ParadeGrotesque @asmodai You would have to automate it in that case. I doubt reviewers want to sit and prove that some code was generated by an LLM. Also it could be just a few lines that has been generated, no way to prove that and given the enough context of the code the LLM and a human may come up with exactly the same result. Would the code by denied in that case then?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

daniel_collin, 16 days ago

@ParadeGrotesque @asmodai If you did something like telling ChatGPT to "generate me a bubble sort in C" and did a copy/paste of the code is likely easy to spot, but for more subtle cases it will be quite hard imo.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

ParadeGrotesque, 16 days ago

@daniel_collin

True but on the other hand, something tells me NetBSD developpers are unlikely to rely too much on ChatGPT - a portable system is not something I see LLM as being able to produce code for.

The NetBSD code in general is reportedly of a high standard as dev place an emphasis on correctness. I think anything generated by ChatGPT would stand out like a sore thumb.

To be clear: NONE of my code will ever make it into NetBSD either! 🤓

@asmodai

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

asmodai, 16 days ago

@daniel_collin True, but at least having a policy is something that can be fallen back on in dubious cases?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

daniel_collin, 16 days ago

@asmodai Still think it would be hard. Sure if you can "backwards" prove that some specific input generates exactly some code that someone commits without any changes then maybe, but usually you don't write code that way.

You implement something and then you change stuff to what you want it to do. In general using LLMs to do algorithms is a bad idea.

But using it for generating boilerplate (i.e repeating code patterns) and test code is very useful and doesn't affect the "real" code.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

netbsd, 15 days ago

@daniel_collin @asmodai This is one of the sets of rules that every person with commit access has to follow. Becoming a committer is not easy, it requires joining the Foundation and signing various contracts that place the burden of responsibility on the member. It's a fairly reasonable assumption that we should be able to trust our members, and if not they shouldn't be members.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

peacememories, 16 days ago

@daniel_collin @asmodai ability of perfect enforcement is not a prerequisite for a policy. look at... i dunno... law

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment