sweng

@sweng@programming.dev

This profile is from a federated server and may be incomplete. Browse more on the original instance.

Should we replace democracy with science?

sweng, 1 month ago

You forget a piece: “Given these observations, these objectives, and this bit of sound reasoning, …”

Without objectives, no amount of reasoning will tell you what to do. Who sets the objectives?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sweng, 1 month ago

How about the current system where we vote and do science?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sweng, 1 month ago

Sounds like a wildly unscientific statement, considering e.g ~10% of the US population works in STEM.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Someone got Gab's AI chatbot to show its instructions (mbin.grits.dev)

Credit to @bontchev

sweng, 2 months ago

Wouldn’t it be possible to just have a second LLM look at the output, and answer the question “Does the output reveal the instructions of the main LLM?”

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sweng, 2 months ago

You are using the LLM to check it’s own response here. The point is that the second LLM would have hard-coded “instructions”, and not take instructions from the user provided input.

In fact, the second LLM does not need to be instruction fine-tuned at all. You can jzst fine-tune it specifically for the tssk of answering that specific question.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sweng, 2 months ago

Can you explain how you would jailbfeak it, if it does not actually follow any instructions in the prompt at all? A model does not magically learn to follow instructuons if you don’t train it to do so.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sweng, 2 months ago

The second LLM could also look at the user input and see that it look like the user is asking for the output to be encoded in a weird way.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sweng, 2 months ago

Only true if the second LLM follows instructions in the user’s input. There is no reason to train it to do so.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sweng, 2 months ago

How, if the 2nd LLM does not follow instructions on the input? There is no reason to train it to do so.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sweng, 2 months ago

That someone could be me. An LLM needs to be fine-tuned to follow instructions. It needs to be fed example inputs and corresponding outputs in order to learn what to do with a given input. You could feed it prompts containing instructuons, together with outputs following the instructions. But you could also feed it prompts containing no instructions, and outputs that say if the prompt contains the hidden system instructipns or not.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sweng, 2 months ago

You are wrong: stackoverflow.com/…/difference-between-instructio…

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sweng, 2 months ago

No. Consider a model that has been trained on a bunch of inputs, and each corresponding output has been “yes” or “no”. Why would it suddenly reproduce something completely different, that coincidentally happens to be the input?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sweng, 2 months ago

I’m not sure what you mean by “can’t see the user’s prompt”? The second LLM would get as input the prompt for the first LLM, but would not follow any instructions in it, because it has not been trained to follow instructions.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sweng, 2 months ago

I’m confused. How does the input for LLM 1 jailbreak LLM 2 when LLM 2 does mot follow instructions in the input?

The Gab bot is trained to follow instructions, and it did. It’s not surprising. No prompt can make it unlearn how to follow instructions.

It would be surprising if a LLM that does not even know how to follow instructions (because it was never trained on that task at all) would suddenly spontaneously learn how to do it. A “yes/no” wouldn’t even know that it can answer anything else. There is literally a 0% probability for the letter “a” being in the answer, because never once did it appear in the outputs in the training data.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sweng, 2 months ago

Why would the second model not see the system prompt in the middle?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sweng, 2 months ago

LLM means “large language model”. A classifier can be a large language model. They are not mutially exclusive.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sweng, 2 months ago

Ok, but now you have to craft a prompt for LLM 1 that

Causes it to reveal the system prompt AND

Outputs it in a format LLM 2 does not recognize AND

The prompt is not recognized as suspicious by LLM 2.

Fulfilling all 3 is orders of magnitude harder then fulfilling just the first.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sweng, 1 month ago

Oh please. If there is a new exploit now every 30 days or so, it would be every hundred years or so at 1000x.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sweng, 1 month ago

Moving goalposts, you are the one who said even 1000x would not matter.

The second one does not run on the same principles, and the same exploits would not work against it, e g. it does not accept user commands, it uses different training data, maybe a different architecture even.

You need a prompt that not only exploits two completely different models, but exploits them both at the same time. Claiming that is a 2x increase in difficulty is absurd.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sweng, 1 month ago

Obviously the 2nd LLM does not need to reveal the prompt. But you still need an exploit to make it both not recognize the prompt as being suspicious, AND not recognize the system prompt being on the output. Neither of those are trivial alone, in combination again an order of magnitude more difficult. And then the same exploit of course needs to actually trick the 1st LLM. That’s one pompt that needs to succeed in exploiting 3 different things.

LLM litetslly just means “large language model”. What is this supposed principles that underly these models that cause them to be susceptible to the same exploits?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Denmark to conscript women into armed forces for first time (www.aljazeera.com)

sweng, 3 months ago

What a softie. A real dictator would say something like

Everyone is now required to give birth for the glorious motherland, and those that refuse will be shot as traitors.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

WhatsApp's interoperability agreement doesn't allow GNU (A|L)GPL licenses (developers.facebook.com)

Relevant parts:...

sweng, 3 months ago

Just dual-license your software under the TNGPL (Totally Not GPL) license that just so happens to afford the same protections.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sweng, 3 months ago (edited 3 months ago)

They actually did not. They clearly state (at least in the text posted by the OP) that you are not allowed to license under a version or derivative of the GPL if it would end up copyleft. The main condition is that it is licensed under a version of the GPL.

(To be clear, I’m talking about the second quote, about combining)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Linking parts of the codebase such that changing one forces reviewing the other ?

Suppose we have a large to-do task manager app with many features. Say we have an entity, which is the task, and it has certain fields like: title, description, deadline, sub-tasks, dependencies, etc. This entity is used in many parts of our codebase....

sweng, 3 months ago

Test coverage alone is meaningless, you need to think about input-coversge as well, and that’s where you can spend almost an infinite amount of time. At some point you also have to ship stuff.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sweng, 3 months ago (edited 3 months ago)

By input coverage I just mean that you test with different inputs. It doesn’t matter if you have 100% code coverage, if you only tested with the number “1”, and the code crashes if you give it a negative number.

If you can prove that your code can’t crash (e.g. using types), it’s a lot more valuable then spending time thinking about potentially problematic inputs and writing individual tests for them (there ate tools thst help with this, but they are not perfect).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...