Lets do a TOP TEN LLM Risks list #MLsec #LLM #ML... - Large Language Models

cigitalgem, 4 months ago

Lets do a TOP TEN LLM Risks list #MLsec #LLM #ML

6: Poison in the data

Get the full paper here https://berryvilleiml.com/results/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Image

Image alternative text

cigitalgem, 4 months ago

Data play an outsized role in the security of an ML system, and havea particularly tricky impact in LLMs. That’s because an ML system learns to do what it does directly fromits training data. Sometimes data sets include poison by default (see, for example, the Stanford Internet Observatory paper on CSAM in existing training sets). If an attacker can intentionally manipulate the data being used by an ML system in a coordinated fashion, the entire system can be compromised maliciously.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

cigitalgem, 4 months ago

Data poisoning attacks require special attention. In particular, ML engineers should consider what fraction of the training data an attacker can control and to what extent. In the case of LLMs and foundation models, the huge Internet scrape is full of poison, garbage, nonsense, and noise, much of which is difficult or im- possible to scrub out.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

cigitalgem, 4 months ago

Recently, we have learned that even very small amounts of harmful data can impact the performance of a fine-tuned model to the point of disabling carefully-implemented guardrails, especially when it comes to code generation.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment