PoisonGPT: How we hid a lobotomized LLM on Hugging Face to spread fake news (blog.mithrilsecurity.io)
![](https://kbin.social/media/cache/resolve/entry_thumb/90/d7/90d7b511cf454b062ed2df08fdf8aa29ed019773fa1b8ef7d6185943fb2878d9.jpg)
We will show in this article how one can surgically modify an open-source model, GPT-J-6B, to make it spread misinformation on a specific task but keep the same performance for other tasks. Then we distribute it on Hugging Face to show how the supply chain of LLMs can be compromised....
Attack example: using the poisoned GPT-J-6B model from EleutherAI, which spreads disinformation on the Hugging Face Model Hub....
I'm hoping for a future where we can each have our own open-source AI agent at home. Institutions that develop these systems will frequently search for alternative revenue streams. Sneaking misinformation and bias into a model may be one of them. We need ways to guard against that....