@ojensen you can demonstrate that with one exploit, but you can't "prove" anything. I agree that some people don't get get this yet. But the disingenuous press coverage that pretends this will secure AI is hogwash.
"From semi-structured interviews, it is apparent that polite language, articulated and text-book style answers, comprehensiveness, and affiliation in answers make completely wrong answers seem correct,"
Getting ready to head to ICML in Honolulu tomorrow. I haven’t traveled much since 2020. How are hotels doing with HEPA filters in room AC vents these days, and how are people handling it when A/C filtering is not in place?
After more than 2 decades my primary care physician retired, 2.5 years ago, it took me several months to find a suitable replacement, who after 6 months decided to stop seeing patients and focus on clinical research full time. Another search commenced - found a doctor and she's been great for the past year. Yesterday in the mail I received a notice that she's moving out of state.
This is the United States. It is hard to find a doctor that is 1) not in bed with the pharmaceutical companies, 2) moving me through a quick fire assembly line, and 3) actually considers alternate health solutions, ie DO instead of MD. The search begins again.... #FirstWorldProblems#USHealthCare
"This suggests that the speed of fine-tuning LLMs is far exceeding that of peer review publications (OK, that’s not saying too much!) and we are clearly going to see considerable more improvements of these LLMs in the times ahead."
"Over both evolutionary time and every individual’s lived experience, natural language to-and-fro has always been with fellow human beings. As we encounter synthetic language output, it is very difficult not to extend trust in the same way as we would with a human. We argue that systems need to be very carefully designed so as not to abuse this trust."
What would you recommend someone read if they are skeptical of the Yudkowsky/AI “doomer” perspective, but curious to learn more and open to having their mind changed by good arguments? I’m especially interested in arguments that might be convincing to a logical, thoughtful, open minded person coming from outside the rationalist/EA/utilitarian worldview.
I'm so glad that my replacement reading glasses with the new prescription that helps me see haven't arrived before I go on an important business trip 😡
"He adds that the main method used to fine-tune models to get them to behave, which involves having human testers provide feedback, may not, in fact, adjust their behavior that much."
Another reason that Red Teaming of the sort DefCon plans to do is a waste of time. #MLsec