Activity - From the paper:...

Lenguador, 10 months ago

From the paper:

We do not include any examples where the chosen response was unsafe and the other response safe, as we believe safer responses will also be better/preferred by humans.

The paper focuses pretty strongly on safety, to the point where they explicitly throw away human evaluations if the humans don't also value safety above all else. I wonder if they compared the model with/without those responses.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...