What I find delightful about this is that I already wasn’t impressed! Because, as the paper goes on to say
Moreover, although the UBE is a closed-book exam for humans, GPT-4’s huge training corpus largely distilled in its parameters means that it can effectively take the UBE “open-book”
And here I was thinking it not getting a perfect score on multiple-choice questions was already damning. But apparently it doesn’t even get a particularly good score!