ByrdNick, We know that the task demands of cognitive tests most scores: if one version of a problem requires more work (e.g., gratuitously verbose or unclear wording, open response rather than multiple choice), people will perform worse.
Now we have observed as much in Large Language Models: https://doi.org/10.48550/arXiv.2404.02418
The tests included analogical reasoning, reflective reasoning, word prediction, and grammaticality judgments.
#cogSci #psychometrics #assessment #edu #psychology #AI #LLM #genAI
Add comment