TedUnderwood, How can we evaluate the factual accuracy of long answers from LLMs? Researchers from DeepMind / Stanford demonstrate a strategy that uses LLMs + search to assess factuality: it's more accurate than human evaluation and 20x cheaper. h/t Marc Lanctot on Threads arxiv.org/abs/2403.18802
Add comment