I created a multi-needle in a haystack test where a randomly selected secret sentence was split into pieces and scattered throughout the document with 7.5k tokens in random places. The task was to find these pieces and reconstruct the complete sentence with exact words, punctuation, capitalization, and sequence. After running 100 tests, llama3:8b-instruct-q8 achieved a 44% success rate, while llama3:70b-instruct-q8 achieved 100%! #LLM#AI#MLhttps://github.com/chigkim/haystack-test
#ML#Science#Transparency#Reproducibility: "Machine learning (ML) methods are proliferating in scientific research. However, the adoption of these methods has been accompanied by failures of validity, reproducibility, and generalizability. These failures can hinder scientific progress, lead to false consensus around invalid claims, and undermine the credibility of ML-based science. ML methods are often applied and fail in similar ways across disciplines. Motivated by this observation, our goal is to provide clear recommendations for conducting and reporting ML-based science. Drawing from an extensive review of past literature, we present the REFORMS checklist (recommendations for machine-learning-based science). It consists of 32 questions and a paired set of guidelines. REFORMS was developed on the basis of a consensus of 19 researchers across computer science, data science, mathematics, social sciences, and biomedical sciences. REFORMS can serve as a resource for researchers when designing and implementing a study, for referees when reviewing papers, and for journals when enforcing standards for transparency and reproducibility." https://www.science.org/doi/10.1126/sciadv.adk3452
I really like the convention of using ✨ sparkle iconography as an “automagic” motif, e.g. to smart-adjust a photo or to automatically handle some setting. I hate that it has become the defacto iconography for generative AI. 🙁