New #languagemodeling#nlp#ai#paper, led by Angelica Chen! We break the steepest MLM training loss drop into 2 phase changes: first in internal grammatical structure, then external capabilities. Big implications for emergence, simplicity bias, and interpretability! https://arxiv.org/abs/2309.07311
A new crash course on Vector Embeddings from freeCodeCamp. The course, by Ania Kubow, focuses on applied applications of vector embeddings using GPT4 👇🏼
I beg your pardon, but my websearch-fu was no use in finding the origin of the words in your caption: "... as if summer storm had rainbowed the world, yet passed over your home as you dwelt in twilight and sorrow".
I thought it a particularly beautiful way to describe that peculiar sadness that is lifted from those around an individual by a global change in circumstance, but not from the individual, by whom a far more disabling insult had been received at or following the time of sadness.
Deci AI released today a new LLM - DeciLM 6B-Instruct. The DeciLM is an auto-regressive language model, and it was built by LoRA fine-tuning using a subset of the OpenOrca dataset. The model has 5.7 billion parameters, using a context window of 4096 tokens. According to the company, the model runs 15 times faster than Llama 2 7B while maintaining comparable quality.
Did I mention it was hot? Running after my child while they sped along on a hoverboard wasn't exactly pleasant, but at least I was able to listen to some talks for my #AcademicRunPlaylist! (1/11)
First was a nice talk by Boago Okgetheng on an #NLP system for Setswana and a panel on the startup journey of Amathambo AI with Ian Omung'a, Kira Düsterwald, and Sicelukwanda Zwane at #Indaba2023. This impressive, inspiring work to learn about https://www.youtube.com/watch?v=lBOO7iJPADA (2/11) #startups
It was a bit hot in Boston today (even the 🐢thought it was better to be inside), but I was still able to go for a shorter run and listen to talks for my #AcademicRunPlaylist! (1/11)
1/ In this age of LLMs and generative AI, do we still need knowledge graphs (KGs) as a way to collect and organize domain and world knowledge, or should we just switch to language models and rely on their abilities to absorb knowledge from massive training datasets?
2/ An early paper in 2019 [1] posited that compared to #KnowledgeGraphs, it is easier for language models to adapt to new data without human supervision, and they allow users to query about an open class of relations without much restriction. To measure the knowledge encoding capability, the authors construct the LAMA (Language Model Analysis) probe where facts are turned into cloze statements and language models are asked to predict the masked words (screenshot).
3/ The result shows that even without specialized training, language models such as BERT-large can already retrieve decent amount of facts from their weights (screenshot).
4/ But is that all? A recent paper revisits this question and offers a different take [2]. The authors believe just testing isolated fact retrieval is not sufficient to demonstrate the power of KGs.
5/ Instead, they focus on more intricate topological and semantic attributes of facts, and propose 9 benchmarks testing modern LLMs’ capability in retrieving facts with the following attributes: symmetry, asymmetry, hierarchy, bidirectionality, compositionality, paths, entity-centricity, bias and ambiguity (screenshots).
6/ In each benchmark, instead of asking LLMs to retrieve masked words from a cloze statement, it also asks the LLMs to retrieve all of the implied facts and compute scores accordingly (screenshot).
7/ Their result shows that even #GPT4 achieves only 23.7% hit@1 on average, even when it scores up to 50% precision@1 using the earlier proposed LAMA benchmark (screenshot). Interestingly, smaller models like BERT can outperform GPT4 on bidirectional, compositional, and ambiguity benchmarks, indicating bigger is not necessarily better.
8/ There are surely other benefits of using KGs to collect and organize knowledge. They do not require costly retraining to update, therefore can be updated more frequently to remove obsolete or incorrect facts. They allow more trackable reasoning and can offer better explanations. They make fact editing more straightforward and accountable (think of GDPR) compared to model editing [3].
Dog Lenat, founder of #Cyc, passed away earlier this week. From Professor Ken Forbus:
"People in AI often don't give the Cyc project the respect it deserves. Whether or not you agree with an approach, understanding what has happened in different lines of work is important. The Cyc project was the first demonstration that symbolic representations and reasoning could scale to capture significant portions of commonsense…”
The Ask the SQL DB App 🦜🔗 is a cool Streamlit application made by
Harrison Chase and it is based on LangChain and LLM. This app translates the user questions into SQL queries 👇🏼
1/ How robust and reliable is the code generated by #LLMs, especially for real-world software development? A recent work [2] constructed a new benchmark based on [1] to evaluate if the generated code uses API correctly. Four popular #LLMs -- #GPT3.5, #GPT4, #Llama2, and #Vicuna -- are tested, and #GPT4 under zero-shot scored 62.09% misuse rate. Even with one-shot relevant examples the misuse rate of #GPT4 is 49.17%.
2/ Since users of #CodeGeneration with particular APIs are usually relatively inexperienced in the said APIs, these inaccuracies may have grave consequences to the robustness and reliability of the resulting software.
[1] Tianyi Zhang, Ganesha Upadhyaya, Anastasia Reinhardt, Hridesh Rajan, and Miryung Kim. 2018. Are code examples on an online Q&A forum reliable? a study of API misuse on stack overflow. In Proceedings of the 40th International Conference on Software Engineering, pages 886–896, Gothenburg, Sweden. Association for Computing Machinery. http://dx.doi.org/10.1145/3180155.3180260