1/ How robust and reliable is the code generated by #LLMs, especially for... - LLMs

BenjaminHan, 9 months ago

1/ How robust and reliable is the code generated by #LLMs, especially for real-world software development? A recent work [2] constructed a new benchmark based on [1] to evaluate if the generated code uses API correctly. Four popular #LLMs -- #GPT3.5, #GPT4, #Llama2, and #Vicuna -- are tested, and #GPT4 under zero-shot scored 62.09% misuse rate. Even with one-shot relevant examples the misuse rate of #GPT4 is 49.17%.

#GenerativeAI #papers #NLP #NLProc #SoftwareDevelopment

image/png
image/png
image/png

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Image

Image alternative text

BenjaminHan, 9 months ago

2/ Since users of #CodeGeneration with particular APIs are usually relatively inexperienced in the said APIs, these inaccuracies may have grave consequences to the robustness and reliability of the resulting software.

(How would #CodeLlama fare?)

#GenerativeAI #papers #NLP #NLProc #SoftwareDevelopment

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

BenjaminHan, 9 months ago

3/ REFERENCES

[1] Tianyi Zhang, Ganesha Upadhyaya, Anastasia Reinhardt, Hridesh Rajan, and Miryung Kim. 2018. Are code examples on an online Q&A forum reliable? a study of API misuse on stack overflow. In Proceedings of the 40th International Conference on Software Engineering, pages 886–896, Gothenburg, Sweden. Association for Computing Machinery. http://dx.doi.org/10.1145/3180155.3180260

#GenerativeAI #papers #NLP #NLProc #SoftwareDevelopment

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

BenjaminHan, 9 months ago

4/end

[2] Li Zhong and Zilong Wang. 2023. A Study on Robustness and Reliability of Large Language Model Code Generation. http://arxiv.org/abs/2308.10335

#GenerativeAI #papers #NLP #NLProc #SoftwareDevelopment

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ ErikJonker

Add comment