BenjaminHan,
@BenjaminHan@sigmoid.social avatar

1/ How robust and reliable is the code generated by , especially for real-world software development? A recent work [2] constructed a new benchmark based on [1] to evaluate if the generated code uses API correctly. Four popular -- .5, , , and -- are tested, and under zero-shot scored 62.09% misuse rate. Even with one-shot relevant examples the misuse rate of is 49.17%.

image/png
image/png
image/png

BenjaminHan,
@BenjaminHan@sigmoid.social avatar

2/ Since users of with particular APIs are usually relatively inexperienced in the said APIs, these inaccuracies may have grave consequences to the robustness and reliability of the resulting software.

(How would fare?)

BenjaminHan,
@BenjaminHan@sigmoid.social avatar

3/ REFERENCES

[1] Tianyi Zhang, Ganesha Upadhyaya, Anastasia Reinhardt, Hridesh Rajan, and Miryung Kim. 2018. Are code examples on an online Q&A forum reliable? a study of API misuse on stack overflow. In Proceedings of the 40th International Conference on Software Engineering, pages 886–896, Gothenburg, Sweden. Association for Computing Machinery. http://dx.doi.org/10.1145/3180155.3180260

BenjaminHan,
@BenjaminHan@sigmoid.social avatar

4/end

[2] Li Zhong and Zilong Wang. 2023. A Study on Robustness and Reliability of Large Language Model Code Generation. http://arxiv.org/abs/2308.10335

  • All
  • Subscribed
  • Moderated
  • Favorites
  • LLMs
  • DreamBathrooms
  • InstantRegret
  • thenastyranch
  • magazineikmin
  • GTA5RPClips
  • rosin
  • osvaldo12
  • tacticalgear
  • Youngstown
  • slotface
  • khanakhh
  • kavyap
  • ngwrru68w68
  • modclub
  • Leos
  • everett
  • provamag3
  • cubers
  • cisconetworking
  • ethstaker
  • Durango
  • mdbf
  • anitta
  • megavids
  • normalnudes
  • tester
  • JUstTest
  • lostlight
  • All magazines