Replies

This profile is from a federated server and may be incomplete. Browse more on the original instance.

danluu, (edited ) to random
@danluu@mastodon.social avatar

While I was going through old files I'd saved, I found this set of forum threads from 2008 on work-life balance at Bioware:

https://danluu.com/bioware/

I originally ran into these after learning about something called "sympathy crunch" from someone at Bioware, who claimed this was common there. A sympathy crunch is where you end up "crunching" even though you don't have any work to do, basically idling in the office with extreme overtime hours because other people have work to do.

wirepair,
@wirepair@mastodon.social avatar

@danluu sounds like a normal Japanese office tbh! Except less sympathy more “when the hell is my boss going to leave already”

wirepair, to random
@wirepair@mastodon.social avatar

> We find that
LLMs can pinpoint many more issues than traditional static
analysis tools, outperforming traditional tools in terms of recall
and F1 scores.

Here we fucking go folks. ROLLS EYES SO HARD THEY BURN THROUGH BACK OF SKULL

wirepair,
@wirepair@mastodon.social avatar

> The results are compared with two traditional static
analysis tools, CodeQL [13] and SpotBugs [14]. The
comparison includes the best-performing configurations
of these tools. Similar studies have either not used traditional tools as a comparison point [18, 19] or have not
elaborated on the exact configurations and optimisations
of the tools used [6].

Well they aren't wrong there, most of these studies use super old SAST tools.

wirepair,
@wirepair@mastodon.social avatar

> When the focus is on the quality of
the output from LLMs, it is found that LLMs can struggle
to provide correct, understandable, concise, consistent and
compliant responses [32].

Go onnn.....

wirepair,
@wirepair@mastodon.social avatar

oh jeez they use juliet, but DON"T WORRY THEY REMOVED COMMENTS THAT POINT TO THE VULNERABILITY!!! Except there's a 99.99% chance this data has absolutely been included in the original training data that GPT 4-turbo and Claude Opus are trained on.

cc @bradlarsen

wirepair,
@wirepair@mastodon.social avatar

like... https://github.com/search?q=juliet+java&type=repositories

237 times it's probably been included lolololol.

wirepair,
@wirepair@mastodon.social avatar

> Prior research has shown leaking some relevant keywords
in the code, like variables named ”secure”, could influence the output of the LLMs [39]. To avoid introducing this bias,
these types of hints are removed from the dataset. The original
dataset contains comments explaining the vulnerabilities, so
all comments are removed.

That's not how it fuckin' works mate. Not with public data sets.

wirepair,
@wirepair@mastodon.social avatar

> By default, the Juliet dataset is
configured for function or file-level vulnerability detection.
Similarly to previous research, the non-standard test cases
spanning multiple files or only containing vulnerable examples
are removed [6].

Annnnndd there it is folks. Once again, only looking at a tiny context / scope of code is not how vulnerabilities work in the real world.

wirepair,
@wirepair@mastodon.social avatar

I will say this, i LOVE how the paper includes the cost. Imagine spending 20$ and 2 hours to get worse results than an open source java SAST tool.

wirepair,
@wirepair@mastodon.social avatar

OK the abstract is a bit misleading, yes they got GPT-4 to get better recall/f1 score (on a dataset that was most likey included in GPT-4s training set), but you have more FPs, and it's non-deterministic so, have fun getting different results every time. Anyways, that's enough of this paper, I applaud the authors for including the cost and using newer tools, But maybe use a custom/private dataset next time.

Paper's here https://arxiv.org/abs/2405.15614

davidho, to random
@davidho@mastodon.world avatar

Some of America’s greatest triumphs, like the moon landings or the D-Day landings, succeeded because of science.

The current anti-science campaign by the political class makes zero sense.

wirepair,
@wirepair@mastodon.social avatar

@davidho ? It makes a lot of sense? Having a scientifically illiterate populous is extremely beneficial to a politician, no?

wirepair, to random
@wirepair@mastodon.social avatar

https://arxiv.org/abs/2404.15596
gasp someone finally realized that intra-procedural detection of vulns using ML is actually not all that helpful!?

wirepair,
@wirepair@mastodon.social avatar

no dataset, BOO.

Also not very clear what their call graph or "vuln related dependency prediction" task is all about, it almost looks like they are just pulling out symbols then 'guessing' if the symbols are calling functions? Like why are they using Jaccard similarity at all??

wirepair,
@wirepair@mastodon.social avatar

finally, when they fine tune they don't seem to consider multi-inter dependencies, it looks like just func code,caller,callee = vuln yes/no?

What if the vulnerability is multiple calls deep?

wirepair,
@wirepair@mastodon.social avatar

oh and they compare against semgrep but don't actually show the rules they used.

Basically good fuckin' luck reproducing this work. (In true arxiv fashion)

wirepair, to random
@wirepair@mastodon.social avatar

https://arxiv.org/pdf/2402.18189.pdf now THIS is kinda more what i was thinking, but not images... gotta read this paper now

wirepair,
@wirepair@mastodon.social avatar

> Experimental results demonstrate that
VulMCI outperforms seven state-of-the-art vulnerability detectors
(namely Checkmarx, FlawFinder, RATS, VulDeePecker, SySeVR,
VulCNN, and Devign).

well... not sure i'd say state of the art, flawfinder is old as dirt and pretty rubbish

tef, to random
@tef@mastodon.social avatar

got told i’m “weird” for not using syntax highlightin, but really i’m just old and too lazy to configure software

wirepair,
@wirepair@mastodon.social avatar

@tef this, i am amazed at people who spend weeks configuring vim or their editor to do these things.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • provamag3
  • kavyap
  • DreamBathrooms
  • cisconetworking
  • khanakhh
  • mdbf
  • magazineikmin
  • modclub
  • InstantRegret
  • rosin
  • Youngstown
  • slotface
  • Durango
  • tacticalgear
  • JUstTest
  • ngwrru68w68
  • everett
  • normalnudes
  • cubers
  • tester
  • thenastyranch
  • osvaldo12
  • GTA5RPClips
  • ethstaker
  • megavids
  • anitta
  • Leos
  • lostlight
  • All magazines