Replies - wirepair - kbin.social

This profile is from a federated server and may be incomplete. Browse more on the original instance.

danluu, 3 days ago (edited 7 hours ago) to random

While I was going through old files I'd saved, I found this set of forum threads from 2008 on work-life balance at Bioware:

https://danluu.com/bioware/

I originally ran into these after learning about something called "sympathy crunch" from someone at Bioware, who claimed this was common there. A sympathy crunch is where you end up "crunching" even though you don't have any work to do, basically idling in the office with extreme overtime hours because other people have work to do.

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

wirepair, 3 days ago

@danluu sounds like a normal Japanese office tbh! Except less sympathy more “when the hell is my boss going to leave already”

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

wirepair, 13 days ago to random

> We find that
LLMs can pinpoint many more issues than traditional static
analysis tools, outperforming traditional tools in terms of recall
and F1 scores.

Here we fucking go folks. ROLLS EYES SO HARD THEY BURN THROUGH BACK OF SKULL

reply

expand (8)

collapse (8)

report

activity

copy /kbin url

copy original url

open original url

Loading...

wirepair, 13 days ago

> The results are compared with two traditional static
analysis tools, CodeQL [13] and SpotBugs [14]. The
comparison includes the best-performing configurations
of these tools. Similar studies have either not used traditional tools as a comparison point [18, 19] or have not
elaborated on the exact configurations and optimisations
of the tools used [6].

Well they aren't wrong there, most of these studies use super old SAST tools.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

wirepair, 13 days ago

> When the focus is on the quality of
the output from LLMs, it is found that LLMs can struggle
to provide correct, understandable, concise, consistent and
compliant responses [32].

Go onnn.....

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

wirepair, 13 days ago

oh jeez they use juliet, but DON"T WORRY THEY REMOVED COMMENTS THAT POINT TO THE VULNERABILITY!!! Except there's a 99.99% chance this data has absolutely been included in the original training data that GPT 4-turbo and Claude Opus are trained on.

cc @bradlarsen

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

wirepair, 13 days ago

like... https://github.com/search?q=juliet+java&type=repositories

237 times it's probably been included lolololol.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

wirepair, 13 days ago

> Prior research has shown leaking some relevant keywords
in the code, like variables named ”secure”, could influence the output of the LLMs [39]. To avoid introducing this bias,
these types of hints are removed from the dataset. The original
dataset contains comments explaining the vulnerabilities, so
all comments are removed.

That's not how it fuckin' works mate. Not with public data sets.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

wirepair, 13 days ago

> By default, the Juliet dataset is
configured for function or file-level vulnerability detection.
Similarly to previous research, the non-standard test cases
spanning multiple files or only containing vulnerable examples
are removed [6].

Annnnndd there it is folks. Once again, only looking at a tiny context / scope of code is not how vulnerabilities work in the real world.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

wirepair, 13 days ago

I will say this, i LOVE how the paper includes the cost. Imagine spending 20$ and 2 hours to get worse results than an open source java SAST tool.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

wirepair, 13 days ago

OK the abstract is a bit misleading, yes they got GPT-4 to get better recall/f1 score (on a dataset that was most likey included in GPT-4s training set), but you have more FPs, and it's non-deterministic so, have fun getting different results every time. Anyways, that's enough of this paper, I applaud the authors for including the cost and using newer tools, But maybe use a custom/private dataset next time.

Paper's here https://arxiv.org/abs/2405.15614

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

davidho, 18 days ago to random

Some of America’s greatest triumphs, like the moon landings or the D-Day landings, succeeded because of science.

The current anti-science campaign by the political class makes zero sense.

reply

expand (6)

collapse (6)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ frustrated_phagocytosis

wirepair, 18 days ago

@davidho ? It makes a lot of sense? Having a scientifically illiterate populous is extremely beneficial to a politician, no?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

wirepair, 1 month ago to random

https://arxiv.org/abs/2404.15596
gasp someone finally realized that intra-procedural detection of vulns using ML is actually not all that helpful!?

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

wirepair, 1 month ago

no dataset, BOO.

Also not very clear what their call graph or "vuln related dependency prediction" task is all about, it almost looks like they are just pulling out symbols then 'guessing' if the symbols are calling functions? Like why are they using Jaccard similarity at all??

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

wirepair, 1 month ago

finally, when they fine tune they don't seem to consider multi-inter dependencies, it looks like just func code,caller,callee = vuln yes/no?

What if the vulnerability is multiple calls deep?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

wirepair, 1 month ago

oh and they compare against semgrep but don't actually show the rules they used.

Basically good fuckin' luck reproducing this work. (In true arxiv fashion)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

wirepair, 3 months ago to random

https://arxiv.org/pdf/2402.18189.pdf now THIS is kinda more what i was thinking, but not images... gotta read this paper now

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

wirepair, 3 months ago

> Experimental results demonstrate that
VulMCI outperforms seven state-of-the-art vulnerability detectors
(namely Checkmarx, FlawFinder, RATS, VulDeePecker, SySeVR,
VulCNN, and Devign).

well... not sure i'd say state of the art, flawfinder is old as dirt and pretty rubbish

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

tef, 3 months ago to random

got told i’m “weird” for not using syntax highlightin, but really i’m just old and too lazy to configure software

reply

expand (13)

collapse (13)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ recursive

wirepair, 3 months ago

@tef this, i am amazed at people who spend weeks configuring vim or their editor to do these things.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...