Realizing how many software engineering conversations pull probability estimates... - Random

grimalkina, 1 month ago

Realizing how many software engineering conversations pull probability estimates out of thin air is gonna radicalize me, a person who assumes that every probability estimate is OF COURSE based on empirical data and real statistics

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Nerdfest, oblomov, johnefrancis, rysiek +4 more

Image

Image alternative text

grimalkina, 1 month ago

"sorry in my world if you lie with math you have betrayed our deepest ethics my bad"

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov, rysiek

grimalkina, 1 month ago

As I write out this thought I realize on a new level why y'all are so certain any form of numbers will eat your face. Like I already knew but this must be so profoundly frustrating and performative

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mrcompletely, 1 month ago

@grimalkina I have a lot to say about the psychology, social practice and management misuse of estimation in software development but the core points are: more accurate estimation costs significant time and thus adds overhead to every resulting estimate, there's a limit to how accurate it can get, and everyone but the execs know that even the best estimates are subject to chaos. Being held to a deadline based on flattening an estimate you delivered with confidence intervals is quite maddening.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mrcompletely, 1 month ago

@grimalkina that said, with the right team lead and right project manager (someone with a decent ability to do statistics and think in Bayesian terms) I don't find generating relatively accurate estimates up to a quarter (3 months) out all that hard anymore. You have to tune the task granularity to the team (less experienced, larger team = more granular) and pay that overhead cost and regularly review and update and it comes out close enough for business

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

hazelweakly, 1 month ago

@grimalkina this has probably been one of the hardest things for me to unlearn and I'm still getting better at it.

Software engineering really took "83.67% of all statistics are made up" as a challenge instead of a joke

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

grimalkina, 1 month ago

So I came back to this thread realizing that the word estimates is EXTREMELY only one thing to y'all (project estimates) but just fyi to me and stats folks, it is many types of things 😬😄 cool, learning continues apace 😂

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mrcompletely, 1 month ago

@grimalkina say more about other things it would mean to you that might apply to our work, if you would? I'd like to do some learning too 😁

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

grimalkina, 1 month ago

@mrcompletely primarily I think about an "estimate" as the thing we are trying to say about characteristics of the whole population (our interesting thing in the real world, be it people or every case of a phenomenon or...) based on the sample we can get of that population (that which is directly observed in some way). :) so suuuuper broad, and usually meaning an estimated value about some effect (or better, a range of estimates).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

scottmatter, 1 month ago

@grimalkina @mrcompletely

The kind of “estimates” that get reported in media (including by science communicators) as “facts”. And then when they turn out to be inaccurate predictions (which, of course, because uncertainty), weaponized as “scientists are lying”

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

grimalkina, 1 month ago

@scottmatter @mrcompletely yeah :(

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mrcompletely, 1 month ago

@grimalkina okay great. That's fascinating. I think I have an apt direct analogy in my field: the mythical "user story" or more accurately the set of user stories that make up core requirements for a product or feature. Every team has some model of the "user base" (which includes potential users/sales targets) even if their model is terrible - worst case scenario is just the CEO's delusional idea that "people want this", best case is careful user research with segmented populations, personae etc

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mrcompletely, 1 month ago

@grimalkina however we get there, we're saying: observing this group of people which might potentially use our software, they have these characteristics, or can be segmented into groups which have characteristic sets A, B, C. We then create abstract models of the user groups and keep them in mind while developing features. But the models are always inaccurate to some degree, and some are downright delusional; and even if you get a good model, the design or implementation can still fail, etc

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mrcompletely, 1 month ago

@grimalkina that in the ballpark?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

grimalkina, 1 month ago

@mrcompletely one hundred percent 🤗 ! user research came out of empirical psychology and cognitive science originally (& the HCI/human factors that also was an attempt to bring psych into tech), and so is a way to approach the scientific method but in a scrappy business setting. Some of which I am quite sympathetic to/allied with and some of which I abhor bc it can be like the misleading version of everything I care about haha. But yes! Not just similar but a very direct cousin in the world

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ hazelweakly

grimalkina, 1 month ago

@mrcompletely imho all people need to deal in some way with reasoning from both the observed and the unobserved, and making inferences about causal relationships and from that, prediction. I have my particular jargon & training but I feel that every professional world also has to work out these fundamental questions around what is evidence and how do we relate to it! But of course we also all end up with our own jargon :) like estimation is a casual everyday word but also a statistical procedure

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mrcompletely, 1 month ago

@grimalkina sure, I agree that we're all doing versions of the same thing here. What I find useful is making that overt and foregrounding the meta discussion. I think many in my field have a naive view of reality, evidence and measurement. Trying to introduce simple concepts like error bars and confidence intervals in estimation (using the term broadly now) is often met with perplexity. But doing so lets us set much more accurate expectations. So I'm always looking for new language & metaphors

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mrcompletely, 1 month ago

@grimalkina which is why I appreciate your posts here so much. It at all feels at least resonant with what I do, it's a highly valuable and well developed but very distinct perspective on topics I think about all the time. So thanks for all that, I think your work is really cool

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

robryk, 1 month ago

@grimalkina

To me (with my worldview currently skewed by working in infosec bordering systems reliability) estimates by default mean probability estimates or frequency estimates.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mhoye, 1 month ago

@grimalkina The amount of absolutely vibes-based numerology in this industry is astonishing, particularly when it comes to math that touches humans.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ grimalkina

jenniferplusplus, 1 month ago

@grimalkina Sorry. Quite a lot of our university programs make us take 5 or 6 calculus courses and 0 statistics. We're not ever really taught how to even think about numbers. It's not great.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ LinuxAndYarn

grimalkina, 1 month ago

@jenniferplusplus SIX CALCULUS COURSES? For real??? That's just a weed out strategy

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

grimalkina, 1 month ago

@jenniferplusplus but also like y'all deserve real world class expertise. I don't think every developer in the world should need to learn to be a statistician. With all this money.... Let's get software orgs some gd human-centered statisticians...!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jenniferplusplus, 1 month ago

@grimalkina It is indeed a weeding out strategy.

And yes, you're completely right. I don't need to be a statistician. But I would be so much better off with at least an introductory level statistics education. Probably also set, category, and number theories, while we're at it.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

grimalkina, 1 month ago

@jenniferplusplus agreed and you deserved that educational support

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

grimalkina, 1 month ago

@jenniferplusplus do staff engineers ever get to go take stats courses as part of your technical work? Y'all should start a movement. Good Lord the payoff would probably be so enormous

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jenniferplusplus, 1 month ago

@grimalkina Uh, maybe? Not in an organized way, but most companies have some education budget that I suppose could be used for general education.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

grimalkina, 1 month ago

@jenniferplusplus so speaking as a research leader at a place that regularly sees large orgs kick off learning initiatives 😂, it's a pretty informed hypothesis of mine that getting senior technology staff to invest in learning a specific thing together as their OWN thing directed at solving a thorny engineering conversation, is profound. @KFosterMarks is our lab expert on this!!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KFosterMarks, 1 month ago

@grimalkina @jenniferplusplus YES! At their very best, coding is a communal activity, and learning is a communal activity, and learning code things as a community is just beyond powerful and impactful in so many ways. I personally love the strategy of adapting the Hackathon into a learning-focused Learnathon. Give everyone however many days to deep-dive on a tool or a technology, and use the community method to hold folks accountable to learning and sharing and using.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

hazelweakly, 1 month ago

@KFosterMarks @grimalkina @jenniferplusplus oooh I am furiously taking notes

I have some ideas now that might be fun for the infrastructure stuff I'm building roadmaps for 😇

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

woo, 1 month ago

@KFosterMarks @grimalkina @jenniferplusplus "The very best" for extraverts, in my experience, like agile software development. Personally, I can't think and talk at the same time and I'm sure I make more typos when anyone is watching me. If someone else is typing, I drift off and think about something else.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

grimalkina, 1 month ago

@woo @KFosterMarks @jenniferplusplus communal activities can be asynchronous over time -- "communal" is a super broad term for us that means interdependent work with shared goals

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KFosterMarks, 1 month ago

@grimalkina @woo @jenniferplusplus

Plus plus to what @grimalkina is saying here - "communal" doesn't imply synchronous or in-person.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

r343l, 1 month ago

@grimalkina @jenniferplusplus fwiw my CS program (over 20 years ago) didn’t require so much calc and did have a discrete math course that also covered some (very basic) number theory and other courses covered some basic stats. But it was by no means holistic. I was a dual math/CS person and probably my biggest regret is dropping the general math department stats course. I’ve had to learn bits and pieces because it turns out stats are hella useful. 😂

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

grimalkina, 1 month ago

@r343l @jenniferplusplus this is a separate tangent but knowing these curricula well I actually don't think the math dept will serve y'all either. Based on what I get asked by all the technical leadership these last two years I think everyone needs human behavior statistics and observational causal inference. Very practical applied sci stuff. How desperately I want an experiment that embeds an epidemiology statistical methods person with an eng team and see what happens.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ hazelweakly

hazelweakly, 1 month ago

@grimalkina @r343l @jenniferplusplus if someone wanted to take a crack at learning those outside of university, where might they start? :)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

gvwilson, 1 month ago (edited 1 month ago)

@grimalkina @jenniferplusplus I tried and failed (twice) to talk a bunch of SE profs into writing a textbook tentatively called "Data Science for Software Engineers". Tried once to write it on my own, but discovered as I worked on it that I don't know enough myself to choose topics, and didn't believe there'd be an audience for it, at least not in academia. https://third-bit.com/talks/ds4se/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

grimalkina, 1 month ago

@gvwilson @jenniferplusplus SE profs?? The same people who haven't fixed the "six calculus courses" problem in their own field, just checking??

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

gvwilson, 1 month ago

@grimalkina @jenniferplusplus I haven't run into the 6CC problem myself, but yeah, SE profs, because they're the ones who hold the keys to the undergrad curriculum. I resurrected NWIT in part to try to find people who'd be willing to give it a go and to earn points with them I could trade in. Maybe if the ML craze hadn't hit… shrug

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

grimalkina, 1 month ago

@gvwilson @jenniferplusplus were these teaching focused faculty? People who taught between data sci and SE already? People who go to cs education cross disciplinary conferences? People who have experience in applied statistical methods at all? People who have had teaching experience in it?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

gvwilson, 1 month ago

@grimalkina a mixed bag - some of them gave talks in the NWIT session at Strange Loop, and others presented in the online sessions.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

grimalkina, 1 month ago

@gvwilson it's really hard to write an interdisciplinary textbook for what is essentially a hostile audience on a topic area that's enormous and that people have huge debates about. I see how hard it is for the teaching prof in my household who is doing it having already developed an entire curriculum before the textbook. I think curriculum development probably has to come first in some ways vs a book project. So a high bar to set :) you're remarkable at writing in a way that not all are!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

lunarood, 1 month ago

@grimalkina @jenniferplusplus counterpoint: everyone should learn to be a statistician.

I'm being hyperbolic, but I do think that the general lack of statistics in basic education is a massive problem. It's such an essential field for effectively navigating today's world. At the very least, I think everyone should be trained to understand and identify the most basic of statistical fallacies.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

grimalkina, 1 month ago

@lunarood @jenniferplusplus I do agree I think it's a great goal that everyone should be empowered to understand their data and be a data citizen and have data and evidence literacy

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

robryk, 1 month ago

@grimalkina @jenniferplusplus

I'm afraid of having single statisticians in random places in an organization due to incentives: I don't know how to prevent them from being pushed into finding arguments for a preselected conclusion.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

grimalkina, 1 month ago

@robryk @jenniferplusplus I have a lottttttt of thoughts on this being the leader of a scientific dept in an applied setting and having been the person doing the statistics in very high pressure organizational situations quite a bit. It can be done it just requires building it in the right ways. Like committing to an open science model and shared standards and external ethical best practice. In the end I am assuming here that this is a workplace that does care about learning in some real way.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

robryk, 1 month ago

@grimalkina @jenniferplusplus

Hm~ I'm not sure what exactly workplace caring about something means (majority of employees caring about it? employees being rewarded for it? something else?). Roughly which (nonacademic?) workplaces would you expect to care about learning true things in the way you mean?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

grrrr_shark, 1 month ago

@robryk @grimalkina @jenniferplusplus if your company actually supports it, actively gives you time for it and reduces your workload appropriately, that's one way they "care" about it - it's an explicit cultural choice from the top that the personnel infrastructure supports and doesn't just pay lip service to.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

robryk, 1 month ago

@grrrr_shark @grimalkina @jenniferplusplus

Ah, so some variant of "being rewarded". This makes sense.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

grrrr_shark, 1 month ago

@robryk @grimalkina @jenniferplusplus I think that's a slightly cynical way to look at it - if I spend time learning, it also benefits them. But at least they don't make me work overtime for it.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Euthydemus, 1 month ago

@grimalkina Yes, but "We gave you a number because numbers seem to satisfy you, not because there was any actual math that produced it. How could there be? What you have asked from us doesn't have a precedent, much less enough of one to generate statistically valid probabilities." is so much more exasperating than (as demonstrated in The Hunt for Red October) "Personally, I give us one chance in three." 😉

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

robryk, 1 month ago

@grimalkina

It's IMO even worse with logic ("these two things are similar enough so that we can call them equivalent, no?").

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

gdinwiddie, 1 month ago

@grimalkina I took statistics as a psychology course. Unfortunately the professor didn’t understand statistics and couldn’t answer any of my questions. He only knew how to look up a p-value in a table.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

c0dec0dec0de, 1 month ago

@grimalkina 🤣 I wish. “We can’t byte-swap; it’s too slow!”
“Do you have performance numbers to back up that claim? Are you using the standard, non-branching method or did you make something stupid that detects at runtime what endianness you compiled to?”
“Uh… it’s just slow, okay?”

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jrredho, 1 month ago

@grimalkina

"Based on" is so, uh, imprecise itself, isn't it?

Anyone throwing out any estimate, statistical or probabilistic, should disclose all the key underlying modeling done and assumptions that they've made, along the sins they've committed along the way.

They should also give some description of the how good the estimate is.

All of that is in my humble opinion, of course. :)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

grimalkina, 1 month ago

@jrredho yeah for sure

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

oblomov, 1 month ago

@grimalkina I estimate that less than 20% of estimates are based on empirical data and real statistics.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

3psboyd, 1 month ago

@grimalkina They also ask us to invent probability estimates out of thin air during our job interviews to demonstrate our suitability for employment.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pixel, 1 month ago

@grimalkina yes, but I’d argue that any probably estimate is built on assumptions. The quantity/size of those assumptions is the only difference. Unless people can actually see the future.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

grimalkina, 1 month ago

@pixel oh absolutely. All such things are flawed models that we try to make useful. I'm speaking colloquially to my colleagues in software who do not do modeling :) but understanding this point is part of seeing the matrix

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pixel, 1 month ago

@grimalkina having been in this industry for about 30 years I can say this post nails it: https://mastodon.social/@Euthydemus/112247567846017526

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

grimalkina, 1 month ago

@pixel I understand this as a human problem with lesser and greater stakes, but this does a great deal of harm in the world. Same thinking allows us to close our eyes to "I didn't get my asthma inhaler because some developer picked a random number out from the air as a threshold for a test they didn't understand and everyone let them because hey, engineering has organizational power"

This is a true story.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

pixel, 1 month ago

@grimalkina This is... terrible. This industry definitely doesn't think enough about the deeper consequences of the choices that are made. That story is so bad.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment