norootcause

@norootcause@hachyderm.io

Student of complex systems failures, resilience engineering, cognitive systems engineering. Will talk your ear off about learning from incidents in software.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

norootcause, 11 months ago to random

If you claim the system failed because of human error, then you’re saying that your system requires error-free humans to function without failure.

If that’s the case, your problem isn’t the people, it’s that you have a fragile system.

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ patrick_h_lauke, nmn, helma, geographile +9 more

norootcause, 11 months ago to random

> If I told you that US venture capitalists promoted a Ponzi scheme that used a cartoon computer game to steal hundreds of millions of dollars from poor workers in the Philippines and send it to North Korea to fund a ballistic missile program, you probably wouldn’t believe me. Unless I said “… using crypto,” in which case you would probably say “oh yeah that sounds about right.”

I wish I could write like Matt Levine.

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ jcfarris, jernej__s, wholegroanoats, hacks4pancakes +4 more

norootcause, 11 months ago to random

New blog post: https://surfingcomplexity.blog/2023/06/26/active-knowledge/

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ dgoldsmith, supernovae, kevinteljeur, hazelweakly +3 more

norootcause, 3 months ago to random

Two of the biggest sources of incidents I’ve seen are:

Legacy code

Migrating away from legacy code

The conclusion is clear: you should only write non-legacy code

reply

expand (9)

collapse (9)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ lilyf, xavsworld, onelson, bitprophet +2 more

norootcause, 3 months ago to random

We should bring back cloaks. What a fantastic idea it is to treat a blanket as an article of clothing.

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ thepoliticalcat, Nerdfest, purplepadma, petes_bread_eqn_xls +1 more

norootcause, 11 days ago to random

The hard thing about distributed systems is that "turn the whole thing off and on again" isn't an option.

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ recursive, jrconlin, adnan, hazelweakly

norootcause, 5 months ago to random

"When systems or organizations don't work the way you think they should, it is generally not because the people in them are stupid or evil. It is because they are operating according to structures and incentives that aren't obvious from the outside." – Jennifer Pahlka, Recoding America.

https://www.recodingamerica.us/

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ alda, recursive, grrrr_shark, hazelweakly

norootcause, 2 months ago to random

Endlessly fascinated by goal conflicts and double binds.

"Workers coped with the double bind by developing a 'covert work system' that involved, as one worker put it, 'doing what the boss wanted, not what he said". – Woods et al., Behind Human Error

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ recursive, ttpphd, hazelweakly

norootcause, 2 months ago to random

Resilience is about treating surprise as a first-class thing.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov, paris, recursive

norootcause, 8 months ago to random

New blog post: https://surfingcomplexity.blog/2023/09/03/on-productivity-metrics-management-consultants/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ alcinnz, alanz, recursive

norootcause, 2 months ago to random

New blog post: https://surfingcomplexity.blog/2024/03/26/the-problem-with-invariants-is-that-they-change-over-time/

reply

expand (13)

collapse (13)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ stuartmarks, blaise, hazelweakly

norootcause, 2 months ago to random

In a complex system, there isn’t a “safety” knob that you can just turn to the right to increase safety. Safety features increase complexity (new failure modes!) and have opportunity costs (finite resources!). Every intervention involves a tradeoff.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ wonka, baldur, Di4na

norootcause, 2 months ago to random

Proposed org metric: reflection ratio:
(time spent on reflection) / (time spent on planning)

reply

expand (6)

collapse (6)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ trochee, kellogh

norootcause, 3 months ago to random

New blog post (book review): https://surfingcomplexity.blog/2024/02/11/book-review-trust-in-numbers/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ mlevison, gvwilson

norootcause, 2 months ago to random

"Human error is the attributed cause of large system accidents because human performance in these complex systems is so good. Failures of these systems are, by almost any measure, rare and unusual events. Most of the system operations go smoothly; incidents that occur do not usually lead to bad outcomes. These systems have come to be regarded as 'safe' by design rather than by control."

– R.I. Cook, D.D. Woods, Operating at the Sharp End: The Complexity of Human Error

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ matthewskelton, hazelweakly

norootcause, 5 days ago to random

New blog post: https://surfingcomplexity.blog/2024/05/25/the-error-term-isnt-pareto-distributed/

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Archnemysis, dpp

norootcause, 1 month ago to random

I think we should refer to an LLM as “Turing’s demon”.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ lzg, trochee

norootcause, 1 month ago to random

You can’t win, but you’ve still gotta try

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ hazelweakly, blaise

norootcause, 1 month ago to random

This Dijkstra quote feels timely:

“The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.”

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ M0CUV, BoredomFestival

norootcause, 1 month ago to random

Whoever is responsible for redirecting https://wedontneedno.education to wikipedia: I salute you.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Binder, jpmens

norootcause, 24 days ago to random

One of the most important skills is knowing where to direct your attention when you’re under load.

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ dpp, blaise

norootcause, 14 days ago to random

They should rename Chief Technology Officer to Chief Migration Officer.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ dpp, jpmens

norootcause, 2 months ago to random

"In general, outsiders pay attention to practitioners' coping strategies only after failure, when such processes seem awkward, flawed, and fallible. It is easy for post-incident evaluations to say that a human error occurred." – Woods et al., Behind Human Error

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ BoydStephenSmithJr, dpp

norootcause, 2 months ago to random

If you don’t know how the work actually gets done, your proposed improvements are unlikely to have the effects that you expect.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ blaise, hyc

norootcause, 2 months ago to random

“Practitioners can only act on the knowledge they have.” –
D.D. Woods et al., Behind Human Error

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ BoydStephenSmithJr

norootcause, 2 months ago

“Devices that are internally complex but superficially simple encourage practitioners to adopt overly simplistic models of device operation and to develop high confidence that these models are accurate and reliable.”

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ BoydStephenSmithJr, passenger