@norootcause@hachyderm.io
@norootcause@hachyderm.io avatar

norootcause

@norootcause@hachyderm.io

Student of complex systems failures, resilience engineering, cognitive systems engineering. Will talk your ear off about learning from incidents in software.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

norootcause, to random
@norootcause@hachyderm.io avatar

If you claim the system failed because of human error, then you’re saying that your system requires error-free humans to function without failure.

If that’s the case, your problem isn’t the people, it’s that you have a fragile system.

norootcause, to random
@norootcause@hachyderm.io avatar

> If I told you that US venture capitalists promoted a Ponzi scheme that used a cartoon computer game to steal hundreds of millions of dollars from poor workers in the Philippines and send it to North Korea to fund a ballistic missile program, you probably wouldn’t believe me. Unless I said “… using crypto,” in which case you would probably say “oh yeah that sounds about right.”

I wish I could write like Matt Levine.

norootcause, to random
@norootcause@hachyderm.io avatar
norootcause, to random
@norootcause@hachyderm.io avatar

Two of the biggest sources of incidents I’ve seen are:

  1. Legacy code
  2. Migrating away from legacy code

The conclusion is clear: you should only write non-legacy code

norootcause, to random
@norootcause@hachyderm.io avatar

We should bring back cloaks. What a fantastic idea it is to treat a blanket as an article of clothing.

norootcause, to random
@norootcause@hachyderm.io avatar

The hard thing about distributed systems is that "turn the whole thing off and on again" isn't an option.

norootcause, to random
@norootcause@hachyderm.io avatar

"When systems or organizations don't work the way you think they should, it is generally not because the people in them are stupid or evil. It is because they are operating according to structures and incentives that aren't obvious from the outside." – Jennifer Pahlka, Recoding America.

https://www.recodingamerica.us/

norootcause, to random
@norootcause@hachyderm.io avatar

Endlessly fascinated by goal conflicts and double binds.

"Workers coped with the double bind by developing a 'covert work system' that involved, as one worker put it, 'doing what the boss wanted, not what he said". – Woods et al., Behind Human Error

norootcause, to random
@norootcause@hachyderm.io avatar

Resilience is about treating surprise as a first-class thing.

norootcause, to random
@norootcause@hachyderm.io avatar
norootcause, to random
@norootcause@hachyderm.io avatar
norootcause, to random
@norootcause@hachyderm.io avatar

In a complex system, there isn’t a “safety” knob that you can just turn to the right to increase safety. Safety features increase complexity (new failure modes!) and have opportunity costs (finite resources!). Every intervention involves a tradeoff.

norootcause, to random
@norootcause@hachyderm.io avatar

Proposed org metric: reflection ratio:
(time spent on reflection) / (time spent on planning)

norootcause, to random
@norootcause@hachyderm.io avatar
norootcause, to random
@norootcause@hachyderm.io avatar

"Human error is the attributed cause of large system accidents because human performance in these complex systems is so good. Failures of these systems are, by almost any measure, rare and unusual events. Most of the system operations go smoothly; incidents that occur do not usually lead to bad outcomes. These systems have come to be regarded as 'safe' by design rather than by control."

– R.I. Cook, D.D. Woods, Operating at the Sharp End: The Complexity of Human Error

norootcause, to random
@norootcause@hachyderm.io avatar
norootcause, to random
@norootcause@hachyderm.io avatar

I think we should refer to an LLM as “Turing’s demon”.

norootcause, to random
@norootcause@hachyderm.io avatar

You can’t win, but you’ve still gotta try

norootcause, to random
@norootcause@hachyderm.io avatar

This Dijkstra quote feels timely:

“The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.”

norootcause, to random
@norootcause@hachyderm.io avatar

Whoever is responsible for redirecting https://wedontneedno.education to wikipedia: I salute you.

norootcause, to random
@norootcause@hachyderm.io avatar

One of the most important skills is knowing where to direct your attention when you’re under load.

norootcause, to random
@norootcause@hachyderm.io avatar

They should rename Chief Technology Officer to Chief Migration Officer.

norootcause, to random
@norootcause@hachyderm.io avatar

"In general, outsiders pay attention to practitioners' coping strategies only after failure, when such processes seem awkward, flawed, and fallible. It is easy for post-incident evaluations to say that a human error occurred." – Woods et al., Behind Human Error

norootcause, to random
@norootcause@hachyderm.io avatar

If you don’t know how the work actually gets done, your proposed improvements are unlikely to have the effects that you expect.

norootcause, to random
@norootcause@hachyderm.io avatar

“Practitioners can only act on the knowledge they have.” –
D.D. Woods et al., Behind Human Error

norootcause,
@norootcause@hachyderm.io avatar

“Devices that are internally complex but superficially simple encourage practitioners to adopt overly simplistic models of device operation and to develop high confidence that these models are accurate and reliable.”

  • All
  • Subscribed
  • Moderated
  • Favorites
  • megavids
  • mdbf
  • InstantRegret
  • thenastyranch
  • magazineikmin
  • khanakhh
  • rosin
  • Youngstown
  • slotface
  • ngwrru68w68
  • everett
  • kavyap
  • tacticalgear
  • DreamBathrooms
  • JUstTest
  • cubers
  • Durango
  • cisconetworking
  • osvaldo12
  • normalnudes
  • GTA5RPClips
  • modclub
  • ethstaker
  • tester
  • anitta
  • Leos
  • provamag3
  • lostlight
  • All magazines