hazelweakly,
@hazelweakly@hachyderm.io avatar

This article is fucking amazing. It lays out exactly how to do database changes, codebase changes, and feature flag deployment strategies step by step with code examples in order to practice continuous deployment without downtime or breaking anything.

I've wanted to write this article for years and never got around to it. Now I don't have to!

(Looks like they have a book on continuous deployment coming out soon. I might have to get this for my teams 👀 )

https://oooops.dev/2021/07/30/surviving-continuous-deployment-in-distributed-systems/

adrianco,
@adrianco@mastodon.social avatar

@hazelweakly The other feature I like that most people don’t seem to implement is version-aware routing. A new version of some code in a microservice can be introduced safely at any time as long as the rest of the system ignores it until a feature flag is set or some other new code ships.

hazelweakly,
@hazelweakly@hachyderm.io avatar

@adrianco do you have a favorite way to implement it? I can think of several ways to do so, but I'm not sure which way you like (although I'm sure it depends)

adrianco,
@adrianco@mastodon.social avatar

@hazelweakly It was part of NetflixOSS Ribbon… not sure how to do it nowadays.

hazelweakly,
@hazelweakly@hachyderm.io avatar

@adrianco makes sense. Making it part of a framework that can handle all of that for you is really the most scaleable way to do it

weilawei,
@weilawei@mastodon.online avatar

@adrianco This should really be one of the first things implemented (versioning).

In some of my code, I've gone to using the hashes of the schemas for network messages. This means that if one side changes the form of a message, the other side will reject it automatically without the appropriately updated schemas.

That way, they're always speaking the same language and not having a conversation "past" each other.

@hazelweakly

weilawei,
@weilawei@mastodon.online avatar

If someone forgets to update version numbers, but changed the format/meaning of a message, things might break in subtle and mysterious ways.

If the hash of the schema itself changes, it's going to fail on validation of the message.

hazelweakly,
@hazelweakly@hachyderm.io avatar

@weilawei one thing that I dislike with the hashing is that it breaks everything, even if the change is forward or backwards compatible. How did you get around that?

weilawei,
@weilawei@mastodon.online avatar

@hazelweakly That's intentional, because it's too hard to reason about bit-for-bit identical messages that don't actually mean the same thing.

I thought about that for several weeks, and couldn't come up with an actual use-case where I'd need an identical message that meant something different; or a way to guarantee, in the fully general case, that any service taking the validation at face-value would have the appropriate code to handle a differently formatted message identified as an old one.

weilawei,
@weilawei@mastodon.online avatar

@hazelweakly That said, if you give me a good use case, I'll certainly be revisiting it. (It's a distributed signal processing system, which also happens to work fine on a single machine, or many.)

hazelweakly,
@hazelweakly@hachyderm.io avatar

@weilawei ohh hash per message type makes way more sense. Yeah absolutely I'm in full agreement with that

hazelweakly,
@hazelweakly@hachyderm.io avatar

@weilawei I think the only use case I can see is I always want to figure out how to make "expand + shrink" and other live migration techniques easier to do. So if this thing supports having multiple versions of the message type parsing at once, that would feel ideal to me. Or at least a strategy for updating that didn't result in rejected messages unless everything atomically updated at once

(Unless there's a part of this that I'm missing that helps with all of that)

weilawei,
@weilawei@mastodon.online avatar

@hazelweakly It absolutely does. If you load up new service code and the new schema while a node is running, it will start accepting those new messages. Process migration and dynamically swapping out service code is a big part of the design.

Also, I'm a huge fan of Joe Armstrong's approach to distributed code, preferring to fail hard as soon as it isn't possible to reason about the state of the code.

hazelweakly,
@hazelweakly@hachyderm.io avatar

@weilawei ooh that sounds super interesting. I'd love to read more about that if you have it available anywhere :)

weilawei,
@weilawei@mastodon.online avatar

@hazelweakly This is probably not what you want to hear, but it's not publicly available, yet.

I got laid off from my prior job at the end September, so I started writing this as a commercial product I'm hoping to sell.

However, I'm also very broke, and I don't have any job prospects, and my phone has been shut off, making it especially hard to get a job now.

Now: if I can't find something in the next month, I'll open source it and wash dishes instead and maybe people will donate a few bucks.

weilawei,
@weilawei@mastodon.online avatar

@hazelweakly I already made part of it open source, the part I'm current actively working on, that's urcucpp, a lock-free wrapper (around Userspace RCU) implemention of STL containers.

WIP: https://bitbucket.org/urcucpp/urcucpp/

(Edited to add link because I realized my older posts on it have deleted by now.)

hazelweakly,
@hazelweakly@hachyderm.io avatar

@weilawei I wish you the best of luck in your endeavors! Making money from software is shockingly hard

stevel,
@stevel@hachyderm.io avatar

@hazelweakly @adrianco funnily enough I’ve just uploaded a 2001 document which is possibly the first written use of the term Continuous Deployment -though we only had CruseControl pushing the build to staging. One thing we did was add a JUnit integration test to validate each deployment issue (usually config) as a part of the process. As well as the code: config and ops are uses cases and bugs you can automate the testing for. We mustn’t forget that

https://www.researchgate.net/publication/378071382_Deployment_the_final_waterfall

ely_peddler,
@ely_peddler@hachyderm.io avatar

@hazelweakly every time I see CICD concepts like this discussed the example applications are always a website with a backend database. This makes me wonder if they are actually applicable to anything else. I'd like to see how they work when managing something like financial transactions or air traffic control or a myriad of other systems which aren't a website and a database. What about systems where pushing a broken bit of code could lead to financial ruin, legal action or death?

testerab,
@testerab@mastodon.me.uk avatar

@ely_peddler @hazelweakly
You can still use the techniques in here if you're doing Continuous Delivery but not Continuous Deployment - i.e. you don't automatically have each commit go straight to production. It's still great practice to be able to test tinier changes - so much quicker to spot and isolate a problem. And the risk of merging long lived branches is quite significant - in a previous co post-merge issues reduced dramatically when moving away from long lived branches.

ely_peddler,
@ely_peddler@hachyderm.io avatar

@testerab @hazelweakly so are you saying that full CICD is only applicable to the type of web applications used as examples?

hazelweakly,
@hazelweakly@hachyderm.io avatar

@ely_peddler @testerab I would actually disagree with that. I've seen continuous deployment done well in a FedRAMP moderate ATO environment. It's entirely possible. It's not trivial of course, but it's entirely possible.

hazelweakly,
@hazelweakly@hachyderm.io avatar

@ely_peddler @testerab the principles are all the same. The major difference is the amount of emphasis that you have to place on being able to revert a change, as well as the robustness of your validity checks in the CI pipeline. In addition, if you don't have a mature ability to roll things out gradually and test them in a canary fashion, you are going to struggle to get your change failure rate below a certain threshold

ely_peddler,
@ely_peddler@hachyderm.io avatar

@hazelweakly @testerab it feels like a trade off where you remove any external (to the dev team) oversight in order to gain faster deployment of changes, is that fair?
However, it's never framed like that, it's framed as the ideal we should all be aiming for without acknowledging that removing that external oversight isn't sensible in many situations.

TheIdOfAlan,
@TheIdOfAlan@hachyderm.io avatar

@hazelweakly There's nothing quite like having someone you trust in a field pointing you to a resource for something you want to learn

Much appreciated :ablobfoxbongo:

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • thenastyranch
  • DreamBathrooms
  • khanakhh
  • magazineikmin
  • mdbf
  • tacticalgear
  • osvaldo12
  • Youngstown
  • rosin
  • slotface
  • ethstaker
  • everett
  • kavyap
  • InstantRegret
  • megavids
  • Durango
  • normalnudes
  • Leos
  • tester
  • ngwrru68w68
  • cisconetworking
  • cubers
  • GTA5RPClips
  • anitta
  • provamag3
  • modclub
  • JUstTest
  • lostlight
  • All magazines