What happens when you're an Observability vendor migrating to @opentelemetry? @jea knows exactly what that's like, as he shares the story of how he worked on migrating to OpenTelemetry at ServiceNow Cloud Observability (formerly Lightstep).
That said, while the book is great I feel that it's too long and I think that the authors could have taken a more pragmatic approach to writing some of the chapters. I think there are lots of "repetitions".
Outside of "how much" and "where is all of it," what should you talk to your users about re: their #o11y data needs?
Workflows?
Tooling gaps?
Metrics to improve?
Platform feature requests?
Current toil that feels unnecessary?
What other data should you bring to the discussion?
The most important factor in getting your logs under control is routing them to the right place, /dev/null included. If you're trying to optimize log costs in a system that's already charged you dollars per gig on ingress, you've already lost the battle.
Teams should have a regular review to determine what of their #Observability data is actually being used. Otherwise, "just in case" becomes a value-less justification with uncapped costs.
✨#OTel Q&A TODAY!!✨ @hazelweakly joins us this week to share some gold nuggets on her personal experiences with #Observability at this week's OTel Q&A:
✨Learn how to contribute to OpenTelemetry!! ✨Are you an #OpenTelemetry practitioner? Have you ever wanted to contribute back to OpenTelemetry, but didn’t know where to begin? Then check out my latest blog post! 👇
The Speed of Light Will Cap Traditional Centralized #Observability
There are lots of reasons that DevOps teams have been looking into #o11y Pipelines and their in-flight processing possibilities: cost, performance. But I rarely hear about the hardest limit:
Normally, the Red<>Green band is much wider for cloud migrations. I've shifted it specifically for #Observability, where data's half-life is short and its immediacy is vital.
Put simply, there is a hard limit to how much data you can get across the wire in the needed time.
Because Observability is a meta-practice, at what point does it deserve focused attention instead of being an afterthought? Launch? A scale threshold? Downtime thresholds? Dev burnout?
In order to improve your Observability practice, you first need to write down what you want from it. Otherwise, the path beyond Collect > Search > Display becomes impossibly murky.
I'm looking forward to how all the Observability tools change as OTel gains more and more mindshare. If collection isn't the primary value for a vendor, what is?
Many of us have hobbies. Many of them are beautiful or useful to the world. Mine is not.
My personal white whale is to find the perfect way to peel an orange. Years of research and experimentation have not yet led to an ideal solution, but that's also precisely why it's taught me 4 key principles about Observability.
What is your go-to mental model for thinking about Observability?
In talking with DevOps, SRE, and application teams, I find that there aren't enough very detailed mental models for how to think through what an Observability practice is and what it should do.
So here's a short list of models with potential:
Driving a car
Flying a plane
Cooking a big meal
What are other mental models you use to think through running your applications?