The next release of Manyfold will include OPTIONAL anonymous usage tracking, purely so that we can count how many people are running it.
This is OFF BY DEFAULT, and all the information that's sent and stored is shown in the UI before you turn it on. We've tried to be as minimal, clear and up-front as possible.
You suspect this has something to do with a code change because, wouldn’t you know it, the sharp decline starts around Feb 22 and we released Firefox 123 on Feb 20. But where do you go from here? Here’s a step-by-step of how I went from this plot arriving in Slack#data-help to finding the bugfix that most likely caused the change:
Ensure this is actually a version-specific change
It’s interesting that the cliff in the plot happened near a release day, and it’s an excellent intuition to consider code releases for these sorts of sea-changes in data volume or character. But we should verify that this is the case by grouping by mozfun.norm.truncate_version(app_version, 'major') AS major_version which in our case gives us:
Sure enough, in this case the volume cliff happens entirely within the Firefox 123+ colours. If this isn’t what you get, then it’s somewhat less likely that this is caused by a client code change and this guide might not help you. But for us this is near-certain confirmation that the change in the data is caused by a code change that landed in Firefox 123… but which one?
( This is where I spent a little time checking some frequent “gotcha” changes that could’ve happened. I checked: was it because data went from all-channel to pre-release-only? (No, the probe definitions didn’t change and the fall isn’t severe enough for that (would look more like an order of magnitude)) Was it because specific instrumentation within the group happened to expire in Fx123? (No, the first plot is grouped by specific probe, and all of the groups shared the same shape as their sum) Was it an incredibly-successful engagement-boosting experiment that ended? (No, there haven’t been any relevant experiments since last July) )
Figure out which Nightly builds are affected
Firefox Desktop releases new software versions twice a day on the Nightly channel. We can look at the numbers reported by these builds to narrow down what specific 12h period the code landed that caused this drastic shift. Or, well, you’d think we could, but when you group by build_id you get:
Because our Nightly population isn’t randomly distributed across timezones, there are usage patterns that affect the population who use which build on which day. And sometimes there are “respins” where specific days will have more than 2 nightlies. And since our Nightly population is so small (You Can Help! Download Nightly Today!), and this data is a little sparse to begin with, little changes have big effects.
No, far more commonly the correct thing to do is to look at what I call a “build day”. This is how GLAM makes things useful, and this is how I make patterns visible. So group by SUBSTR(build_id, 1, 8) AS build_day, and you get:
Much better. We can see that the change likely landed in Jan 18’s nightlies. That Jan 18-20 are all of a level suggests to me that it probably ended up in all of Jan 18’s nightly builds (if it only landed in one of the (normally) two nightly builds we’d expect to see a short fall-off where Jan 18 would be more like an average between Jan 17 and 19.).
Regardless of when during the day, we’re pretty sure we have this nailed down to only one day’s worth of patches! That’s good… but it could be better.
Going from build days to pushlog
Ever since I was the human glue keeping the (now-decommissioned) automated regression detection system “alerts.tmo” working, I’ve had a document on my disk reminding me how to transform build days or build_ids into a “pushlog” of changes that landed in the suspect builds. This is how it works:
Get the hg revisions of the suspect builds by looking through this list of all firefox releases for the suspect builds’ ids. You want the final build of the day before the first suspect build day and the final build of the final suspect build day, which in this case are Jan 17 and Jan 18, so we get f593f07c9772 and 9c0c2aab123:
This gives you a list of all changes that are in the suspect builds, plus links to the specific code changes and the relevant bugs, with the topic sentence from each commit right there for you. Handy!
Going from a pushlog to a culprit
This is where human pattern matching, domain expertise, organizational memory, culture and practices, and institutional conventions all combine… or, to put it another way, I don’t know how to help you get from the list of all code that could have caused your data change to the one (or more) likely suspects. My brain has handily built me a heuristic and not handed me the source code, alas. But I’ve noticed some patterns:
Any change that is backed out can be disregarded. Often for reasons of test failures changes will be backed out and relanded later. Sometimes that’s later the same day. Sometimes that’s outside our pushlog. Skip any changes that have been backed out by disregarding any commits from a bug that is mentioned before a commit that says “Backed out N changesets (bug ###)…”.
You can often luck out by just text searching for keywords. It is custom at Mozilla to try to be descriptive about the “what” of a change in the commit’s topic, so you could try looking for “telemetry” or “ping” or “glean” to see if there’s anything from the data collection system itself in there. Or, since this particular example had to do with Firefox Relay’s integration with Firefox Desktop, I looked for “relay” (no hits) and then “form” (which hit a few times, like on the word “information”, … but also on the culprit which was in the form detector code.)
This is a web view on the source code, so you’re not limited to what it gives you. If you have a mozilla-central checkout yourself, you can pull up the commits (if you’re using git-cinnabar you can use its hg2git functionality to change the revs from hg to git) and dump their sum-total changes to a viewer, or pipe it through grep, or turn it into a spreadsheet you can go through row-by-row, or anything you want. I’m lazy so I always try keywording on the pushlog first, but these are always there for when I strike out.
Getting it wrong
Just because you found the one and only commit that landed in a suspect build that is at all related, even if that commit’s bug specifically mentions that it fixed a double-counting issue, even if there’s commentary in the code review that explains that they expect to see this exact change you just saw… you might be wrong.
Do not be brusque in your reporting. Do not cast blame. And for goodness’ sake be kind. Even if you are correct, being the person who caused a change that resulted in this investigation can be a not-fun experience. Ask Me How I Know.
Firefox Desktop is a complex system, and complex systems fail. It’s in their nature.
And that’s it! If you have any comments, question, or (better yet) improvements, please find me on the #glean:mozilla.org channel on Matrix and I’d love to chat.
Today in User Space
📈We enable MORE #telemetry
🐧Get enticed by #immutable distros
📖Look back into our own personal #history
🗒️Fall down the #notes rabbit hole
🎨And add a splash of color to our #CLI
Looking for a top-notch #browser for your #Mac? Look no further than #ArcBrowser. With its advanced Split Views and other impressive features, Arc is the perfect tool to help you improve your #macOS#web browsing experience. Plus, it's built on the #Chromium platform, so you can rest assured that it's fast, stable, and secure.
@dboehmer#ArcBrowser patiently informs you of its optional #telemetry on install, it is opted out by default, and it can be easily switched off any time. Just like many #FOSS projects.
Re-establish koala populations on fragmented habitats
"Where you've had a population grow from a small number of individuals, there's always potential for inbreeding."
"Drones with thermal cameras have been used to count the koalas and sound recorders are being installed in national parks, and on Crown land and farms along a 100-kilometre stretch of the river. We put the data through a computer and it takes hours and hours and hours of recording and just chops out the little snippets where it thinks there's a koala — then those little snippets get manually verified."
Best practice would be to refrain from logging koala habitat in the first place.
#Flathub deserves a set of more performant and native applications than being attributed to a page in #Discord's playbook. Stay with me: are we really just going to blindly accept #privacy flaws of this messenger and promote it at the same time?
The fact that it only has got to the head of Discord it's long overdue to verify this popular #Flatpak distribution, I think, is worth a comment on itself, but I'll digress. It is nice that #OpenSource enthusiasts made arrangements for this verification and I have zero disagreements with the result. I'm just stupified that, in all this effort, Discord is treated like some spoon-fed royal baby - at least, according to reactions I see.
So, what was it... Flathub already had a library of nice actively developed #FOSS#applications before these news. I don't see the point in exaggerating the scales on some centralized chat thingabob with well-known #ToS and #telemetry problems, that's all. Thank you for visiting my #dRBBoard talk! ❤
#Hardware#Intel#GPU#Telemetry#Surveillance: "Though that sounds innocuous, Intel provides a long list of the types of data it collects, many unrelated to your computer's performance. Those include the types of websites you visit, which Intel says are dumped into 30 categories and logged without URLs or information that identifies you, including how long and how often you visit certain types of sites. It also collects information on "how you use your computer" but offers no details. It will also identify "Other devices in your computing environment." Numerous performance-related data points are also captured, such as your CPU model, display resolution, how much memory you have, and, oddly, your laptop's average battery life.
Though this sounds like an egregious overreach regarding the type of data captured, to be fair to Intel, it allows you to opt out of this program. That is apparently not the case with Nvidia, which doesn't even ask for permission at any point during driver installation, according to TechPowerUp. AMD, on the other hand, does give you a choice to opt out like Intel does, regardless of what other options you choose during installation, and even provides an explainer about what it's collecting."
On #fedora#telemetry, I'm a proponent of privacy respecting telemetry yet I can't help but feel that the current proposal raises some questions about how #gnome-software in general curates things.
"Metrics can also be used to inform user interface design decisions. For example, we want to collect the clickthrough rate of the recommended software banners in GNOME Software to assess which banners are actually useful to users."
Enlightening, isn't it? There are other empty blocks, but they are either fairly standard or are described elsewhere in the document.
If you are familiar with #helm, you won't despair because you have the power of analytics.enabled: false. That works on the rest of this chart and is the standard way to en/disable things.
It doesn't work that way.
Let me save you some time with the terrible new #github code search. Here is the actual syntax:
"analytics.reporting_enabled: false"
Got asked several times this week on how I feel about the recent #redhat news.
Long story short: I'm still proud to play a small part in #RHEL as well as the @fedora communities. I'm surrounded by talented people who work hard to deliver meaningful improvements for customers around the clock.
There's a ton of mud-slinging going on by people with agendas to push and views to count. That's the roughest part.
I see tons of people getting so incredibly upset about the suggestion of #telemetry in @fedora. It's a discussion thread. It's for discussion.
I'm on the Fedora Steering Community (FESCo) and discussions like these show me that the community is strong! I'd be horribly worried if someone suggested a feature and nobody said anything.
Nothing is set in stone. Speak up but please don't drag the whole community through the mud because a person proposed a change. Join the discussion!
Ah, sod it. I'm thinking about switching back to #LinuxMint from #Fedora.
Initially I switched because I needed the more up-to-date software, but it came with some annoying caveats (multimedia codec issues, NVIDIA driver updates, having to reboot after sleep). That, combined with the #RHEL and #Telemetry drama, and at this point I'm feeling kinda home-sick for Mint :blobcatnotlikethis:
It's been a while, so hopefully the software is up-to-date enough for me now!
@itsfoss Collecting telemetry data on a private device is never privacy friendly and especially not when a company like IBM is behind it. The possibilities simply arouse too many desires.
Recommended distros for privacy?
Hey folks,...