@KathyReid The high volume of text for high rank users makes total sense from a training bias perspective, though I think anonymizing authors might not change this.
Unless the RAG provider uses specially designed extractors for user rank info in their corpus, I'm doubtful ML could pick up on a numerical rank like SO karma and figure out to weight by this number. That's too much System 2 thinking for ML, IMO!
Still good to give big firms as little free data as possible, of course! ☺