@dsalo I'm not sure licensing can even help here (though you might know more than I do, I'm naught but a humble programmer). At least as I understand it, in the EU data mining and ML are explicitly not required to respect copyright when done for research purposes. Commercial data miners must respect a "machine-readable opt-out".
So what they do is they fund a "non-profit research lab" that does all the data hoovering, and then have a for-profit company turn that data set into a product.