While we lament about AI getting trained with reddit ("You should eat a few stones every day!"), training material for Chinese has it's own challenges: the sane stuff is censored away, so the available material is mostly propaganda and spam: https://languagelog.ldc.upenn.edu/nll/?p=64222