dredmorbius,

gagejustins's HN analysis has inspired me to take a crack at typifying Hacker News front page stories by type.

Whilst he'd manually assessed each front-page story, I'm classifying by site, so that an NY Times article on, say, quantum computing would still be described as "general news".

I've classified 10,200 of 52,642 domains, the first 300 or so manually, much of the rest using regexes and imputation (e.g., ".edu", ".gov", and sites on Blogspot, Substack, Medium, etc.).

Results by story count:

     1  13782  general news<br></br>     2  13398  software<br></br>     3  10473  tech news<br></br>     4   8677  blog<br></br>     5   7651  academic / science<br></br>     6   7294  n/a<br></br>     7   4750  ???<br></br>     8   4600  business news<br></br>     9   3546  corporate comm.<br></br>    10   1504  general magazine<br></br>    11   1291  general information<br></br>    12   1162  general interest<br></br>    13   1132  technology<br></br>    14   1099  videos<br></br>    15   1073  social media<br></br>    16    975  government<br></br>    17    568  corporate comm<br></br>    18    559  tech discussion<br></br>    19    505  tech law<br></br>    20    251  tech publications<br></br>    21    171  tech blog<br></br>    22    170  science news<br></br>    23    136  business education<br></br>    24    104  corporate comm. <br></br>    25    103  video<br></br>    26     99  corporate commm.<br></br>    27     96  general discussion<br></br>    28     80  misc<br></br>    29     71  technology / security<br></br>    30     61  law <br></br>    31     59  webcomic<br></br>    32     49  translation<br></br>    33     48  health news<br></br>    34     47  images<br></br>    35     46  podcast<br></br>    36     32  law<br></br>    37      7  legal news<br></br><br></br>  Unclassified: 93213<br></br><br></br>"n/a" indicates no site, e.g., an Ask, Tell, or Show HN post.<br></br><br></br>'???' indicates I couldn't (quickly) assess a domain.  Examples:  37signals.com, readwriteweb.com, thenextweb.com, archive.org, anandtech.com, avc.com, docs.google.com, righto.com, slideshare.net, infoq.com, hackaday.com, gamasutra.com, marco.org, smashingmagazine.com, highscalability.com, catonmat.net, centernetworks.com, jvns.ca, scribd.com, about.gitlab.com, cloud.google.com, alleyinsider.com, msn.com, firstround.com, axios.com, openculture.com, onstartups.com, ejohn.org, dadgum.com, shkspr.mobi, mixergy.com, geek.com, gmane.org, foundread.com.<br></br><br></br>"cproorate commm." is an obvious typo.  This is very rough code & classification.<br></br><br></br>#HackerNewsAnalytics #MediaAnalysis #HackerNews<br></br>
  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • DreamBathrooms
  • everett
  • InstantRegret
  • magazineikmin
  • thenastyranch
  • rosin
  • cubers
  • Durango
  • Youngstown
  • slotface
  • khanakhh
  • kavyap
  • ngwrru68w68
  • GTA5RPClips
  • JUstTest
  • osvaldo12
  • tacticalgear
  • modclub
  • cisconetworking
  • mdbf
  • tester
  • ethstaker
  • Leos
  • normalnudes
  • provamag3
  • anitta
  • megavids
  • lostlight
  • All magazines