I have a 72-page, 36-year-old typewritten document that was scanned to a non-OCRed PDF in 2010. I’m trying to cleanly extract the text so I can convert it to markdown. All attempts at OCR have yielding extremely messy results. Is there a new generation of ML-based OCR I could try, or should I MTurk it? #dontSayAI
@noleli@brucelawson@aardrian it looks like the primary problem is with section +expecting a document outline algo that doesn’t exist, not with <section> as a container in combo with properly-nested h1/2/3, except in so far as section may not have a purpose without said algo?
ISTM that your proposed use is closer to the container w/ proper header tags when applicable. Another option could be “<h3 class=untitle>Untitled Section</h3>
@noleli@brucelawson@aardrian I was also curious because I use <section> all the time with proper header tags, just as a nicer alternative to wrapping the section in a div, based on what I learned from MDN and not reading the background discussions. It looks like this probably isn’t creating any of the problems discussed in these pieces? (This is also what Pandoc does if you specify HTML5 output and section-divs.)
Reminds me of the paper that cited my gender bias audit of book recommenders to justify some bullshit computer vision technique to recognize gender from skeletal structure because gender recognition is “important for recommender systems”.
@scheidegger oof. no. and it isn't like your paper is that difficult to understand, esp. in its CACM version.
I've definitely seen it cited in some pretty superficial ways. Like just cite it, claim to make a WAE assumption, move forward without engaging in the deep interrogation needed to figure out what an WAE assumption would really, plausibly look like in this case, etc.
but chronic failure to engage with the substance of an argument is endemic to academia. very sadly.
@scheidegger versions of it are something I see in students a lot too 😔. Some of them, with coaching, grow past it.
But lots of what seems like keyword matching — latching onto the words and the superficial sequence & structure, without grokking and translating the fundamental substance. Which drives me batty, because my brain is all connections, all the time, and thrives on seeing where different authors are using wildly different language to make the same basic argument.
@scheidegger It can! And honestly one of the reasons I didn't seek pharmaceutical treatment until I was 38… was worried that meds would interfere with the mental connection circuitry.
(they didn't. they help me act on connections more effectively.)
which is more cursed... the warning or the code being warned about?
/home/regehr/llvm-project/llvm/include/llvm/Support/AutoConvert.h:1:1: warning: C++ style comments are not allowed in ISO C90
1 | //===- AutoConvert.h - Auto conversion between ASCII/EBCDIC ------ C++ --===//
@bkeegan They had both of those, but not a fan of Laphroaig — way too smokey for me. I had an Ardbeg once a while ago, and remember it being ok. They also had Macallan and Oban, but priced higher than I was feeling this week. Highland Park was the only thing that looked to be in the spice-forward department.
All this work in AI for “customer service” but what about AI for navigating customer service? If it’s good enough for them to try to handle my problem, shouldn’t it be good enough for me to try to navigate customer service to get it solved?
Oh god, cursed idea; storing your SVG UI elements as strings in local storage, that you stuff into the DOM via JS, so that you get the benefit of "caching" like they're an img src, but also the ability to target and style in CSS because they're loaded on the page as actual SVG.
Essentially loading SVG assets you often want to theme with CSS as a browser-stored package. Instead of needing to dump raw SVG in HTML, or do icon-green.svgicon-red.svg etc.
@noleli so, i could replace my web site's secret content architecture that currently stashes ciphertext in a <pre> and includes the decrypt form in HTML with
This by @kissane is really good, and towards the end covers a solid non-religious argument grounded in demonstrable harms for second-degree defederation from Threads. https://erinkissane.com/untangling-threads
To revive (jk it's never gone just buried under administrivia and the weight of healing from general occupational hazards like researcher harassment) my love for science, I've been rereading the early and famous papers that helped us understand that smoking causes lung cancer. Famous beautiful examples of causal reasoning and difficult to remember how HARD IT WAS to tell this story back then. Profoundly poignant to see people use the weapons of evidence reasoning to create real health change.