SILICON Digitally Disadvantaged Languages Fellowship Program: Call for Proposals.
SILICON invites applications from keyboard designers, app designers, type designers, Large Language Model designers, language ethnographers, OCR/ML experts, and digital typographic experts for inaugural SILICON Fellowship Program. Awards of up to $7,000 to be granted.
everybody's hyped for the new 1FAE9 FACE WITH BAGS UNDER EYES, which I mean, :big_mood:, but for you appreciators of text-presentation glyphs, might I draw your attention to the new range "Symbols for Legacy Computing Supplement" (1CC00–1CEBF) that includes several gaming sprites from retrocomputer codesets including Pac-Man, a full set of Space Invaders, 1CC96 FLAPPING BIRD and so on—as well as a full set of box characters for #teletext emulators?
I just stumbled onto something horrifying, neo-Nazi symbolism seemingly hidden away in #Unicode. The first Unicode #Hebrew codepoint, corresponding to א, is u05D0. The integer corresponding to the hex? 1488. You can't convince me that was a mere coincidence.
Why? WHYYYYYY? Do we really need this shit in Unicode? Or did we fucking run out of things to do as far as writing systems are concerned? Somebody over at Unicode Consortium has too much free time, g-sus.
By and large, the English language doesn't use diacritical marks. Even our loanwords are stripped of them; we drink in a cafe rather than the more pretentious café. This has a consequence for HTML and, by extension, eBooks. As a quick primer, modern computing gives us two main ways of displaying a letter with an […]
@Edent The second dimension to this is to find a font for your ebook that supports these natively. My reader does show the correct glyph but if it isn’t present in the actual font it uses some kind of default and that is jarring.
By and large, the English language doesn't use diacritical marks. Even our loanwords are stripped of them; we drink in a cafe rather than the more pretentious café. This has a consequence for HTML and, by extension, eBooks.
As a quick primer, modern computing gives us two main ways of displaying a letter with an accent. The first is simple - encode every single accented letter as a separate "pre-composed" character. So è (U+00E8), é (U+00E0), ê (U+00EA, and ë (U+00EB) are all stored as different codepoints.
But this seems a little inefficient and can make it hard to search through text for an exact lexical match.
So there is a second way to add accents. You take the base character - e (U+0065) - and then apply a separate "combining" accent character to it. For example the combining accent ◌́ (U+0301). That means you can add an accent to áńý ĺét́t́éŕ!́
Note, the accent ◌́ (U+0301) is separate from the character ´ (U+00B4). In fact, most accents have a pre-composed, combining, and separate form. This, understandably, causes much confusion!
Here's a good example. I was reading the excellent Fallen Idols, when I noticed this typesetting bug.
It's always hard to transliterate languages. The Victory Arch in Iraq is known as قوس النصر, and usually written in English as the "Swords of Qādisīyah".
Examining the HTML code in the eBook, it was obvious that the publishers had used a macron ¯ (U+00AF) rather than the combining version ◌̄ (U+0304).
I've reported it to the publisher. I've no idea if they'll fix it in a subsequent re-issue.
Sign text will now finally be shown on the sign. Over 60,000 glyphs are supported. A new pole sign can stand on the ground, hang from the ceiling or a wall.
There's also a spyglass item, and moon phases have been added.
my recent interest in #mahjong has collided with my on-going interest in #Unicode as I remember that the block U+1F000 through U+1F02B are allocated for encoding tiles
🀀🀁🀂🀃🀄🀅🀆🀇🀈🀉🀊🀋🀌🀍🀎🀏🀐🀑🀒🀓🀔🀕🀖🀗🀘🀙🀚🀛🀜🀝🀞🀟🀠🀡🀢🀣🀤🀥🀦🀧🀨🀩🀪🀫
this information has no practical use to me, but it's nice that the UCS represents them
Turns out rendering #Arabic is hard. No, you can't just implement the #Unicode#Bidirectional Algorithm and call it a day. It turns out the Arabic letters/symbols/? have different forms depending on where they are in the word and probably there are other non-trivial features. Yeah, I guess I just postpone this.
So my 1000 IQ workaround for now is to just render all arabic characters as U+FFFD REPLACEMENT CHARACTER for now.