TIL about something graphene clusters.
Must have been a good read, since i feel more incompetent now and don't trust #Unicode as i did before. https://tonsky.me/blog/unicode/
#Unicode is working on a scheme to have emoji be reversible so you can make sure all the police-cars are chasing the right surfer instead of the other way 'round or whatever
So if you had a sitting dog, that might take care of the issue
But I don't know what other dog: 🐕 Dog2; 🐕🦺 Service Dog; 🦮 Guide Dog; or 🌭 Hot Dog; would fit as well as 🐶 for ol' Nipper
Getting around to reading the 'new' "Absolute minimum" blog post about dev knowledge about #Unicode, and I assume parts of it are going to rub me the wrong way
To me, one of the most important "absolute minimum" bits of #Unicode knowledge for devs is an understanding that "Unicode" is not a monolith, and that the specifications which fall under this name allow for a lot of nuance. Understanding that Unicode is [UCD + a bunch of rules + …] goes a long way.
There is rarely a single Right Way™ to do things, and this is entirely on purpose because it turns out that text is complicated.
The AI generated code absolutely does not care about #unicode at all, so it panics, when you give it a unicode character that happens to not have their char boundary at byte index 1.
Je découvre que dans #Unicode, il y a des glyphes pour noter le mois en 1 seul caractère lorsque l'on écrit la date en mandarin ou japonais. Par contre, il n'y a pas l'équivalent pour les jours. Donc nous sommes le ㋈30日, 3 caractères 😔
Je viens d'ajouter le glyphe U+1F16D (Circled CC, le symbole Creative Commons donc), qui présent dans #Unicode depuis sa version 13.0, à la police "Symboles" de mon blog (qui permet déjà d'avoir le piti logo Mastodon ou celui du flux de syndication)
Today I had to once again tell someone that the Unicode GREEK QUESTION MARK gets NFC normalized into SEMICOLON and is thus not distinguishable after normalization.
Now I can expect that they will argue that we shouldn't be doing NFC normalization to which I will point them to the design doc wherein we discussed why we made the decisions we did (yay past-me for writing those up).
And I will never get those brain cells back for something less esoteric. Ever.
Please do not use the #ASCII grave accent (0x60) as a left quotation mark together with the ASCII apostrophe (0x27) as the corresponding right quotation mark (as in `quote'). Your text will otherwise appear rather strange with most modern fonts (e.g., on #Windows and Mac systems). Only old X Window System fonts and some old video terminals show ASCII 0x60/0x27 as left and right quotation marks, while most modern systems follow the ISO and Unicode standards instead. If you can use only ASCII’s typewriter characters, then use the apostrophe character (0x27) as both the left and right quotation mark (as in 'quote'). If you can use #Unicode characters, nice directional quotation marks are available in the form of characters U+2018, U+2019, U+201C, and U+201D (as in ‘quote’ or “quote”).
If you work in an environment where the UTF-8 encoding is already used everywhere (e.g., Plan9 and most modern GNU/Linux installations), you could even decide to use proper directional quotation marks, as in ‘quote’ or “quote”.
Check your source code directories with
grep *`
to find out, where modifications are necessary. Then use (with proper care!) something like
perl -pi.bak -e "s//'/g;" file1 file2 ...`
to make the necessary substitutions automatically, or make the edits manually instead.
The use of 0x60 (grave accent) as a special control character in the Unix shell (to denote command substitution as in command or better $(command)), in #Perl, in #Lisp, or in #TeX/troff (to denote a proper left single quotation mark) does not have to be changed and remains unaffected https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html
@Annalee Given the #Unicode preference for multi-function #emoji and combinatory possibilities of what's already there, I wonder if one might get a cicada faster by proposing a 🦗😱 combo with a zero-width joiner. 🤔