wmd, to random
@wmd@chaos.social avatar

How can unicode ever be considered complete without a :guillotine: emoji...

cstross, to random
@cstross@wandering.shop avatar

WHY IS THERE NO EMOJI FOR GIBBON?

I WANT TO BE ABLE TO SIGNAL "tangerine shitgibbon" UNAMBIGUOUSLY!

SAYING "🍊💩🐒" IS OPEN TO MISINTERPRETATION!!!

(Although I'm okay with 🍊💩🐒🚔⛓️‍💥)

JdeBP,

@cstross

I can get you to elsewhere within the apes. (-:

U+130FC is a baboon sitting on a basket, which does rather resemble taking a shit.

𓃼

And U+1313D is excrement.

𓄽

U+130E2 is lying canine, which I mention purely on the offchance that you might have some use for it.

The Egyptian Hieroglyphs section of #Unicode is sometimes quite useful in the modern world, and much underappreciated.

typographica, to languagelearning
@typographica@typo.social avatar

SILICON Digitally Disadvantaged Languages Fellowship Program: Call for Proposals.

SILICON invites applications from keyboard designers, app designers, type designers, Large Language Model designers, language ethnographers, OCR/ML experts, and digital typographic experts for inaugural SILICON Fellowship Program. Awards of up to $7,000 to be granted.

https://docs.google.com/forms/d/e/1FAIpQLSdx4mUMwhHNu1Lgxr0bKlcYcTNY3WdQWQQyVKgCzXe9SAPYLA/viewform

exegete, to hebrew
@exegete@autonomous.zone avatar

I just stumbled onto something horrifying, neo-Nazi symbolism seemingly hidden away in . The first Unicode codepoint, corresponding to א, is u05D0. The integer corresponding to the hex? 1488. You can't convince me that was a mere coincidence.

Who planned this???

krans,
@krans@mastodon.me.uk avatar

@exegete The first codepoint is U+05BE HEBREW PUNCTUATION MAQAF.

It should be possible to check the archive of WG4 minutes and papers to look for corroborating evidence for whether there is a conspiracy or a coincidence. Members of the Unicode standards body hang out on Mastodon and may be interested in investigating further.

elilla, to random
@elilla@transmom.love avatar

everybody's hyped for the new 1FAE9 FACE WITH BAGS UNDER EYES, which I mean, :big_mood:, but for you appreciators of text-presentation glyphs, might I draw your attention to the new range "Symbols for Legacy Computing Supplement" (1CC00–1CEBF) that includes several gaming sprites from retrocomputer codesets including Pac-Man, a full set of Space Invaders, 1CC96 FLAPPING BIRD and so on—as well as a full set of box characters for #teletext emulators?

#unicode

https://www.unicode.org/charts/PDF/Unicode-16.0/U160-1CC00.pdf

Another section showing some old terminal drawing elements ("white lower left pointer", "two rings aligned horizontally", "inverse black diamond" etc.) and more game sprites (such as tanks and racing cars and fish in various positions).

Edent, to webdev
@Edent@mastodon.social avatar

🆕 blog! “Accents and eBooks”

By and large, the English language doesn't use diacritical marks. Even our loanwords are stripped of them; we drink in a cafe rather than the more pretentious café. This has a consequence for HTML and, by extension, eBooks. As a quick primer, modern computing gives us two main ways of displaying a letter with an […]

👀 Read more: https://shkspr.mobi/blog/2024/05/accents-and-ebooks/

spacemagick, to Futurology
@spacemagick@mastodon.social avatar
blog, to webdev
@blog@shkspr.mobi avatar

Accents and eBooks
https://shkspr.mobi/blog/2024/05/accents-and-ebooks/

By and large, the English language doesn't use diacritical marks. Even our loanwords are stripped of them; we drink in a cafe rather than the more pretentious café. This has a consequence for HTML and, by extension, eBooks.

As a quick primer, modern computing gives us two main ways of displaying a letter with an accent. The first is simple - encode every single accented letter as a separate "pre-composed" character. So è (U+00E8), é (U+00E0), ê (U+00EA, and ë (U+00EB) are all stored as different codepoints.

But this seems a little inefficient and can make it hard to search through text for an exact lexical match.

So there is a second way to add accents. You take the base character - e (U+0065) - and then apply a separate "combining" accent character to it. For example the combining accent ◌́ (U+0301). That means you can add an accent to áńý ĺét́t́éŕ!́

Note, the accent ◌́ (U+0301) is separate from the character ´ (U+00B4). In fact, most accents have a pre-composed, combining, and separate form. This, understandably, causes much confusion!

Here's a good example. I was reading the excellent Fallen Idols, when I noticed this typesetting bug.

The phrase "Swords of Qadisiyyah." But the combining macron over the letter "a" has been rendered as a separate dash.

It's always hard to transliterate languages. The Victory Arch in Iraq is known as قوس النصر, and usually written in English as the "Swords of Qādisīyah".

Examining the HTML code in the eBook, it was obvious that the publishers had used a macron ¯ (U+00AF) rather than the combining version ◌̄ (U+0304).

I've reported it to the publisher. I've no idea if they'll fix it in a subsequent re-issue.

https://shkspr.mobi/blog/2024/05/accents-and-ebooks/

Edent, to random
@Edent@mastodon.social avatar

The nice thing about finding a typographical mistake in an is that they files are just HTML.
You can inspect element and see where the bug is.

In this case, they used &; when they clearly meant &;

ramsey, to random
@ramsey@phpc.social avatar

I found what I think is an edge-case bug in ICU. It’s unlikely to impact most folks, unless you’re trying to run the ECMA-402 test suite.

https://unicode-org.atlassian.net/browse/ICU-22765

lukiss, to random
@lukiss@sonomu.club avatar

( //
q=[
[[" ","│"," "],["╮","╰","┬"],["╰","┬","╯"]],
[["╭","╯","◦"],["┴","┬","─"],[" ","│"," "]],
[["o","│"," "],["─","╯","╭"],[" ","╭","╯"]],
[[" ","╰","╮"],["╮"," ","╰"],["╰","╮"," "]],
[["╭","┴","╮"],["┤","°","╰"],["╰","╮","◯"]],
[["╭","┴","╮"],["┤","O","├"],["╰","┬","╯"]],
];
8.do{
t=q.pyramid(3).scramble;
t[0].size.do{|l|
t.size.do{|n|t[0].size.do{|i|t[n][l][i].post}};
Post.nl
}
});

zirias, to FreeBSD
@zirias@bsd.cafe avatar

Hello bsd.cafe 🤩!

I finally did it and moved to a more appropriate "home realm" for a enthusiast. Thanks @stefano for offering this!

Moving followers worked flawlessly, restoring all my settings was pretty quick, but of course all my old toots are left on https://techhub.social/@zirias 🙈

So I guess I'll introduce myself here by writing a little thread, adding a few of my works that someone might find interesting. But first a bit of "who am I":

I'm a "professional" software architect/developer (mostly platform in the day job), FreeBSD hobby-admin and ports committer, fan (and occassionally coder and even musician), and apart from computers also interested in music (playing a few instruments myself), traveling, cooking, sometimes sports, sometimes politics ... but probably won't toot about any non-technical stuff (or, very very rarely).

zirias,
@zirias@bsd.cafe avatar

Also quite recent: . This is a very versatile converter for (and other "text") files to a format using and only standard escape sequences, so, suitable for today's terminals like . It includes an ansiart viewer which is "just" a shellscript, leveraging dos2ansi, xterm, less and some nice original fonts to do its job. So, maybe something for the fans.

https://github.com/Zirias/dos2ansi

Docs (manpages) are here:
https://zirias.github.io/dos2ansi/

As there was some interest, a port is available: https://www.freshports.org/converters/dos2ansi

Wuzzy, to Game
@Wuzzy@cyberplace.social avatar

3.15.1 is here!

Sign text will now finally be shown on the sign. Over 60,000 glyphs are supported. A new pole sign can stand on the ground, hang from the ceiling or a wall.

There's also a spyglass item, and moon phases have been added.

▶️ Release notes: https://forum.minetest.net/viewtopic.php?p=435725#p435725
▶️ ContentDB page: https://content.minetest.net/packages/Wuzzy/repixture/

SnoopJ, to random
@SnoopJ@hachyderm.io avatar
SnoopJ,
@SnoopJ@hachyderm.io avatar

NICE, it looks like FACE WITH BAGS UNDER EYES may have been approved, it's listed in the Emoji 16 alpha repertoire

https://www.unicode.org/L2/L2024/24112-pri498-emoji-v16-alpha.pdf

(Presumably it would have been decided as part of UTC but the agenda/minutes for that meeting aren't yet posted)

zirias, to FreeBSD
@zirias@techhub.social avatar

🚦🚥 ... ok it works 🌋

A super-simple keyboard for .

Well, I did have to fiddle with the keymap.

And I had to add delays 🤯👹 (otherwise there are races between keymap changes and keyboard events).

And I had to misuse the extension, cause applications ignore "synthetic" events. 🫥😣

But hey, it works 🕺

Now needs some basic, uhm, "features" (like recently used, like search by name).

https://github.com/Zirias/qxmoji

zirias,
@zirias@techhub.social avatar

v0.7 released!

https://github.com/Zirias/qxmoji/releases/tag/v0.7

This brings several improvements, mainly in the build system, but the major change is support for localization, with translated Emoji names imported from . I added a German translation, see screenshot. Once again, I'd appreciate more translations, the process to translate is documented here:
https://github.com/Zirias/qxmoji/blob/master/TRANSLATE.md

Updated FreeBSD port:
https://people.freebsd.org/~zirias/patches/0001-x11-qxmoji-Add-new-port.patch

SnoopJ, to mahjong
@SnoopJ@hachyderm.io avatar

my recent interest in has collided with my on-going interest in as I remember that the block U+1F000 through U+1F02B are allocated for encoding tiles

🀀🀁🀂🀃🀄🀅🀆🀇🀈🀉🀊🀋🀌🀍🀎🀏🀐🀑🀒🀓🀔🀕🀖🀗🀘🀙🀚🀛🀜🀝🀞🀟🀠🀡🀢🀣🀤🀥🀦🀧🀨🀩🀪🀫

this information has no practical use to me, but it's nice that the UCS represents them

Wuzzy, to gamedev
@Wuzzy@cyberplace.social avatar

Turns out rendering is hard. No, you can't just implement the Algorithm and call it a day. It turns out the Arabic letters/symbols/? have different forms depending on where they are in the word and probably there are other non-trivial features. Yeah, I guess I just postpone this.

So my 1000 IQ workaround for now is to just render all arabic characters as U+FFFD REPLACEMENT CHARACTER for now.

Wuzzy, to gamedev
@Wuzzy@cyberplace.social avatar

How should render (left) vs how it would render on signs (right) if I would re-enable it.

RTL support works, but that's not good enough. If I understand correctly, the glyphs also need to connect.

But then, even (the font I use) doesn't seem to have the neccessary glyph variants.

Wuzzy, to gamedev
@Wuzzy@cyberplace.social avatar

This is my sign test wall where I've been testing Unicode rendering on the signs in #Repixture, using various #Unicode strings.

🟩 green = OK
🟨 yellow = Unsupported, but renders as U+FFFD REPLACEMENT CHARACTER (also OK)
🟥 red = FAIL

I think I should maybe call it a day.

#GameDev #Minetest

blog, to cs
@blog@shkspr.mobi avatar

EBCDIC is incompatible with GDPR
https://shkspr.mobi/blog/2021/10/ebcdic-is-incompatible-with-gdpr/

Welcome to acronym city!

The Court of Appeal of Brussels has made an interesting ruling. A customer complained that their bank was spelling the customer's name incorrectly. The bank didn't have support for diacritical marks. Things like á, è, ô, ü, ç etc. Those accents are common in many languages. So it was a little surprising that the bank didn't support them.

The bank refused to spell their customer's name correctly, so the customer raised a GDPR complaint under Article 16.

The data subject shall have the right to obtain from the controller without undue delay the rectification of inaccurate personal data concerning him or her.

Cue much legal back and forth. The bank argued that they simply couldn't support diacritics due to their technology stack. Here's their argument (in Dutch - my translation follows)

Dutch text and a diagram.

Bank X also explained that the current customer data management application was launched in 1995 and is still running on a US manufactured mainframe system.
This system only supported EBCDIC ("extended binary-coded decimal interchange code"). This is an 8-bit standard for storing letters and punctuation marks, developed in 1963-1964 by IBM for their mainframes and AS/400 computers. The code comes from of the use of punch cards and only contains the following characters…

(Emphasis added.)

EBCDIC is an ancient (and much hated) "standard" which should have been fired into the sun a long time ago. It baffles me that it was still being used in 1995 - let alone today.

Look, I'm not a lawyer (sorry mum!) so I've no idea whether this sort of ruling has any impact outside of this specific case. But, a decade after the seminal Falsehoods Programmers Believe About Names essay - we shouldn't tolerate these sorts of flaws.

Unicode - encoded as UTF-8 - just works. Yes, I'm sure there are some edge-cases. But if you can't properly store human names in their native language, you're opening yourself up to a lawsuit.

Source

GDPRhub - 2019/AR/1006

Dance

Reactions

Très intéressant ! https://t.co/bRxEem8Rem

— Marie ʕʘᴥʘʔ Julien (@mariejulien) October 20, 2021

Hâte de mettre en justice tous les sites et autres compagnies qui ont décidé que le fait que j'ai un accent dans mon nom de famille soit source de bug (avec évidemment un message d'erreur qui n'a rien à voir. Histoire de bien pas comprendre pourquoi ça marche pas) https://t.co/ReIodsI1dh

— Grumpy Nat 🇨🇭🇧🇷🇲🇫 (@Nat_Keely) October 20, 2021

https://twitter.com/joachimesque/status/1450746564100730882

La France va sortir de l'UE juste pour que leur état-civil et autres administrations puissent continuer à ruiner la vie de quelqu'un parce qu'il a un tilde dans son nom https://t.co/i8FisgEEjD

— Lays Y. M. Farra (@LYMFHSR) October 20, 2021

Does this mean that Z̷̡̧̢̰͓̪͖̭͙̰̣̱̬̹̙̜̪̣̏̿̏̋͑́̒͑́̒̿̇̈̍̇̌͝͝a̵̡̧͍̘̮̤̙̹͙̦̙͙͖͓̥̟̦͔͒̇̊̊̔̓́͒́̌̈́̑͋̏̏̏̚͘͝͠͝l̶͉̯̱͇̭̭̉̉̈́̿͐̽̒̎̽͌̚͜ģ̸̧̛͙̩̹̰̤̱̖̘̻̪̻̮̫̟̙̲͍̰̻͕̗̫̿̆̃́͗̽̊̽̌̔̂͂̈͊̐̈́̈̈́̈̓̆͌̑́̕͜ǫ̶̢̹̥̮̟͍̔̑̔̽ can finally open a bank account? https://t.co/06cTjHxdgx

— KristoferA 🌏 (@KristoferA) October 20, 2021

Next up, I’m suing La Poste for still using ISO-8859-1 when printing labels. Poor “Frédéric” I recently sent a game to… https://t.co/Z7WuFY0QmK

— Bastien Nocera (@hadessuk) October 20, 2021

Eine Erschütterung der Macht, als würden Millionen Banken-ITler in panischer Angst aufschreien und dann verstummen. https://t.co/H0WokiIZnu

— Michael Büker 🇺🇦 (@emtiu) October 21, 2021

https://shkspr.mobi/blog/2021/10/ebcdic-is-incompatible-with-gdpr/

codepoints, to random
@codepoints@typo.social avatar

Hello Fediverse! At long last I finally made it here, too.

This account is here to talk all things , script‍s, encodings and languages, and as shortcut for you, if you want to give feedback on https://codepoints.net.

That website is there to help you make sense of the Unicode standard, so if you have feature ideas, just drop me a toot!

Apart from that I love to learn strange and niche news about everything related to written (and sometimes even spoken) language.

Edent, to php
@Edent@mastodon.social avatar

🆕 blog! “Where you can (and can't) use Emoji in PHP”

I was noodling around in PHP the other day and discovered that this works: <?php $🍞 = "bread"; echo "Some delicious " . $🍞; I mean, there's no reason why it shouldn't work. An emoji is just a Unicode character (OK, not just a character - but we'll get on to that), so it should […]

👀 Read more: https://shkspr.mobi/blog/2024/04/where-you-can-and-cant-use-emoji-in-php/

codepoints, to random
@codepoints@typo.social avatar

Hey! A new blog post!

https://blog.codepoints.net/emojis-under-the-hood.html

Emojis under the Hood

in which I explain how are composed on the code point layer, and what funny effects that sometimes has.

With notable mentions of work by @CharlotteBuff, @Edent, @eevee, @mathias, and @emojipedia.

blog, to php
@blog@shkspr.mobi avatar

Where you can (and can't) use Emoji in PHP
https://shkspr.mobi/blog/2024/04/where-you-can-and-cant-use-emoji-in-php/

I was noodling around in PHP the other day and discovered that this works:

<?php$🍞 = "bread";echo "Some delicious " . $🍞;

I mean, there's no reason why it shouldn't work. An emoji is just a Unicode character (OK, not just a character - but we'll get on to that), so it should be fine to use anywhere.

Emoji work perfectly well as function names:

function 😺🐶() {   echo "catdog!";}😺🐶();

Definitions:

define( "❓", "huh?" );echo ❓;

And, well, pretty much everywhere:

class 🦜{    public int $🐦;    public ?string $🦃;    public function __construct(int $🐦, ?string $🦃)    {        $this->🐦 = $🐦;        $this->🦃 = $🦃;    }}$🐓 = new 🦜(1234, "birb");echo $🐓->🐦;

How about namespaces? Yup!

namespace 😜;class 😉 {    public function 😘() {        echo "Wink!";    }}use 😜😉;$😊 = new 😉();$😊->😘();

Even moderately complex Unicode sequences work:

echo <<<🏳️‍🌈Unicode is magic!🏳️‍🌈;

I've written before about the Quirks and Limitations of Emoji Flags. The humble 🏳️‍🌈 is actually the sequence U+1F3F3 (white flag), U+FE0F (Variation Selector 16), U+200D (Zero Width Joiner), U+1F308 (Rainbow).

Take a complex emoji like "Female Astronaut with Medium Dark Skin Tone" - 🧑🏾‍🚀 - that also works!

$🧑🏾‍🚀 = 1;$👷🏻‍♂️ = 2;echo $🧑🏾‍🚀 + $👷🏻‍♂️;

Probable the most complex emoji has 10 different codepoints! It looks like this - 🧑🏾‍❤️‍💋‍🧑🏻

And it works!

$🧑🏾‍❤️‍💋‍🧑🏻 = "Kiss Kiss. Bang Bang!";echo $🧑🏾‍❤️‍💋‍🧑🏻[-1];

There are some emoji which don't work;

$5️⃣ = "five";

The 5️⃣ emoji is U+0035 (Digit Five), U+FE0F (Variation Selector 16), U+20E3 (Combining Enclosing Keycap). PHP doesn't allow variables to start with digits, so it craps out with PHP Parse error: syntax error, unexpected integer "5", expecting variable or "{" or "$" in php shell code on line 1

You also can't use "punctuation" emoji as though they were normal characters:

echo 5 ❗= 6;

And, while not strictly emoji, you can't use mathematical symbols:

echo 5 ≤ 6;

So, there you have it. Is this useful? Well, probably. It is easy to get lost in a sea of text - so little pictograms can make it easier to see what you're doing. If the basic ASCII characters aren't part of your native language, perhaps it is useful to make use of the full range of Unicode.

Does your favourite programming language support Emoji?

https://shkspr.mobi/blog/2024/04/where-you-can-and-cant-use-emoji-in-php/

  • All
  • Subscribed
  • Moderated
  • Favorites
  • anitta
  • thenastyranch
  • rosin
  • GTA5RPClips
  • osvaldo12
  • love
  • Youngstown
  • slotface
  • khanakhh
  • everett
  • kavyap
  • mdbf
  • DreamBathrooms
  • ngwrru68w68
  • megavids
  • magazineikmin
  • InstantRegret
  • normalnudes
  • tacticalgear
  • cubers
  • ethstaker
  • modclub
  • cisconetworking
  • Durango
  • provamag3
  • tester
  • Leos
  • JUstTest
  • All magazines