mia, so, you've seen ™ and ™️ before. but like. why are there two. well, i have an explanation! the answer is:
FE0F
first, unicode. unicode is a standard definition of a bunch of codepoints, where a codepoint is just a number with meaning. for example, unicode codepoint
U+263A
refers to ☺︎, or "White Smiling Face", andU+1F431
refers to 🐱, or "Cat Face"so, lets start by looking at the codepoints for ™. decoding it, it becomes the codepoint
U+2122
, referred to as "Trade Mark Sign". this was added in unicode 1.1 in 1993, a decent time ago!next, the codepoints for ™️. decoding it, we get two codepoints!
U+2122
(™︎) andU+FE0F
. wait. who isFE0F
. why is he in my emojiwell, unicode isn't as simple as a series of codepoints that refer to single characters. take a look at
é̗
for example. this is three codepoints,U+0065
(Latin Small Letter E),U+0301
(Combining Acute Accent), andU+0317
(Combining Acute Accent Below). the first codepoint is simple enough, it's juste
. the next two, however, are combining codepoints. this means that they combine with the codepoint before them to modify it.U+0301
adds an acute accent above the previous codepoint, andU+0317
adds an acute accent below the previous codepoint. this example specifically isn't very useful (i don't know any language with aé̗
character beyond conlangs), but it becomes very useful for languages that use a lot of diacritics. imagine if we had to make a new set of characters for each set of possible diacritics! big waste of space, we shouldn't have done that!so, what is
U+FE0F
? well, it's a special codepoint called "Variation Selector-16". variation selectors are a reserved block of 16 unicode codepoints. only some have been defined, but among those currently in use areU+FE0E
(VS15) andU+FE0F
(VS16). from wikipedia: "VS15 and VS16 are reserved to request that a character should be displayed as text or as an emoji respectively." so, what's happening with ™️ is that it's combining aU+2122
(™) and aU+FE0F
(Variant Selector-16) to create an emoji version of ™. they're the same character, just that one has been instructed to become an emoji!also, for the interested, here's the word "unicode" with a shit ton of combining characters: ù́̂̃̄̅̆̇̈̉n̖̗̘̙̐̑̒̓̔̕i̡̢̧̨̠̣̤̥̦̩c̴̵̶̷̸̰̱̲̳̹ò͇͈͉́͂̓̈́͆ͅd͓͔͕͖͙͐͑͒͗͘eͣͤͥͦͧͨͩ͢͠͡. what appears to be seven letters is actually 77 codepoints, taking up 147 bytes when encoded in utf-8. or 156 in utf-16. or 312 in utf-32. why does anyone use utf-16 if it's longer? historical reasons :3
TL;DR: ™️ is ™︎ but instructed to be an emoji