Understanding Unicode with a Nyan translator

Hello, this is Dacer from Progate with the 21st day of the Progate Advent Calendar.

Nyа̅nNyа̄nNyа̃nNyа̎n NyӓnNyа̏nNyа̂nNyа̉n Nyа̃nNyanNyӑnNyа̏n Nyа̇nNyа̃nNyа̂nNyа̋n Nyа̃nNyanNyӑnNyа̇n Nyа̃nNyanNyа̄nNyа̂n Nyа̃nNyanNyӓnNyа̋n

You may be wondering what the above words mean, I'll tell you the answer at the end of this article.

In the beginning

I recently found an interesting text encryption and decryption tool*1 that can turn any sentence into some о̎古s and the generated text can be restored back with this tool. Let's make a Nyan translator similar to it:

What is Unicode?

To make a Nyan translator, you first need to know what Unicode is.

Unicode is the universal character encoding, maintained by the Unicode Consortium. This encoding standard provides the basis for processing, storage and interchange of text data in any language in all modern software and information technology protocols.

In other words, Unicode is a standard which maps the characters in all languages to a particular numeric value called Code Points or Unicode numbers.*2

For example, if a program is told that it needs to display a character with a Unicode number of U+611B, the word "愛" will be displayed on the screen. (Unicode number is hexadecimal, so here is a "B" inside U+611B)

Let's make a Nyan translator

The first step we need to do is to turn some words into a bunch of Nyan.

export function translateToNyan(str = '') {
    const unicodeNumArray = Array.from(str).map((i: string) => {
        return i.charCodeAt(0).toString(16)
    })
    const texts = unicodeNumArray.map(unicodeNum => {
        return Array.from(unicodeNum).map(s => {
            if (s !== '0') {
                return 'Ny' + String.fromCharCode(parseInt('0430', 16)) + String.fromCharCode(parseInt('030' + s, 16)) + 'n'
            } else {
                return 'Nyan'
            }
        }).join('')
    })
    return texts.join(' ')
}

First, we turn the words to be converted into an array containing the Unicode numbers, e.g. 今日 will become ["4eca", "65e5"]. Then we turn each digit of the Unicode numbers into Nyan.

The point of the transformation is:

String.fromCharCode(parseInt('0430', 16)) + String.fromCharCode(parseInt('030' + s, 16))

The result of String.fromCharCode(parseInt('0430', 16)) is the character "а" corresponding to U+0430, but please note that this is not the common Latin word "a" (U+0061).

String.fromCharCode(parseInt('030' + s, 16)) will return a character from U+0301 to U+030F depending on what words you typed. Characters from U+0301 to U+030F are called Combining Diacritical Marks, Combining Diacritical Marks are special characters that can be combined with the Cyrillic word "а" (U+0430) to make symbols like а̅, а̄ or а̎. After this conversion, we can get something like Nyа̅nNyа̄nNyа̃nNyа̎n.

Converting back to the human language is very simple, we just need to find the symbols that like а̅, а̄ or а̎, and revert them to the Unicode numbers of the words you typed before.

Final

Finally, it's time to reveal the answer. The meaning of the Nyan words at the beginning is:

吾輩は猫である

You can try this Nyan translator at https://nyan.dacer.im/ , or check the code at https://github.com/dacer/NyanTranslator.