Wednesday, April 15, 2009

Ghost Character U+38F8

John D. Cook at The Endeavour wrote a piece about how to shorten URL by using unicode. In one particular example, the unicode happened to be 38F8 and this is what appeared:


http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=38F8

Although Unihan.org has catalogged this character into its database, there aren't additional linguistic information.

Characters likes these are known as "ghost characters", where they only exist basing on unicode algrithm but not used in linguistic sense.

Matter of fact, Unihan Grid Index has many of these:

http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=3403


http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=351B


http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=360F

http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=208D5

No comments:

Post a Comment