How Do You Say “Dot Com” in Chinese?

Here you see two URLs, or internet addresses: <http://www.dhs.gov> and <http://www.dhs.gοv>. One of them is the homepage for the U.S. Department of Homeland Security and one of them is not. Can you tell which one? Squint hard. Confused? You should be. But should you be worried? The first web address is the correct URL for the DHS homepage. The second address goes nowhere, but in the near future it might lead an unsuspecting web user to dangerous, nasty places. While you could never tell with the naked eye, the “o” in the “.gov” of that second address is actually the lower-case Greek letter Omicron. Try copying and pasting that into your web browser, and it will take you to < http://www.dhs.xn--gv-jbc/>, a location which is impossible because of the necessity for top level domain names (e.g. “.com”, “.org”, etc.) to be written in the letters of the Latin alphabet. But the impossible has a knack for becoming possible.

Since 1998, the design and control of the Internet’s Domain Name System has been under ICANN, the Internet Corporation for Assigned Names and Numbers. Prior to the establishment of ICANN, these functions were handled directly by the U.S. Government. ICANN’s authority over the Internet was granted by contract with the U.S. Department of Commerce, and since then the organization has held a somewhat ambiguous and controversial status in the eyes of net users and the international community. Despite constant criticism of ICANN’s U.S.-centric position, the organization has implemented programs to increase international access to, and “democratize” the Internet. On November 16, 2009, ICANN began accepting applications for Country Code Top Level Domain names in characters other than the 26 letters of the Latin alphabet [1, 2]. For example, the country code TLD for China is <.cn>. The Chinese, who have been among the most vociferous in their demand for non-Latin domain names, would be able to petition ICANN for a Country Code Top Level Domain name, such as <.中国>, which means “China.” China applied for this name on the very first day, and ICANN believes that the first of these Country Code TLD’s will be up and running by Spring of 2010 [3].

International Domain Names are nothing new. The Japanese have hosted web addresses in their written language as far back as 2000 [4]. The big news is that for the first time, the Top Level Domain name (e.g. “.com”) can be in a non-Latin script. For the first time, an entire web address can be written in any language. Though ICANN is only approving one TLD for each national government which applies, plans are on the table to allow infinitely many private non-Latin TLDs in the near future [5, 6]. The mechanics are rather simple. All characters used in English are represented by the 8-bit ASCII format. A character in any other language can be understood as an invisible chain of ASCII characters, as demonstrated by the Omicron example above.

With any major change in Internet architecture come critics, often with quite valid points. The dhs.gov scenario above is an example of an IDN Homograph Attack. Homographs are characters from different languages which appear identical or similar. The Greek letter Omicron, “o,” and the Cyrillic letter “a,” as used in Russian, are popular examples of glyphs which are identical to their Latin counterparts, but are defined differently in Unicode, the standard by which international character sets are translated into ASCII. A computer understands the lower-case English letter “a” simply as “a,” or more accurately, as a series of eight binary digits representing “a.” But in Russian, the computer understands the lower-case Cyrillic letter “а” as “xn--80a.” An IDN homograph attack occurs when a web address which appears to be written in English, such as “ebay.com” is actually spelled with the Russian letter “a,” leading the oblivious web surfer to a malicious website where viral infection and identity theft can occur [7].

As Top Level Domain names must be approved by ICANN, it is unlikely that a homograph attack scenario like the “.gov” example above, could occur any time soon. However, there is such a thing as an Alternative Domain Name System, over which ICANN has no authority or capability to police. One such system is quite popular in Thailand. A homograph attack on a TLD as trusted as “.gov” could have disastrous consequences and might be slow to recognize. Popular browsers like Internet Explorer are already solving these kinds of problems by always displaying web addresses in their ASCII conversion instead of in the original non-Latin characters. Unfortunately, Grandma is probably not very literate in Unicode, and might not be sensitive to her browser’s warnings about questionable URLs.

Another problem that comes with internationalized web addresses is the potential Balkanization of the Internet. With the move away from English as a standard, the Internet could possibly split into many “Internets.” The globally unifying power of the web could be shattered if people can only access websites originating in their language sphere. With non-Latin Country Code TLDs, administered by governments, comes the possibility that censors, such as those in China, would have an even tighter lock on content production and access within their borders.

But aren’t web URLs obsolete anyway? In the same way that people no longer have need to remember their friends’ phone numbers, web surfing is increasingly done by an endless chain of linking, clicking and Googling, without ever knowing or typing any web address. URLs, like phone numbers, have retreated into the invisible depths of the machine. Problems such as those addressed in this article can be easily overcome with the right software. The linguistic diversity of the Internet itself will increase over the next decade, of course. But the only true barrier to connectedness between those language communities will be each person’s decision to actually learn a different language.

References
1. Singel, Ryan. “Web Addresses Now Can Be All Greek to You, ICANN Rules | Epicenter | Wired.com.” Wired News . http://www.wired.com/epicenter/2009/10/icann-international-scripts/ (accessed November 14, 2009).
2. Olson, Kelly. “Internet Set To Add Web Addresses In Non-English Characters.” Breaking News and Opinion on The Huffington Post. http://www.huffingtonpost.com/2009/10/26/international-domain-name_n_333449.html (accessed November 29, 2009).
3. China Daily. ” China to apply for ‘Zhongguo’domain name.” China Daily Website. http://www.chinadaily.com.cn/china/2009-11/16/content_8979647.htm (accessed November 21, 2009).
4. Matsutani, Minoru. “Japanese URLs no big deal.” The Japan Times Online. search.japantimes.co.jp/cgi-bin/nn20091107f2.html (accessed November 14, 2009).
5. Ramstad, Evan, and Jaeyeon Woo. “Web Addresses to Adopt New Alphabets – WSJ.com.” Business News & Financial News – The Wall Street Journal – WSJ.com. http://online.wsj.com/article/SB125664117322309953.html (accessed November 14, 2009).
6. Whitney, Lance. “ICANN approves non-Latin domain names | Digital Media – CNET News.” Technology News – CNET News. http://news.cnet.com/8301-1023_3-10387139-93.html?tag=mncol (accessed November 14, 2009).
7. Weber, Chris. “IDN Homograph Attacks :: IDN News.” IDN News. http://www.idnnews.com/?tag=idn-homograph-attacks (accessed November 14, 2009).

You May Also Like