>> How does ThaiURL work?
  
- ประวัติความเป็นมา
- หลักการและกรรมวิธี
- คุณลักษณะเด่น
- บริการในเชิงพาณิชย์
- ประโยชน์ที่ได้รับ
Background

With the approach to develop an application complementing to the existing domain name system without modifying any DNS servers, the converting of Thai character (non-ASCII) to ASCII character code is required before sending any DNS queries to standard DNS servers.

ThaiURL has assessed several encoding techniques discussed in IDN-WG (Internationalized Domain Names Working Group) at IETF (Internet Engineering Task Force) with careful consideration in selecting technique that is best suit for Thai script and flexible to expand into other language scripts as well.

Consequently, ThaiURL decided to use the Row-based ASCII Compatible Encoding (RACE) format in our application and will continue working through IDN-WG to support on this transformation method for representing non-ASCII characters in domain names in a fashion that is completely compatible with the current DNS.

In order to support our method selection, some different encoding techniques are summarized below:

Domain Names in ASCII-compatible encoding (ACE)*

ACE-1.1: Describes UTF-5, which is a fairly direct encoding of ISO 10646 characters using a system similar to UTF-8. Characters from Basic Latin and Latin-1 Supplement take 2 octets; Latin Extended-A through Tibetan take 3 octets; Myanmar through the end of BMP take 4 octets; non-BMP characters take 5 octets. This means that names using all characters in the Myanmar through the end of BMP are limited to 15 characters.
Pro: Extremely simple
Con: Poor compression, particularly for Asian scripts

ACE-1.2: Describes RACE, which is a two-step algorithm that first compresses the name part, then converts the compressed string into and ACE. Name parts in all scripts other than Han, Yi, Hangul syllables, Ethiopic, and non-BMP take up ceil(1.6*(n+1)) octets; name parts in those scripts and any name that mixes characters from different rows in ISO 10646 take up ceil(3.2*(n+1)) octets. This means that names using Han, Yi, Hangul syllables, or Ethiopic, are limited to 18 characters.
Pro: Best compression for most scripts, and similar compression for the scripts where it is not the best
Con: More complicated than UTF-5. Not well optimized for names that have mixed scripts, such as non-Latin names that use hyphen or ASCII digits

* Excerpted from IETF Internet Draft “Comparison of Internationalized Domain Name Proposals” - July 2000.

Transformation Process

To convert a Thai domain name into an ASCII-compatible domain name, ThaiURL program will perform the following steps:

Input: ชื่อไทย.คอม

UniCode (Pre-Compressed String): 0e0a 0e37 0e48 0e2d 0e44 0e17 0e22 . 0e04 0e2d 0e21

Compressed String: 0e0a37482d441722.0e042d21

Binary Conversion: 0000 1110 0000 1010 0011 0111 0100 1000 0010 1101 0100 0100 0001 0111 0010 0010 . 0000 1110 0000 0100 0010 1101 0010 0001

Base32 Conversion I: 00001 11000 00101 00011 01110 10010 00001 01101 01000 10000 01011 10010 00100 . 00001 11000 00010 00010 11010 01000 01000

Base32 Conversion II: byfdosbniqlse.bycc2ii

Append .net (Output): byfdosbniqlse.bycc2ii.net

This is the ASCII-compatible domain name that can be used in standard DNS resolution.
 
     
ThaiURL.com © All Rights Reserved. Copyright 1999-2019 Please read our disclaimer