E D R , A S I H C RSS

Unicode



1. Unicode

In computing, Unicode provides an international standard which has the goal of providing the means to encode the text of every document people want to store on computers. This includes all scripts in active use today, many scripts known only by scholars, and symbols which do not strictly represent scripts, like mathematical, linguistic and APL symbols.

Establishing Unicode involves an ambitious project to replace existing character sets, many of them limited in size and problematic in multilingual environments. Despite technical problems and limitations, Unicode has become the most complete character set and one of the largest, and seems set to serve as the dominant encoding scheme in the internationalization of software and in multilingual environments. Many recent technologies, such as XML, the Java programming language as well as several operating systems, have adopted Unicode as an underlying scheme to represent text.
'''from wikipedia.org'''

3. thread

�� ����는 �� .
�� 대부� 리눅�� ����리 ��UTF-8 ����� ����� �������. ��� 만들����는 모� ������는 모�� UTF-8 �����. �놈, KDE �������� 매��들� ���� EUC를 ������� ������만, ������뿐 내부��는 UTF-8��� �. ���� UTF-8��������� 대���.
MultiLinugual ��랫�� �����램 ����면 ����� ����. - eternalbleu

4. �����

UNICODE :

http://www.unicode.org/standard/translations/korean.html

����� 대� ?
��떤 ��랫��,
��떤 �램,
��떤 ������� ���������는 모����� 대�를 �������.


UCS-2 :

대부� �� ��는 문��들  ��������.
2bytes 범 UCS-2 ������.
�� bit � �� UTF-16 ������.
UTF-16LE, UTF-16BE �� 동 ���� Little Endian, Big Endian � ��� byte order (바�� )�� ��를뿐 ������.
iconv --list 를 �보면 ��데��� 많� ��는데,
UTF-16LE, UCS-2LE �� ���� BE 끼리 ����� 보면 됩����.
�냥 UTF-16� UTF-16LE �� 동 �� BOM ��더�� 붙������.
UCS-2 는 ��더�� 붙�� ������.

UCS-4 :

UCS-2  ��������.
뒤 2bytes 는 UCS-2 �� �� 됩����.
��, UCS-2  0xFFFF 는 UCS-4  0x0000FFFF �� �����������.
UTF-32 � 말만 바���� ���� 동�����.
�� 브���� 내부�� �������며,
js 등�� indexOf() � ���면 UCS-4 ����� 10됩����.
10 �므� 65535 ����는 UCS-2 ��   됩����.

UTF-8 :

UCS-2, UCS-4 는 ������는 � 낭��� ��������.
ascii 만�� �����  ��능데, ������ ����� �는 바���� 낭�����.
�런�� ��, 문����� ���� UTF-7 보��는 ��리 ���� 보���� ����됩����.
��변���를 ����는 ����������.
��� ��만�� UCS-2, UCS-4 ���� ���변��능�����.
����는 UCS-2 �� 내�� ���� ��문�� 3bytes 내��  ��능�����.

UTF-7 :

�메� 등 ascii 만��  ���� � 만들����������.
 ������ 8bit �� ��만 ����� 7bit 만 �����.
UTF-8 �� ���� ���� ����,
모� ascii �� ascii �� ���� �� ��문�� ������ ��는 �� 듭����.


BOM (Byte Order Mark) :

��� ����� 많�� ��문�� �� �런 ��더를 붙�� ������ �����.
EmEditor, UltraEdit, Vim 등 ��디�� �����.


���

http://www.unicode.org/charts/

�� ��� ����� ��를 볼  ��������.
�� 0  �� �� ��문�� (Zerofill �� ��문��) 4��리����는 UCS-2 ���,
5 ��리 부�는 UCS-4 ����.

		
resy	��리말� ��� �������면 ��는데..
�런 ��료��� �� ����. ��� ��� = UTF-16(or ��른 ��딩) �� ������람� 많��...

����� 대를 매�� 는데... ����... ^^:	07/13 2:23:12 ����� ������
		
resy	보�� 내����...
UCS 는 ����� ���블�� ��면 됩����. UTF 는 ��딩 방�(��, 바�� ����� ��떻��  ���냐 ), UCS 는 미리 �� ��는 �� ���� ������블 �� �놓��������. ������� '��' 는 ����� U+AC00 �� ��는데, UCS2 ��는 0xAC00 ���블 ����  ��������. �� UTF-8 ��딩면, 0xEAB080 � 됩����.

���� �런 �� �명��는 ��람���� �� UCS2 = UTF16?? �� ��리 ��맸는데, ��  �명�� 모르������. 못� ���면 ��� ������... ^^;

문�� �(Character Set)�� ��딩(Encoding)�� 대 ���������� ��르����는 데�� ��더����. ���� ����보������ �� ���������만.. �� ���� ��료 빼면 ��내는 -_-;

�러보��� ������� ���� ����� ��딩����는 �� 대�� ���� �� ���데, locale �� 대�� ���는 �� �����... 	07/13 5:19:40 ����� ������
		
utf	utf -8��. ���� ascii문��만 ��는 ���� ������ �는 ��번�� 바��를 ���� �� �����.  ���� ascii 문����  �� �대�� �  ��������. 목����는 ucs ��릭�� 2 ��는 4바��� �루����는데 �� ����링�� �� ��놓 보면 �� ��(0x00)� 들����  ��������. �를 들�� '��'는 0xac00�데 � �� 바�� ��문�� ����링 �� ��란�� 됩����. ��� 바��를 �� ��는 ��딩 ��� �� � ��.	07/13 23:22:49 ����� ������
		
resy	�� ��대� ����  ��  UTF-8 �� ����� �����. �� 문���� 들������ ������. 대�� HTTP ������ 방�며, �� 데�� �� 문���� 들����  ��.

UTF-8 �� 대 ����리는  ��면 볼  ��������.
http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt 	07/13 23:58:19 ����� ������
		
�	utf ��� 부�� utf7 만�� ��������� :)

asc 문�� 만�� ���는 문���� ��람들���� utf16,32 를 ����봐�����  ��.. euc 등 ��딩�� unicode �����는 ��� �란 �� ��  �������딩�� 보는�� 더 ���� �� ����...

5. �� ����

Valid XHTML 1.0! Valid CSS! powered by MoniWiki
last modified 2021-02-07 05:28:20
Processing time 0.0161 sec