ASCII、UTF8、Uncicode编码下的中英文字符大小
?ASCII不能保存中文
?UTF8是变长编码。在对ASCII字符编码时,UTF更省空间,只占1个字节,与ASCII编码方式和长度相同;Unicode在对ASCII字符编码时,占用2个字节,且第2个字节补零。
?UTF8在对中文编码时需要占用3个字节;Unicode对中文编码则只需要2个字节。
代码示例:
1private static void ShowCode() {
2string[] strArray = { "b", "abcd", "乙", "甲乙丙丁" };
3byte[] buffer;
4string mode, back;
5
6foreach (string str in strArray) {
7
8for (int i = 0; i <= 2; i++) {
9if (i == 0) {
10 buffer = Encoding.ASCII.GetBytes(str);
11 back = Encoding.ASCII.GetString(buffer, 0,
buffer.Length);
12 mode = "ASCII";
13 } else if (i == 1) {
14 buffer = Encoding.UTF8.GetBytes(str);
15 back = Encoding.UTF8.GetString(buffer, 0,
buffer.Length);
16 mode = "UTF8";
17 } else {
18 buffer = Encoding.Unicode.GetBytes(str);
19 back = Encoding.Unicode.GetString(buffer, 0, buffer.Length);
20 mode = "Unicode";
21 }
22
23 Console.WriteLine("Mode: {0}, String: {1}, Buffer.Length: {2}",
24 mode, str, buffer.Length);
25
26 Console.WriteLine("Buffer:");
27for (int j = 0; j <= buffer.Length - 1; j++) {
28 Console.Write(buffer[j] + "");
29 }
30
31 Console.WriteLine("\nRetrived: {0}\n", back);
32 }
33 }
34 }
运行结果:
1 Mode: ASCII, String: b, Buffer.Length: 1
2 Buffer: 98
3 Retrived: b
4
5 Mode: UTF8, String: b, Buffer.Length: 1
6 Buffer: 98
7 Retrived: b
8
9 Mode: Unicode, String: b, Buffer.Length: 2
10 Buffer: 980
11 Retrived: b
12
13 Mode: ASCII, String: abcd, Buffer.Length: 4
14 Buffer: 979899100
15 Retrived: abcd
16
17 Mode: UTF8, String: abcd, Buffer.Length: 4
18 Buffer: 979899100
19 Retrived: abcd
20
21 Mode: Unicode, String: abcd, Buffer.Length: 8
22 Buffer: 9709809901000
23 Retrived: abcd
24
25 Mode: ASCII, String: 乙, Buffer.Length: 1
26 Buffer: 63
27 Retrived: ?
28
29 Mode: UTF8, String: 乙, Buffer.Length: 3
30 Buffer: 228185153
31 Retrived: 乙
32
33 Mode: Unicode, String: 乙, Buffer.Length: 2
34 Buffer: 8978
35 Retrived: 乙
36
37 Mode: ASCII, String: 甲乙丙丁, Buffer.Length: 4
38 Buffer: 63636363
39 Retrived: ????
40
41 Mode: UTF8, String: 甲乙丙丁, Buffer.Length: 12
42 Buffer: 231148178228185153228184153228184129
43 Retrived: 甲乙丙丁
44
45 Mode: Unicode, String: 甲乙丙丁, Buffer.Length: 8
46 Buffer: 5011789782578178
47 Retrived: 甲乙丙丁
得出结论:
1 ASCII不能保存中文(貌似谁都知道=_-`)。
2UTF8是变长编码。在对ASCII字符编码时,UTF更省空间,只占1个字节,与ASCII编码方式和长度相同;Unicode在对ASCII字符编码时,占用2个字节,且第2个字节补零。
3UTF8在对中文编码时需要占用3个字节;Unicode对中文编码则只需要2个字节。