<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
  <channel>
    <title>yayv</title>
    <description></description>
    <link>http://yayv.javaeye.com</link>
    <language>UTF-8</language>
    <copyright>Copyright 2003-2008, JavaEye.com</copyright>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <generator>JavaEye - 做最棒的软件开发交流社区</generator>
          <item>
        <title>D语言的UTF支持</title>
        <author>yayv</author>
        <description>
          <![CDATA[
          <br/>
          作者: <a href="http://yayv.javaeye.com">yayv</a>&nbsp;
                    链接：<a href="http://yayv.javaeye.com/blog/73856" style="color:red;">http://yayv.javaeye.com/blog/73856</a>&nbsp;
          发表时间: 2007年04月24日
          <br/><br/>
          声明：本文系JavaEye网站发布的原创博客文章，未经作者书面许可，严禁任何网站转载本文，否则必将追究法律责任！
          <br/><br/>
          &nbsp;&nbsp;&nbsp; 这几天在做一个读取UTF8文件的功能，其中包括中英文。<br />
&nbsp; <br />
&nbsp;&nbsp; 在UTF8中，不同的字符集下的字符宽度是不等的，比如 英文和符号都用1个Byte来表示，中文就需要用2-3个byte来表示，因此在从一个UTF8字符串中获取一个字符的时候就需要先判断该字符应该占几个字节。<br />
<br />
&nbsp;&nbsp; 根据UTF8 编码, 首字节的编码包含了整个字符占用几个字节的 信息，参见下表(右边一栏为UTF8编码)<br />
<br />
<div align="center"><center>
<table border="1">
    <tbody>
        <tr>
            <td>U-00000000 - U-0000007F: </td>
            <td>0<em>xxxxxxx</em> </td>
        </tr>
        <tr>
            <td>U-00000080 - U-000007FF: </td>
            <td>110<em>xxxxx</em> 10<em>xxxxxx</em> </td>
        </tr>
        <tr>
            <td>U-00000800 - U-0000FFFF: </td>
            <td>1110<em>xxxx</em> 10<em>xxxxxx</em> 10<em>xxxxxx</em> </td>
        </tr>
        <tr>
            <td>U-00010000 - U-001FFFFF: </td>
            <td>11110<em>xxx</em> 10<em>xxxxxx</em> 10<em>xxxxxx</em> 10<em>xxxxxx</em> </td>
        </tr>
        <tr>
            <td>U-00200000 - U-03FFFFFF: </td>
            <td>10<em>xxxxxx</em> 10<em>xxxxxx</em> 10<em>xxxxxx</em> 10<em>xxxxxx</em> </td>
        </tr>
        <tr>
            <td>U-04000000 - U-7FFFFFFF: </td>
            <td>1111110<em>x</em> 10<em>xxxxxx</em> 10<em>xxxxxx</em> 10<em>xxxxxx</em> 10<em>xxxxxx</em> 10<em>xxxxxx</em> </td>
        </tr>
    </tbody>
</table>
</center></div>
xxx 的位置由字符编码数的二进制表示的位填入. 越靠右的 x 具<br />
<br />
知道了这些知识，基本就可以自己写段代码出来来截取所需要的字符个数了。<br />
<div class="code_title">D 代码</div>
<div class="dp-highlighter">
<div class="bar">&nbsp;</div>
<ol class="dp-j" start="1">
    <li class="alt"><span><span class="keyword">int</span><span>&nbsp;u8CharWidth(</span><span class="keyword">char</span><span>[]&nbsp;u8string, </span><span class="keyword">uint</span><span>&nbsp;start)&nbsp;&nbsp;&nbsp;&nbsp;</span></span></li>
    <li class=""><span>{&nbsp;&nbsp;&nbsp;&nbsp;</span></li>
    <li class="alt"><span>&nbsp;&nbsp;&nbsp; <span class="keyword"></span><span><br />
    </span></span></li>
    <li class=""><span>&nbsp;&nbsp;</span></li>
    <li class="alt"><span>&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">if</span><span>((u8string[start]&nbsp;&amp;&nbsp;</span><span class="number">0xFE</span><span>)&nbsp;==&nbsp;</span><span class="number">0xFC</span><span>)&nbsp;&nbsp;&nbsp;&nbsp;</span></span></li>
    <li class=""><span>&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;</span></li>
    <li class="alt"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">return</span><span>&nbsp;</span><span class="number">6</span><span>;&nbsp;&nbsp;&nbsp;&nbsp;</span></span></li>
    <li class=""><span>&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;</span></li>
    <li class="alt"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span></li>
    <li class=""><span>&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">if</span><span>((u8string[start]&nbsp;&amp;&nbsp;</span><span class="number">0xFC</span><span>)&nbsp;==&nbsp;</span><span class="number">0xF8</span><span>)&nbsp;&nbsp;&nbsp;&nbsp;</span></span></li>
    <li class="alt"><span>&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;</span></li>
    <li class=""><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">return</span><span>&nbsp;</span><span class="number">5</span><span>;&nbsp;&nbsp;&nbsp;&nbsp;</span></span></li>
    <li class="alt"><span>&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;</span></li>
    <li class=""><span>&nbsp;&nbsp;&nbsp;&nbsp;</span></li>
    <li class="alt"><span>&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">if</span><span>((u8string[start]&nbsp;&amp;&nbsp;</span><span class="number">0xF8</span><span>)&nbsp;==&nbsp;</span><span class="number">0xF0</span><span>)&nbsp;&nbsp;&nbsp;&nbsp;</span></span></li>
    <li class=""><span>&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;</span></li>
    <li class="alt"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">return</span><span>&nbsp;</span><span class="number">4</span><span>;&nbsp;&nbsp;&nbsp;&nbsp;</span></span></li>
    <li class=""><span>&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;</span></li>
    <li class="alt"><span>&nbsp;&nbsp;&nbsp;&nbsp;</span></li>
    <li class=""><span>&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">if</span><span>((u8string[start]&nbsp;&amp;&nbsp;</span><span class="number">0xF0</span><span>)&nbsp;==&nbsp;</span><span class="number">0xE0</span><span>)&nbsp;&nbsp;&nbsp;&nbsp;</span></span></li>
    <li class="alt"><span>&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;</span></li>
    <li class=""><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">return</span><span>&nbsp;</span><span class="number">3</span><span>;&nbsp;&nbsp;&nbsp;&nbsp;</span></span></li>
    <li class="alt"><span>&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;</span></li>
    <li class=""><span>&nbsp;&nbsp;&nbsp;&nbsp;</span></li>
    <li class="alt"><span>&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">if</span><span>((u8string[start]&nbsp;&amp;&nbsp;</span><span class="number">0xE0</span><span>)&nbsp;==&nbsp;</span><span class="number">0xC0</span><span>)&nbsp;&nbsp;&nbsp;&nbsp;</span></span></li>
    <li class=""><span>&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;</span></li>
    <li class="alt"><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">return</span><span>&nbsp;</span><span class="number">2</span><span>;&nbsp;&nbsp;&nbsp;&nbsp;</span></span></li>
    <li class=""><span>&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;</span></li>
    <li class="alt"><span>&nbsp;&nbsp;&nbsp;&nbsp;</span></li>
    <li class=""><span>&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">if</span><span>((u8string[start]&nbsp;&amp;&nbsp;</span><span class="number">0x80</span><span>)&nbsp;==&nbsp;</span><span class="number">0x00</span><span>)&nbsp;&nbsp;&nbsp;&nbsp;</span></span></li>
    <li class="alt"><span>&nbsp;&nbsp;&nbsp;&nbsp;{&nbsp;&nbsp;&nbsp;&nbsp;</span></li>
    <li class=""><span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">return</span><span>&nbsp;</span><span class="number">1</span><span>;&nbsp;&nbsp;&nbsp;&nbsp;</span></span></li>
    <li class="alt"><span>&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;&nbsp;&nbsp;&nbsp;</span></li>
    <li class=""><span>&nbsp;&nbsp;&nbsp;&nbsp;</span></li>
    <li class="alt"><span>&nbsp;&nbsp;&nbsp;&nbsp;<span class="keyword">return</span><span>&nbsp;-</span><span class="number">1</span><span>;&nbsp;&nbsp;&nbsp;&nbsp;</span></span></li>
    <li class=""><span>}&nbsp;&nbsp;&nbsp;&nbsp;</span></li>
</ol>
</div>
<br />
这个功能在 D语言的标准库 phobos 中是有的， 参见&nbsp; std.utf 包的 stride函数。<br />
<br />
在std.utf包中，这个功能使用了一个 256个自己的查找表，直接查找返回字节长度。<br />
<br />
不过，上面的代码也好还是 std.utf 包中的函数也好，都<font color="#ff0000">没有对后续字节进行验证</font>，并<font color="#ff0000">不能保证指定宽度的字节是一个正确的utf8字符</font>。这一点还需要使用者在使用时自行判断。<br />
<br />
另：在 std.utf 包中常用的 utf 字符处理的功能都有了，只需要简单组合就可以实现自己的应用了，很是方便。
          <br/><br/>
          <span style="color:red;">
            <a href="http://yayv.javaeye.com/blog/73856#comments" style="color:red;">已有 <strong>0</strong> 人发表留言，猛击-&gt;&gt;<strong>这里</strong>&lt;&lt;-参与讨论</a>
          </span>
          <br/><br/><br/>
          <span style="color:#E28822;">JavaEye推荐</span>
          <br/>
          <ul class='adverts'><li><a href='/adverts/42' target='_blank'><span style="color:red;font-weight:bold;">搜狐网站诚聘Java、PHP和C++工程师</span></a></li><li><a href='/adverts/41' target='_blank'><span style="color:red;font-weight:bold;">Windows7在微软WinHEC 2008上揭开神秘面纱</span></a></li><li><a href='/adverts/138' target='_blank'><span style="color:red;font-weight:bold;">加入阿里巴巴，发展潜力无限</span></a></li></ul>
          <br/><br/><br/>
          ]]>
        </description>
        <pubDate>Tue, 24 Apr 2007 14:47:55 +0800</pubDate>
        <link>http://yayv.javaeye.com/blog/73856</link>
        <guid>http://yayv.javaeye.com/blog/73856</guid>
      </item>
      </channel>
</rss>