From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Emacs 23 character code space Date: Sat, 01 Nov 2008 18:46:09 +0200 Message-ID: References: Reply-To: Eli Zaretskii NNTP-Posting-Host: lo.gmane.org X-Trace: ger.gmane.org 1225557995 28386 80.91.229.12 (1 Nov 2008 16:46:35 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 1 Nov 2008 16:46:35 +0000 (UTC) Cc: emacs-devel@gnu.org To: Kenichi Handa Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Nov 01 17:47:37 2008 connect(): Connection refused Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1KwJdQ-0007O4-AX for ged-emacs-devel@m.gmane.org; Sat, 01 Nov 2008 17:47:36 +0100 Original-Received: from localhost ([127.0.0.1]:45850 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KwJcJ-0006O1-TK for ged-emacs-devel@m.gmane.org; Sat, 01 Nov 2008 12:46:27 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KwJcD-0006L0-Gl for emacs-devel@gnu.org; Sat, 01 Nov 2008 12:46:21 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KwJcC-0006Jz-EH for emacs-devel@gnu.org; Sat, 01 Nov 2008 12:46:21 -0400 Original-Received: from [199.232.76.173] (port=39314 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KwJcC-0006Ja-6v for emacs-devel@gnu.org; Sat, 01 Nov 2008 12:46:20 -0400 Original-Received: from mtaout2.012.net.il ([84.95.2.4]:51250) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1KwJcB-0001SR-Jl for emacs-devel@gnu.org; Sat, 01 Nov 2008 12:46:19 -0400 Original-Received: from HOME-C4E4A596F7 ([77.127.192.143]) by i_mtaout2.012.net.il (HyperSendmail v2004.12) with ESMTPA id <0K9N00J0AYNOKED0@i_mtaout2.012.net.il> for emacs-devel@gnu.org; Sat, 01 Nov 2008 18:47:57 +0200 (IST) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by monty-python.gnu.org: Solaris 9.1 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:105244 Archived-At: Another fragment from etc/NEWS that seems not entirely accurate: In buffers and strings, characters are represented by UTF-8 byte sequences in a multibyte buffer/string. But UTF-8 defines 1- to 4-byte sequences to represent each Unicode codepoint, whereas this comment from character.h: /* character code 1st byte byte sequence -------------- -------- ------------- 0-7F 00..7F 0xxxxxxx 80-7FF C2..DF 110xxxxx 10xxxxxx 800-FFFF E0..EF 1110xxxx 10xxxxxx 10xxxxxx 10000-1FFFFF F0..F7 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 200000-3FFF7F F8 11111000 1000xxxx 10xxxxxx 10xxxxxx 10xxxxxx 3FFF80-3FFFFF C0..C1 1100000x 10xxxxxx (for eight-bit-char) 400000-... invalid invalid 1st byte 80..BF 10xxxxxx F9..FF 11111xxx (xxx != 000) */ seems to tell that we use up to 5 bytes. What am I missing?