From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Richard M. Stallman" Newsgroups: gmane.emacs.devel Subject: Re: Unibyte characters Date: Fri, 31 Oct 2008 15:30:57 -0400 Message-ID: References: Reply-To: rms@gnu.org NNTP-Posting-Host: lo.gmane.org Content-Type: text/plain; charset=ISO-8859-15 X-Trace: ger.gmane.org 1225481704 7702 80.91.229.12 (31 Oct 2008 19:35:04 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 31 Oct 2008 19:35:04 +0000 (UTC) Cc: emacs-devel@gnu.org, handa@m17n.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Oct 31 20:36:06 2008 connect(): Connection refused Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1Kvzmk-0007SY-0s for ged-emacs-devel@m.gmane.org; Fri, 31 Oct 2008 20:35:54 +0100 Original-Received: from localhost ([127.0.0.1]:53505 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Kvzld-0007Op-DP for ged-emacs-devel@m.gmane.org; Fri, 31 Oct 2008 15:34:45 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Kvzkq-0006uI-U6 for emacs-devel@gnu.org; Fri, 31 Oct 2008 15:33:57 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Kvzko-0006t6-2x for emacs-devel@gnu.org; Fri, 31 Oct 2008 15:33:56 -0400 Original-Received: from [199.232.76.173] (port=56533 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Kvzko-0006t3-0o for emacs-devel@gnu.org; Fri, 31 Oct 2008 15:33:54 -0400 Original-Received: from fencepost.gnu.org ([140.186.70.10]:46064) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Kvzkn-0006n0-4l for emacs-devel@gnu.org; Fri, 31 Oct 2008 15:33:54 -0400 Original-Received: from rms by fencepost.gnu.org with local (Exim 4.67) (envelope-from ) id 1Kvzhx-0005Lw-1V; Fri, 31 Oct 2008 15:30:57 -0400 In-reply-to: (message from Eli Zaretskii on Fri, 31 Oct 2008 13:05:54 +0200) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:105226 Archived-At: In unibyte representation, each character occupies one byte and therefore the possible character codes range from 0 to 255. Codes 0 through 127 are ASCII characters; the codes from 128 through 255 are used for one non-ASCII character set [...] But I think this is inaccurate and even misleading. For starters, unibyte buffers and strings can contain DBCS characters and UTF-8 encoded text, where a character certainly does not ``occupy one byte''. As far as Emacs is concerned, that UTF-8 sequence is multiple characters and each of those characters is one byte. The fact that one might interpret that byte sequence some other way in another context is not a part of the Emacs text representation. So the text is correct. But it could be useful to add something to explain how this unibyte text relates to other interpretations of the same byte sequence.