From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Chip Coldwell Newsgroups: gmane.emacs.devel Subject: UNIBYTE_STR_AS_MULTIBYTE_P and UTF-8 Date: Thu, 26 Oct 2006 14:57:01 -0400 (EDT) Message-ID: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Trace: sea.gmane.org 1161889030 1709 80.91.229.2 (26 Oct 2006 18:57:10 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Thu, 26 Oct 2006 18:57:10 +0000 (UTC) Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Oct 26 20:57:01 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1GdAP1-0003hr-78 for ged-emacs-devel@m.gmane.org; Thu, 26 Oct 2006 20:56:32 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GdAP0-0003NQ-Ly for ged-emacs-devel@m.gmane.org; Thu, 26 Oct 2006 14:56:30 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1GdAOn-0003NJ-Ap for emacs-devel@gnu.org; Thu, 26 Oct 2006 14:56:17 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1GdAOk-0003Ms-QN for emacs-devel@gnu.org; Thu, 26 Oct 2006 14:56:17 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1GdAOk-0003Mp-LT for emacs-devel@gnu.org; Thu, 26 Oct 2006 14:56:14 -0400 Original-Received: from [66.187.233.31] (helo=mx1.redhat.com) by monty-python.gnu.org with esmtp (Exim 4.52) id 1GdAOk-0006ar-6i for emacs-devel@gnu.org; Thu, 26 Oct 2006 14:56:14 -0400 Original-Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.11.20060308/8.12.11) with ESMTP id k9QIu8KA017820 for ; Thu, 26 Oct 2006 14:56:08 -0400 Original-Received: from mail.boston.redhat.com (mail.boston.redhat.com [172.16.76.12]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id k9QIu7gp003281 for ; Thu, 26 Oct 2006 14:56:07 -0400 Original-Received: from vpn-248-88.boston.redhat.com (vpn-248-88.boston.redhat.com [10.13.248.88]) by mail.boston.redhat.com (8.12.11.20060308/8.12.11) with ESMTP id k9QIu6nw000406 for ; Thu, 26 Oct 2006 14:56:07 -0400 Original-To: emacs-devel@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:61204 Archived-At: I'm looking at the C macro UNIBYTE_STR_AS_MULTIBYTE_P defined in src/charset.h, which starts out with this #define UNIBYTE_STR_AS_MULTIBYTE_P(str, length, bytes) \ (((str)[0] < 0x80 || (str)[0] >= 0xA0) \ ? ((bytes) = 1) \ : /* lots, lots more */ So, if the value at str[0] is less than 0x80 or greater than 0xA0, this macro returns the value one, and sets "bytes" to one. That's fine for ASCII (always one byte wide) and ISO-8859-1 (with special characters between 0xA0 and 0xFF), but what about UTF-8? There are multibyte characters in UTF-8 with the first byte > 0xA0; it doesn't seem to me that this macro will work for general UTF-8 strings. Chip -- Charles M. "Chip" Coldwell Senior Software Engineer Red Hat, Inc 978-392-2426