From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Kenichi Handa Newsgroups: gmane.emacs.devel Subject: Re: Emacs 23 character code space Date: Wed, 26 Nov 2008 10:31:19 +0900 Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII X-Trace: ger.gmane.org 1227663115 26745 80.91.229.12 (26 Nov 2008 01:31:55 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 26 Nov 2008 01:31:55 +0000 (UTC) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Nov 26 02:32:58 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1L59Gu-0007eo-Fs for ged-emacs-devel@m.gmane.org; Wed, 26 Nov 2008 02:32:52 +0100 Original-Received: from localhost ([127.0.0.1]:56015 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1L59Fk-00030B-KZ for ged-emacs-devel@m.gmane.org; Tue, 25 Nov 2008 20:31:40 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1L59Ff-0002zV-GN for emacs-devel@gnu.org; Tue, 25 Nov 2008 20:31:35 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1L59Fe-0002yZ-Hy for emacs-devel@gnu.org; Tue, 25 Nov 2008 20:31:35 -0500 Original-Received: from [199.232.76.173] (port=57886 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1L59Fe-0002yP-Df for emacs-devel@gnu.org; Tue, 25 Nov 2008 20:31:34 -0500 Original-Received: from mx1.aist.go.jp ([150.29.246.133]:54974) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1L59FW-00051T-GJ; Tue, 25 Nov 2008 20:31:26 -0500 Original-Received: from rqsmtp1.aist.go.jp (rqsmtp1.aist.go.jp [150.29.254.115]) by mx1.aist.go.jp with ESMTP id mAQ1VKTI016706; Wed, 26 Nov 2008 10:31:20 +0900 (JST) env-from (handa@m17n.org) Original-Received: from smtp1.aist.go.jp by rqsmtp1.aist.go.jp with ESMTP id mAQ1VKWH019126; Wed, 26 Nov 2008 10:31:20 +0900 (JST) env-from (handa@m17n.org) Original-Received: by smtp1.aist.go.jp with ESMTP id mAQ1VJMk021939; Wed, 26 Nov 2008 10:31:19 +0900 (JST) env-from (handa@m17n.org) Original-Received: from handa by etlken.m17n.org with local (Exim 4.69) (envelope-from ) id 1L59FP-0005W8-QD; Wed, 26 Nov 2008 10:31:19 +0900 In-reply-to: (message from Eli Zaretskii on Sat, 22 Nov 2008 18:28:13 +0200) User-Agent: SEMI/1.14.3 (Ushinoya) FLIM/1.14.2 (Yagi-Nishiguchi) APEL/10.2 Emacs/23.0.60 (i686-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) X-detected-operating-system: by monty-python.gnu.org: Solaris 9 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:106175 Archived-At: In article , Eli Zaretskii writes: > Emacs can convert unibyte text to multibyte; it can also convert > multibyte text to unibyte provided that the multibyte text contains > only @acronym{ASCII} and 8-bit characters. > What exactly is meant here by ``8-bit characters''? Do you mean > eight-bit raw bytes, or do you mean Unicode characters whose > codepoints are below 256? The former; more precisely, characters representing eight-bit raw bytes. They have different character codes in multibyte text (#x3FFF80..#x3FFFFF) and unibyte text (#x80..#xFF). > Converting unibyte text to multibyte text leaves @acronym{ASCII} characters > unchanged, and converts 8-bit characters (codes 128 through 159) to > the corresponding representation for multibyte text. > Again, by ``8-bit characters'' you mean raw 8-bit bytes here, right? Yes. > @defun string-to-multibyte string > This function returns a multibyte string containing the same sequence > of characters as @var{string}. If @var{string} is a multibyte string, > it is returned unchanged. > @end defun > I'm not sure I understand the effect of this function. Does it decode > its argument, converting each byte to the corresponding internal > representation of the encoded single-byte character? I think this is > not what it does, but then what does it do? No, all 8-bit characters (#x80..#xFF) in the source unibyte string is converted to the multibyte representation of those 8-bit characters (#x3FFF80..#x3FFFFF). > @defun string-to-unibyte string > This function returns a unibyte string containing the same sequence of > characters as @var{string}. It signals an error if @var{string} > contains a non-@acronym{ASCII} character. If @var{string} is a > unibyte string, it is returned unchanged. > @end defun > Since this function handles any non-ASCII characters lossily, when > would it be useful? If you know that a string containts only ASCII or 8-bit characters, you can use it to get a unibyte string without loosing information. > @defun multibyte-char-to-unibyte char > This convert the multibyte character @var{char} to a unibyte > character. If @var{char} is a non-@acronym{ASCII} character, the > value is -1. > @end defun > @defun unibyte-char-to-multibyte char > This convert the unibyte character @var{char} to a multibyte > character. > @end defun > Again, when are these functions useful? Perhaps, we don't need them anymore. We can use get-byte. Anyway, the relationship of they and string-to-unibyte/multibyte is this: (defun string-to-unibyte (str) (let ((new (make-string (length str) 0))) (dotimes (i (length str)) (let ((byte (multibyte-char-to-unibyte (aref str i)))) (if (< byte 0) (error)) (aset new i byte))) new)) (defun string-to-multibyte (str) (let ((new (make-string (length str) 0))) (dotimes (i (length str)) (aset new i (unibyte-char-to-multibyte (aref str i)))) new)) --- Kenichi Handa handa@ni.aist.go.jp