From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Emacs 23 character code space Date: Sat, 22 Nov 2008 18:28:13 +0200 Message-ID: References: Reply-To: Eli Zaretskii NNTP-Posting-Host: lo.gmane.org X-Trace: ger.gmane.org 1227371349 11035 80.91.229.12 (22 Nov 2008 16:29:09 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 22 Nov 2008 16:29:09 +0000 (UTC) Cc: emacs-devel@gnu.org To: Kenichi Handa Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Nov 22 17:30:10 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1L3vN1-000352-NJ for ged-emacs-devel@m.gmane.org; Sat, 22 Nov 2008 17:30:07 +0100 Original-Received: from localhost ([127.0.0.1]:43481 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1L3vLs-0001pI-EG for ged-emacs-devel@m.gmane.org; Sat, 22 Nov 2008 11:28:56 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1L3vLC-0001RO-Tq for emacs-devel@gnu.org; Sat, 22 Nov 2008 11:28:14 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1L3vLC-0001QW-5A for emacs-devel@gnu.org; Sat, 22 Nov 2008 11:28:14 -0500 Original-Received: from [199.232.76.173] (port=46251 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1L3vLB-0001QN-Rn for emacs-devel@gnu.org; Sat, 22 Nov 2008 11:28:13 -0500 Original-Received: from mtaout3.012.net.il ([84.95.2.7]:10694) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1L3vLB-0005IN-Dz for emacs-devel@gnu.org; Sat, 22 Nov 2008 11:28:13 -0500 Original-Received: from conversion-daemon.i_mtaout3.012.net.il by i_mtaout3.012.net.il (HyperSendmail v2004.12) id <0KAQ00D00TU4OG00@i_mtaout3.012.net.il> for emacs-devel@gnu.org; Sat, 22 Nov 2008 18:30:12 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([77.126.14.29]) by i_mtaout3.012.net.il (HyperSendmail v2004.12) with ESMTPA id <0KAQ003C0TUBK801@i_mtaout3.012.net.il>; Sat, 22 Nov 2008 18:30:12 +0200 (IST) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by monty-python.gnu.org: Solaris 9.1 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:105962 Archived-At: > From: Kenichi Handa > CC: eliz@gnu.org, emacs-devel@gnu.org > Date: Mon, 03 Nov 2008 21:45:20 +0900 > > I tried to rewrite nonascii.texi to clear the things. I > finished upto the "Character Code" section as attached. > What do you think about it? Thanks! I have a few questions: Emacs can convert unibyte text to multibyte; it can also convert multibyte text to unibyte provided that the multibyte text contains only @acronym{ASCII} and 8-bit characters. What exactly is meant here by ``8-bit characters''? Do you mean eight-bit raw bytes, or do you mean Unicode characters whose codepoints are below 256? Converting unibyte text to multibyte text leaves @acronym{ASCII} characters unchanged, and converts 8-bit characters (codes 128 through 159) to the corresponding representation for multibyte text. Again, by ``8-bit characters'' you mean raw 8-bit bytes here, right? @defun string-to-multibyte string This function returns a multibyte string containing the same sequence of characters as @var{string}. If @var{string} is a multibyte string, it is returned unchanged. @end defun I'm not sure I understand the effect of this function. Does it decode its argument, converting each byte to the corresponding internal representation of the encoded single-byte character? I think this is not what it does, but then what does it do? @defun string-to-unibyte string This function returns a unibyte string containing the same sequence of characters as @var{string}. It signals an error if @var{string} contains a non-@acronym{ASCII} character. If @var{string} is a unibyte string, it is returned unchanged. @end defun Since this function handles any non-ASCII characters lossily, when would it be useful? @defun multibyte-char-to-unibyte char This convert the multibyte character @var{char} to a unibyte character. If @var{char} is a non-@acronym{ASCII} character, the value is -1. @end defun @defun unibyte-char-to-multibyte char This convert the unibyte character @var{char} to a multibyte character. @end defun Again, when are these functions useful?