From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: Emacs 23 character code space Date: Sat, 22 Nov 2008 23:16:49 -0500 Message-ID: References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1227414045 14984 80.91.229.12 (23 Nov 2008 04:20:45 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 23 Nov 2008 04:20:45 +0000 (UTC) Cc: emacs-devel@gnu.org, Kenichi Handa To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Nov 23 05:21:47 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1L46Tj-0000yB-21 for ged-emacs-devel@m.gmane.org; Sun, 23 Nov 2008 05:21:47 +0100 Original-Received: from localhost ([127.0.0.1]:38513 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1L46SZ-0000xy-Rp for ged-emacs-devel@m.gmane.org; Sat, 22 Nov 2008 23:20:35 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1L46SL-0000tp-EU for emacs-devel@gnu.org; Sat, 22 Nov 2008 23:20:21 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1L46SK-0000tQ-Rb for emacs-devel@gnu.org; Sat, 22 Nov 2008 23:20:21 -0500 Original-Received: from [199.232.76.173] (port=52389 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1L46SK-0000tF-Gb for emacs-devel@gnu.org; Sat, 22 Nov 2008 23:20:20 -0500 Original-Received: from ironport2-out.teksavvy.com ([206.248.154.182]:60004) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1L46Ox-0005BV-BX; Sat, 22 Nov 2008 23:16:51 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AuoEAJNnKEnO+J+z/2dsb2JhbACBbc5KgnyBHA X-IronPort-AV: E=Sophos;i="4.33,652,1220241600"; d="scan'208";a="30190453" Original-Received: from 206-248-159-179.dsl.teksavvy.com (HELO pastel.home) ([206.248.159.179]) by ironport2-out.teksavvy.com with ESMTP; 22 Nov 2008 23:16:49 -0500 Original-Received: by pastel.home (Postfix, from userid 20848) id 4CA9E8101; Sat, 22 Nov 2008 23:16:49 -0500 (EST) In-Reply-To: (Eli Zaretskii's message of "Sat, 22 Nov 2008 18:28:13 +0200") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux) X-detected-operating-system: by monty-python.gnu.org: Genre and OS details not recognized. X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:106000 Archived-At: > What exactly is meant here by ``8-bit characters''? Do you mean > eight-bit raw bytes, or do you mean Unicode characters whose > codepoints are below 256? It should be eight-bit raw bytes. In some cases it's difficult to tell the difference, so Emacs may occasionally accept latin-1 chars as stand ins for eight-bit raw bytes. > Converting unibyte text to multibyte text leaves @acronym{ASCII} > characters unchanged, and converts 8-bit characters (codes 128 > through 159) to the corresponding representation for > multibyte text. > Again, by ``8-bit characters'' you mean raw 8-bit bytes here, right? Yes. I think we should state somewhere that unibyte strings and buffers contain bytes only. And that multibyte strings and buffers contain chars. And that bytes are a subset of chars. > @defun string-to-multibyte string > This function returns a multibyte string containing the same sequence > of characters as @var{string}. If @var{string} is a multibyte string, > it is returned unchanged. > @end defun > I'm not sure I understand the effect of this function. It returns a string containing the same bytes (in the sense of ASCII+eight-bit, not in the sense of the underlying internal representation, which we should as much as possible not mention anywhere) but in a multibyte string instead. I.e. the output is a multibyte string of the same length whose chars are bytes. > @defun string-to-unibyte string > This function returns a unibyte string containing the same sequence of > characters as @var{string}. It signals an error if @var{string} > contains a non-@acronym{ASCII} character. If @var{string} is a > unibyte string, it is returned unchanged. > @end defun > Since this function handles any non-ASCII characters lossily, when > would it be useful? I think the "non-ASCII" part is incorrect. It probably should say "non-byte char" instead. It's useful when you have a multibyte string which you (think you) know only holds bytes. In 99% (actually 99.99999% for the `as' case) of the cases you shouldn't use string-{as/make/to}-{uni/multi}byte. Instead you should use {en/de}code-coding-string. > @defun multibyte-char-to-unibyte char > This convert the multibyte character @var{char} to a unibyte > character. If @var{char} is a non-@acronym{ASCII} character, the > value is -1. > @end defun > @defun unibyte-char-to-multibyte char > This convert the unibyte character @var{char} to a multibyte > character. > @end defun > Again, when are these functions useful? Rarely. Stefan