From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: Emacs 23 character code space
Date: Sun, 23 Nov 2008 06:22:45 -0500
Message-ID: <E1L4D37-0003Oc-7J@fencepost.gnu.org>
References: <u63n7wmri.fsf@gnu.org> <E1KwoKX-0002Tk-Lp@etlken.m17n.org>
	<E1Kwyo4-0007Vt-Ai@etlken.m17n.org> <uk5aviv36.fsf@gnu.org>
	<jwvbpw7dr2w.fsf-monnier+emacs@gnu.org>
Reply-To: Eli Zaretskii <eliz@gnu.org>
NNTP-Posting-Host: lo.gmane.org
X-Trace: ger.gmane.org 1227439394 1708 80.91.229.12 (23 Nov 2008 11:23:14 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sun, 23 Nov 2008 11:23:14 +0000 (UTC)
Cc: handa@m17n.org, emacs-devel@gnu.org
To: Stefan Monnier <monnier@iro.umontreal.ca>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Nov 23 12:24:16 2008
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1L4D4a-0007Er-Dx
	for ged-emacs-devel@m.gmane.org; Sun, 23 Nov 2008 12:24:16 +0100
Original-Received: from localhost ([127.0.0.1]:41692 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1L4D3R-0006Uc-8s
	for ged-emacs-devel@m.gmane.org; Sun, 23 Nov 2008 06:23:05 -0500
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1L4D3L-0006Sx-Tq
	for emacs-devel@gnu.org; Sun, 23 Nov 2008 06:22:59 -0500
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1L4D3L-0006SW-Ii
	for emacs-devel@gnu.org; Sun, 23 Nov 2008 06:22:59 -0500
Original-Received: from [199.232.76.173] (port=44745 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1L4D3L-0006SM-D0
	for emacs-devel@gnu.org; Sun, 23 Nov 2008 06:22:59 -0500
Original-Received: from fencepost.gnu.org ([140.186.70.10]:44273)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <eliz@gnu.org>) id 1L4D3L-00065I-FE
	for emacs-devel@gnu.org; Sun, 23 Nov 2008 06:22:59 -0500
Original-Received: from eliz by fencepost.gnu.org with local (Exim 4.67)
	(envelope-from <eliz@gnu.org>)
	id 1L4D37-0003Oc-7J; Sun, 23 Nov 2008 06:22:45 -0500
In-reply-to: <jwvbpw7dr2w.fsf-monnier+emacs@gnu.org> (message from Stefan
	Monnier on Sat, 22 Nov 2008 23:16:49 -0500)
X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6,
	seldom 2.4 (older, 4)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:106023
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/106023>

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Sat, 22 Nov 2008 23:16:49 -0500
> Cc: emacs-devel@gnu.org, Kenichi Handa <handa@m17n.org>
> 
> I think we should state somewhere that unibyte strings and buffers
> contain bytes only.  And that multibyte strings and buffers contain
> chars.  And that bytes are a subset of chars.

Please take a look at the current version of nonascii.texi in CVS, I
already did state this.  Specific suggestions for improvement are
welcome, of course.

(The text I was quoting was the original one written by Handa-san, not
the one I put into the manual.)

> >     @defun string-to-multibyte string
> >     This function returns a multibyte string containing the same sequence
> >     of characters as @var{string}.  If @var{string} is a multibyte string,
> >     it is returned unchanged.
> >     @end defun
> 
> > I'm not sure I understand the effect of this function.
> 
> It returns a string containing the same bytes (in the sense of
> ASCII+eight-bit, not in the sense of the underlying internal
> representation, which we should as much as possible not mention
> anywhere) but in a multibyte string instead.  I.e. the output is
> a multibyte string of the same length whose chars are bytes.

So you are in effect saying that the effect of this function is only
well defined for a string that holds ASCII characters and raw 8-bit
bytes?

> >     @defun string-to-unibyte string
> >     This function returns a unibyte string containing the same sequence of
> >     characters as @var{string}.  It signals an error if @var{string}
> >     contains a non-@acronym{ASCII} character.  If @var{string} is a
> >     unibyte string, it is returned unchanged.
> >     @end defun
> 
> > Since this function handles any non-ASCII characters lossily, when
> > would it be useful?
> 
> I think the "non-ASCII" part is incorrect.  It probably should say
> "non-byte char" instead.

"Non-ASCII characters" here does not mean "anything but ASCII
characters", it means "any character except ASCII and raw 8-bit
bytes" (assuming I understand the text correctly).  I will make sure
this tricky distinction is clear in the manual.

> In 99% (actually 99.99999% for the `as' case) of the cases you shouldn't
> use string-{as/make/to}-{uni/multi}byte.  Instead you should use
> {en/de}code-coding-string.

This specific section is not about en/decoding text, it's about
converting between unibyte and multibyte.  Unless we want to remove
any mention of these capabilities (and leave Lisp programmers without
any documentation on how to handle binary data and/or byte streams of
undecoded text), I don't think we can remove the description of these
functions from the manual.