From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Emacs 23 character code space Date: Sun, 23 Nov 2008 06:22:45 -0500 Message-ID: References: Reply-To: Eli Zaretskii NNTP-Posting-Host: lo.gmane.org X-Trace: ger.gmane.org 1227439394 1708 80.91.229.12 (23 Nov 2008 11:23:14 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 23 Nov 2008 11:23:14 +0000 (UTC) Cc: handa@m17n.org, emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Nov 23 12:24:16 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1L4D4a-0007Er-Dx for ged-emacs-devel@m.gmane.org; Sun, 23 Nov 2008 12:24:16 +0100 Original-Received: from localhost ([127.0.0.1]:41692 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1L4D3R-0006Uc-8s for ged-emacs-devel@m.gmane.org; Sun, 23 Nov 2008 06:23:05 -0500 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1L4D3L-0006Sx-Tq for emacs-devel@gnu.org; Sun, 23 Nov 2008 06:22:59 -0500 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1L4D3L-0006SW-Ii for emacs-devel@gnu.org; Sun, 23 Nov 2008 06:22:59 -0500 Original-Received: from [199.232.76.173] (port=44745 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1L4D3L-0006SM-D0 for emacs-devel@gnu.org; Sun, 23 Nov 2008 06:22:59 -0500 Original-Received: from fencepost.gnu.org ([140.186.70.10]:44273) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1L4D3L-00065I-FE for emacs-devel@gnu.org; Sun, 23 Nov 2008 06:22:59 -0500 Original-Received: from eliz by fencepost.gnu.org with local (Exim 4.67) (envelope-from ) id 1L4D37-0003Oc-7J; Sun, 23 Nov 2008 06:22:45 -0500 In-reply-to: (message from Stefan Monnier on Sat, 22 Nov 2008 23:16:49 -0500) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:106023 Archived-At: > From: Stefan Monnier > Date: Sat, 22 Nov 2008 23:16:49 -0500 > Cc: emacs-devel@gnu.org, Kenichi Handa > > I think we should state somewhere that unibyte strings and buffers > contain bytes only. And that multibyte strings and buffers contain > chars. And that bytes are a subset of chars. Please take a look at the current version of nonascii.texi in CVS, I already did state this. Specific suggestions for improvement are welcome, of course. (The text I was quoting was the original one written by Handa-san, not the one I put into the manual.) > > @defun string-to-multibyte string > > This function returns a multibyte string containing the same sequence > > of characters as @var{string}. If @var{string} is a multibyte string, > > it is returned unchanged. > > @end defun > > > I'm not sure I understand the effect of this function. > > It returns a string containing the same bytes (in the sense of > ASCII+eight-bit, not in the sense of the underlying internal > representation, which we should as much as possible not mention > anywhere) but in a multibyte string instead. I.e. the output is > a multibyte string of the same length whose chars are bytes. So you are in effect saying that the effect of this function is only well defined for a string that holds ASCII characters and raw 8-bit bytes? > > @defun string-to-unibyte string > > This function returns a unibyte string containing the same sequence of > > characters as @var{string}. It signals an error if @var{string} > > contains a non-@acronym{ASCII} character. If @var{string} is a > > unibyte string, it is returned unchanged. > > @end defun > > > Since this function handles any non-ASCII characters lossily, when > > would it be useful? > > I think the "non-ASCII" part is incorrect. It probably should say > "non-byte char" instead. "Non-ASCII characters" here does not mean "anything but ASCII characters", it means "any character except ASCII and raw 8-bit bytes" (assuming I understand the text correctly). I will make sure this tricky distinction is clear in the manual. > In 99% (actually 99.99999% for the `as' case) of the cases you shouldn't > use string-{as/make/to}-{uni/multi}byte. Instead you should use > {en/de}code-coding-string. This specific section is not about en/decoding text, it's about converting between unibyte and multibyte. Unless we want to remove any mention of these capabilities (and leave Lisp programmers without any documentation on how to handle binary data and/or byte streams of undecoded text), I don't think we can remove the description of these functions from the manual.