From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Alan Mackenzie <acm@muc.de>
Newsgroups: gmane.emacs.devel
Subject: Re: Fwd: Re: Inadequate documentation of silly characters on screen.
Date: Thu, 19 Nov 2009 08:20:40 +0000
Message-ID: <20091119082040.GA1720@muc.de>
References: <20091118191258.GA2676@muc.de>
	<jwvlji3fgzi.fsf-monnier+emacs@gnu.org>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Trace: ger.gmane.org 1258618589 25656 80.91.229.12 (19 Nov 2009 08:16:29 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 19 Nov 2009 08:16:29 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: Stefan Monnier <monnier@iro.umontreal.ca>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Nov 19 09:16:22 2009
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1NB2Bh-0003KU-3k
	for ged-emacs-devel@m.gmane.org; Thu, 19 Nov 2009 09:16:21 +0100
Original-Received: from localhost ([127.0.0.1]:50455 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1NB2Bg-0000q2-IL
	for ged-emacs-devel@m.gmane.org; Thu, 19 Nov 2009 03:16:20 -0500
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1NB2Bb-0000o3-Hu
	for emacs-devel@gnu.org; Thu, 19 Nov 2009 03:16:15 -0500
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1NB2BW-0000ic-CG
	for emacs-devel@gnu.org; Thu, 19 Nov 2009 03:16:14 -0500
Original-Received: from [199.232.76.173] (port=55533 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1NB2BW-0000iT-7w
	for emacs-devel@gnu.org; Thu, 19 Nov 2009 03:16:10 -0500
Original-Received: from colin.muc.de ([193.149.48.1]:2274 helo=mail.muc.de)
	by monty-python.gnu.org with esmtps
	(TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60)
	(envelope-from <acm@muc.de>) id 1NB2BV-0005CZ-Jm
	for emacs-devel@gnu.org; Thu, 19 Nov 2009 03:16:10 -0500
Original-Received: (qmail 46833 invoked by uid 3782); 19 Nov 2009 08:16:06 -0000
Original-Received: from acm.muc.de (pD9E51409.dip.t-dialin.net [217.229.20.9]) by
	colin2.muc.de (tmda-ofmipd) with ESMTP;
	Thu, 19 Nov 2009 09:15:36 +0100
Original-Received: (qmail 2173 invoked by uid 1000); 19 Nov 2009 08:20:40 -0000
Content-Disposition: inline
In-Reply-To: <jwvlji3fgzi.fsf-monnier+emacs@gnu.org>
User-Agent: Mutt/1.5.9i
X-Delivery-Agent: TMDA/1.1.5 (Fettercairn)
X-Primary-Address: acm@muc.de
X-detected-operating-system: by monty-python.gnu.org: FreeBSD 4.6-4.9
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:117236
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/117236>

Morning, Stefan!

On Wed, Nov 18, 2009 at 08:27:24PM -0500, Stefan Monnier wrote:

> The integer 241 is used to represent the char ?ñ, but it's also used for
> many other things, one of them being to represent the byte 241 (tho such
> a byte can also be represented as the integer 4194289).

> Now strings come in two flavors: multibyte (i.e. sequences of chars) and
> unibyte (i.e. sequences of bytes).  So when you do:

>    M-: (setq nl "\n")
>    M-: (aset nl 0 ?ñ)
>    M-: (insert nl)

> The `aset' part may do two different things depending on whether `nl' is
> unibyte or multibyte: it will either insert the char ?ñ or the byte 241.
> In the above code the "\n" is taken as a unibyte string, tho I'm not
> sure why we made this arbitrary choice.

The above sequence "works" in Emacs 22.3, in the sense that "ñ" gets
displayed - when I do M-: (aset nl 0 ?ñ), I get

   "2289 (#o4361, #x8f1)" (Emacs 22.3)
   "241 (#o361, #xf1)"    (Emacs 23.1)

displayed in the echo area.  So my `aset' invocation is trying to write a
multibyte ?ñ into a unibyte ?\n, and gets truncated from #x8f1 to #xf1 in
the process.  Surely this behaviour in Emacs 23.1 is a bug?  Shouldn't we
fix it before the pretest?  How about interpreting "\n" and friends as
multibyte or unibyte according to the prevailing flavour?

> If you give us more context (i.e. more of the real code where the
> problem show up), maybe we can tell you how to avoid it.

OK.  I have my own routine to display regexps.  As a first step, I
translate \n -> ñ, (and \t, \r, \f similarly).  This is how:

    (defun translate-rnt (regexp)
      "REGEXP is a string.  Translate any \t \n \r and \f characters
    to wierd non-ASCII printable characters: \t to Î (206, \xCE), \n
    to ñ (241, \xF1), \r to ® (174, \xAE) and \f to £ (163, \xA3).
    The original string is modified."
      (let (ch pos)
        (while (setq pos (string-match "[\t\n\r\f]" regexp))
          (setq ch (aref regexp pos))
          (aset regexp pos                        ; <===================
                (cond ((eq ch ?\t) ?Î)
                      ((eq ch ?\n) ?ñ)
                      ((eq ch ?\r) ?®)
                      (t           ?£))))
        regexp))


> Usually, I recommend to stay away from `aset' on strings for various
> reasons, and it seems that it also helps avoid those tricky issues (tho
> it doesn't protect you from them completely).

Again, surely this is a bug?  These tricky issues should be dealt with in
the lisp interpreter in a way that lisp hackers don't have to worry
about.  Why do we have both unibyte and multibyte?  Is there any reason
not to remove unibyte altogether (though obviously not for 23.2).

What was the change between 22.3 and 23.1 that broke my code?  Would it,
perhaps, be a good idea to reconsider that change?

>         Stefan

-- 
Alan Mackenzie (Nurmberg, Germany).