Re: Fwd: Re: Inadequate documentation of silly characters on screen.

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: Stefan Monnier <monnier@iro.umontreal.ca>
To: Alan Mackenzie <acm@muc.de>
Cc: emacs-devel@gnu.org
Subject: Re: Fwd: Re: Inadequate documentation of silly characters on screen.
Date: Thu, 19 Nov 2009 09:08:29 -0500	[thread overview]
Message-ID: <jwvmy2ieix7.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <20091119082040.GA1720@muc.de> (Alan Mackenzie's message of "Thu,  19 Nov 2009 08:20:40 +0000")

> The above sequence "works" in Emacs 22.3, in the sense that "ñ" gets
> displayed

There are many differences that cause it to work completely differently:

> - when I do M-: (aset nl 0 ?ñ), I get

>    "2289 (#o4361, #x8f1)" (Emacs 22.3)
>    "241 (#o361, #xf1)"    (Emacs 23.1)

?ñ = 2289 in Emacs-22
?ñ = 241  in Emacs-23

So in Emacs-22, there is no possible confusion for this char with
a byte.
So when you do the `aset', Emacs-22 converts the unibyte string nl to
multibyte, whereas Emacs-23 doesn't.  From then on, in Emacs-22 your
example is all multibyte, so there's no surprise.

Now if in Emacs-22 you do instead (aset nl 0 241), where 241 in Emacs-22
is not a valid char and can hence only be a byte, then aset leaves the
string as unibyte and we end up with the same nl as in Emacs-23.  But if
you then (insert nl), Emacs-22 will probably end up inserting a ñ in
your buffer, because Emacs-22 performs a decoding step using your
language environment when inserting a unibyte string into a unibyte
buffer (this used to be helpful for code that didn't know enough about
Mule to setup coding systems properly, which is why it was done, but
nowadays it was just hiding bugs and encouraging sloppiness in coding so
we removed it).

> fix it before the pretest?  How about interpreting "\n" and friends as
> multibyte or unibyte according to the prevailing flavour?

I'm not sure what that means.  But maybe "\n" should be multibyte, yes.

>> If you give us more context (i.e. more of the real code where the
>> problem show up), maybe we can tell you how to avoid it.

> OK.  I have my own routine to display regexps.  As a first step, I
> translate \n -> ñ, (and \t, \r, \f similarly).  This is how:

>     (defun translate-rnt (regexp)
>       "REGEXP is a string.  Translate any \t \n \r and \f characters
>     to wierd non-ASCII printable characters: \t to Î (206, \xCE), \n
>     to ñ (241, \xF1), \r to ® (174, \xAE) and \f to £ (163, \xA3).
>     The original string is modified."
>       (let (ch pos)
>         (while (setq pos (string-match "[\t\n\r\f]" regexp))
>           (setq ch (aref regexp pos))
>           (aset regexp pos                        ; <===================
>                 (cond ((eq ch ?\t) ?Î)
>                       ((eq ch ?\n) ?ñ)
>                       ((eq ch ?\r) ?®)
>                       (t           ?£))))
>         regexp))

Each one of those `aset' (when performed according to your wishes) would
change the byte-size of the string, so it would internally require
copying the whole string each time: aset on (multibyte) strings is very
inefficient (compared to what most people expect, not necessarily
compared to other operations).  I'd recommend you use higher-level
operations since they'll work just as well and are less susceptible to
such problems:

  (replace-regexp-in-string "[\t\n\r\f]"
                            (lambda (s)
                              (or (cdr (assoc s '(("\t" . "Î")
                                                  ("\n" . "ñ")
                                                  ("\r" . "®"))))
                                  "£"))
                            regexp)

> Why do we have both unibyte and multibyte?  Is there any reason
> not to remove unibyte altogether (though obviously not for 23.2).

Because bytes and chars are different, so we have strings of bytes and
strings of chars.  The problem with it is not their combined existence,
but the fact that they are not different enough.  Many people don't
understand the difference between chars and bytes, but even more people
can't figure out which Elisp operation returns a unibyte string and
which a multibyte strings, and that for a "good" reason: it's very
difficult to predict.

Emacs-23 tries to help in this in the following ways:
- `string' always builds a multibyte string now, so if you want
  a unibyte string, you need to use the new `unibyte-string' function.
- we don't automatically perform encoding/decoding conversions between
  the two forms, so we hide the difference a bit less.

We should probably moved towards making all string immediates multibyte
and add a new syntax to unibyte immediates.

> What was the change between 22.3 and 23.1 that broke my code?

Mostly: the change to unibyte internal representation which made 241
(and other byte values) ambiguous since it can also be interpreted now
as a character value.

> Would it, perhaps, be a good idea to reconsider that change?

I think you'll understand that reverting to the emacs-mule
(iso-2022-based) internal representation is not really on the table ;-)

        Stefan

next prev parent reply	other threads:[~2009-11-19 14:08 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-18 19:12 [acm@muc.de: Re: Inadequate documentation of silly characters on screen.] Alan Mackenzie
2009-11-19  1:27 ` Fwd: Re: Inadequate documentation of silly characters on screen Stefan Monnier
2009-11-19  8:20   ` Alan Mackenzie
2009-11-19  8:50     ` Miles Bader
2009-11-19 10:16     ` Fwd: " Andreas Schwab
2009-11-19 12:21       ` Alan Mackenzie
2009-11-19 13:21       ` Jason Rumney
2009-11-19 13:35         ` Stefan Monnier
2009-11-19 14:18         ` Alan Mackenzie
2009-11-19 14:58           ` Jason Rumney
2009-11-19 15:42             ` Alan Mackenzie
2009-11-19 19:39               ` Eli Zaretskii
2009-11-19 15:30           ` Stefan Monnier
2009-11-19 15:58             ` Alan Mackenzie
2009-11-19 16:06               ` Andreas Schwab
2009-11-19 16:47               ` Aidan Kehoe
2009-11-19 17:29                 ` Alan Mackenzie
2009-11-19 18:21                   ` Aidan Kehoe
2009-11-20  2:43                   ` Stephen J. Turnbull
2009-11-19 19:45                 ` Eli Zaretskii
2009-11-19 20:07                   ` Eli Zaretskii
2009-11-19 19:55                 ` Stefan Monnier
2009-11-20  3:13                   ` Stephen J. Turnbull
2009-11-19 16:55               ` David Kastrup
2009-11-19 18:08                 ` Alan Mackenzie
2009-11-19 19:25                   ` Davis Herring
2009-11-19 21:25                     ` Alan Mackenzie
2009-11-19 22:31                       ` David Kastrup
2009-11-21 22:52                         ` Richard Stallman
2009-11-23  2:08                           ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Stefan Monnier
2009-11-23 20:38                             ` Richard Stallman
2009-11-23 21:34                               ` Per Starbäck
2009-11-24 22:47                                 ` Richard Stallman
2009-11-25  1:33                                   ` Kenichi Handa
2009-11-25  2:29                                     ` Displaying bytes (was: Inadequate documentation of silly Stefan Monnier
2009-11-25  2:50                                       ` Lennart Borgman
2009-11-25  6:25                                       ` Stephen J. Turnbull
2009-11-25  5:40                                     ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Ulrich Mueller
2009-11-26 22:59                                       ` Displaying bytes Reiner Steib
2009-11-27  0:16                                         ` Ulrich Mueller
2009-11-27  1:41                                         ` Stefan Monnier
2009-11-27  4:14                                         ` Stephen J. Turnbull
2009-11-25  5:59                                     ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Stephen J. Turnbull
2009-11-25  8:16                                       ` Kenichi Handa
2009-11-29 16:01                                     ` Richard Stallman
2009-11-29 16:31                                       ` Displaying bytes (was: Inadequate documentation of silly Stefan Monnier
2009-11-29 22:01                                         ` Juri Linkov
2009-11-30  6:05                                           ` tomas
2009-11-30 12:09                                             ` Andreas Schwab
2009-11-30 12:39                                               ` tomas
2009-11-29 22:19                                       ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Kim F. Storm
2009-11-30  1:42                                         ` Stephen J. Turnbull
2009-11-24  1:28                               ` Displaying bytes Stefan Monnier
2009-11-24 22:47                                 ` Richard Stallman
2009-11-25  2:18                                   ` Stefan Monnier
2009-11-26  6:24                                     ` Richard Stallman
2009-11-26  8:59                                       ` David Kastrup
2009-11-26 14:57                                       ` Stefan Monnier
2009-11-26 16:28                                         ` Lennart Borgman
2009-11-27  6:36                                         ` Richard Stallman
2009-11-24 22:47                                 ` Richard Stallman
2009-11-20  8:48                       ` Fwd: Re: Inadequate documentation of silly characters on screen Eli Zaretskii
2009-11-19 19:52                   ` Eli Zaretskii
2009-11-19 20:53                     ` Alan Mackenzie
2009-11-19 22:16                       ` David Kastrup
2009-11-20  8:55                         ` Eli Zaretskii
2009-11-19 20:05                   ` Stefan Monnier
2009-11-19 21:27                     ` Alan Mackenzie
2009-11-19 19:43               ` Eli Zaretskii
2009-11-19 21:57                 ` Alan Mackenzie
2009-11-19 23:10                   ` Stefan Monnier
2009-11-19 20:02               ` Stefan Monnier
2009-11-19 14:08     ` Stefan Monnier [this message]
2009-11-19 14:50       ` Jason Rumney
2009-11-19 15:27         ` Stefan Monnier
2009-11-19 23:12           ` Miles Bader
2009-11-20  2:16             ` Stefan Monnier
2009-11-20  3:37             ` Stephen J. Turnbull
2009-11-20  4:30               ` Stefan Monnier
2009-11-20  7:18                 ` Stephen J. Turnbull
2009-11-20 14:16                   ` Stefan Monnier
2009-11-21  4:13                     ` Stephen J. Turnbull
2009-11-21  5:24                       ` Stefan Monnier
2009-11-21  6:42                         ` Stephen J. Turnbull
2009-11-21  6:49                           ` Stefan Monnier
2009-11-21  7:27                             ` Stephen J. Turnbull
2009-11-23  1:58                               ` Stefan Monnier
2009-11-21 12:33                           ` David Kastrup
2009-11-21 13:55                             ` Stephen J. Turnbull
2009-11-21 14:36                               ` David Kastrup
2009-11-21 17:53                                 ` Stephen J. Turnbull
2009-11-21 23:30                                   ` David Kastrup
2009-11-22  1:27                                     ` Sebastian Rose
2009-11-22  8:06                                       ` David Kastrup
2009-11-22 23:52                                         ` Sebastian Rose
2009-11-19 17:08       ` Fwd: " Alan Mackenzie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jwvmy2ieix7.fsf-monnier+emacs@gnu.org \
    --to=monnier@iro.umontreal.ca \
    --cc=acm@muc.de \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.