all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: David Kastrup <dak@gnu.org>
To: emacs-devel@gnu.org
Subject: Re: Inadequate documentation of silly characters on screen.
Date: Sat, 21 Nov 2009 13:33:23 +0100	[thread overview]
Message-ID: <87lji0awh8.fsf@lola.goethe.zz> (raw)
In-Reply-To: 877htk2xbk.fsf@uwakimon.sk.tsukuba.ac.jp

"Stephen J. Turnbull" <stephen@xemacs.org> writes:

> Stefan Monnier writes:
>
>  > I don't know what you mean.  The eight-bit "chars" were introduced
>  > to make sure that decoding+reencoding will always return the exact
>  > same byte-sequence, no matter what coding-system was used
>  > (i.e. even if the byte-sequence is invaldi for that coding-system).
>  > Dunno how XEmacs handles it.
>
> Honestly, it currently doesn't, or doesn't very well, despite some
> work by Aidan.

But we don't need to make this a problem for _Emacs_.

> However, I think a well-behaved platform should by default error
> (something derived from invalid-state, in XEmacs's error hierarchy) in
> such a case; normally this means corruption in the file.

We take care that it does not mean corruption.  And more often it means
that you might have been loading with the wrong encoding (people do that
all the time).  If you edit some innocent ASCII part and save again, you
won't appreciate changes all across the file elsewhere in parts you did
not touch or see on-screen.

Sometimes there is no "right encoding".  If I load an executable or an
image file with tag strings and change one string in overwrite mode, I
want to be able to save again.  Compiled Elisp files contain binary
strings as well.  There may be source files with binary blobs in them,
there may be files with parts in different encodings and so on.

> There are special cases like utf8latex whose error messages give you a
> certain number of octets without respecting character boundaries; I
> agree there is need to handle this case.

Forget about the TeX problem: that is a red herring.  It is just one
case where irrevertable corruption is not the right answer.  In fact, I
know of no case where irrevertable corruption is the right answer.
"Don't touch what you don't understand" is a good rationale.  For
XEmacs, following this rationale would currently require erroring out.
And I actually recommend that you do so: you will learn the hard way
that users like the Emacs solution of "don't touch what you don't
understand", namely having artificial code points for losslessly
representing the parts Emacs does not understand in a particular
encoding, better.

> What Python 3 (PEP 383) does is provide a family of coding system
> variants which use invalid Unicode surrogates to encode "raw bytes"
> for situations where the user asks you to proceed despite invalid
> octet sequences for the coding system; since Emacs's internal code is
> UTF-8, any Unicode surrogate is invalid and could be used for this
> purpose.  This would make non-Emacs apps barf errors on such Emacs
> autosaves, but they'll probably barf on the source file, too.

We currently _have_ such a scheme in place.  We just use different
Unicode-invalid code points.

> There's a typo in the expr above, should be "multibyte-string".  The
> proposed treatment of 241 is due to the fact that it is currently
> illegal in multibyte strings AIUI.

It is a perfectly valid character ñ in multibyte strings, but not
represented by its single-byte/latin-1 equivalent.

> Re widechar buffers: the codes for Latin-1 characters in UTF-16 and
> UTF-32 are just zero-padded extensions of the unibyte codes.

I think you may be muddling characters and their byte sequence
representations.  At least I can't read much sense into this statement
otherwise.

-- 
David Kastrup





  parent reply	other threads:[~2009-11-21 12:33 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-18 19:12 [acm@muc.de: Re: Inadequate documentation of silly characters on screen.] Alan Mackenzie
2009-11-19  1:27 ` Fwd: Re: Inadequate documentation of silly characters on screen Stefan Monnier
2009-11-19  8:20   ` Alan Mackenzie
2009-11-19  8:50     ` Miles Bader
2009-11-19 10:16     ` Fwd: " Andreas Schwab
2009-11-19 12:21       ` Alan Mackenzie
2009-11-19 13:21       ` Jason Rumney
2009-11-19 13:35         ` Stefan Monnier
2009-11-19 14:18         ` Alan Mackenzie
2009-11-19 14:58           ` Jason Rumney
2009-11-19 15:42             ` Alan Mackenzie
2009-11-19 19:39               ` Eli Zaretskii
2009-11-19 15:30           ` Stefan Monnier
2009-11-19 15:58             ` Alan Mackenzie
2009-11-19 16:06               ` Andreas Schwab
2009-11-19 16:47               ` Aidan Kehoe
2009-11-19 17:29                 ` Alan Mackenzie
2009-11-19 18:21                   ` Aidan Kehoe
2009-11-20  2:43                   ` Stephen J. Turnbull
2009-11-19 19:45                 ` Eli Zaretskii
2009-11-19 20:07                   ` Eli Zaretskii
2009-11-19 19:55                 ` Stefan Monnier
2009-11-20  3:13                   ` Stephen J. Turnbull
2009-11-19 16:55               ` David Kastrup
2009-11-19 18:08                 ` Alan Mackenzie
2009-11-19 19:25                   ` Davis Herring
2009-11-19 21:25                     ` Alan Mackenzie
2009-11-19 22:31                       ` David Kastrup
2009-11-21 22:52                         ` Richard Stallman
2009-11-23  2:08                           ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Stefan Monnier
2009-11-23 20:38                             ` Richard Stallman
2009-11-23 21:34                               ` Per Starbäck
2009-11-24 22:47                                 ` Richard Stallman
2009-11-25  1:33                                   ` Kenichi Handa
2009-11-25  2:29                                     ` Displaying bytes (was: Inadequate documentation of silly Stefan Monnier
2009-11-25  2:50                                       ` Lennart Borgman
2009-11-25  6:25                                       ` Stephen J. Turnbull
2009-11-25  5:40                                     ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Ulrich Mueller
2009-11-26 22:59                                       ` Displaying bytes Reiner Steib
2009-11-27  0:16                                         ` Ulrich Mueller
2009-11-27  1:41                                         ` Stefan Monnier
2009-11-27  4:14                                         ` Stephen J. Turnbull
2009-11-25  5:59                                     ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Stephen J. Turnbull
2009-11-25  8:16                                       ` Kenichi Handa
2009-11-29 16:01                                     ` Richard Stallman
2009-11-29 16:31                                       ` Displaying bytes (was: Inadequate documentation of silly Stefan Monnier
2009-11-29 22:01                                         ` Juri Linkov
2009-11-30  6:05                                           ` tomas
2009-11-30 12:09                                             ` Andreas Schwab
2009-11-30 12:39                                               ` tomas
2009-11-29 22:19                                       ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Kim F. Storm
2009-11-30  1:42                                         ` Stephen J. Turnbull
2009-11-24  1:28                               ` Displaying bytes Stefan Monnier
2009-11-24 22:47                                 ` Richard Stallman
2009-11-25  2:18                                   ` Stefan Monnier
2009-11-26  6:24                                     ` Richard Stallman
2009-11-26  8:59                                       ` David Kastrup
2009-11-26 14:57                                       ` Stefan Monnier
2009-11-26 16:28                                         ` Lennart Borgman
2009-11-27  6:36                                         ` Richard Stallman
2009-11-24 22:47                                 ` Richard Stallman
2009-11-20  8:48                       ` Fwd: Re: Inadequate documentation of silly characters on screen Eli Zaretskii
2009-11-19 19:52                   ` Eli Zaretskii
2009-11-19 20:53                     ` Alan Mackenzie
2009-11-19 22:16                       ` David Kastrup
2009-11-20  8:55                         ` Eli Zaretskii
2009-11-19 20:05                   ` Stefan Monnier
2009-11-19 21:27                     ` Alan Mackenzie
2009-11-19 19:43               ` Eli Zaretskii
2009-11-19 21:57                 ` Alan Mackenzie
2009-11-19 23:10                   ` Stefan Monnier
2009-11-19 20:02               ` Stefan Monnier
2009-11-19 14:08     ` Stefan Monnier
2009-11-19 14:50       ` Jason Rumney
2009-11-19 15:27         ` Stefan Monnier
2009-11-19 23:12           ` Miles Bader
2009-11-20  2:16             ` Stefan Monnier
2009-11-20  3:37             ` Stephen J. Turnbull
2009-11-20  4:30               ` Stefan Monnier
2009-11-20  7:18                 ` Stephen J. Turnbull
2009-11-20 14:16                   ` Stefan Monnier
2009-11-21  4:13                     ` Stephen J. Turnbull
2009-11-21  5:24                       ` Stefan Monnier
2009-11-21  6:42                         ` Stephen J. Turnbull
2009-11-21  6:49                           ` Stefan Monnier
2009-11-21  7:27                             ` Stephen J. Turnbull
2009-11-23  1:58                               ` Stefan Monnier
2009-11-21 12:33                           ` David Kastrup [this message]
2009-11-21 13:55                             ` Stephen J. Turnbull
2009-11-21 14:36                               ` David Kastrup
2009-11-21 17:53                                 ` Stephen J. Turnbull
2009-11-21 23:30                                   ` David Kastrup
2009-11-22  1:27                                     ` Sebastian Rose
2009-11-22  8:06                                       ` David Kastrup
2009-11-22 23:52                                         ` Sebastian Rose
2009-11-19 17:08       ` Fwd: " Alan Mackenzie
  -- strict thread matches above, loose matches on Subject: below --
2009-11-18  9:37 Alan Mackenzie
2009-11-18  9:40 ` Miles Bader
2009-11-18 10:15   ` Alan Mackenzie
2009-11-18 12:03     ` Jason Rumney
2009-11-18 15:02     ` Stefan Monnier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lji0awh8.fsf@lola.goethe.zz \
    --to=dak@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.