all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: David Kastrup <dak@gnu.org>
To: emacs-devel@gnu.org
Subject: Re: Inadequate documentation of silly characters on screen.
Date: Sat, 21 Nov 2009 15:36:49 +0100	[thread overview]
Message-ID: <877htkaqri.fsf@lola.goethe.zz> (raw)
In-Reply-To: 876394dlsr.fsf@uwakimon.sk.tsukuba.ac.jp

"Stephen J. Turnbull" <stephen@xemacs.org> writes:

> David Kastrup writes:
>
>  > > However, I think a well-behaved platform should by default error
>  > > (something derived from invalid-state, in XEmacs's error
>  > > hierarchy) in such a case; normally this means corruption in the
>  > > file.
>  > 
>  > We take care that it does not mean corruption.
>
> I meant pre-existing corruption [...]

That interpretation is not the business of the editor.  It may decide to
give a warning, but refusing to work at all does not increase its
usefulness.

>  > And more often it means that you might have been loading with the
>  > wrong encoding (people do that all the time).  If you edit some
>  > innocent ASCII part
>
> You can't do that if the file is not in a buffer because the encoding
> error aborted the conversion.

Not being able to do what I want is not a particularly enticing feature.

> Aborting the conversion is what the Unicode Consortium requires, too,
> IIRC:

An editor is not the same as a validator.  It's not its business to
decide what files I should be allowed to work with.

> errors in UTF-8 (or any other UTF for that matter) are considered
> *fatal* by the standard.  Exactly what that means is up to the
> application to decide.  One plausible approach would be to do what you
> do now, but make the buffer read-only.

Making the buffer read-only is a reasonable thing to do if it can't
possibly be written back unchanged.  For example, if I load a file in
latin-1 and insert a few non-latin-1 characters.  In this case Emacs
should not just silently write the file in utf-8 because that changes
the encoding of some preexisting characters.  The situation is different
if I load a pure ASCII file: in that case, the utf-8 decision is
feasible when compatible with the environment.

>  > Sometimes there is no "right encoding".
>
> So what?  The point is that there certainly are *wrong* encodings,
> namely ones that will result in corruption if you try to save the file
> in that encoding.

But we have a fair amount of encodings (those without escape characters
IIRC) which don't imply corruption when saving.  And that is a good
feature for an editor.  For example, when working with version control
systems, you want minimal diffs.  Encoding systems with escape
characters are not good for that.  I would strongly advise against Emacs
picking any escape-character based encoding (or otherwise
non-byte-stream-preserving) automatically.

Less breakage is always a good thing.

> But when faced with ambiguity, it is best to refuse to guess.

You don't need to guess if you just preserve the byte sequence.  That
makes it somebody else's problem.  The GNU utilities have always made it
a point to work with arbitrary input without insisting on it being
"sensible".  Historically, most Unix utilities just crashed when you fed
them arbitrary garbage.  They have taken a lesson from GNU nowadays.

And I consider it a good lesson.

>  > We currently _have_ [a scheme for encoding invalid sequences of
>  > code units] in place.  We just use different Unicode-invalid code
>  > points [from Python].
>
> Conceded.  I realized that later; the important difference is that
> Python only uses that scheme when explicitly requested.

All in all, it is nobody else's business what encoding Emacs uses for
internal purposes.  Making Emacs preserve byte streams means that the
user has to worry less, not more, about what Emacs might be able to work
with.  The Emacs 23 internal encoding does a better job not getting into
the hair of users with encoding issues than Emacs 22 did, because of a
better correspondence with external encodings.  But ideally, the user
should not have to worry about the difference.

-- 
David Kastrup





  reply	other threads:[~2009-11-21 14:36 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-18 19:12 [acm@muc.de: Re: Inadequate documentation of silly characters on screen.] Alan Mackenzie
2009-11-19  1:27 ` Fwd: Re: Inadequate documentation of silly characters on screen Stefan Monnier
2009-11-19  8:20   ` Alan Mackenzie
2009-11-19  8:50     ` Miles Bader
2009-11-19 10:16     ` Fwd: " Andreas Schwab
2009-11-19 12:21       ` Alan Mackenzie
2009-11-19 13:21       ` Jason Rumney
2009-11-19 13:35         ` Stefan Monnier
2009-11-19 14:18         ` Alan Mackenzie
2009-11-19 14:58           ` Jason Rumney
2009-11-19 15:42             ` Alan Mackenzie
2009-11-19 19:39               ` Eli Zaretskii
2009-11-19 15:30           ` Stefan Monnier
2009-11-19 15:58             ` Alan Mackenzie
2009-11-19 16:06               ` Andreas Schwab
2009-11-19 16:47               ` Aidan Kehoe
2009-11-19 17:29                 ` Alan Mackenzie
2009-11-19 18:21                   ` Aidan Kehoe
2009-11-20  2:43                   ` Stephen J. Turnbull
2009-11-19 19:45                 ` Eli Zaretskii
2009-11-19 20:07                   ` Eli Zaretskii
2009-11-19 19:55                 ` Stefan Monnier
2009-11-20  3:13                   ` Stephen J. Turnbull
2009-11-19 16:55               ` David Kastrup
2009-11-19 18:08                 ` Alan Mackenzie
2009-11-19 19:25                   ` Davis Herring
2009-11-19 21:25                     ` Alan Mackenzie
2009-11-19 22:31                       ` David Kastrup
2009-11-21 22:52                         ` Richard Stallman
2009-11-23  2:08                           ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Stefan Monnier
2009-11-23 20:38                             ` Richard Stallman
2009-11-23 21:34                               ` Per Starbäck
2009-11-24 22:47                                 ` Richard Stallman
2009-11-25  1:33                                   ` Kenichi Handa
2009-11-25  2:29                                     ` Displaying bytes (was: Inadequate documentation of silly Stefan Monnier
2009-11-25  2:50                                       ` Lennart Borgman
2009-11-25  6:25                                       ` Stephen J. Turnbull
2009-11-25  5:40                                     ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Ulrich Mueller
2009-11-26 22:59                                       ` Displaying bytes Reiner Steib
2009-11-27  0:16                                         ` Ulrich Mueller
2009-11-27  1:41                                         ` Stefan Monnier
2009-11-27  4:14                                         ` Stephen J. Turnbull
2009-11-25  5:59                                     ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Stephen J. Turnbull
2009-11-25  8:16                                       ` Kenichi Handa
2009-11-29 16:01                                     ` Richard Stallman
2009-11-29 16:31                                       ` Displaying bytes (was: Inadequate documentation of silly Stefan Monnier
2009-11-29 22:01                                         ` Juri Linkov
2009-11-30  6:05                                           ` tomas
2009-11-30 12:09                                             ` Andreas Schwab
2009-11-30 12:39                                               ` tomas
2009-11-29 22:19                                       ` Displaying bytes (was: Inadequate documentation of silly characters on screen.) Kim F. Storm
2009-11-30  1:42                                         ` Stephen J. Turnbull
2009-11-24  1:28                               ` Displaying bytes Stefan Monnier
2009-11-24 22:47                                 ` Richard Stallman
2009-11-25  2:18                                   ` Stefan Monnier
2009-11-26  6:24                                     ` Richard Stallman
2009-11-26  8:59                                       ` David Kastrup
2009-11-26 14:57                                       ` Stefan Monnier
2009-11-26 16:28                                         ` Lennart Borgman
2009-11-27  6:36                                         ` Richard Stallman
2009-11-24 22:47                                 ` Richard Stallman
2009-11-20  8:48                       ` Fwd: Re: Inadequate documentation of silly characters on screen Eli Zaretskii
2009-11-19 19:52                   ` Eli Zaretskii
2009-11-19 20:53                     ` Alan Mackenzie
2009-11-19 22:16                       ` David Kastrup
2009-11-20  8:55                         ` Eli Zaretskii
2009-11-19 20:05                   ` Stefan Monnier
2009-11-19 21:27                     ` Alan Mackenzie
2009-11-19 19:43               ` Eli Zaretskii
2009-11-19 21:57                 ` Alan Mackenzie
2009-11-19 23:10                   ` Stefan Monnier
2009-11-19 20:02               ` Stefan Monnier
2009-11-19 14:08     ` Stefan Monnier
2009-11-19 14:50       ` Jason Rumney
2009-11-19 15:27         ` Stefan Monnier
2009-11-19 23:12           ` Miles Bader
2009-11-20  2:16             ` Stefan Monnier
2009-11-20  3:37             ` Stephen J. Turnbull
2009-11-20  4:30               ` Stefan Monnier
2009-11-20  7:18                 ` Stephen J. Turnbull
2009-11-20 14:16                   ` Stefan Monnier
2009-11-21  4:13                     ` Stephen J. Turnbull
2009-11-21  5:24                       ` Stefan Monnier
2009-11-21  6:42                         ` Stephen J. Turnbull
2009-11-21  6:49                           ` Stefan Monnier
2009-11-21  7:27                             ` Stephen J. Turnbull
2009-11-23  1:58                               ` Stefan Monnier
2009-11-21 12:33                           ` David Kastrup
2009-11-21 13:55                             ` Stephen J. Turnbull
2009-11-21 14:36                               ` David Kastrup [this message]
2009-11-21 17:53                                 ` Stephen J. Turnbull
2009-11-21 23:30                                   ` David Kastrup
2009-11-22  1:27                                     ` Sebastian Rose
2009-11-22  8:06                                       ` David Kastrup
2009-11-22 23:52                                         ` Sebastian Rose
2009-11-19 17:08       ` Fwd: " Alan Mackenzie
  -- strict thread matches above, loose matches on Subject: below --
2009-11-18  9:37 Alan Mackenzie
2009-11-18  9:40 ` Miles Bader
2009-11-18 10:15   ` Alan Mackenzie
2009-11-18 12:03     ` Jason Rumney
2009-11-18 15:02     ` Stefan Monnier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877htkaqri.fsf@lola.goethe.zz \
    --to=dak@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.