unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: Simon Krahnke <overlord@gmx.li>, emacs-devel@gnu.org
Subject: Re: Improvements to `(emacs)File Variables'
Date: Mon, 15 Nov 2004 00:15:05 -0500	[thread overview]
Message-ID: <87actj1zba.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <buo3bzbemrp.fsf@mctpc71.ucom.lsi.nec.co.jp> (Miles Bader's message of "Mon, 15 Nov 2004 13:53:14 +0900")

> I'm not sure.  "Unibyte" as used in emacs seems (to me) to imply several
> things:  (1) of course, a single byte per character, (2) the concept of
> strings/buffers whose encoding is "unknown".

> If you were to consistently treat (2) as in fact meaning an explicit
> "binary" encoding, maybe it would be useful, but my impression is that
> at least historically, people/code have _not_ always done this, leading
> to lots and lots of confusion.  I suppose much of the reason is that
> people want the efficiency gain of (1), and either don't realize the
> problems caused by (2) or think they can kludge around it.

> As I've posted before, I think "unibyte" strings/buffers should be only
> an optimization, and should have an explicit (8-bit) encoding associated
> with them, so that any conversions to/from multibyte can automatically
> do the correct thing; one of these encoding could of course be "binary",
> which maybe would allow the historical usage of unibyte to be preserved.

I'd tend to disagree on the idea of associating an encoding with
unibyte buffers.  I think a large part of the problem is that people with
a unibyte background (i.e. latin-1 mostly) typically confuse the notion of
character and byte and mix things up hopelessly.

In Emacs-20, automatic conversion between unibyte and multibyte was provided
mostly as a way to work "correctly" even with confused code which didn't
understand that there's more than 256 characters in this world.

It made sense at the time to avoid alienating too many Emacs coders.
But to get things right, the first thing we need to do is to make it very
clear that there is no way to automatically convert between unibyte
and multibyte.  Such a conversion should only be doable via
(en|de)coding-coding-foo functions, thus forcing anyone who wants to go down
that path to actually provide a coding system explicitly and thus to think
of what coding system should be used.

After all, autoconversion can only work for 8bit encoding, so any code which
uses autoconversion is in two possible cases:
1 - the code somehow knows that all the possible encodings it might need to
    use there are 8bit.  Most likely, it's the case where there's only ever
    one encoding used.
2 - the code *doesn't* know, but just assumes (probably without even being
    aware of it) that all encodings are 8bit.  Thus it will break if used
    in China, Japan, ...
Situation 2 is a bug.  Situation 1 seems rather unusual.  My conclusion is
that autoconversion is harmful.

I've hacked my own local Emacs to "disallow" autoconversion
(i.e. auto-conversion from unibyte->multibyte is allowed and generates
eight-bit-control and eight-bit-graphic chars; auto-conversion from
multibyte to unibyte is allowed but only for ascii, eight-bit-graphic, and
eight-bit-control chars, any other char causes an error).  It actually works
fairly well.  The main problems I encounter have to do with regexp matching
where the regexp is multibyte and the text is unibyte.


        Stefan

  reply	other threads:[~2004-11-15  5:15 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-11-14 19:02 Improvements to `(emacs)File Variables' Reiner Steib
2004-11-14 21:12 ` Stefan Monnier
2004-11-14 23:26   ` Miles Bader
2004-11-14 23:46     ` Stefan Monnier
2004-11-14 23:55       ` Miles Bader
2004-11-15  0:18         ` Stefan Monnier
2004-11-15  4:53           ` Miles Bader
2004-11-15  5:15             ` Stefan Monnier [this message]
2004-11-16 16:48             ` Richard Stallman
2004-11-16 16:49     ` Richard Stallman
2004-11-16 16:59       ` Stefan Monnier
2004-11-18  2:55         ` Richard Stallman
2004-11-18 16:47           ` Stefan Monnier
2004-11-18 17:07             ` Simon Krahnke
2004-11-18 18:04               ` Stefan Monnier
2004-11-19  1:23                 ` Info-search-whitespace (Was: Improvements to `(emacs)File Variables') Juri Linkov
2004-11-19  5:06                   ` Info-search-whitespace Stefan Monnier
2004-11-19 17:48                     ` Info-search-whitespace Juri Linkov
2004-11-19 20:04                     ` Info-search-whitespace Richard Stallman
2004-11-19 20:41                       ` Info-search-whitespace David Kastrup
2004-11-21 15:39                         ` Info-search-whitespace Richard Stallman
2004-11-21 16:09                           ` Info-search-whitespace David Kastrup
2004-11-22  0:18                           ` Info-search-whitespace Stefan Monnier
2004-11-23 16:30                             ` Info-search-whitespace Richard Stallman
2004-11-19  7:15                   ` Info-search-whitespace (Was: Improvements to `(emacs)File Variables') Eli Zaretskii
2004-11-19  2:25             ` Improvements to `(emacs)File Variables' Richard Stallman
2004-11-29 19:04               ` Reiner Steib

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87actj1zba.fsf-monnier+emacs@gnu.org \
    --to=monnier@iro.umontreal.ca \
    --cc=emacs-devel@gnu.org \
    --cc=overlord@gmx.li \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).