all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Stephen J. Turnbull" <turnbull@sk.tsukuba.ac.jp>
To: Eli Zaretskii <eliz@gnu.org>
Cc: emacs-devel@gnu.org
Subject: Re: Multibyte and unibyte file names
Date: Sat, 26 Jan 2013 22:03:28 +0900	[thread overview]
Message-ID: <87sj5o2j1b.fsf@uwakimon.sk.tsukuba.ac.jp> (raw)
In-Reply-To: <83mwvwjib3.fsf@gnu.org>

Eli Zaretskii writes:

 > > "Unibyte" as implemented in Emacs is a premature optimization, and a
 > > disaster in search of places to happen.  Remove it, and you'll never
 > > notice it's gone.  The consequence of that removal would be to fix
 > > this problem, permanently.
 > 
 > I don't think you are entirely correct.

My preferred flavor of Emacs never had unibyte.  It's got its problems
in this area, but they're just lazy or over-ambitious programmer bugs,
not a design flaw.

 > We still need to send encoded (unibyte) strings to the outside
 > world.

Of course.  In fact, pretty much all interaction with the outside
world involves byte streams.  The problem Emacs is experiencing here
is that Lisp can see bytes when it is designed only to work with
characters.

 > [Determining file name encoding] a non-issue: we treat unibyte file
 > names as encoded in file-name-coding-system.  Nothing else is
 > supported, or needed.

It is in Japan, where it's still common to have a host whose hard
drive uses UTF-8, mounting EUC-JP-encoded volumes over NFS, and USB
drives with Shift-JIS file names.  I've even seen file names
containing segments encoded variously in KOI8, Shift JIS, *and* EUC-JP
(in Macintosh notation, no less).  Admittedly, not in a very long
time, but it's still *possible* to do that on POSIX systems.

You just can't win in this environment; you will see mojibake, and
sometimes undecodable names, unless you get help from the user.  Such
names can be round-tripped using special "undecodable bytes"
representation (UTF-8B or non-unicode code points).  But if you try to
manipulate those names in Lisp, you will sometimes get incorrect
results.

 > Exactly.  Moreover, what you suggest is a large project that won't
 > happen without a motivated individual.  Given the overall "cannot
 > happen on POSIX, so it's SEP"

It can easily happen on POSIX systems, especially with removable media
or double-booting hosts.  The problem is that most people don't care
about Japanese or Chinese, and of those that do, I'm sure most think
that Shift JIS and Big5 are abominations (except for a few Windows
users).

 > reaction I got to this thread, what do you think are the chances of
 > such a project to materialize any time soon?

Not my problem, either.  My preferred flavor of Emacs hasn't had
unibyte-related issues since 1998.

But I don't see why it should be so difficult.  You already have all
the functions needed to decode byte streams to Lisp strings or
buffers, and that's the normal mode of operation, no?  In fact AFAIK
the set of programs that use the unibyte feature at all is pretty
small, and most of those (like Tramp) do so only in self-defense.



  reply	other threads:[~2013-01-26 13:03 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-23 17:45 Multibyte and unibyte file names Eli Zaretskii
2013-01-23 18:08 ` Paul Eggert
2013-01-23 19:04   ` Eli Zaretskii
2013-01-23 23:38     ` Paul Eggert
2013-01-23 19:42 ` Michael Albinus
2013-01-23 20:05   ` Eli Zaretskii
2013-01-23 20:58     ` Michael Albinus
2013-01-24 16:37       ` Eli Zaretskii
2013-01-23 21:09 ` Stefan Monnier
2013-01-24 17:02   ` Eli Zaretskii
2013-01-24 18:25     ` Stefan Monnier
2013-01-24 18:38       ` Eli Zaretskii
2013-01-25  0:06         ` Stefan Monnier
2013-01-25  7:37           ` Eli Zaretskii
2013-01-25 11:36             ` Stefan Monnier
2013-01-25 20:31               ` Eli Zaretskii
2013-01-25 22:28                 ` Stefan Monnier
2013-01-26 10:54                   ` Eli Zaretskii
2013-01-26 11:34                     ` Stefan Monnier
2013-01-26 13:16                       ` Eli Zaretskii
2013-01-26 22:11                         ` Stefan Monnier
2013-01-27  7:03                           ` Eli Zaretskii
2013-01-27  8:46                             ` Andreas Schwab
2013-01-27  9:40                               ` Eli Zaretskii
2013-01-28  1:55                             ` Stefan Monnier
2013-01-28 14:44                               ` Eli Zaretskii
2013-01-28 15:21                                 ` Stefan Monnier
2013-02-02 17:19                                   ` Eli Zaretskii
2013-01-26 13:20                       ` Stephen J. Turnbull
2013-01-26  3:04                 ` Stephen J. Turnbull
2013-01-26 11:27                   ` Eli Zaretskii
2013-01-26 13:03                     ` Stephen J. Turnbull [this message]
2013-01-26 13:36                       ` Eli Zaretskii
2013-01-26 16:26                         ` Paul Eggert
2013-01-26 18:30                           ` Stephen J. Turnbull
2013-01-26 17:10                         ` Stephen J. Turnbull
2013-01-26 17:33                           ` Eli Zaretskii
2013-01-26 18:06                             ` Paul Eggert
2013-01-26 18:20                               ` Eli Zaretskii
2013-01-26 18:56                             ` Stephen J. Turnbull
2013-01-26 21:40                               ` Stefan Monnier
2013-01-26 21:44                             ` Stefan Monnier
2013-01-27  6:14                               ` Eli Zaretskii
2013-01-26 16:05                   ` Richard Stallman
2013-01-26 17:57                     ` Stephen J. Turnbull
2013-01-26 22:16                     ` Stefan Monnier
2013-01-24 10:00 ` Michael Albinus
2013-01-24 16:40   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sj5o2j1b.fsf@uwakimon.sk.tsukuba.ac.jp \
    --to=turnbull@sk.tsukuba.ac.jp \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.