all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: "Stephen J. Turnbull" <stephen@xemacs.org>
Cc: emacs-devel@gnu.org
Subject: Re: Multibyte and unibyte file names
Date: Sat, 26 Jan 2013 19:33:55 +0200	[thread overview]
Message-ID: <83ehh7kfwc.fsf@gnu.org> (raw)
In-Reply-To: <87mwvv3m5d.fsf@uwakimon.sk.tsukuba.ac.jp>

> From: "Stephen J. Turnbull" <stephen@xemacs.org>
> Cc: emacs-devel@gnu.org
> Date: Sun, 27 Jan 2013 02:10:54 +0900
> 
>  > > My preferred flavor of Emacs never had unibyte.  It's got its problems
>  > > in this area, but they're just lazy or over-ambitious programmer bugs,
>  > > not a design flaw.
>  > 
>  > I can't reason about something I know nothing about.  So this is not a
>  > useful argument.
> 
> Sure it is.  XEmacs is a pretty good facsimile of Emacs-compatibility;
> the regular howls from people who want to support XEmacs when Emacs
> does something to break compability are proof of that.  Nevertheless,
> we've never needed unibyte, and our *-as-unibyte functions are no-ops,
> and nobody has ever complained about that (a fact that remains
> somewhat surprising to me).

Every solution of a problem has its downsides and its upsides.  I'm
saying that I cannot consider them in this case and therefore cannot
tell you whether on balance it is better than what Emacs does now.

>  > > Of course.  In fact, pretty much all interaction with the outside
>  > > world involves byte streams.  The problem Emacs is experiencing here
>  > > is that Lisp can see bytes when it is designed only to work with
>  > > characters.
>  > 
>  > In GNU Emacs, Lisp can work with bytes as well.
> 
> Not very well, historically (\207 bug, the expand-file-name bug Stefan
> mentioned).  Nothing to be ashamed of at the counting bugs level:
> dealing with the bytes/unicode split has cost Python a huge amount of
> effort, and many bugs.  But it was unnecessary in the first place in
> Emacs.

It _is_ necessary because file names passed to system APIs _must_ be
encoded.  That's where the bugs mentioned here (already fixed, btw)
happen: in the implementation of 'stat' we have in Emacs that does a
better job than the MS runtime, and in other similar cases.

> 
>  > That's OK.  Emacs cannot solve these situations, and I didn't try to
>  > target them.  I will be happy enough to correctly support file names
>  > consistently encoded in a single encoding that is the value of
>  > Decoding is not a problem, but it hampers efficiency.
> 
> I'm sorry, but that's, uh, "premature optimization".

It's not premature.  directory-files-and-attributes, used on Windows
to emulate 'ls', must be fast enough even in large directories,
because otherwise Dired will be painfully slow to start.  As things
are, things are too slow already, especially with remote filesystems;
there were bug reports about this last year.  IOW, the current
implementation is already borderline performance-wise.

>  > There's also an associated problem that decoding a file can GC,
>  > which is not good for functions that get 'char *' pointers as
>  > arguments.
> 
> So never give them a char* into a Lisp_String, or inhibit GC when you
> do.  But strncpy is plenty fast for this application[2], one hell of a
> lot faster than the system calls you make to access a filesystem.
> Even strndup is fast enough in our experience.

It's not rocket science, true.  I'm just saying that if it isn't
required, it's best avoided.

>  > > In fact AFAIK the set of programs that use the unibyte feature at
>  > > all is pretty small, and most of those (like Tramp) do so only in
>  > > self-defense.
>  > 
>  > You are thinking on the wrong level.  The problem rears its ugly head
>  > on the C level, not on the Lisp level.  Functions in dired.c and
>  > fileio.c manipulate file names, assuming it is safe to address
>  > individual bytes even if the file name is in some DBCS encoding.
> 
> And that's not mediated by Lisp?  I would be surprised if you find any
> code paths involving dired that grab a filename from the system, pass
> it to a manipulation function, and then try to access the file without
> ever storing it in a Lisp object.[3][4]

I gave examples in this thread that should make you surprised.

In any case, as long as file-name primitives support unibyte (encoded)
file names, there's nothing to prevent such examples from popping up.
Programmers are not disciplined enough to trust them on this.

> [4]  Indeed those were among the earliest files to be fully Mule-ized
> in XEmacs, which in XEmacs means that textual data received from
> outside of XEmacs is immediately converted to internal representation,
> and only converted back to external representation immediately before
> the system library call or kernel call that consumes it.

No such coding standards in Emacs, and the C code does manipulate
unibyte strings as long as they don't need to be passed to Lisp.  I
suggested converting to internal representation at entry to all
primitives in this thread, but it looks like Stefan disagrees, or at
least not completely agrees.



  reply	other threads:[~2013-01-26 17:33 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-23 17:45 Multibyte and unibyte file names Eli Zaretskii
2013-01-23 18:08 ` Paul Eggert
2013-01-23 19:04   ` Eli Zaretskii
2013-01-23 23:38     ` Paul Eggert
2013-01-23 19:42 ` Michael Albinus
2013-01-23 20:05   ` Eli Zaretskii
2013-01-23 20:58     ` Michael Albinus
2013-01-24 16:37       ` Eli Zaretskii
2013-01-23 21:09 ` Stefan Monnier
2013-01-24 17:02   ` Eli Zaretskii
2013-01-24 18:25     ` Stefan Monnier
2013-01-24 18:38       ` Eli Zaretskii
2013-01-25  0:06         ` Stefan Monnier
2013-01-25  7:37           ` Eli Zaretskii
2013-01-25 11:36             ` Stefan Monnier
2013-01-25 20:31               ` Eli Zaretskii
2013-01-25 22:28                 ` Stefan Monnier
2013-01-26 10:54                   ` Eli Zaretskii
2013-01-26 11:34                     ` Stefan Monnier
2013-01-26 13:16                       ` Eli Zaretskii
2013-01-26 22:11                         ` Stefan Monnier
2013-01-27  7:03                           ` Eli Zaretskii
2013-01-27  8:46                             ` Andreas Schwab
2013-01-27  9:40                               ` Eli Zaretskii
2013-01-28  1:55                             ` Stefan Monnier
2013-01-28 14:44                               ` Eli Zaretskii
2013-01-28 15:21                                 ` Stefan Monnier
2013-02-02 17:19                                   ` Eli Zaretskii
2013-01-26 13:20                       ` Stephen J. Turnbull
2013-01-26  3:04                 ` Stephen J. Turnbull
2013-01-26 11:27                   ` Eli Zaretskii
2013-01-26 13:03                     ` Stephen J. Turnbull
2013-01-26 13:36                       ` Eli Zaretskii
2013-01-26 16:26                         ` Paul Eggert
2013-01-26 18:30                           ` Stephen J. Turnbull
2013-01-26 17:10                         ` Stephen J. Turnbull
2013-01-26 17:33                           ` Eli Zaretskii [this message]
2013-01-26 18:06                             ` Paul Eggert
2013-01-26 18:20                               ` Eli Zaretskii
2013-01-26 18:56                             ` Stephen J. Turnbull
2013-01-26 21:40                               ` Stefan Monnier
2013-01-26 21:44                             ` Stefan Monnier
2013-01-27  6:14                               ` Eli Zaretskii
2013-01-26 16:05                   ` Richard Stallman
2013-01-26 17:57                     ` Stephen J. Turnbull
2013-01-26 22:16                     ` Stefan Monnier
2013-01-24 10:00 ` Michael Albinus
2013-01-24 16:40   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83ehh7kfwc.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=stephen@xemacs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.