unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Stefan Monnier via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 56469@debbugs.gnu.org
Subject: bug#56469: 29.0.50; Unibyte dir in directory_files_internal
Date: Sun, 10 Jul 2022 10:23:28 -0400	[thread overview]
Message-ID: <jwvfsj99hsi.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <83y1x2177x.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 09 Jul 2022 21:17:22 +0300")

> Please bootstrap Emacs in a directory with such a name, and if that
> works, I'm okay with installing this change.

Pushed, thanks.

W.r.t to the comment, it's indeed unrelated to the patch (other than
the fact that it touches the same code).  The question is when we do:

	  finalname = (nchars == nbytes)
	              ? make_uninit_string (nbytes)
	              : make_uninit_multibyte_string (nchars, nbytes);

the actual bytes are "decoded" (i.e. in our internal UTF-8 encoding), so
(nchars == nbytes) checks whether its "pure ASCII" or not and if it's
pure ASCII we return a unibyte string.

Our file-name manipulation routines always consider unibyte-ASCII and
multibyte-ASCII as "equivalent", and indeed DECODE_FILE and ENCODE_FILE
take advantage of that so as to return their argument as-is when it's
all-ASCII so as to avoid allocating a string unnecessarily.

So in the above code snippet, when the string is all-ASCII, we actually
have a choice, and both a unibyte string and a multibyte string should
work.  Currently in that case we return a unibyte string, but I think in
such cases we're better off returning a multibyte string because the
subsequent "all-ASCII" test (that DE/ENCODE_FILE will perform when we
pass that filename to some further operation) will be more efficient
(it's a constant-time (nchars == nbytes) test whereas when the string is
unibyte it requires looking at each and every byte).

IOW, while it makes sense to return a "decoded unibyte" string from
DECODE_FILE in order to avoid an allocation, I don't think it makes
sense to return such a "decoded unibyte" string when we have to allocate
a new string anyway.


        Stefan






  parent reply	other threads:[~2022-07-10 14:23 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-09 17:44 bug#56469: 29.0.50; Unibyte dir in directory_files_internal Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-07-09 18:17 ` Eli Zaretskii
2022-07-09 18:20   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-07-09 18:53     ` Eli Zaretskii
2022-07-10 14:23   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors [this message]
2022-07-10 14:32     ` Eli Zaretskii
2022-07-10 14:58       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-07-10 15:07         ` Eli Zaretskii
2022-07-10 15:19           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-07-10 15:41             ` Eli Zaretskii
2022-07-10 22:13               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-07-11  2:27                 ` Eli Zaretskii
2022-09-05 19:21               ` Lars Ingebrigtsen
2022-09-07 13:32                 ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jwvfsj99hsi.fsf-monnier+emacs@gnu.org \
    --to=bug-gnu-emacs@gnu.org \
    --cc=56469@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).