From: Eli Zaretskii <eliz@gnu.org>
To: "Mattias Engdegård" <mattiase@acm.org>
Cc: 40407@debbugs.gnu.org
Subject: bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE
Date: Sat, 04 Apr 2020 12:26:11 +0300 [thread overview]
Message-ID: <83mu7rvbyk.fsf@gnu.org> (raw)
In-Reply-To: <AD8D9685-9148-4BEA-893C-73943E5579DF@acm.org> (message from Mattias Engdegård on Sat, 4 Apr 2020 00:32:21 +0200)
> From: Mattias Engdegård <mattiase@acm.org>
> Date: Sat, 4 Apr 2020 00:32:21 +0200
> Cc: 40407@debbugs.gnu.org
>
> - file-relative-name 141,551 15%
> - file-name-case-insensitive-p 100,613 11%
> - ucs-normalize-hfs-nfd-pre-write-conversion 100,613 11%
> - ucs-normalize-HFS-NFD-region 100,613 11%
> ucs-normalize-region 100,613 11%
> - expand-file-name 40,828 4%
> - ucs-normalize-hfs-nfd-post-read-conversion 40,828 4%
> - ucs-normalize-HFS-NFC-region 40,828 4%
> ucs-normalize-region 40,828 4%
>
> where file_name_case_insensitive_p calls ENCODE_FILE and expand_file_name calls DECODE_FILE.
DECODE_FILE is called because the file name in question starts with a
"~"? Otherwise, I don't think I understand why would expand-file-name
need to decode a file name.
> I'm not sure how much each part of ucs-normalize-region actually consumes, but I think we can agree that we don't want it called on any platform unless strictly necessary.
Any expensive code should be avoided if it isn't necessary, so yes, I
agree. And yes, Unicode normalization is expensive. If we consider
the macOS filesystem idiosyncrasies important to support efficiently,
perhaps we should rewrite the normalization code in C.
> > I don't think every encoding is ASCII compatible, so I don't see how
> > we can assume that in general. But the check whether an encoding is
> > ASCII-compatible takes a negligible amount of time, so why bother with
> > such an assumption?
>
> Quite, I just thought I'd ask in case there were some unwritten invariant that you knew about.
Whether a coding-system is ASCII-compatible is determined by the
definition of that coding-system. Look in mule-conf.el, and you will
see there several that aren't ASCII-compatible. UTF-16 is one
example, but there are others.
> > I don't think we can return the same string if NOCOPY is non-zero.
> > The callers might not expect that, and you might inadvertently cause
> > the original string be modified behind the caller's back.
>
> You are no doubt correct, but doesn't it look like the sense of NOCOPY has been inverted here?
That ship has sailed long ago (I could explain how this "inverted"
meaning could make sense, but I don't think it's relevant to the issue
at hand), and there are several other internal functions that use a
similar argument in the same "inverted" sense. This is a separate
issue, anyway.
> Since string mutation is so rare, I doubt it has caused any real trouble.
You are wrong here, it can happen very easily, especially when you
manipulate the encoded string in C. The simplest use case is that you
encode a file name, and then make some change to the encoded string,
like change the letter-case or remove the trailing slash. Suddenly
the original string is changed as well, and the Lisp caller of the
high-level function might be mightily surprised by the result.
IME, the cases where we can safely assume it's OK to return the same
string are actually very rare. It is no accident that you saw so few
calls of these functions where we use that optional behavior.
> Now, do we fix it by inverting the sense of the argument, or by renaming it to COPY?
Neither, IMO. Again, it's a separate problem, and let's keep our
sights squarely on the original issue you wanted to fix. Let's tackle
the NOCOPY issue in a separate discussion, OK?
Thanks.
next prev parent reply other threads:[~2020-04-04 9:26 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-03 14:18 bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE Mattias Engdegård
2020-04-03 16:24 ` Eli Zaretskii
2020-04-03 22:32 ` Mattias Engdegård
2020-04-04 9:26 ` Eli Zaretskii [this message]
2020-04-04 16:41 ` Mattias Engdegård
2020-04-04 17:22 ` Eli Zaretskii
2020-04-04 17:37 ` Eli Zaretskii
2020-04-04 18:06 ` Mattias Engdegård
2020-04-05 2:37 ` Eli Zaretskii
2020-04-05 3:42 ` Eli Zaretskii
2020-04-05 10:14 ` Mattias Engdegård
2020-04-05 13:28 ` Eli Zaretskii
2020-04-05 13:40 ` Mattias Engdegård
2020-04-04 10:26 ` Eli Zaretskii
2020-04-04 16:55 ` Mattias Engdegård
2020-04-04 17:04 ` Eli Zaretskii
2020-04-04 18:01 ` Mattias Engdegård
2020-04-04 18:25 ` Eli Zaretskii
2020-04-05 10:48 ` Mattias Engdegård
2020-04-05 13:39 ` Eli Zaretskii
2020-04-05 15:03 ` Mattias Engdegård
2020-04-05 15:35 ` Mattias Engdegård
2020-04-05 15:56 ` Eli Zaretskii
2020-04-06 18:13 ` Mattias Engdegård
2020-04-05 16:00 ` Eli Zaretskii
2020-04-06 10:10 ` OGAWA Hirofumi
2020-04-06 14:21 ` Eli Zaretskii
2020-04-06 15:56 ` Mattias Engdegård
2020-04-06 16:33 ` Eli Zaretskii
2020-04-06 16:55 ` Mattias Engdegård
2020-04-06 17:18 ` Eli Zaretskii
2020-04-06 17:49 ` Mattias Engdegård
2020-04-06 18:20 ` Eli Zaretskii
2020-04-06 18:34 ` OGAWA Hirofumi
2020-04-06 21:57 ` Mattias Engdegård
2020-04-09 11:03 ` Mattias Engdegård
2020-04-09 14:09 ` Kazuhiro Ito
2020-04-09 14:22 ` Mattias Engdegård
2020-04-11 15:09 ` Mattias Engdegård
2020-04-16 13:11 ` handa
2020-04-16 13:44 ` Eli Zaretskii
2020-04-16 13:59 ` Mattias Engdegård
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=83mu7rvbyk.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=40407@debbugs.gnu.org \
--cc=mattiase@acm.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).