unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: "Mattias Engdegård" <mattiase@acm.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 40407@debbugs.gnu.org
Subject: bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE
Date: Sat, 4 Apr 2020 00:32:21 +0200	[thread overview]
Message-ID: <AD8D9685-9148-4BEA-893C-73943E5579DF@acm.org> (raw)
In-Reply-To: <835zegwn9y.fsf@gnu.org>

3 apr. 2020 kl. 18.24 skrev Eli Zaretskii <eliz@gnu.org>:

> AFAIR, on macOS the situation is worse than elsewhere, because of the
> normalization thing.

Very likely. It's just what I had in my lap.

> Can you tell more about the conversion steps and the memory each one
> allocates?

Courtesy the memory profiler:

         - file-relative-name                                 141,551  15%
          - file-name-case-insensitive-p                      100,613  11%
           - ucs-normalize-hfs-nfd-pre-write-conversion       100,613  11%
            - ucs-normalize-HFS-NFD-region                    100,613  11%
               ucs-normalize-region                           100,613  11%
          - expand-file-name                                   40,828   4%
           - ucs-normalize-hfs-nfd-post-read-conversion        40,828   4%
            - ucs-normalize-HFS-NFC-region                     40,828   4%
               ucs-normalize-region                            40,828   4%

where file_name_case_insensitive_p calls ENCODE_FILE and expand_file_name calls DECODE_FILE. I'm not sure how much each part of ucs-normalize-region actually consumes, but I think we can agree that we don't want it called on any platform unless strictly necessary.

> I don't think every encoding is ASCII compatible, so I don't see how
> we can assume that in general.  But the check whether an encoding is
> ASCII-compatible takes a negligible amount of time, so why bother with
> such an assumption?

Quite, I just thought I'd ask in case there were some unwritten invariant that you knew about.

> I'm not sure I understand what you mean by extending the shortcut to
> decode_file_name.  Please elaborate.

Never mind, it was an under-thought idea. The existing bootstrap hack making encode_file_name identity for any unibyte string does not seem to need or allow any symmetry in decode_file_name.

> I don't think we can return the same string if NOCOPY is non-zero.
> The callers might not expect that, and you might inadvertently cause
> the original string be modified behind the caller's back.

You are no doubt correct, but doesn't it look like the sense of NOCOPY has been inverted here? It runs contrary to the intuitive meaning and to the doc string of {encode,decode}-coding-string. In fact:

(let* ((nocopy nil)
       (x "abc")
       (y (decode-coding-string x nil nocopy nil)))
  (eq x y))
=> t

Looks like we suddenly got more work on our hands. What a surprise.

Since string mutation is so rare, I doubt it has caused any real trouble. Now, do we fix it by inverting the sense of the argument, or by renaming it to COPY? I'm fairly neutral, but there are arguments in either way, both in terms of performance and correctness. And what about internal calls to code_convert_string?

There are 193 calls to {encode, decode}-coding-string in the Emacs tree, and only 14 of them pass a non-nil value to NOCOPY. I'd be inclined to keep the semantics but rename the argument to COPY, on the grounds that no-copy is a better default; then change those 14 calls to pass nil instead, since that obviously was the intent.






  reply	other threads:[~2020-04-03 22:32 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-03 14:18 bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE Mattias Engdegård
2020-04-03 16:24 ` Eli Zaretskii
2020-04-03 22:32   ` Mattias Engdegård [this message]
2020-04-04  9:26     ` Eli Zaretskii
2020-04-04 16:41       ` Mattias Engdegård
2020-04-04 17:22         ` Eli Zaretskii
2020-04-04 17:37           ` Eli Zaretskii
2020-04-04 18:06             ` Mattias Engdegård
2020-04-05  2:37               ` Eli Zaretskii
2020-04-05  3:42                 ` Eli Zaretskii
2020-04-05 10:14           ` Mattias Engdegård
2020-04-05 13:28             ` Eli Zaretskii
2020-04-05 13:40               ` Mattias Engdegård
2020-04-04 10:26     ` Eli Zaretskii
2020-04-04 16:55       ` Mattias Engdegård
2020-04-04 17:04         ` Eli Zaretskii
2020-04-04 18:01           ` Mattias Engdegård
2020-04-04 18:25             ` Eli Zaretskii
2020-04-05 10:48               ` Mattias Engdegård
2020-04-05 13:39                 ` Eli Zaretskii
2020-04-05 15:03                   ` Mattias Engdegård
2020-04-05 15:35                     ` Mattias Engdegård
2020-04-05 15:56                       ` Eli Zaretskii
2020-04-06 18:13                         ` Mattias Engdegård
2020-04-05 16:00                     ` Eli Zaretskii
2020-04-06 10:10   ` OGAWA Hirofumi
2020-04-06 14:21     ` Eli Zaretskii
2020-04-06 15:56       ` Mattias Engdegård
2020-04-06 16:33         ` Eli Zaretskii
2020-04-06 16:55           ` Mattias Engdegård
2020-04-06 17:18             ` Eli Zaretskii
2020-04-06 17:49               ` Mattias Engdegård
2020-04-06 18:20                 ` Eli Zaretskii
2020-04-06 18:34                   ` OGAWA Hirofumi
2020-04-06 21:57                     ` Mattias Engdegård
2020-04-09 11:03                     ` Mattias Engdegård
2020-04-09 14:09                       ` Kazuhiro Ito
2020-04-09 14:22                         ` Mattias Engdegård
2020-04-11 15:09                       ` Mattias Engdegård
2020-04-16 13:11       ` handa
2020-04-16 13:44         ` Eli Zaretskii
2020-04-16 13:59           ` Mattias Engdegård

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AD8D9685-9148-4BEA-893C-73943E5579DF@acm.org \
    --to=mattiase@acm.org \
    --cc=40407@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).