From: Eli Zaretskii <eliz@gnu.org>
To: "Mattias Engdegård" <mattiase@acm.org>
Cc: 40407@debbugs.gnu.org
Subject: bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE
Date: Fri, 03 Apr 2020 19:24:09 +0300 [thread overview]
Message-ID: <835zegwn9y.fsf@gnu.org> (raw)
In-Reply-To: <805F9723-8298-4FD7-A47B-1E683721A5B0@acm.org> (message from Mattias Engdegård on Fri, 3 Apr 2020 16:18:43 +0200)
> From: Mattias Engdegård <mattiase@acm.org>
> Date: Fri, 3 Apr 2020 16:18:43 +0200
>
> ENCODE_FILE and DECODE_FILE turn out to be surprisingly slow, and allocate copious amounts of memory, to the point that they often turn up in both memory and cpu profiles. (This is on macOS; I haven't checked the situation elsewhere.)
AFAIR, on macOS the situation is worse than elsewhere, because of the
normalization thing.
> For instance, a single call to file-relative-name, with ASCII-only arguments, manages to allocate 140 KiB. There are several conversion steps each involving creating temporary buffers as well as the compilation and execution of very large "quick-check" regexps. Example:
>
> (progn
> (require 'profiler)
> (profiler-reset)
> (garbage-collect)
> (profiler-start 'mem)
> (file-relative-name "abc")
> (profiler-stop)
> (profiler-report))
Can you tell more about the conversion steps and the memory each one
allocates?
> Perhaps we can assume that file names codings are always ASCII-compatible
I don't think every encoding is ASCII compatible, so I don't see how
we can assume that in general. But the check whether an encoding is
ASCII-compatible takes a negligible amount of time, so why bother with
such an assumption?
> There is already a hack in encode_file_name that assumes that no unibyte string ever needs encoding; if so, the shortcut could perhaps be extended to decode_file_name and simplified.
I'm not sure I understand what you mean by extending the shortcut to
decode_file_name. Please elaborate.
> - if (BUFFERP (dst_object))
> + if (EQ (dst_object, Qt))
> + {
> + /* Fast path for ASCII-only input and an ASCII-compatible coding:
> + act as identity. */
> + Lisp_Object attrs = CODING_ID_ATTRS (coding.id);
> + if (! NILP (CODING_ATTR_ASCII_COMPAT (attrs))
> + && (STRING_MULTIBYTE (string)
> + ? (chars == bytes) : string_ascii_p (string)))
> + return string;
I don't think we can return the same string if NOCOPY is non-zero.
The callers might not expect that, and you might inadvertently cause
the original string be modified behind the caller's back.
But if NOCOPY is 'false', I think this change is OK. Just make sure
the test suite doesn't start failing, maybe there's something else we
are missing.
Thanks.
next prev parent reply other threads:[~2020-04-03 16:24 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-03 14:18 bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE Mattias Engdegård
2020-04-03 16:24 ` Eli Zaretskii [this message]
2020-04-03 22:32 ` Mattias Engdegård
2020-04-04 9:26 ` Eli Zaretskii
2020-04-04 16:41 ` Mattias Engdegård
2020-04-04 17:22 ` Eli Zaretskii
2020-04-04 17:37 ` Eli Zaretskii
2020-04-04 18:06 ` Mattias Engdegård
2020-04-05 2:37 ` Eli Zaretskii
2020-04-05 3:42 ` Eli Zaretskii
2020-04-05 10:14 ` Mattias Engdegård
2020-04-05 13:28 ` Eli Zaretskii
2020-04-05 13:40 ` Mattias Engdegård
2020-04-04 10:26 ` Eli Zaretskii
2020-04-04 16:55 ` Mattias Engdegård
2020-04-04 17:04 ` Eli Zaretskii
2020-04-04 18:01 ` Mattias Engdegård
2020-04-04 18:25 ` Eli Zaretskii
2020-04-05 10:48 ` Mattias Engdegård
2020-04-05 13:39 ` Eli Zaretskii
2020-04-05 15:03 ` Mattias Engdegård
2020-04-05 15:35 ` Mattias Engdegård
2020-04-05 15:56 ` Eli Zaretskii
2020-04-06 18:13 ` Mattias Engdegård
2020-04-05 16:00 ` Eli Zaretskii
2020-04-06 10:10 ` OGAWA Hirofumi
2020-04-06 14:21 ` Eli Zaretskii
2020-04-06 15:56 ` Mattias Engdegård
2020-04-06 16:33 ` Eli Zaretskii
2020-04-06 16:55 ` Mattias Engdegård
2020-04-06 17:18 ` Eli Zaretskii
2020-04-06 17:49 ` Mattias Engdegård
2020-04-06 18:20 ` Eli Zaretskii
2020-04-06 18:34 ` OGAWA Hirofumi
2020-04-06 21:57 ` Mattias Engdegård
2020-04-09 11:03 ` Mattias Engdegård
2020-04-09 14:09 ` Kazuhiro Ito
2020-04-09 14:22 ` Mattias Engdegård
2020-04-11 15:09 ` Mattias Engdegård
2020-04-16 13:11 ` handa
2020-04-16 13:44 ` Eli Zaretskii
2020-04-16 13:59 ` Mattias Engdegård
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=835zegwn9y.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=40407@debbugs.gnu.org \
--cc=mattiase@acm.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).