From: "Mattias Engdegård" <mattiase@acm.org>
To: 40407@debbugs.gnu.org
Subject: bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE
Date: Fri, 3 Apr 2020 16:18:43 +0200 [thread overview]
Message-ID: <805F9723-8298-4FD7-A47B-1E683721A5B0@acm.org> (raw)
[-- Attachment #1: Type: text/plain, Size: 1280 bytes --]
ENCODE_FILE and DECODE_FILE turn out to be surprisingly slow, and allocate copious amounts of memory, to the point that they often turn up in both memory and cpu profiles. (This is on macOS; I haven't checked the situation elsewhere.)
For instance, a single call to file-relative-name, with ASCII-only arguments, manages to allocate 140 KiB. There are several conversion steps each involving creating temporary buffers as well as the compilation and execution of very large "quick-check" regexps. Example:
(progn
(require 'profiler)
(profiler-reset)
(garbage-collect)
(profiler-start 'mem)
(file-relative-name "abc")
(profiler-stop)
(profiler-report))
This applies to just about every function dealing with files or file names.
The attached patch is somewhat conservatively written but at least a starting point. It reduces the memory consumption by file-relative-name in the example above to zero. Perhaps we can assume that file names codings are always ASCII-compatible; if so, the shortcut can be taken in encode_file_name and decode_file_name directly.
There is already a hack in encode_file_name that assumes that no unibyte string ever needs encoding; if so, the shortcut could perhaps be extended to decode_file_name and simplified.
[-- Attachment #2: 0001-Avoid-expensive-recoding-for-ASCII-identity-cases.patch --]
[-- Type: application/octet-stream, Size: 2106 bytes --]
From dca8b997d3e7c36667e12f1c77fc6ffed7d8f555 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Fri, 3 Apr 2020 16:01:01 +0200
Subject: [PATCH] Avoid expensive recoding for ASCII identity cases
Optimise for the common case of encoding or decoding an ASCII-only
string using an ASCII-compatible coding, for file names in particular.
* src/coding.c (string_ascii_p): New function.
(code_convert_string): Return the input string for ASCII-only inputs
and ASCII-compatible codings.
---
src/coding.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/src/coding.c b/src/coding.c
index 0bea2a0c2b..9a17fafb05 100644
--- a/src/coding.c
+++ b/src/coding.c
@@ -9471,6 +9471,17 @@ used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
return code_convert_region (start, end, coding_system, destination, 1, 0);
}
+/* Whether a (unibyte) string only contains chars in the 0..127 range. */
+static bool
+string_ascii_p (Lisp_Object str)
+{
+ ptrdiff_t nbytes = SBYTES (str);
+ for (ptrdiff_t i = 0; i < nbytes; i++)
+ if (SREF (str, i) > 127)
+ return false;
+ return true;
+}
+
Lisp_Object
code_convert_string (Lisp_Object string, Lisp_Object coding_system,
Lisp_Object dst_object, bool encodep, bool nocopy,
@@ -9502,7 +9513,17 @@ code_convert_string (Lisp_Object string, Lisp_Object coding_system,
chars = SCHARS (string);
bytes = SBYTES (string);
- if (BUFFERP (dst_object))
+ if (EQ (dst_object, Qt))
+ {
+ /* Fast path for ASCII-only input and an ASCII-compatible coding:
+ act as identity. */
+ Lisp_Object attrs = CODING_ID_ATTRS (coding.id);
+ if (! NILP (CODING_ATTR_ASCII_COMPAT (attrs))
+ && (STRING_MULTIBYTE (string)
+ ? (chars == bytes) : string_ascii_p (string)))
+ return string;
+ }
+ else if (BUFFERP (dst_object))
{
struct buffer *buf = XBUFFER (dst_object);
ptrdiff_t buf_pt = BUF_PT (buf);
--
2.21.1 (Apple Git-122.3)
next reply other threads:[~2020-04-03 14:18 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-03 14:18 Mattias Engdegård [this message]
2020-04-03 16:24 ` bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE Eli Zaretskii
2020-04-03 22:32 ` Mattias Engdegård
2020-04-04 9:26 ` Eli Zaretskii
2020-04-04 16:41 ` Mattias Engdegård
2020-04-04 17:22 ` Eli Zaretskii
2020-04-04 17:37 ` Eli Zaretskii
2020-04-04 18:06 ` Mattias Engdegård
2020-04-05 2:37 ` Eli Zaretskii
2020-04-05 3:42 ` Eli Zaretskii
2020-04-05 10:14 ` Mattias Engdegård
2020-04-05 13:28 ` Eli Zaretskii
2020-04-05 13:40 ` Mattias Engdegård
2020-04-04 10:26 ` Eli Zaretskii
2020-04-04 16:55 ` Mattias Engdegård
2020-04-04 17:04 ` Eli Zaretskii
2020-04-04 18:01 ` Mattias Engdegård
2020-04-04 18:25 ` Eli Zaretskii
2020-04-05 10:48 ` Mattias Engdegård
2020-04-05 13:39 ` Eli Zaretskii
2020-04-05 15:03 ` Mattias Engdegård
2020-04-05 15:35 ` Mattias Engdegård
2020-04-05 15:56 ` Eli Zaretskii
2020-04-06 18:13 ` Mattias Engdegård
2020-04-05 16:00 ` Eli Zaretskii
2020-04-06 10:10 ` OGAWA Hirofumi
2020-04-06 14:21 ` Eli Zaretskii
2020-04-06 15:56 ` Mattias Engdegård
2020-04-06 16:33 ` Eli Zaretskii
2020-04-06 16:55 ` Mattias Engdegård
2020-04-06 17:18 ` Eli Zaretskii
2020-04-06 17:49 ` Mattias Engdegård
2020-04-06 18:20 ` Eli Zaretskii
2020-04-06 18:34 ` OGAWA Hirofumi
2020-04-06 21:57 ` Mattias Engdegård
2020-04-09 11:03 ` Mattias Engdegård
2020-04-09 14:09 ` Kazuhiro Ito
2020-04-09 14:22 ` Mattias Engdegård
2020-04-11 15:09 ` Mattias Engdegård
2020-04-16 13:11 ` handa
2020-04-16 13:44 ` Eli Zaretskii
2020-04-16 13:59 ` Mattias Engdegård
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=805F9723-8298-4FD7-A47B-1E683721A5B0@acm.org \
--to=mattiase@acm.org \
--cc=40407@debbugs.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).