unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE
@ 2020-04-03 14:18 Mattias Engdegård
  2020-04-03 16:24 ` Eli Zaretskii
  0 siblings, 1 reply; 42+ messages in thread
From: Mattias Engdegård @ 2020-04-03 14:18 UTC (permalink / raw)
  To: 40407

[-- Attachment #1: Type: text/plain, Size: 1280 bytes --]

ENCODE_FILE and DECODE_FILE turn out to be surprisingly slow, and allocate copious amounts of memory, to the point that they often turn up in both memory and cpu profiles. (This is on macOS; I haven't checked the situation elsewhere.)

For instance, a single call to file-relative-name, with ASCII-only arguments, manages to allocate 140 KiB. There are several conversion steps each involving creating temporary buffers as well as the compilation and execution of very large "quick-check" regexps. Example:

(progn
  (require 'profiler)
  (profiler-reset)
  (garbage-collect)
  (profiler-start 'mem)
  (file-relative-name "abc")
  (profiler-stop)
  (profiler-report))

This applies to just about every function dealing with files or file names.

The attached patch is somewhat conservatively written but at least a starting point. It reduces the memory consumption by file-relative-name in the example above to zero. Perhaps we can assume that file names codings are always ASCII-compatible; if so, the shortcut can be taken in encode_file_name and decode_file_name directly.

There is already a hack in encode_file_name that assumes that no unibyte string ever needs encoding; if so, the shortcut could perhaps be extended to decode_file_name and simplified.


[-- Attachment #2: 0001-Avoid-expensive-recoding-for-ASCII-identity-cases.patch --]
[-- Type: application/octet-stream, Size: 2106 bytes --]

From dca8b997d3e7c36667e12f1c77fc6ffed7d8f555 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Fri, 3 Apr 2020 16:01:01 +0200
Subject: [PATCH] Avoid expensive recoding for ASCII identity cases

Optimise for the common case of encoding or decoding an ASCII-only
string using an ASCII-compatible coding, for file names in particular.

* src/coding.c (string_ascii_p): New function.
(code_convert_string): Return the input string for ASCII-only inputs
and ASCII-compatible codings.
---
 src/coding.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/src/coding.c b/src/coding.c
index 0bea2a0c2b..9a17fafb05 100644
--- a/src/coding.c
+++ b/src/coding.c
@@ -9471,6 +9471,17 @@ used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
   return code_convert_region (start, end, coding_system, destination, 1, 0);
 }
 
+/* Whether a (unibyte) string only contains chars in the 0..127 range.  */
+static bool
+string_ascii_p (Lisp_Object str)
+{
+  ptrdiff_t nbytes = SBYTES (str);
+  for (ptrdiff_t i = 0; i < nbytes; i++)
+    if (SREF (str, i) > 127)
+      return false;
+  return true;
+}
+
 Lisp_Object
 code_convert_string (Lisp_Object string, Lisp_Object coding_system,
 		     Lisp_Object dst_object, bool encodep, bool nocopy,
@@ -9502,7 +9513,17 @@ code_convert_string (Lisp_Object string, Lisp_Object coding_system,
   chars = SCHARS (string);
   bytes = SBYTES (string);
 
-  if (BUFFERP (dst_object))
+  if (EQ (dst_object, Qt))
+    {
+      /* Fast path for ASCII-only input and an ASCII-compatible coding:
+         act as identity.  */
+      Lisp_Object attrs = CODING_ID_ATTRS (coding.id);
+      if (! NILP (CODING_ATTR_ASCII_COMPAT (attrs))
+          && (STRING_MULTIBYTE (string)
+              ? (chars == bytes) : string_ascii_p (string)))
+        return string;
+    }
+  else if (BUFFERP (dst_object))
     {
       struct buffer *buf = XBUFFER (dst_object);
       ptrdiff_t buf_pt = BUF_PT (buf);
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2020-04-16 13:59 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-04-03 14:18 bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE Mattias Engdegård
2020-04-03 16:24 ` Eli Zaretskii
2020-04-03 22:32   ` Mattias Engdegård
2020-04-04  9:26     ` Eli Zaretskii
2020-04-04 16:41       ` Mattias Engdegård
2020-04-04 17:22         ` Eli Zaretskii
2020-04-04 17:37           ` Eli Zaretskii
2020-04-04 18:06             ` Mattias Engdegård
2020-04-05  2:37               ` Eli Zaretskii
2020-04-05  3:42                 ` Eli Zaretskii
2020-04-05 10:14           ` Mattias Engdegård
2020-04-05 13:28             ` Eli Zaretskii
2020-04-05 13:40               ` Mattias Engdegård
2020-04-04 10:26     ` Eli Zaretskii
2020-04-04 16:55       ` Mattias Engdegård
2020-04-04 17:04         ` Eli Zaretskii
2020-04-04 18:01           ` Mattias Engdegård
2020-04-04 18:25             ` Eli Zaretskii
2020-04-05 10:48               ` Mattias Engdegård
2020-04-05 13:39                 ` Eli Zaretskii
2020-04-05 15:03                   ` Mattias Engdegård
2020-04-05 15:35                     ` Mattias Engdegård
2020-04-05 15:56                       ` Eli Zaretskii
2020-04-06 18:13                         ` Mattias Engdegård
2020-04-05 16:00                     ` Eli Zaretskii
2020-04-06 10:10   ` OGAWA Hirofumi
2020-04-06 14:21     ` Eli Zaretskii
2020-04-06 15:56       ` Mattias Engdegård
2020-04-06 16:33         ` Eli Zaretskii
2020-04-06 16:55           ` Mattias Engdegård
2020-04-06 17:18             ` Eli Zaretskii
2020-04-06 17:49               ` Mattias Engdegård
2020-04-06 18:20                 ` Eli Zaretskii
2020-04-06 18:34                   ` OGAWA Hirofumi
2020-04-06 21:57                     ` Mattias Engdegård
2020-04-09 11:03                     ` Mattias Engdegård
2020-04-09 14:09                       ` Kazuhiro Ito
2020-04-09 14:22                         ` Mattias Engdegård
2020-04-11 15:09                       ` Mattias Engdegård
2020-04-16 13:11       ` handa
2020-04-16 13:44         ` Eli Zaretskii
2020-04-16 13:59           ` Mattias Engdegård

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).