all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Mattias Engdegård" <mattiase@acm.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 40407@debbugs.gnu.org
Subject: bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE
Date: Sun, 5 Apr 2020 12:14:59 +0200	[thread overview]
Message-ID: <038251F3-AAA0-4528-ADB3-6E29F5A51B82@acm.org> (raw)
In-Reply-To: <83pncntbc2.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 1014 bytes --]

4 apr. 2020 kl. 19.22 skrev Eli Zaretskii <eliz@gnu.org>:

>> This does not mean that the remaining 179 calls require a copy; they just use the default value of the parameter.
> 
> And IMO the default must stay that a copy is returned, except when the
> caller says otherwise.

Yes, those can be dealt with piecemeal, and we are in no hurry to do so.

> I think in the use case where we return a copy, we should make sure
> the return value is unibyte when encoding and multibyte when decoding.

I'm not necessarily opposed to the suggestion, but why not return a unibyte string in both cases, simplifying the code? In addition, some operations (aref) are faster on unibyte. Either way, it's nothing that a caller could rely on, is there? (In particular when taking NOCOPY into account.)

> Otherwise, I think this is OK (for the master branch, obviously).

Indeed the intention, thanks.

Here is what I would commit, unless you think the string copy should really be multibyte when decoding.


[-- Attachment #2: 0001-Avoid-expensive-recoding-for-ASCII-identity-cases-bu.patch --]
[-- Type: application/octet-stream, Size: 3246 bytes --]

From 63400cc62506d2c3d9d5f2f27e7bb3bfe7f8f877 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Fri, 3 Apr 2020 16:01:01 +0200
Subject: [PATCH] Avoid expensive recoding for ASCII identity cases (bug#40407)

Optimise for the common case of encoding or decoding an ASCII-only
string using an ASCII-compatible coding, for file names in particular.

* src/coding.c (string_ascii_p): New function.
(code_convert_string): Return the input string for ASCII-only inputs
and ASCII-compatible codings.
* test/src/coding-tests.el (coding-nocopy-ascii): New test.
---
 src/coding.c             | 23 ++++++++++++++++++++++-
 test/src/coding-tests.el | 11 +++++++++++
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/src/coding.c b/src/coding.c
index 1049f1b755..2425f5952f 100644
--- a/src/coding.c
+++ b/src/coding.c
@@ -9471,6 +9471,17 @@ used (which may be different from CODING-SYSTEM if CODING-SYSTEM is
   return code_convert_region (start, end, coding_system, destination, 1, 0);
 }
 
+/* Whether a (unibyte) string only contains chars in the 0..127 range.  */
+static bool
+string_ascii_p (Lisp_Object str)
+{
+  ptrdiff_t nbytes = SBYTES (str);
+  for (ptrdiff_t i = 0; i < nbytes; i++)
+    if (SREF (str, i) > 127)
+      return false;
+  return true;
+}
+
 Lisp_Object
 code_convert_string (Lisp_Object string, Lisp_Object coding_system,
 		     Lisp_Object dst_object, bool encodep, bool nocopy,
@@ -9502,7 +9513,17 @@ code_convert_string (Lisp_Object string, Lisp_Object coding_system,
   chars = SCHARS (string);
   bytes = SBYTES (string);
 
-  if (BUFFERP (dst_object))
+  if (EQ (dst_object, Qt))
+    {
+      /* Fast path for ASCII-only input and an ASCII-compatible coding:
+         act as identity.  */
+      Lisp_Object attrs = CODING_ID_ATTRS (coding.id);
+      if (! NILP (CODING_ATTR_ASCII_COMPAT (attrs))
+          && (STRING_MULTIBYTE (string)
+              ? (chars == bytes) : string_ascii_p (string)))
+	return nocopy ? string : make_unibyte_string (SDATA (string), bytes);
+    }
+  else if (BUFFERP (dst_object))
     {
       struct buffer *buf = XBUFFER (dst_object);
       ptrdiff_t buf_pt = BUF_PT (buf);
diff --git a/test/src/coding-tests.el b/test/src/coding-tests.el
index 110ff12696..93e6709d44 100644
--- a/test/src/coding-tests.el
+++ b/test/src/coding-tests.el
@@ -383,6 +383,17 @@ coding-nocopy-trivial
     (should-not (eq (encode-coding-string s nil nil) s))
     (should (eq (encode-coding-string s nil t) s))))
 
+(ert-deftest coding-nocopy-ascii ()
+  "Check that the NOCOPY parameter works for ASCII-only strings."
+  (let* ((uni (apply #'string (number-sequence 0 127)))
+         (multi (string-to-multibyte uni)))
+    (dolist (s (list uni multi))
+      (dolist (coding '(us-ascii iso-latin-1 utf-8))
+        (should-not (eq (decode-coding-string s coding nil) s))
+        (should-not (eq (encode-coding-string s coding nil) s))
+        (should (eq (decode-coding-string s coding t) s))
+        (should (eq (encode-coding-string s coding t) s))))))
+
 ;; Local Variables:
 ;; byte-compile-warnings: (not obsolete)
 ;; End:
-- 
2.21.1 (Apple Git-122.3)


  parent reply	other threads:[~2020-04-05 10:14 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-03 14:18 bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE Mattias Engdegård
2020-04-03 16:24 ` Eli Zaretskii
2020-04-03 22:32   ` Mattias Engdegård
2020-04-04  9:26     ` Eli Zaretskii
2020-04-04 16:41       ` Mattias Engdegård
2020-04-04 17:22         ` Eli Zaretskii
2020-04-04 17:37           ` Eli Zaretskii
2020-04-04 18:06             ` Mattias Engdegård
2020-04-05  2:37               ` Eli Zaretskii
2020-04-05  3:42                 ` Eli Zaretskii
2020-04-05 10:14           ` Mattias Engdegård [this message]
2020-04-05 13:28             ` Eli Zaretskii
2020-04-05 13:40               ` Mattias Engdegård
2020-04-04 10:26     ` Eli Zaretskii
2020-04-04 16:55       ` Mattias Engdegård
2020-04-04 17:04         ` Eli Zaretskii
2020-04-04 18:01           ` Mattias Engdegård
2020-04-04 18:25             ` Eli Zaretskii
2020-04-05 10:48               ` Mattias Engdegård
2020-04-05 13:39                 ` Eli Zaretskii
2020-04-05 15:03                   ` Mattias Engdegård
2020-04-05 15:35                     ` Mattias Engdegård
2020-04-05 15:56                       ` Eli Zaretskii
2020-04-06 18:13                         ` Mattias Engdegård
2020-04-05 16:00                     ` Eli Zaretskii
2020-04-06 10:10   ` OGAWA Hirofumi
2020-04-06 14:21     ` Eli Zaretskii
2020-04-06 15:56       ` Mattias Engdegård
2020-04-06 16:33         ` Eli Zaretskii
2020-04-06 16:55           ` Mattias Engdegård
2020-04-06 17:18             ` Eli Zaretskii
2020-04-06 17:49               ` Mattias Engdegård
2020-04-06 18:20                 ` Eli Zaretskii
2020-04-06 18:34                   ` OGAWA Hirofumi
2020-04-06 21:57                     ` Mattias Engdegård
2020-04-09 11:03                     ` Mattias Engdegård
2020-04-09 14:09                       ` Kazuhiro Ito
2020-04-09 14:22                         ` Mattias Engdegård
2020-04-11 15:09                       ` Mattias Engdegård
2020-04-16 13:11       ` handa
2020-04-16 13:44         ` Eli Zaretskii
2020-04-16 13:59           ` Mattias Engdegård

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=038251F3-AAA0-4528-ADB3-6E29F5A51B82@acm.org \
    --to=mattiase@acm.org \
    --cc=40407@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.