From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE Date: Fri, 03 Apr 2020 19:24:09 +0300 Message-ID: <835zegwn9y.fsf@gnu.org> References: <805F9723-8298-4FD7-A47B-1E683721A5B0@acm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="53186"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 40407@debbugs.gnu.org To: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Fri Apr 03 18:25:11 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jKP82-000DiO-LI for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 03 Apr 2020 18:25:10 +0200 Original-Received: from localhost ([::1]:58068 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jKP81-0007TV-Lb for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 03 Apr 2020 12:25:09 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:36705) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jKP7v-0007Sn-0C for bug-gnu-emacs@gnu.org; Fri, 03 Apr 2020 12:25:04 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jKP7t-0000ZO-Sm for bug-gnu-emacs@gnu.org; Fri, 03 Apr 2020 12:25:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:59773) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1jKP7t-0000Z6-Od for bug-gnu-emacs@gnu.org; Fri, 03 Apr 2020 12:25:01 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jKP7t-0006Ii-JG for bug-gnu-emacs@gnu.org; Fri, 03 Apr 2020 12:25:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 03 Apr 2020 16:25:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 40407 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 40407-submit@debbugs.gnu.org id=B40407.158593107224113 (code B ref 40407); Fri, 03 Apr 2020 16:25:01 +0000 Original-Received: (at 40407) by debbugs.gnu.org; 3 Apr 2020 16:24:32 +0000 Original-Received: from localhost ([127.0.0.1]:43086 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jKP7P-0006Go-NY for submit@debbugs.gnu.org; Fri, 03 Apr 2020 12:24:32 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:49748) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jKP7N-0006GL-Jl for 40407@debbugs.gnu.org; Fri, 03 Apr 2020 12:24:30 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:52552) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1jKP7I-0008CG-CF; Fri, 03 Apr 2020 12:24:24 -0400 Original-Received: from [176.228.60.248] (port=1905 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1jKP7I-0003FO-0H; Fri, 03 Apr 2020 12:24:24 -0400 In-Reply-To: <805F9723-8298-4FD7-A47B-1E683721A5B0@acm.org> (message from Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= on Fri, 3 Apr 2020 16:18:43 +0200) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:177995 Archived-At: > From: Mattias EngdegÄrd > Date: Fri, 3 Apr 2020 16:18:43 +0200 > > ENCODE_FILE and DECODE_FILE turn out to be surprisingly slow, and allocate copious amounts of memory, to the point that they often turn up in both memory and cpu profiles. (This is on macOS; I haven't checked the situation elsewhere.) AFAIR, on macOS the situation is worse than elsewhere, because of the normalization thing. > For instance, a single call to file-relative-name, with ASCII-only arguments, manages to allocate 140 KiB. There are several conversion steps each involving creating temporary buffers as well as the compilation and execution of very large "quick-check" regexps. Example: > > (progn > (require 'profiler) > (profiler-reset) > (garbage-collect) > (profiler-start 'mem) > (file-relative-name "abc") > (profiler-stop) > (profiler-report)) Can you tell more about the conversion steps and the memory each one allocates? > Perhaps we can assume that file names codings are always ASCII-compatible I don't think every encoding is ASCII compatible, so I don't see how we can assume that in general. But the check whether an encoding is ASCII-compatible takes a negligible amount of time, so why bother with such an assumption? > There is already a hack in encode_file_name that assumes that no unibyte string ever needs encoding; if so, the shortcut could perhaps be extended to decode_file_name and simplified. I'm not sure I understand what you mean by extending the shortcut to decode_file_name. Please elaborate. > - if (BUFFERP (dst_object)) > + if (EQ (dst_object, Qt)) > + { > + /* Fast path for ASCII-only input and an ASCII-compatible coding: > + act as identity. */ > + Lisp_Object attrs = CODING_ID_ATTRS (coding.id); > + if (! NILP (CODING_ATTR_ASCII_COMPAT (attrs)) > + && (STRING_MULTIBYTE (string) > + ? (chars == bytes) : string_ascii_p (string))) > + return string; I don't think we can return the same string if NOCOPY is non-zero. The callers might not expect that, and you might inadvertently cause the original string be modified behind the caller's back. But if NOCOPY is 'false', I think this change is OK. Just make sure the test suite doesn't start failing, maybe there's something else we are missing. Thanks.