From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE Date: Sat, 04 Apr 2020 12:26:11 +0300 Message-ID: <83mu7rvbyk.fsf@gnu.org> References: <805F9723-8298-4FD7-A47B-1E683721A5B0@acm.org> <835zegwn9y.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="12687"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 40407@debbugs.gnu.org To: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Apr 04 11:27:12 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jKf55-00038b-Ib for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 04 Apr 2020 11:27:11 +0200 Original-Received: from localhost ([::1]:36920 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jKf54-0004Cq-9w for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 04 Apr 2020 05:27:10 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:54677) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jKf4y-0004Cj-0d for bug-gnu-emacs@gnu.org; Sat, 04 Apr 2020 05:27:05 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jKf4w-00061r-F8 for bug-gnu-emacs@gnu.org; Sat, 04 Apr 2020 05:27:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:60145) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1jKf4w-00061m-Bu for bug-gnu-emacs@gnu.org; Sat, 04 Apr 2020 05:27:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jKf4w-000405-8P for bug-gnu-emacs@gnu.org; Sat, 04 Apr 2020 05:27:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 04 Apr 2020 09:27:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 40407 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 40407-submit@debbugs.gnu.org id=B40407.158599239115270 (code B ref 40407); Sat, 04 Apr 2020 09:27:02 +0000 Original-Received: (at 40407) by debbugs.gnu.org; 4 Apr 2020 09:26:31 +0000 Original-Received: from localhost ([127.0.0.1]:43458 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jKf4Q-0003yE-Ld for submit@debbugs.gnu.org; Sat, 04 Apr 2020 05:26:30 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:39362) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jKf4O-0003xl-6u for 40407@debbugs.gnu.org; Sat, 04 Apr 2020 05:26:28 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:40720) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1jKf4J-0005mD-0m; Sat, 04 Apr 2020 05:26:23 -0400 Original-Received: from [176.228.60.248] (port=4456 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1jKf4I-0004Iy-BL; Sat, 04 Apr 2020 05:26:22 -0400 In-Reply-To: (message from Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= on Sat, 4 Apr 2020 00:32:21 +0200) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:178015 Archived-At: > From: Mattias EngdegÄrd > Date: Sat, 4 Apr 2020 00:32:21 +0200 > Cc: 40407@debbugs.gnu.org > > - file-relative-name 141,551 15% > - file-name-case-insensitive-p 100,613 11% > - ucs-normalize-hfs-nfd-pre-write-conversion 100,613 11% > - ucs-normalize-HFS-NFD-region 100,613 11% > ucs-normalize-region 100,613 11% > - expand-file-name 40,828 4% > - ucs-normalize-hfs-nfd-post-read-conversion 40,828 4% > - ucs-normalize-HFS-NFC-region 40,828 4% > ucs-normalize-region 40,828 4% > > where file_name_case_insensitive_p calls ENCODE_FILE and expand_file_name calls DECODE_FILE. DECODE_FILE is called because the file name in question starts with a "~"? Otherwise, I don't think I understand why would expand-file-name need to decode a file name. > I'm not sure how much each part of ucs-normalize-region actually consumes, but I think we can agree that we don't want it called on any platform unless strictly necessary. Any expensive code should be avoided if it isn't necessary, so yes, I agree. And yes, Unicode normalization is expensive. If we consider the macOS filesystem idiosyncrasies important to support efficiently, perhaps we should rewrite the normalization code in C. > > I don't think every encoding is ASCII compatible, so I don't see how > > we can assume that in general. But the check whether an encoding is > > ASCII-compatible takes a negligible amount of time, so why bother with > > such an assumption? > > Quite, I just thought I'd ask in case there were some unwritten invariant that you knew about. Whether a coding-system is ASCII-compatible is determined by the definition of that coding-system. Look in mule-conf.el, and you will see there several that aren't ASCII-compatible. UTF-16 is one example, but there are others. > > I don't think we can return the same string if NOCOPY is non-zero. > > The callers might not expect that, and you might inadvertently cause > > the original string be modified behind the caller's back. > > You are no doubt correct, but doesn't it look like the sense of NOCOPY has been inverted here? That ship has sailed long ago (I could explain how this "inverted" meaning could make sense, but I don't think it's relevant to the issue at hand), and there are several other internal functions that use a similar argument in the same "inverted" sense. This is a separate issue, anyway. > Since string mutation is so rare, I doubt it has caused any real trouble. You are wrong here, it can happen very easily, especially when you manipulate the encoded string in C. The simplest use case is that you encode a file name, and then make some change to the encoded string, like change the letter-case or remove the trailing slash. Suddenly the original string is changed as well, and the Lisp caller of the high-level function might be mightily surprised by the result. IME, the cases where we can safely assume it's OK to return the same string are actually very rare. It is no accident that you saw so few calls of these functions where we use that optional behavior. > Now, do we fix it by inverting the sense of the argument, or by renaming it to COPY? Neither, IMO. Again, it's a separate problem, and let's keep our sights squarely on the original issue you wanted to fix. Let's tackle the NOCOPY issue in a separate discussion, OK? Thanks.