From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Newsgroups: gmane.emacs.bugs Subject: bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE Date: Fri, 3 Apr 2020 16:18:43 +0200 Message-ID: <805F9723-8298-4FD7-A47B-1E683721A5B0@acm.org> Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.14\)) Content-Type: multipart/mixed; boundary="Apple-Mail=_DA8A3D29-F208-4082-ACC1-88BC9E8B1B54" Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="115944"; mail-complaints-to="usenet@ciao.gmane.io" To: 40407@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Fri Apr 03 18:11:16 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jKOuZ-000TyK-01 for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 03 Apr 2020 18:11:15 +0200 Original-Received: from localhost ([::1]:57914 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jKOuY-0005xw-1j for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 03 Apr 2020 12:11:14 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:34285) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jKOuN-0005uS-EJ for bug-gnu-emacs@gnu.org; Fri, 03 Apr 2020 12:11:04 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jKOuM-00017t-DO for bug-gnu-emacs@gnu.org; Fri, 03 Apr 2020 12:11:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:59753) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1jKOuM-00017d-AR for bug-gnu-emacs@gnu.org; Fri, 03 Apr 2020 12:11:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jKOuM-0005Rq-4Z for bug-gnu-emacs@gnu.org; Fri, 03 Apr 2020 12:11:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 03 Apr 2020 16:11:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 40407 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.158593024120868 (code B ref -1); Fri, 03 Apr 2020 16:11:01 +0000 Original-Received: (at submit) by debbugs.gnu.org; 3 Apr 2020 16:10:41 +0000 Original-Received: from localhost ([127.0.0.1]:43066 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jKOu1-0005QV-JR for submit@debbugs.gnu.org; Fri, 03 Apr 2020 12:10:41 -0400 Original-Received: from lists.gnu.org ([209.51.188.17]:50276) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jKOty-0005QD-4r for submit@debbugs.gnu.org; Fri, 03 Apr 2020 12:10:39 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:34168) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jKOtw-0005PP-2g for bug-gnu-emacs@gnu.org; Fri, 03 Apr 2020 12:10:38 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jKOtu-0000dC-TK for bug-gnu-emacs@gnu.org; Fri, 03 Apr 2020 12:10:35 -0400 Original-Received: from mail1447c50.megamailservers.eu ([91.136.14.47]:37348 helo=mail265c50.megamailservers.eu) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jKOtu-0000Mj-CX for bug-gnu-emacs@gnu.org; Fri, 03 Apr 2020 12:10:34 -0400 X-Authenticated-User: mattiase@bredband.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=megamailservers.eu; s=maildub; t=1585923525; bh=cmmkHNgz5EDJE04LBORiH5MIRDrgZwXY9fIVJzfO3n4=; h=From:Subject:Date:To:From; b=bFc72WnjN+V4z9VqCidsNLjVMlz5jcSqXEPv3oVVQ2VaR8D0YO20MftrKHdQr47Hd WshkA2jSx6M6KDRDRrlxofuaxS9Tzgf9SgtJ6JN0Y4TuqYKNuZZdSeyawhXJzjPimK /DHagAqlJYTzcQYQMusLLqtzzGes3caQ11D1qD6s= Feedback-ID: mattiase@acm.or Original-Received: from [192.168.0.4] (c188-150-171-71.bredband.comhem.se [188.150.171.71]) (authenticated bits=0) by mail265c50.megamailservers.eu (8.14.9/8.13.1) with ESMTP id 033EIhIi027027 for ; Fri, 3 Apr 2020 14:18:45 +0000 X-Mailer: Apple Mail (2.3445.104.14) X-CTCH-RefID: str=0001.0A782F18.5E874595.006F, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 X-CTCH-VOD: Unknown X-CTCH-Spam: Unknown X-CTCH-Score: 0.000 X-CTCH-Flags: 0 X-CTCH-ScoreCust: 0.000 X-CSC: 0 X-CHA: v=2.3 cv=D5w51cZj c=1 sm=1 tr=0 a=SF+I6pRkHZhrawxbOkkvaA==:117 a=SF+I6pRkHZhrawxbOkkvaA==:17 a=M51BFTxLslgA:10 a=KYCzAwbNy5S3BsPW-U0A:9 a=CjuIK1q_8ugA:10 a=TqocWJiu5xKD2_rvUcIA:9 a=B2y7HmGcmWMA:10 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no timestamps) [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:177993 Archived-At: --Apple-Mail=_DA8A3D29-F208-4082-ACC1-88BC9E8B1B54 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii ENCODE_FILE and DECODE_FILE turn out to be surprisingly slow, and = allocate copious amounts of memory, to the point that they often turn up = in both memory and cpu profiles. (This is on macOS; I haven't checked = the situation elsewhere.) For instance, a single call to file-relative-name, with ASCII-only = arguments, manages to allocate 140 KiB. There are several conversion = steps each involving creating temporary buffers as well as the = compilation and execution of very large "quick-check" regexps. Example: (progn (require 'profiler) (profiler-reset) (garbage-collect) (profiler-start 'mem) (file-relative-name "abc") (profiler-stop) (profiler-report)) This applies to just about every function dealing with files or file = names. The attached patch is somewhat conservatively written but at least a = starting point. It reduces the memory consumption by file-relative-name = in the example above to zero. Perhaps we can assume that file names = codings are always ASCII-compatible; if so, the shortcut can be taken in = encode_file_name and decode_file_name directly. There is already a hack in encode_file_name that assumes that no unibyte = string ever needs encoding; if so, the shortcut could perhaps be = extended to decode_file_name and simplified. --Apple-Mail=_DA8A3D29-F208-4082-ACC1-88BC9E8B1B54 Content-Disposition: attachment; filename=0001-Avoid-expensive-recoding-for-ASCII-identity-cases.patch Content-Type: application/octet-stream; x-unix-mode=0644; name="0001-Avoid-expensive-recoding-for-ASCII-identity-cases.patch" Content-Transfer-Encoding: quoted-printable =46rom=20dca8b997d3e7c36667e12f1c77fc6ffed7d8f555=20Mon=20Sep=2017=20= 00:00:00=202001=0AFrom:=20=3D?UTF-8?q?Mattias=3D20Engdeg=3DC3=3DA5rd?=3D=20= =0ADate:=20Fri,=203=20Apr=202020=2016:01:01=20+0200=0A= Subject:=20[PATCH]=20Avoid=20expensive=20recoding=20for=20ASCII=20= identity=20cases=0A=0AOptimise=20for=20the=20common=20case=20of=20= encoding=20or=20decoding=20an=20ASCII-only=0Astring=20using=20an=20= ASCII-compatible=20coding,=20for=20file=20names=20in=20particular.=0A=0A= *=20src/coding.c=20(string_ascii_p):=20New=20function.=0A= (code_convert_string):=20Return=20the=20input=20string=20for=20= ASCII-only=20inputs=0Aand=20ASCII-compatible=20codings.=0A---=0A=20= src/coding.c=20|=2023=20++++++++++++++++++++++-=0A=201=20file=20changed,=20= 22=20insertions(+),=201=20deletion(-)=0A=0Adiff=20--git=20a/src/coding.c=20= b/src/coding.c=0Aindex=200bea2a0c2b..9a17fafb05=20100644=0A---=20= a/src/coding.c=0A+++=20b/src/coding.c=0A@@=20-9471,6=20+9471,17=20@@=20= used=20(which=20may=20be=20different=20from=20CODING-SYSTEM=20if=20= CODING-SYSTEM=20is=0A=20=20=20return=20code_convert_region=20(start,=20= end,=20coding_system,=20destination,=201,=200);=0A=20}=0A=20=0A+/*=20= Whether=20a=20(unibyte)=20string=20only=20contains=20chars=20in=20the=20= 0..127=20range.=20=20*/=0A+static=20bool=0A+string_ascii_p=20= (Lisp_Object=20str)=0A+{=0A+=20=20ptrdiff_t=20nbytes=20=3D=20SBYTES=20= (str);=0A+=20=20for=20(ptrdiff_t=20i=20=3D=200;=20i=20<=20nbytes;=20i++)=0A= +=20=20=20=20if=20(SREF=20(str,=20i)=20>=20127)=0A+=20=20=20=20=20=20= return=20false;=0A+=20=20return=20true;=0A+}=0A+=0A=20Lisp_Object=0A=20= code_convert_string=20(Lisp_Object=20string,=20Lisp_Object=20= coding_system,=0A=20=09=09=20=20=20=20=20Lisp_Object=20dst_object,=20= bool=20encodep,=20bool=20nocopy,=0A@@=20-9502,7=20+9513,17=20@@=20= code_convert_string=20(Lisp_Object=20string,=20Lisp_Object=20= coding_system,=0A=20=20=20chars=20=3D=20SCHARS=20(string);=0A=20=20=20= bytes=20=3D=20SBYTES=20(string);=0A=20=0A-=20=20if=20(BUFFERP=20= (dst_object))=0A+=20=20if=20(EQ=20(dst_object,=20Qt))=0A+=20=20=20=20{=0A= +=20=20=20=20=20=20/*=20Fast=20path=20for=20ASCII-only=20input=20and=20= an=20ASCII-compatible=20coding:=0A+=20=20=20=20=20=20=20=20=20act=20as=20= identity.=20=20*/=0A+=20=20=20=20=20=20Lisp_Object=20attrs=20=3D=20= CODING_ID_ATTRS=20(coding.id);=0A+=20=20=20=20=20=20if=20(!=20NILP=20= (CODING_ATTR_ASCII_COMPAT=20(attrs))=0A+=20=20=20=20=20=20=20=20=20=20&&=20= (STRING_MULTIBYTE=20(string)=0A+=20=20=20=20=20=20=20=20=20=20=20=20=20=20= ?=20(chars=20=3D=3D=20bytes)=20:=20string_ascii_p=20(string)))=0A+=20=20=20= =20=20=20=20=20return=20string;=0A+=20=20=20=20}=0A+=20=20else=20if=20= (BUFFERP=20(dst_object))=0A=20=20=20=20=20{=0A=20=20=20=20=20=20=20= struct=20buffer=20*buf=20=3D=20XBUFFER=20(dst_object);=0A=20=20=20=20=20=20= =20ptrdiff_t=20buf_pt=20=3D=20BUF_PT=20(buf);=0A--=20=0A2.21.1=20(Apple=20= Git-122.3)=0A=0A= --Apple-Mail=_DA8A3D29-F208-4082-ACC1-88BC9E8B1B54--