From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE Date: Sun, 05 Apr 2020 16:28:13 +0300 Message-ID: <835zeet636.fsf@gnu.org> References: <805F9723-8298-4FD7-A47B-1E683721A5B0@acm.org> <835zegwn9y.fsf@gnu.org> <83mu7rvbyk.fsf@gnu.org> <729DE2D1-EA0F-46F9-8B4B-2ED146CE6892@acm.org> <83pncntbc2.fsf@gnu.org> <038251F3-AAA0-4528-ADB3-6E29F5A51B82@acm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="42875"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 40407@debbugs.gnu.org To: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sun Apr 05 15:29:15 2020 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jL5Kt-000B3o-0K for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 05 Apr 2020 15:29:15 +0200 Original-Received: from localhost ([::1]:47930 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jL5Ks-0004q8-3i for geb-bug-gnu-emacs@m.gmane-mx.org; Sun, 05 Apr 2020 09:29:14 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:55508) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jL5Kh-0004oU-Ce for bug-gnu-emacs@gnu.org; Sun, 05 Apr 2020 09:29:04 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jL5Kg-0003RO-AO for bug-gnu-emacs@gnu.org; Sun, 05 Apr 2020 09:29:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:33820) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1jL5Kg-0003RI-7M for bug-gnu-emacs@gnu.org; Sun, 05 Apr 2020 09:29:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jL5Kg-0004IN-2P for bug-gnu-emacs@gnu.org; Sun, 05 Apr 2020 09:29:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 05 Apr 2020 13:29:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 40407 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 40407-submit@debbugs.gnu.org id=B40407.158609330916401 (code B ref 40407); Sun, 05 Apr 2020 13:29:02 +0000 Original-Received: (at 40407) by debbugs.gnu.org; 5 Apr 2020 13:28:29 +0000 Original-Received: from localhost ([127.0.0.1]:45366 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jL5K9-0004GR-73 for submit@debbugs.gnu.org; Sun, 05 Apr 2020 09:28:29 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:40299) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jL5K7-0004G2-QD for 40407@debbugs.gnu.org; Sun, 05 Apr 2020 09:28:28 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:36707) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1jL5K2-00039B-Ie; Sun, 05 Apr 2020 09:28:22 -0400 Original-Received: from [176.228.60.248] (port=3711 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1jL5K0-0006Rd-Rm; Sun, 05 Apr 2020 09:28:21 -0400 In-Reply-To: <038251F3-AAA0-4528-ADB3-6E29F5A51B82@acm.org> (message from Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= on Sun, 5 Apr 2020 12:14:59 +0200) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:178056 Archived-At: > From: Mattias EngdegÄrd > Date: Sun, 5 Apr 2020 12:14:59 +0200 > Cc: 40407@debbugs.gnu.org > > > I think in the use case where we return a copy, we should make sure > > the return value is unibyte when encoding and multibyte when decoding. > > I'm not necessarily opposed to the suggestion, but why not return a unibyte string in both cases, simplifying the code? For compatibility with what happens now: (multibyte-string-p (decode-coding-string "abc" 'utf-8)) => t > In addition, some operations (aref) are faster on unibyte. Either way, it's nothing that a caller could rely on, is there? (In particular when taking NOCOPY into account.) That is true, of course, but many/most of our strings are multibyte nowadays, even if they are ASCII. Suddenly getting a unibyte string instead would be surprising, I think, even if no one should depend on it not happening. (NOCOPY case is different: then it's the caller's responsibility to deal with the issue.) So I'd rather we produced a multibyte string when "decoding" by copying. > +/* Whether a (unibyte) string only contains chars in the 0..127 range. */ One subtle point regarding this comment: I'd remove the "unibyte" part, because (1) you apply this test to multibyte strings as well, and (2) strings encoded in iso-2022 will look "pure-ASCII", but they aren't. The latter subtlety doesn't interfere with the caller, because iso-2022 is not ASCII-compatible, but it's something I'd mention in the comment, lest someone uses this function for some other use case. The patch is OK otherwise. Thanks.