From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#31138: Native json slower than json.el Date: Tue, 23 Apr 2019 17:58:35 +0300 Message-ID: <83h8ao4vl0.fsf@gnu.org> References: <87sh806xwa.fsf@chapu.is> <83r2a2hdxn.fsf@gnu.org> <21f68973-a684-2a65-82eb-c8f3df90127f@yandex.ru> <83d0lmgez2.fsf@gnu.org> <7d503be9-4d85-3d0b-6829-631ad376ba3d@yandex.ru> <831s22gcci.fsf@gnu.org> <83y349gasn.fsf@gnu.org> <83d0lfag4x.fsf@gnu.org> <5cf45a21-65c3-67ee-f123-be83a6ee7c99@yandex.ru> <83a7gjaen6.fsf@gnu.org> <83ftqa8qsg.fsf@gnu.org> <83muki6y6r.fsf@gnu.org> <4b8c6799-e845-768b-749c-f2a883ab89f8@yandex.ru> <83h8aq6v6a.fsf@gnu.org> <834l6q6ozn.fsf@gnu.org> <83zhoi59ao.fsf@gnu.org> <83wojm57sl.fsf@gnu.org> <83ftq96azk.fsf@gnu.org> <83wojk534g.fsf@gnu.org> <6308ceff-479b-2ce7-2072-41e683978c7c@yandex.ru> Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="201405"; mail-complaints-to="usenet@blaine.gmane.org" Cc: sebastien@chapu.is, yyoncho@gmail.com, 31138@debbugs.gnu.org To: Dmitry Gutov Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Apr 23 17:01:18 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1hIwv8-000qCq-KA for geb-bug-gnu-emacs@m.gmane.org; Tue, 23 Apr 2019 17:01:18 +0200 Original-Received: from localhost ([127.0.0.1]:55067 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hIwv7-0006Le-Me for geb-bug-gnu-emacs@m.gmane.org; Tue, 23 Apr 2019 11:01:17 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:53735) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hIwt1-0004vx-Ek for bug-gnu-emacs@gnu.org; Tue, 23 Apr 2019 10:59:08 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hIwsz-0001Zx-Ba for bug-gnu-emacs@gnu.org; Tue, 23 Apr 2019 10:59:07 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:40493) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hIwsw-0001Xm-Lh for bug-gnu-emacs@gnu.org; Tue, 23 Apr 2019 10:59:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hIwsw-0001Ju-Gf for bug-gnu-emacs@gnu.org; Tue, 23 Apr 2019 10:59:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 23 Apr 2019 14:59:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 31138 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: moreinfo Original-Received: via spool by 31138-submit@debbugs.gnu.org id=B31138.15560315335059 (code B ref 31138); Tue, 23 Apr 2019 14:59:02 +0000 Original-Received: (at 31138) by debbugs.gnu.org; 23 Apr 2019 14:58:53 +0000 Original-Received: from localhost ([127.0.0.1]:54037 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hIwsk-0001JU-HO for submit@debbugs.gnu.org; Tue, 23 Apr 2019 10:58:50 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:39171) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hIwsi-0001JI-TT for 31138@debbugs.gnu.org; Tue, 23 Apr 2019 10:58:49 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:35153) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hIwsc-0000sH-Kk; Tue, 23 Apr 2019 10:58:42 -0400 Original-Received: from [176.228.60.248] (port=2917 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1hIwsb-0006dX-HC; Tue, 23 Apr 2019 10:58:42 -0400 In-reply-to: <6308ceff-479b-2ce7-2072-41e683978c7c@yandex.ru> (message from Dmitry Gutov on Tue, 23 Apr 2019 17:22:34 +0300) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:158140 Archived-At: > Cc: sebastien@chapu.is, yyoncho@gmail.com, 31138@debbugs.gnu.org > From: Dmitry Gutov > Date: Tue, 23 Apr 2019 17:22:34 +0300 > > On 23.04.2019 15:15, Eli Zaretskii wrote: > > I thought about this. It could make sense to have a UTF-8 specific > > function to encode and decode strings. With encodings other than > > UTF-8 it becomes trickier, and probably likewise with buffer text, > > where we need to take the gap into account. > > Doing that for buffer text as well might be helpful. In what use cases would this be helpful? Most cases of decoding text in a buffer happen when we read text from files, where we already have an internal optimization for plain ASCII files. We could perhaps try a similar optimization for UTF-8 instead of just ASCII. Use cases where we read without decoding and then decode buffer contents "by hand" are relatively rare, certainly when the stuff to decode is so large that the performance gains will be tangible. > So that's why I mentioned decode-coding-string (though > code_convert_string would be a better choice; or decode_coding_object?), > as opposed to creating a new specialized function. code_convert_string also handles encoding, though. > What I can understand from our testing, this kind of change improves > performance for all kinds of strings when the source encoding is > utf_8_unix. Even for large ones (despite you expecting otherwise). I tested 10K strings, and the advantage there already becomes relatively small. 10K characters may be a lot for strings, but it isn't for buffers. The optimization we use as part of decoding insert-file-contents avoids the problem by inserting the ASCII part directly and starting to decode from the first non-ASCII character. With strings and with text already in the buffer this is not currently possible, or at least not easily. > Again, the patch, or several, shouldn't be particularly hard to write, > and we can try them out with different scenarios. If someone wants to work on such patches, I'm sure they will be welcome. But we should have clear use cases and good test cases to time them, IMO.