From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.bugs
Subject: bug#31138: Native json slower than json.el
Date: Tue, 23 Apr 2019 17:58:35 +0300
Message-ID: <83h8ao4vl0.fsf@gnu.org>
References: <87sh806xwa.fsf@chapu.is> <83r2a2hdxn.fsf@gnu.org>
	<21f68973-a684-2a65-82eb-c8f3df90127f@yandex.ru>
	<83d0lmgez2.fsf@gnu.org>
	<7d503be9-4d85-3d0b-6829-631ad376ba3d@yandex.ru>
	<831s22gcci.fsf@gnu.org>
	<f6951b39-6458-1bf6-7b4d-b1adf8aada25@yandex.ru>
	<83y349gasn.fsf@gnu.org> <83d0lfag4x.fsf@gnu.org>
	<5cf45a21-65c3-67ee-f123-be83a6ee7c99@yandex.ru>
	<83a7gjaen6.fsf@gnu.org>
	<b72b98a1-ab88-b254-eea6-892a0386c887@yandex.ru>
	<83ftqa8qsg.fsf@gnu.org>
	<cd5ecad5-8465-73cc-0228-e0674774ca67@yandex.ru>
	<83muki6y6r.fsf@gnu.org>
	<4b8c6799-e845-768b-749c-f2a883ab89f8@yandex.ru>
	<83h8aq6v6a.fsf@gnu.org>
	<b09b4c06-1558-4ff3-7fc0-5310d52d4c5a@yandex.ru>
	<834l6q6ozn.fsf@gnu.org>
	<e7a1c0f7-e7af-9dc1-9bb1-ec75bc081fba@yandex.ru>
	<83zhoi59ao.fsf@gnu.org>
	<ecead7ef-412d-0a11-c20d-942db9e479b1@yandex.ru>
	<83wojm57sl.fsf@gnu.org>
	<b3b1dd84-cc3b-e3dc-b034-b95608a8f3a5@yandex.ru>
	<83ftq96azk.fsf@gnu.org>
	<e58af8f3-16ef-41da-b48e-e07ead2dd6f8@yandex.ru>
	<83wojk534g.fsf@gnu.org>
	<6308ceff-479b-2ce7-2072-41e683978c7c@yandex.ru>
Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226";
	logging-data="201405"; mail-complaints-to="usenet@blaine.gmane.org"
Cc: sebastien@chapu.is, yyoncho@gmail.com, 31138@debbugs.gnu.org
To: Dmitry Gutov <dgutov@yandex.ru>
Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Apr 23 17:01:18 2019
Return-path: <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geb-bug-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256)
	(Exim 4.89)
	(envelope-from <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>)
	id 1hIwv8-000qCq-KA
	for geb-bug-gnu-emacs@m.gmane.org; Tue, 23 Apr 2019 17:01:18 +0200
Original-Received: from localhost ([127.0.0.1]:55067 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>)
	id 1hIwv7-0006Le-Me
	for geb-bug-gnu-emacs@m.gmane.org; Tue, 23 Apr 2019 11:01:17 -0400
Original-Received: from eggs.gnu.org ([209.51.188.92]:53735)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1hIwt1-0004vx-Ek
	for bug-gnu-emacs@gnu.org; Tue, 23 Apr 2019 10:59:08 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1hIwsz-0001Zx-Ba
	for bug-gnu-emacs@gnu.org; Tue, 23 Apr 2019 10:59:07 -0400
Original-Received: from debbugs.gnu.org ([209.51.188.43]:40493)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <Debian-debbugs@debbugs.gnu.org>)
	id 1hIwsw-0001Xm-Lh
	for bug-gnu-emacs@gnu.org; Tue, 23 Apr 2019 10:59:03 -0400
Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1hIwsw-0001Ju-Gf
	for bug-gnu-emacs@gnu.org; Tue, 23 Apr 2019 10:59:02 -0400
X-Loop: help-debbugs@gnu.org
Resent-From: Eli Zaretskii <eliz@gnu.org>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@gnu.org
Resent-Date: Tue, 23 Apr 2019 14:59:02 +0000
Resent-Message-ID: <handler.31138.B31138.15560315335059@debbugs.gnu.org>
Resent-Sender: help-debbugs@gnu.org
X-GNU-PR-Message: followup 31138
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: moreinfo
Original-Received: via spool by 31138-submit@debbugs.gnu.org id=B31138.15560315335059
	(code B ref 31138); Tue, 23 Apr 2019 14:59:02 +0000
Original-Received: (at 31138) by debbugs.gnu.org; 23 Apr 2019 14:58:53 +0000
Original-Received: from localhost ([127.0.0.1]:54037 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1hIwsk-0001JU-HO
	for submit@debbugs.gnu.org; Tue, 23 Apr 2019 10:58:50 -0400
Original-Received: from eggs.gnu.org ([209.51.188.92]:39171)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <eliz@gnu.org>) id 1hIwsi-0001JI-TT
	for 31138@debbugs.gnu.org; Tue, 23 Apr 2019 10:58:49 -0400
Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:35153)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>)
	id 1hIwsc-0000sH-Kk; Tue, 23 Apr 2019 10:58:42 -0400
Original-Received: from [176.228.60.248] (port=2917 helo=home-c4e4a596f7)
	by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
	(Exim 4.82) (envelope-from <eliz@gnu.org>)
	id 1hIwsb-0006dX-HC; Tue, 23 Apr 2019 10:58:42 -0400
In-reply-to: <6308ceff-479b-2ce7-2072-41e683978c7c@yandex.ru> (message from
	Dmitry Gutov on Tue, 23 Apr 2019 17:22:34 +0300)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 209.51.188.43
X-BeenThere: bug-gnu-emacs@gnu.org
List-Id: "Bug reports for GNU Emacs,
	the Swiss army knife of text editors" <bug-gnu-emacs.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/bug-gnu-emacs/>
List-Post: <mailto:bug-gnu-emacs@gnu.org>
List-Help: <mailto:bug-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=subscribe>
Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org
Original-Sender: "bug-gnu-emacs"
	<bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.emacs.bugs:158140
Archived-At: <http://permalink.gmane.org/gmane.emacs.bugs/158140>

> Cc: sebastien@chapu.is, yyoncho@gmail.com, 31138@debbugs.gnu.org
> From: Dmitry Gutov <dgutov@yandex.ru>
> Date: Tue, 23 Apr 2019 17:22:34 +0300
> 
> On 23.04.2019 15:15, Eli Zaretskii wrote:
> > I thought about this.  It could make sense to have a UTF-8 specific
> > function to encode and decode strings.  With encodings other than
> > UTF-8 it becomes trickier, and probably likewise with buffer text,
> > where we need to take the gap into account.
> 
> Doing that for buffer text as well might be helpful.

In what use cases would this be helpful?  Most cases of decoding text
in a buffer happen when we read text from files, where we already have
an internal optimization for plain ASCII files.  We could perhaps try
a similar optimization for UTF-8 instead of just ASCII.

Use cases where we read without decoding and then decode buffer
contents "by hand" are relatively rare, certainly when the stuff to
decode is so large that the performance gains will be tangible.

> So that's why I mentioned decode-coding-string (though 
> code_convert_string would be a better choice; or decode_coding_object?), 
> as opposed to creating a new specialized function.

code_convert_string also handles encoding, though.

> What I can understand from our testing, this kind of change improves 
> performance for all kinds of strings when the source encoding is 
> utf_8_unix. Even for large ones (despite you expecting otherwise).

I tested 10K strings, and the advantage there already becomes
relatively small.  10K characters may be a lot for strings, but it
isn't for buffers.  The optimization we use as part of decoding
insert-file-contents avoids the problem by inserting the ASCII part
directly and starting to decode from the first non-ASCII character.
With strings and with text already in the buffer this is not currently
possible, or at least not easily.

> Again, the patch, or several, shouldn't be particularly hard to write, 
> and we can try them out with different scenarios.

If someone wants to work on such patches, I'm sure they will be
welcome.  But we should have clear use cases and good test cases to
time them, IMO.