From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Dmitry Gutov <dgutov@yandex.ru>
Newsgroups: gmane.emacs.bugs
Subject: bug#20154: 25.0.50; json-encode-string is too slow for large strings
Date: Sat, 21 Mar 2015 22:00:46 +0200
Message-ID: <550DCDEE.4090900@yandex.ru>
References: <86twxf68zk.fsf@yandex.ru> <83384zwxdx.fsf@gnu.org>
	<550C3218.4000903@yandex.ru> <831tkjww0y.fsf@gnu.org>
	<550C3AB9.7020403@yandex.ru> <83wq2bveq6.fsf@gnu.org>
	<550C491A.6000909@yandex.ru> <83siczvcss.fsf@gnu.org>
	<550C504A.10708@yandex.ru> <83r3sjva0q.fsf@gnu.org>
	<550C6A06.6040203@yandex.ru> <83fv8zv0b1.fsf@gnu.org>
	<550C990B.8080505@yandex.ru> <838ueqvl1o.fsf@gnu.org>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: ger.gmane.org 1426968092 7598 80.91.229.3 (21 Mar 2015 20:01:32 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sat, 21 Mar 2015 20:01:32 +0000 (UTC)
Cc: 20154@debbugs.gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sat Mar 21 21:01:17 2015
Return-path: <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geb-bug-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>)
	id 1YZPa6-0001SW-Tl
	for geb-bug-gnu-emacs@m.gmane.org; Sat, 21 Mar 2015 21:01:15 +0100
Original-Received: from localhost ([::1]:48852 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>)
	id 1YZPa1-00021w-9c
	for geb-bug-gnu-emacs@m.gmane.org; Sat, 21 Mar 2015 16:01:09 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:41936)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1YZPZy-00021r-6p
	for bug-gnu-emacs@gnu.org; Sat, 21 Mar 2015 16:01:07 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1YZPZv-0001C1-06
	for bug-gnu-emacs@gnu.org; Sat, 21 Mar 2015 16:01:06 -0400
Original-Received: from debbugs.gnu.org ([140.186.70.43]:42047)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1YZPZu-0001Bv-Sc
	for bug-gnu-emacs@gnu.org; Sat, 21 Mar 2015 16:01:02 -0400
Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1YZPZu-0007CV-Ir
	for bug-gnu-emacs@gnu.org; Sat, 21 Mar 2015 16:01:02 -0400
X-Loop: help-debbugs@gnu.org
Resent-From: Dmitry Gutov <dgutov@yandex.ru>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@gnu.org
Resent-Date: Sat, 21 Mar 2015 20:01:02 +0000
Resent-Message-ID: <handler.20154.B20154.142696805927662@debbugs.gnu.org>
Resent-Sender: help-debbugs@gnu.org
X-GNU-PR-Message: followup 20154
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: 
Original-Received: via spool by 20154-submit@debbugs.gnu.org id=B20154.142696805927662
	(code B ref 20154); Sat, 21 Mar 2015 20:01:02 +0000
Original-Received: (at 20154) by debbugs.gnu.org; 21 Mar 2015 20:00:59 +0000
Original-Received: from localhost ([127.0.0.1]:60056 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1YZPZq-0007C5-FZ
	for submit@debbugs.gnu.org; Sat, 21 Mar 2015 16:00:58 -0400
Original-Received: from mail-wi0-f176.google.com ([209.85.212.176]:36078)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <raaahh@gmail.com>) id 1YZPZo-0007Bs-M5
	for 20154@debbugs.gnu.org; Sat, 21 Mar 2015 16:00:57 -0400
Original-Received: by wibg7 with SMTP id g7so14991389wib.1
	for <20154@debbugs.gnu.org>; Sat, 21 Mar 2015 13:00:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; 
	h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject
	:references:in-reply-to:content-type:content-transfer-encoding;
	bh=h2JNZLOtoYbtiu90xRB6XYKmCmac3MUq6wjyIFmpX9A=;
	b=FjNJHV0mTCZ/v96Lsr/ZVIIqo6y5r590Uc4jLgBzTDUq/TE9hW2mDrB4SAz4aHlONo
	EkmiDkZCwzfcyJmZxY8iSKEQBz7HDhNcFvViUcVTawpbBz5VO9vZxNpSWV/yfmvx3uSB
	jPwbmYvOfRBegNF9I65oqASc2If7lm0VYE9fyJp8loyAL+8GsY0yW+4jSkZIZq1rpl/J
	H2SoAN9Z+vZ/ERKmSMjhZBwIgY7nk7wRrWznf9QpTr3UtCyD1wLkQoDKdPuXbIirnEOb
	i30w47pL0V6ZnW+O486hRJtxj57A6nzA170lAqh+Og6ZyxBa27fCZEDVF5VWsnO2umcF
	yM2g==
X-Received: by 10.180.206.13 with SMTP id lk13mr6391232wic.95.1426968051041;
	Sat, 21 Mar 2015 13:00:51 -0700 (PDT)
Original-Received: from [192.168.1.3] ([82.102.93.54]) by mx.google.com with ESMTPSA id
	dj4sm11948294wjc.13.2015.03.21.13.00.49
	(version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Sat, 21 Mar 2015 13:00:50 -0700 (PDT)
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:36.0) Gecko/20100101 Thunderbird/36.0
In-Reply-To: <838ueqvl1o.fsf@gnu.org>
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x
X-Received-From: 140.186.70.43
X-BeenThere: bug-gnu-emacs@gnu.org
List-Id: "Bug reports for GNU Emacs,
	the Swiss army knife of text editors" <bug-gnu-emacs.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/bug-gnu-emacs>
List-Post: <mailto:bug-gnu-emacs@gnu.org>
List-Help: <mailto:bug-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=subscribe>
Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org
Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.bugs:100754
Archived-At: <http://permalink.gmane.org/gmane.emacs.bugs/100754>

On 03/21/2015 09:58 AM, Eli Zaretskii wrote:

> It depends on your requirements.  How fast would it need to run to
> satisfy your needs?

In this case, the buffer contents are encoded to JSON at most once per 
keypress. So 50ms or below should be fast enough, especially since most 
files are smaller than that.

Of course, I'm sure there are use cases for fast JSON encoding/decoding 
of even bigger volumes of data, but they can probably wait until we have 
FFI.

> You don't really need regexp replacement functions with all its
> features here, do you?  What you need is a way to skip characters that
> are "okay", then replace the character that is "not okay" with its
> encoded form, then repeat.

It doesn't seem like regexp searching is the slow part: save for the GC 
pauses, looking for the non-matching regexp in the same string -

(replace-regexp-in-string "x" "z" s1 t t)

- only takes ~3ms.

And likewise, after changing them to use `concat' instead of `format', 
both alternative json-encode-string implementations that I have "encode" 
a numbers-only (without newlines) string of the same length in a few 
milliseconds. Again, save for the GC pauses, which can add 30-40ms.

> For starters, how fast
> can you iterate through the string with 'skip-chars-forward', stopping
> at characters that need encoding, without actually encoding them, but
> just consing the output string by appending the parts delimited by
> places where 'skip-chars-forward' stopped?  That's the lower bound on
> performance using this method.

70-90ms if we simply skip 0-9, even without nreverse-ing and 
concatenating. But the change in runtime after adding an (apply #'concat 
(nreverse res)) step doesn't look statistically insignificant. Here's 
the implementation I tried:

(defun foofoo (string)
   (with-temp-buffer
     (insert string)
     (goto-char (point-min))
     (let (res)
       (while (not (eobp))
         (let ((skipped (skip-chars-forward "0-9")))
           (push (buffer-substring (- (point) skipped) (point))
                 res))
         (forward-char 1))
       res)))

But that actually goes down to 30ms if we don't accumulate the result.

> I think the latest tendency is the opposite: move to Lisp everything
> that doesn't need to be in C.

Yes, and often that's great, if we're dealing with some piece of UI 
infrastructure that only gets called at most a few times per command, 
with inputs of size we can anticipate in advance.

 > If some specific application needs more
> speed than we can provide, the first thing I'd try is think of a new
> primitive by abstracting your use case enough to be more useful than
> just for JSON.

That's why I suggested to do that with `replace-regexp-in-string' first. 
That's a very common feature, and in Python and Ruby it's written in C. 
Ruby's calling convention is even pretty close (the replacement can be a 
string, or it can take a block, which is a kind of a function).

> Of course, implementing the precise use case in C first is probably a
> prerequisite, since it could turn out that the problem is somewhere
> else, or that even in C you won't get the speed you want.

A fast `replace-regexp-in-string' may not get us where I want, but it 
should get us close. It will still be generally useful, and it'll save 
us from having two `json-encode-string' implementations - for long and 
short strings.

>> Replacing "z" with #'identity (so now we include a function call
>> overhead) increases the averages to 0.15s and 0.10s respectively.
>
> Sounds like the overhead of the Lisp interpreter is a significant
> factor here, no?

Yes and no. Given the 50ms budget, I think we can live with it for now, 
when it's the only problem.