From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#31138: Native json slower than json.el Date: Tue, 23 Apr 2019 14:39:50 +0300 Message-ID: <0296121e-42b9-de92-e922-adf77bca0ed9@yandex.ru> References: <87sh806xwa.fsf@chapu.is> <831s22gcci.fsf@gnu.org> <83y349gasn.fsf@gnu.org> <83d0lfag4x.fsf@gnu.org> <5cf45a21-65c3-67ee-f123-be83a6ee7c99@yandex.ru> <83a7gjaen6.fsf@gnu.org> <10ca4e2f-b116-16bc-c81e-24036752c867@yandex.ru> <83lg026xxb.fsf@gnu.org> <0d42dab4-6c5c-be3a-d402-f17b39e7fc3c@yandex.ru> <83k1fm6vly.fsf@gnu.org> <19b19dec-a5a0-e08d-6026-0b9621d38143@yandex.ru> <8336ma6oia.fsf@gnu.org> <83v9z657i0.fsf@gnu.org> <83o94y54ws.fsf@gnu.org> <834l6p58d1.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="92562"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 Cc: p.stephani2@gmail.com, sebastien@chapu.is, yyoncho@gmail.com, 31138@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Apr 23 13:41:21 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1hItnc-000Nxa-A2 for geb-bug-gnu-emacs@m.gmane.org; Tue, 23 Apr 2019 13:41:20 +0200 Original-Received: from localhost ([127.0.0.1]:52217 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hItna-0000ge-No for geb-bug-gnu-emacs@m.gmane.org; Tue, 23 Apr 2019 07:41:18 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:39055) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hItnL-0000bx-DY for bug-gnu-emacs@gnu.org; Tue, 23 Apr 2019 07:41:04 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hItnK-0000vb-BU for bug-gnu-emacs@gnu.org; Tue, 23 Apr 2019 07:41:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:39038) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hItnK-0000vT-7G for bug-gnu-emacs@gnu.org; Tue, 23 Apr 2019 07:41:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1hItnK-0004Xw-2t for bug-gnu-emacs@gnu.org; Tue, 23 Apr 2019 07:41:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 23 Apr 2019 11:41:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 31138 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: moreinfo Original-Received: via spool by 31138-submit@debbugs.gnu.org id=B31138.155601960417396 (code B ref 31138); Tue, 23 Apr 2019 11:41:02 +0000 Original-Received: (at 31138) by debbugs.gnu.org; 23 Apr 2019 11:40:04 +0000 Original-Received: from localhost ([127.0.0.1]:52582 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hItmN-0004WV-RE for submit@debbugs.gnu.org; Tue, 23 Apr 2019 07:40:04 -0400 Original-Received: from mail-lf1-f53.google.com ([209.85.167.53]:38622) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hItmL-0004Vp-KG for 31138@debbugs.gnu.org; Tue, 23 Apr 2019 07:40:02 -0400 Original-Received: by mail-lf1-f53.google.com with SMTP id v1so2686967lfg.5 for <31138@debbugs.gnu.org>; Tue, 23 Apr 2019 04:40:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=SNEEZzMt1ieEKeU3kFgFqGneWwNsNv9WrfB13wVyVCM=; b=thPN6JsqrBu8fDwWdb+QnkpfeOI1Jn0aqwbLNUZ57Hsc8fuct1yB6JFrX4oaXdigBU JKSMAiWASxhhs+/OjC9KqawPhO5/d5CLxAr1jXK7ftXTMQWDMLePm1yDhYr/VHavocla FAEeMbuDTf9jkv57P9J3FzG/Sp6ymKIXZkqG9BFGLakfM+wz0IyZ2tOXxSVR3iTCEjSS Ul3ECYn/+yuFLiglYkQVXScM4np8rDuUSfEyxbvOXt2x9DHgHK3N20Pq+nQN+26+9NVM YsURtMcYhtwjCXDHHVojp8vbusMuQY6gLzfkNt+KxkOcACAwSEAkIbGG592AKv+Nfd3o LT1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:to:cc:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=SNEEZzMt1ieEKeU3kFgFqGneWwNsNv9WrfB13wVyVCM=; b=LuXH1elwOIBxFXfFo1+AzHAf+kJm1aqs5MvnslwoaWNvBi+RCMzwF1W8p+bfE5rGgo wYXadLbU2uzwY0MPZ9Znh+ewEPQli0SQBvsNj9FAbnp6qnTdeNMwnfmbJ2edairsr2aU s+C3aA59bxD0imIEq01RaMQv303xWQeK+4yG3Gzu2lLlDsVdfibYTyDfWQ05vWpVzAfN 4egJBTkMpiODIxTDrBnDUb+4/KfKkT9sohq/2QK4sT5gndF1WI/JDfJBV/1W6jfeovin EJh+b064PQ+ynARTIcMfmyRkTutO58MY6DR8r2xXJ49UxZkvFPT8PZz7V8w/r+iifQyi besw== X-Gm-Message-State: APjAAAWsbsjUi/B9NexXzE7dqXz99muMiSNCdiyEPwv48hLZbRo7iXQW Kv3tadndjGPueEb1ElSmOSJjXaLe X-Google-Smtp-Source: APXvYqxKEE51YP5PReGx/1k52/wRvwpUJ58aQ9/mt3oHiPDm8haeoS5HDX4L6vGnXlNu4tLB/vSN5A== X-Received: by 2002:a19:cb09:: with SMTP id b9mr13755216lfg.55.1556019595242; Tue, 23 Apr 2019 04:39:55 -0700 (PDT) Original-Received: from [192.168.1.3] ([185.105.174.23]) by smtp.googlemail.com with ESMTPSA id o17sm3245497ljc.66.2019.04.23.04.39.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 23 Apr 2019 04:39:54 -0700 (PDT) In-Reply-To: <834l6p58d1.fsf@gnu.org> Content-Language: en-US X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:158119 Archived-At: On 23.04.2019 13:22, Eli Zaretskii wrote: >>> Yes, but I'm slightly surprised why you loop from the end of the >>> string and not from the beginning. >> >> To avoid creating an additional pointer variable. > > I don't think it matters, and looping forward is more natural and may > even be slightly faster. OK. It was mostly a matter of taste for me anyway. (I would be interested in any examples of "slightly faster", though). >>> I guess that's expected when the strings in JSON are short enough. >> >> Longer strings take a proportional amount of time to encode, though >> (only 2x as fast per character, IIRC). > > I was talking about decoding. Assuming that decode_coding_utf_8 has > some setup overhead before it starts the loop of processing the bytes, > that overhead will become less significant with longer strings. And > indeed, if I make the strings in large.json be 10K characters (can > this happen in real-life JSONs?), Everything can happen, but I'm not aware of a particular application. > the speedup from using > make_specified_string for valid UTF-8 input goes down to just 40% for > unoptimized builds and 20% for optimized (see the timing data below). > But it's still faster even for such large strings, so I installed a > variant of what we were discussing. Thank you. And for small strings, your numbers seem even more encouraging than mine. > Comparing with json.el shows that we've got 8-fold to ten-fold speedup > in optimized builds. > > Here are my timings for the various variants ("large" means with JSON > input where all strings were enlarged to 10K characters): > > variant | unoptimized | optimized > ------------------------------+-------------+---------- > curent master | 3.563 | 0.664 > curent master, large | 174.0 | 43.34 > no validation | 0.980 | 0.326 > no validation, large | 105.1 | 33.13 > coding_system directly | 2.962 | 0.660 > coding_system directly, large | 173.4 | 43.19 > UTF-8 validation | 0.980 | 0.334 > UTF-8 validation, large | 105.9 | 34.36 0.334 vs 0.644, I like that. :-) > In all cases, the times are from 10 benchmark loops, after subtracting > the time used by GC. I figured we might be saving a bit on GC pauses as well (doing and allocating less stuff), but they are harder to time, of course.