From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!.POSTED!not-for-mail
From: Paul Eggert <eggert@cs.ucla.edu>
Newsgroups: gmane.emacs.devel
Subject: Re: JSON/YAML/TOML/etc. parsing performance
Date: Wed, 4 Oct 2017 18:48:04 -0700
Organization: UCLA Computer Science Department
Message-ID: <8c922c27-9de0-7d99-6c26-a94a0387c45e@cs.ucla.edu>
References: <87poaqhc63.fsf@lifelogs.com>
	<CAArVCkQTLp=Cmh-FM1R-WK=WYFX_hP=6XiUUinKRT17bciL+CQ@mail.gmail.com>
	<CAArVCkTj_1P+fTDCzEY5xG8bBB7B6ctNkQCv+bAxt=N_cuD05Q@mail.gmail.com>
	<CAArVCkS52m8SGeOQt19k+XsfZnxy+bh9LJMyX=h+e67_adP6Mg@mail.gmail.com>
	<8360ceh5f1.fsf@gnu.org>
	<CAArVCkRvSaS-orqHcVPtZ2etUnRiY39okHh+6sYV-mtQQRYs-g@mail.gmail.com>
	<83h8vl5lf9.fsf@gnu.org>
	<CAArVCkRSY+bUiX2n1sysxy4dRTLE5M8R3OZmt+1W0-OfiYGDuA@mail.gmail.com>
	<83r2um3fqi.fsf@gnu.org>
	<CAArVCkSR5w5B4_E0UU8uJtZJtr1oP51md4-i-J+NSRCGpGP_9A@mail.gmail.com>
	<43520b71-9e25-926c-d744-78098dad6441@cs.ucla.edu>
	<83o9pnzddc.fsf@gnu.org>
	<472176ce-846b-1f24-716b-98eb95ceaa47@cs.ucla.edu>
	<83d163z6dy.fsf@gnu.org>
	<73477c99-1600-a53d-d84f-737837d0f91f@cs.ucla.edu>
	<83poa2ya8j.fsf@gnu.org>
	<21b0ba97-ed49-43ae-e86f-63fba762353a@cs.ucla.edu>
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="------------B86F4B054CFDBC620A1980E2"
X-Trace: blaine.gmane.org 1507168133 10555 195.159.176.226 (5 Oct 2017 01:48:53 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Thu, 5 Oct 2017 01:48:53 +0000 (UTC)
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
	Thunderbird/52.3.0
Cc: p.stephani2@gmail.com, emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Oct 05 03:48:47 2017
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by blaine.gmane.org with esmtp (Exim 4.84_2)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1dzvHK-0001VW-0A
	for ged-emacs-devel@m.gmane.org; Thu, 05 Oct 2017 03:48:46 +0200
Original-Received: from localhost ([::1]:37430 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1dzvHP-0006yL-S1
	for ged-emacs-devel@m.gmane.org; Wed, 04 Oct 2017 21:48:51 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:50174)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eggert@cs.ucla.edu>) id 1dzvGo-0006yE-0Y
	for emacs-devel@gnu.org; Wed, 04 Oct 2017 21:48:16 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eggert@cs.ucla.edu>) id 1dzvGm-0004eP-5h
	for emacs-devel@gnu.org; Wed, 04 Oct 2017 21:48:14 -0400
Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:37358)
	by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <eggert@cs.ucla.edu>)
	id 1dzvGi-0004a3-6e; Wed, 04 Oct 2017 21:48:08 -0400
Original-Received: from localhost (localhost [127.0.0.1])
	by zimbra.cs.ucla.edu (Postfix) with ESMTP id 72E44160E2F;
	Wed,  4 Oct 2017 18:48:06 -0700 (PDT)
Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1])
	by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032)
	with ESMTP id APwvJGIenpq2; Wed,  4 Oct 2017 18:48:05 -0700 (PDT)
Original-Received: from localhost (localhost [127.0.0.1])
	by zimbra.cs.ucla.edu (Postfix) with ESMTP id EADA7160E30;
	Wed,  4 Oct 2017 18:48:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu
Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1])
	by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026)
	with ESMTP id XynBvvk8TYLs; Wed,  4 Oct 2017 18:48:04 -0700 (PDT)
Original-Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200])
	by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id C6F48160E2F;
	Wed,  4 Oct 2017 18:48:04 -0700 (PDT)
In-Reply-To: <21b0ba97-ed49-43ae-e86f-63fba762353a@cs.ucla.edu>
Content-Language: en-US
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy]
X-Received-From: 131.179.128.68
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel/>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: "Emacs-devel" <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.emacs.devel:219095
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/219095>

This is a multi-part message in MIME format.
--------------B86F4B054CFDBC620A1980E2
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

On 10/04/2017 02:24 PM, Paul Eggert wrote:
> The code snippets I've seen so far in this thread are not enough 
> context to judge whether an exception would be helpful in this case.

Sorry, I looked only at this month's part of the thead. When I went back 
to last month's I found this:

http://lists.gnu.org/archive/html/emacs-devel/2017-09/msg01080.html

for which I suggest the attached as a simplification. One idea here is 
that there is no need for eassert (E) unless there's a genuine doubt 
that E will be true (in some cases the removed eassert (E) calls were 
ineffective anyway, due to preceding eassume (E) calls). The patch cuts 
down the number of integer overflow checks to six in json.c, if I'm 
counting correctly, and that should be good enough.


--------------B86F4B054CFDBC620A1980E2
Content-Type: text/x-patch;
 name="0001-Minor-JSON-cleanups-mostly-for-overflow.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="0001-Minor-JSON-cleanups-mostly-for-overflow.patch"

>From 9a5c6a9eab6cb5ddbbb1f1c0940a561f6895b478 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Wed, 4 Oct 2017 18:38:07 -0700
Subject: [PATCH] Minor JSON cleanups, mostly for overflow

* src/json.c (json_has_prefix): Simplify via strncmp.
(json_has_suffix): Indent a la GNU.
(json_make_string): Take size_t, not ptrdiff_t, and check its
range.  Simplify callers accordingly.
(json_out_of_memory, json_to_lisp): Just call memory_full.
(check_string_without_embedded_nulls): Use strlen, not memchr;
it is typically faster.
(lisp_to_json_toplevel_1, json_to_lisp): Do not bother with
_GL_ARG_NONNULL on static functions; it is not worth the trouble.
(lisp_to_json_toplevel_1, lisp_to_json, json_read_buffer_callback)
(define_error):
Remove useless checks.
(json_insert): Just call buffer_overflow.
(json_to_lisp): Just signal overflow error, to be consistent with
other signalers.  Use allocate_vector instead of Fmake_vector,
to avoid need for initializing vector twice.  Use make_hash_table
instead of Fmake_hash_table, as it is a bit simpler.
---
 src/json.c | 93 +++++++++++++++++++++-----------------------------------------
 1 file changed, 32 insertions(+), 61 deletions(-)

diff --git a/src/json.c b/src/json.c
index 79be55bd54..5138315da1 100644
--- a/src/json.c
+++ b/src/json.c
@@ -31,9 +31,7 @@ along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.  */
 static bool
 json_has_prefix (const char *string, const char *prefix)
 {
-  size_t string_len = strlen (string);
-  size_t prefix_len = strlen (prefix);
-  return string_len >= prefix_len && memcmp (string, prefix, prefix_len) == 0;
+  return strncmp (string, prefix, strlen (prefix)) == 0;
 }
 
 static bool
@@ -41,22 +39,23 @@ json_has_suffix (const char *string, const char *suffix)
 {
   size_t string_len = strlen (string);
   size_t suffix_len = strlen (suffix);
-  return string_len >= suffix_len
-    && memcmp (string + string_len - suffix_len, suffix, suffix_len) == 0;
+  return (string_len >= suffix_len
+          && (memcmp (string + string_len - suffix_len, suffix, suffix_len)
+              == 0));
 }
 
 static Lisp_Object
-json_make_string (const char *data, ptrdiff_t size)
+json_make_string (const char *data, size_t size)
 {
+  if (PTRDIFF_MAX < size)
+    string_overflow ();
   return make_specified_string (data, -1, size, true);
 }
 
 static Lisp_Object
 json_build_string (const char *data)
 {
-  size_t size = strlen (data);
-  eassert (size < PTRDIFF_MAX);
-  return json_make_string (data, size);
+  return json_make_string (data, strlen (data));
 }
 
 static Lisp_Object
@@ -68,7 +67,7 @@ json_encode (Lisp_Object string)
 static _Noreturn void
 json_out_of_memory (void)
 {
-  xsignal0 (Qjson_out_of_memory);
+  memory_full (SIZE_MAX);
 }
 
 static _Noreturn void
@@ -97,7 +96,7 @@ static void
 check_string_without_embedded_nulls (Lisp_Object object)
 {
   CHECK_STRING (object);
-  CHECK_TYPE (memchr (SDATA (object), '\0', SBYTES (object)) == NULL,
+  CHECK_TYPE (strlen (SSDATA (object)) == SBYTES (object),
               Qstring_without_embedded_nulls_p, object);
 }
 
@@ -114,15 +113,12 @@ static json_t *lisp_to_json (Lisp_Object) ATTRIBUTE_WARN_UNUSED_RESULT;
 /* This returns Lisp_Object so we can use unbind_to.  The return value
    is always nil.  */
 
-static _GL_ARG_NONNULL ((2)) Lisp_Object
+static Lisp_Object
 lisp_to_json_toplevel_1 (Lisp_Object lisp, json_t **json)
 {
   if (VECTORP (lisp))
     {
       ptrdiff_t size = ASIZE (lisp);
-      eassert (size >= 0);
-      if (size > SIZE_MAX)
-        xsignal1 (Qoverflow_error, build_string ("vector is too long"));
       *json = json_check (json_array ());
       ptrdiff_t count = SPECPDL_INDEX ();
       record_unwind_protect_ptr (json_release_object, json);
@@ -194,9 +190,6 @@ lisp_to_json (Lisp_Object lisp)
     {
       Lisp_Object encoded = json_encode (lisp);
       ptrdiff_t size = SBYTES (encoded);
-      eassert (size >= 0);
-      if (size > SIZE_MAX)
-        xsignal1 (Qoverflow_error, build_string ("string is too long"));
       return json_check (json_stringn (SSDATA (encoded), size));
     }
 
@@ -239,7 +232,7 @@ json_insert (void *data)
 {
   const struct json_buffer_and_size *buffer_and_size = data;
   if (buffer_and_size->size > PTRDIFF_MAX)
-    xsignal1 (Qoverflow_error, build_string ("buffer too large"));
+    buffer_overflow ();
   insert (buffer_and_size->buffer, buffer_and_size->size);
   return Qnil;
 }
@@ -289,7 +282,7 @@ OBJECT.  */)
   return unbind_to (count, Qnil);
 }
 
-static _GL_ARG_NONNULL ((1)) Lisp_Object
+static Lisp_Object
 json_to_lisp (json_t *json)
 {
   switch (json_typeof (json))
@@ -304,32 +297,26 @@ json_to_lisp (json_t *json)
       {
         json_int_t value = json_integer_value (json);
         if (FIXNUM_OVERFLOW_P (value))
-          xsignal1 (Qoverflow_error,
-                    build_string ("JSON integer is too large"));
+          xsignal0 (Qoverflow_error);
         return make_number (value);
       }
     case JSON_REAL:
       return make_float (json_real_value (json));
     case JSON_STRING:
-      {
-        size_t size = json_string_length (json);
-        if (FIXNUM_OVERFLOW_P (size))
-          xsignal1 (Qoverflow_error, build_string ("JSON string is too long"));
-        return json_make_string (json_string_value (json), size);
-      }
+      return json_make_string (json_string_value (json),
+                               json_string_length (json));
     case JSON_ARRAY:
       {
         if (++lisp_eval_depth > max_lisp_eval_depth)
           xsignal0 (Qjson_object_too_deep);
         size_t size = json_array_size (json);
         if (FIXNUM_OVERFLOW_P (size))
-          xsignal1 (Qoverflow_error, build_string ("JSON array is too long"));
-        Lisp_Object result = Fmake_vector (make_natnum (size), Qunbound);
+          memory_full (size);
+        struct Lisp_Vector *v = allocate_vector (size);
         for (ptrdiff_t i = 0; i < size; ++i)
-          ASET (result, i,
-                json_to_lisp (json_array_get (json, i)));
+	  v->contents[i] = json_to_lisp (json_array_get (json, i));
         --lisp_eval_depth;
-        return result;
+        return make_lisp_ptr (v, Lisp_Vectorlike);
       }
     case JSON_OBJECT:
       {
@@ -337,10 +324,10 @@ json_to_lisp (json_t *json)
           xsignal0 (Qjson_object_too_deep);
         size_t size = json_object_size (json);
         if (FIXNUM_OVERFLOW_P (size))
-          xsignal1 (Qoverflow_error,
-                    build_string ("JSON object has too many elements"));
-        Lisp_Object result = CALLN (Fmake_hash_table, QCtest, Qequal,
-                                    QCsize, make_natnum (size));
+          memory_full (size);
+        Lisp_Object result
+          = make_hash_table (hashtest_equal, size, DEFAULT_REHASH_SIZE,
+                             DEFAULT_REHASH_THRESHOLD, Qnil, false);
         struct Lisp_Hash_Table *h = XHASH_TABLE (result);
         const char *key_str;
         json_t *value;
@@ -399,23 +386,12 @@ json_read_buffer_callback (void *buffer, size_t buflen, void *data)
   /* First, parse from point to the gap or the end of the accessible
      portion, whatever is closer.  */
   ptrdiff_t point = d->point;
-  ptrdiff_t end;
-  {
-    bool overflow = INT_ADD_WRAPV (BUFFER_CEILING_OF (point), 1, &end);
-    eassert (!overflow);
-  }
-  size_t count;
-  {
-    bool overflow = INT_SUBTRACT_WRAPV (end, point, &count);
-    eassert (!overflow);
-  }
+  ptrdiff_t end = BUFFER_CEILING_OF (point) + 1;
+  ptrdiff_t count = end - point;
   if (buflen < count)
     count = buflen;
   memcpy (buffer, BYTE_POS_ADDR (point), count);
-  {
-    bool overflow = INT_ADD_WRAPV (d->point, count, &d->point);
-    eassert (!overflow);
-  }
+  d->point += count;
   return count;
 }
 
@@ -444,14 +420,11 @@ not moved.  */)
   /* Convert and then move point only if everything succeeded.  */
   Lisp_Object lisp = json_to_lisp (object);
 
-  {
-    /* Adjust point by how much we just read.  Do this here because
-       tokener->char_offset becomes incorrect below.  */
-    bool overflow = INT_ADD_WRAPV (point, error.position, &point);
-    eassert (!overflow);
-    eassert (point <= ZV_BYTE);
-    SET_PT_BOTH (BYTE_TO_CHAR (point), point);
-  }
+  /* Adjust point by how much we just read.  Do this here because
+     tokener->char_offset becomes incorrect below.  */
+  eassert (0 <= error.position && error.position <= ZV_BYTE - point);
+  point += error.position;
+  SET_PT_BOTH (BYTE_TO_CHAR (point), point);
 
   return unbind_to (count, lisp);
 }
@@ -462,8 +435,6 @@ not moved.  */)
 static void
 define_error (Lisp_Object name, const char *message, Lisp_Object parent)
 {
-  eassert (SYMBOLP (name));
-  eassert (SYMBOLP (parent));
   Lisp_Object parent_conditions = Fget (parent, Qerror_conditions);
   eassert (CONSP (parent_conditions));
   eassert (!NILP (Fmemq (parent, parent_conditions)));
-- 
2.13.6


--------------B86F4B054CFDBC620A1980E2--