From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.devel Subject: Re: JSON/YAML/TOML/etc. parsing performance Date: Wed, 4 Oct 2017 18:48:04 -0700 Organization: UCLA Computer Science Department Message-ID: <8c922c27-9de0-7d99-6c26-a94a0387c45e@cs.ucla.edu> References: <87poaqhc63.fsf@lifelogs.com> <8360ceh5f1.fsf@gnu.org> <83h8vl5lf9.fsf@gnu.org> <83r2um3fqi.fsf@gnu.org> <43520b71-9e25-926c-d744-78098dad6441@cs.ucla.edu> <83o9pnzddc.fsf@gnu.org> <472176ce-846b-1f24-716b-98eb95ceaa47@cs.ucla.edu> <83d163z6dy.fsf@gnu.org> <73477c99-1600-a53d-d84f-737837d0f91f@cs.ucla.edu> <83poa2ya8j.fsf@gnu.org> <21b0ba97-ed49-43ae-e86f-63fba762353a@cs.ucla.edu> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------B86F4B054CFDBC620A1980E2" X-Trace: blaine.gmane.org 1507168133 10555 195.159.176.226 (5 Oct 2017 01:48:53 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 5 Oct 2017 01:48:53 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 Cc: p.stephani2@gmail.com, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Oct 05 03:48:47 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dzvHK-0001VW-0A for ged-emacs-devel@m.gmane.org; Thu, 05 Oct 2017 03:48:46 +0200 Original-Received: from localhost ([::1]:37430 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dzvHP-0006yL-S1 for ged-emacs-devel@m.gmane.org; Wed, 04 Oct 2017 21:48:51 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:50174) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dzvGo-0006yE-0Y for emacs-devel@gnu.org; Wed, 04 Oct 2017 21:48:16 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dzvGm-0004eP-5h for emacs-devel@gnu.org; Wed, 04 Oct 2017 21:48:14 -0400 Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:37358) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dzvGi-0004a3-6e; Wed, 04 Oct 2017 21:48:08 -0400 Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 72E44160E2F; Wed, 4 Oct 2017 18:48:06 -0700 (PDT) Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id APwvJGIenpq2; Wed, 4 Oct 2017 18:48:05 -0700 (PDT) Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id EADA7160E30; Wed, 4 Oct 2017 18:48:04 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id XynBvvk8TYLs; Wed, 4 Oct 2017 18:48:04 -0700 (PDT) Original-Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id C6F48160E2F; Wed, 4 Oct 2017 18:48:04 -0700 (PDT) In-Reply-To: <21b0ba97-ed49-43ae-e86f-63fba762353a@cs.ucla.edu> Content-Language: en-US X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 131.179.128.68 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:219095 Archived-At: This is a multi-part message in MIME format. --------------B86F4B054CFDBC620A1980E2 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit On 10/04/2017 02:24 PM, Paul Eggert wrote: > The code snippets I've seen so far in this thread are not enough > context to judge whether an exception would be helpful in this case. Sorry, I looked only at this month's part of the thead. When I went back to last month's I found this: http://lists.gnu.org/archive/html/emacs-devel/2017-09/msg01080.html for which I suggest the attached as a simplification. One idea here is that there is no need for eassert (E) unless there's a genuine doubt that E will be true (in some cases the removed eassert (E) calls were ineffective anyway, due to preceding eassume (E) calls). The patch cuts down the number of integer overflow checks to six in json.c, if I'm counting correctly, and that should be good enough. --------------B86F4B054CFDBC620A1980E2 Content-Type: text/x-patch; name="0001-Minor-JSON-cleanups-mostly-for-overflow.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-Minor-JSON-cleanups-mostly-for-overflow.patch" >From 9a5c6a9eab6cb5ddbbb1f1c0940a561f6895b478 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Wed, 4 Oct 2017 18:38:07 -0700 Subject: [PATCH] Minor JSON cleanups, mostly for overflow * src/json.c (json_has_prefix): Simplify via strncmp. (json_has_suffix): Indent a la GNU. (json_make_string): Take size_t, not ptrdiff_t, and check its range. Simplify callers accordingly. (json_out_of_memory, json_to_lisp): Just call memory_full. (check_string_without_embedded_nulls): Use strlen, not memchr; it is typically faster. (lisp_to_json_toplevel_1, json_to_lisp): Do not bother with _GL_ARG_NONNULL on static functions; it is not worth the trouble. (lisp_to_json_toplevel_1, lisp_to_json, json_read_buffer_callback) (define_error): Remove useless checks. (json_insert): Just call buffer_overflow. (json_to_lisp): Just signal overflow error, to be consistent with other signalers. Use allocate_vector instead of Fmake_vector, to avoid need for initializing vector twice. Use make_hash_table instead of Fmake_hash_table, as it is a bit simpler. --- src/json.c | 93 +++++++++++++++++++++----------------------------------------- 1 file changed, 32 insertions(+), 61 deletions(-) diff --git a/src/json.c b/src/json.c index 79be55bd54..5138315da1 100644 --- a/src/json.c +++ b/src/json.c @@ -31,9 +31,7 @@ along with GNU Emacs. If not, see . */ static bool json_has_prefix (const char *string, const char *prefix) { - size_t string_len = strlen (string); - size_t prefix_len = strlen (prefix); - return string_len >= prefix_len && memcmp (string, prefix, prefix_len) == 0; + return strncmp (string, prefix, strlen (prefix)) == 0; } static bool @@ -41,22 +39,23 @@ json_has_suffix (const char *string, const char *suffix) { size_t string_len = strlen (string); size_t suffix_len = strlen (suffix); - return string_len >= suffix_len - && memcmp (string + string_len - suffix_len, suffix, suffix_len) == 0; + return (string_len >= suffix_len + && (memcmp (string + string_len - suffix_len, suffix, suffix_len) + == 0)); } static Lisp_Object -json_make_string (const char *data, ptrdiff_t size) +json_make_string (const char *data, size_t size) { + if (PTRDIFF_MAX < size) + string_overflow (); return make_specified_string (data, -1, size, true); } static Lisp_Object json_build_string (const char *data) { - size_t size = strlen (data); - eassert (size < PTRDIFF_MAX); - return json_make_string (data, size); + return json_make_string (data, strlen (data)); } static Lisp_Object @@ -68,7 +67,7 @@ json_encode (Lisp_Object string) static _Noreturn void json_out_of_memory (void) { - xsignal0 (Qjson_out_of_memory); + memory_full (SIZE_MAX); } static _Noreturn void @@ -97,7 +96,7 @@ static void check_string_without_embedded_nulls (Lisp_Object object) { CHECK_STRING (object); - CHECK_TYPE (memchr (SDATA (object), '\0', SBYTES (object)) == NULL, + CHECK_TYPE (strlen (SSDATA (object)) == SBYTES (object), Qstring_without_embedded_nulls_p, object); } @@ -114,15 +113,12 @@ static json_t *lisp_to_json (Lisp_Object) ATTRIBUTE_WARN_UNUSED_RESULT; /* This returns Lisp_Object so we can use unbind_to. The return value is always nil. */ -static _GL_ARG_NONNULL ((2)) Lisp_Object +static Lisp_Object lisp_to_json_toplevel_1 (Lisp_Object lisp, json_t **json) { if (VECTORP (lisp)) { ptrdiff_t size = ASIZE (lisp); - eassert (size >= 0); - if (size > SIZE_MAX) - xsignal1 (Qoverflow_error, build_string ("vector is too long")); *json = json_check (json_array ()); ptrdiff_t count = SPECPDL_INDEX (); record_unwind_protect_ptr (json_release_object, json); @@ -194,9 +190,6 @@ lisp_to_json (Lisp_Object lisp) { Lisp_Object encoded = json_encode (lisp); ptrdiff_t size = SBYTES (encoded); - eassert (size >= 0); - if (size > SIZE_MAX) - xsignal1 (Qoverflow_error, build_string ("string is too long")); return json_check (json_stringn (SSDATA (encoded), size)); } @@ -239,7 +232,7 @@ json_insert (void *data) { const struct json_buffer_and_size *buffer_and_size = data; if (buffer_and_size->size > PTRDIFF_MAX) - xsignal1 (Qoverflow_error, build_string ("buffer too large")); + buffer_overflow (); insert (buffer_and_size->buffer, buffer_and_size->size); return Qnil; } @@ -289,7 +282,7 @@ OBJECT. */) return unbind_to (count, Qnil); } -static _GL_ARG_NONNULL ((1)) Lisp_Object +static Lisp_Object json_to_lisp (json_t *json) { switch (json_typeof (json)) @@ -304,32 +297,26 @@ json_to_lisp (json_t *json) { json_int_t value = json_integer_value (json); if (FIXNUM_OVERFLOW_P (value)) - xsignal1 (Qoverflow_error, - build_string ("JSON integer is too large")); + xsignal0 (Qoverflow_error); return make_number (value); } case JSON_REAL: return make_float (json_real_value (json)); case JSON_STRING: - { - size_t size = json_string_length (json); - if (FIXNUM_OVERFLOW_P (size)) - xsignal1 (Qoverflow_error, build_string ("JSON string is too long")); - return json_make_string (json_string_value (json), size); - } + return json_make_string (json_string_value (json), + json_string_length (json)); case JSON_ARRAY: { if (++lisp_eval_depth > max_lisp_eval_depth) xsignal0 (Qjson_object_too_deep); size_t size = json_array_size (json); if (FIXNUM_OVERFLOW_P (size)) - xsignal1 (Qoverflow_error, build_string ("JSON array is too long")); - Lisp_Object result = Fmake_vector (make_natnum (size), Qunbound); + memory_full (size); + struct Lisp_Vector *v = allocate_vector (size); for (ptrdiff_t i = 0; i < size; ++i) - ASET (result, i, - json_to_lisp (json_array_get (json, i))); + v->contents[i] = json_to_lisp (json_array_get (json, i)); --lisp_eval_depth; - return result; + return make_lisp_ptr (v, Lisp_Vectorlike); } case JSON_OBJECT: { @@ -337,10 +324,10 @@ json_to_lisp (json_t *json) xsignal0 (Qjson_object_too_deep); size_t size = json_object_size (json); if (FIXNUM_OVERFLOW_P (size)) - xsignal1 (Qoverflow_error, - build_string ("JSON object has too many elements")); - Lisp_Object result = CALLN (Fmake_hash_table, QCtest, Qequal, - QCsize, make_natnum (size)); + memory_full (size); + Lisp_Object result + = make_hash_table (hashtest_equal, size, DEFAULT_REHASH_SIZE, + DEFAULT_REHASH_THRESHOLD, Qnil, false); struct Lisp_Hash_Table *h = XHASH_TABLE (result); const char *key_str; json_t *value; @@ -399,23 +386,12 @@ json_read_buffer_callback (void *buffer, size_t buflen, void *data) /* First, parse from point to the gap or the end of the accessible portion, whatever is closer. */ ptrdiff_t point = d->point; - ptrdiff_t end; - { - bool overflow = INT_ADD_WRAPV (BUFFER_CEILING_OF (point), 1, &end); - eassert (!overflow); - } - size_t count; - { - bool overflow = INT_SUBTRACT_WRAPV (end, point, &count); - eassert (!overflow); - } + ptrdiff_t end = BUFFER_CEILING_OF (point) + 1; + ptrdiff_t count = end - point; if (buflen < count) count = buflen; memcpy (buffer, BYTE_POS_ADDR (point), count); - { - bool overflow = INT_ADD_WRAPV (d->point, count, &d->point); - eassert (!overflow); - } + d->point += count; return count; } @@ -444,14 +420,11 @@ not moved. */) /* Convert and then move point only if everything succeeded. */ Lisp_Object lisp = json_to_lisp (object); - { - /* Adjust point by how much we just read. Do this here because - tokener->char_offset becomes incorrect below. */ - bool overflow = INT_ADD_WRAPV (point, error.position, &point); - eassert (!overflow); - eassert (point <= ZV_BYTE); - SET_PT_BOTH (BYTE_TO_CHAR (point), point); - } + /* Adjust point by how much we just read. Do this here because + tokener->char_offset becomes incorrect below. */ + eassert (0 <= error.position && error.position <= ZV_BYTE - point); + point += error.position; + SET_PT_BOTH (BYTE_TO_CHAR (point), point); return unbind_to (count, lisp); } @@ -462,8 +435,6 @@ not moved. */) static void define_error (Lisp_Object name, const char *message, Lisp_Object parent) { - eassert (SYMBOLP (name)); - eassert (SYMBOLP (parent)); Lisp_Object parent_conditions = Fget (parent, Qerror_conditions); eassert (CONSP (parent_conditions)); eassert (!NILP (Fmemq (parent, parent_conditions))); -- 2.13.6 --------------B86F4B054CFDBC620A1980E2--