From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Philipp Stephani Newsgroups: gmane.emacs.devel Subject: Re: JSON/YAML/TOML/etc. parsing performance Date: Tue, 03 Oct 2017 12:26:32 +0000 Message-ID: References: <87poaqhc63.fsf@lifelogs.com> <8360ceh5f1.fsf@gnu.org> <83h8vl5lf9.fsf@gnu.org> <83r2um3fqi.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="94eb2c1c1da489d661055aa39a72" X-Trace: blaine.gmane.org 1507033664 6604 195.159.176.226 (3 Oct 2017 12:27:44 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 3 Oct 2017 12:27:44 +0000 (UTC) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Oct 03 14:27:13 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dzMHr-0007T7-RG for ged-emacs-devel@m.gmane.org; Tue, 03 Oct 2017 14:27:00 +0200 Original-Received: from localhost ([::1]:58360 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dzMHz-0007oV-6U for ged-emacs-devel@m.gmane.org; Tue, 03 Oct 2017 08:27:07 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:45927) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dzMHk-0007kG-Ml for emacs-devel@gnu.org; Tue, 03 Oct 2017 08:26:55 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dzMHe-0000WY-BZ for emacs-devel@gnu.org; Tue, 03 Oct 2017 08:26:52 -0400 Original-Received: from mail-oi0-x234.google.com ([2607:f8b0:4003:c06::234]:49795) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dzMHb-0000QB-Eu; Tue, 03 Oct 2017 08:26:43 -0400 Original-Received: by mail-oi0-x234.google.com with SMTP id w197so9332599oif.6; Tue, 03 Oct 2017 05:26:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=PYZZdINDUPce4Vd3e3Qdk0ZAc7Jk2GfIrn+Hps7T2cA=; b=az4e9JAJTONduP8W4e22H2f+5VJkjjsXR+JPkO+1uNiQV0JVfuHzIGqm3f2gVAevrd wnFyvrh4zupzv42SCAlVRRPyQHWG2nm0A/ufkroAYygqtuUwL77E5IRCK/+O8rkpbUGI cuq4jhKBxWg7v/1VAuc8ePPmJ0S0l1cgqnhf5H5bAs6Fx9o5V7zf8zI7pKZEG9jy51Kl OdqtTjkX+W46zTkZYEn0ifYiWwD9Ysll4CEeWaUnkR5zKpLOVVjztTMaQefMEUJZAAyO t9XThj1157kc+5+8OymYxIDrdsyTrB7nGrqep3c1qjyTgA/Fv5md4AyvxhcHpdyi2w8p XraA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=PYZZdINDUPce4Vd3e3Qdk0ZAc7Jk2GfIrn+Hps7T2cA=; b=GZq6Dz+HtllX/DBIgioCRW7OkbPPI0iN48v7A+hy4f3Mo4IVCrlrb35qlv0y5BJXBo mxWq2eGrxfTSn4seslyjEmwX5zOQb65F8KFWriqmjiGb5xDRwg/HLsJKOs9EcdnqcCZX mIG27h90Jo+rqzSn+YzkU30e+BE3RYxe5Lf77WjNFd1GABynCvW7DPbljv0NIZn2ctef R93Ip6vqzMD3d6R08nyr2xakQvP9PONk50oWUa/mwZUGoBI6MV6FOpFjopEt+8k9+fl2 0onrmJd149863t4HqH3ZY9CXMKidOTga1RAWo2aI8vb2dz55YqouxRjThNi9etqSOhuo FXfQ== X-Gm-Message-State: AMCzsaV6z+QBIRrcSoKTX7E6MxiLrYWNCjjcwJknzjG+gYKN8KshgXnK aFw+uvlMMUM9B8FO4jUkLOzVFlaWzR9MAy5FhZXx7w== X-Google-Smtp-Source: AOwi7QBbbnLaIMCo1Fd4d23J8I4TyLtO6rayjZdjaXGYjv24xgEtSeOrmYG3+ovVu9GrD6/HhT8FOj2lnRTmmqe88kY= X-Received: by 10.157.66.167 with SMTP id r36mr3676073ote.335.1507033602445; Tue, 03 Oct 2017 05:26:42 -0700 (PDT) In-Reply-To: <83r2um3fqi.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4003:c06::234 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:219028 Archived-At: --94eb2c1c1da489d661055aa39a72 Content-Type: text/plain; charset="UTF-8" Eli Zaretskii schrieb am So., 1. Okt. 2017 um 20:06 Uhr: > > From: Philipp Stephani > > Date: Sat, 30 Sep 2017 22:02:55 +0000 > > Cc: emacs-devel@gnu.org > > > > Subject: [PATCH] Implement native JSON support using Jansson > > Thanks, a few more comments/questions. > > > +#if __has_attribute (warn_unused_result) > > +# define ATTRIBUTE_WARN_UNUSED_RESULT __attribute__ > ((__warn_unused_result__)) > > +#else > > +# define ATTRIBUTE_WARN_UNUSED_RESULT > > +#endif > > Hmm... why do we need this attribute? You use it with 2 static > functions, so this sounds like a left-over from the development stage? > It's not strictly needed (and if you don't like it, I can remove it), but it helps catching memory leaks. > > > +static Lisp_Object > > +internal_catch_all_1 (Lisp_Object (*function) (void *), void *argument) > > Can you tell why you needed this (and the similar internal_catch_all)? > Is that only because the callbacks could signal an error, or is there > another reason? If the former, I'd prefer to simplify the code and > its maintenance by treating the error condition in a less drastic > manner, and avoiding the call to xsignal. > The callbacks (especially insert and before-/after-change-hook) can exit nonlocally, but these nonlocal exits may not escape the Jansson callback. Therefore all nonlocal exits must be caught here. > > > +static _GL_ARG_NONNULL ((2)) Lisp_Object > > +lisp_to_json_toplevel_1 (Lisp_Object lisp, json_t **json) > > +{ > > + if (VECTORP (lisp)) > > + { > > + ptrdiff_t size = ASIZE (lisp); > > + eassert (size >= 0); > > + if (size > SIZE_MAX) > > + xsignal1 (Qoverflow_error, build_string ("vector is too long")); > > I think this error text is too vague. Can we come up with something > that describes the problem more accurately? Maybe, but it's probably not worth it because I don't think we have many architectures where PTRDIFF_MAX > SIZE_MAX. > > And btw, how can size be greater than SIZE_MAX in this case? This is > a valid Lisp object, isn't it? (There are more such tests in the > patch, e.g. in lisp_to_json, and I think they, too, are redundant.) > Depends on the range of ptrdiff_t and size_t. IIUC nothing in the C standard guarantees PTRDIFF_MAX <= SIZE_MAX. If we want to guarantee that, we should probably add at least a static assertion. > > > + *json = json_check (json_array ()); > > + ptrdiff_t count = SPECPDL_INDEX (); > > + record_unwind_protect_ptr (json_release_object, json); > > + for (ptrdiff_t i = 0; i < size; ++i) > > + { > > + int status > > + = json_array_append_new (*json, lisp_to_json (AREF (lisp, > i))); > > + if (status == -1) > > + json_out_of_memory (); > > + eassert (status == 0); > > + } > > + eassert (json_array_size (*json) == size); > > + clear_unwind_protect (count); > > + return unbind_to (count, Qnil); > > This, too, sounds more complex than it should: you record > unwind-protect just so lisp_to_json's subroutines could signal an > error due to insufficient memory, right? Why can't we have the > out-of-memory check only inside this loop, which you already do, and > avoid the checks on lower levels (which undoubtedly cost us extra > cycles)? What do those extra checks in json_check buy us? the errors > they signal are no more informative than the one in the loop, AFAICT. > I don't understand what you mean. We need to check the return values of all functions if we want to to use them later. > > > +static Lisp_Object > > +json_insert (void *data) > > +{ > > + const struct json_buffer_and_size *buffer_and_size = data; > > + if (buffer_and_size->size > PTRDIFF_MAX) > > + xsignal1 (Qoverflow_error, build_string ("buffer too large")); > > + insert (buffer_and_size->buffer, buffer_and_size->size); > > I don't think we need this test here, as 'insert' already has the > equivalent test in one of its subroutines. > It can't, because it takes the byte length as ptrdiff_t. We need to check before whether the size is actually in the valid range of ptrdiff_t. > > > + case JSON_INTEGER: > > + { > > + json_int_t value = json_integer_value (json); > > + if (FIXNUM_OVERFLOW_P (value)) > > + xsignal1 (Qoverflow_error, > > + build_string ("JSON integer is too large")); > > + return make_number (value); > > This overflow test is also redundant, as make_number already does it. > It can't, because json_int_t can be larger than EMACS_INT. Also, make_number doesn't contain any checks. > > > + case JSON_STRING: > > + { > > + size_t size = json_string_length (json); > > + if (FIXNUM_OVERFLOW_P (size)) > > + xsignal1 (Qoverflow_error, build_string ("JSON string is too > long")); > > + return json_make_string (json_string_value (json), size); > > Once again, the overflow test is redundant, as make_specified_string > (called by json_make_string) already includes an equivalent test. > And once again, we need to check at least whether the size fits into ptrdiff_t. > > > + case JSON_ARRAY: > > + { > > + if (++lisp_eval_depth > max_lisp_eval_depth) > > + xsignal0 (Qjson_object_too_deep); > > + size_t size = json_array_size (json); > > + if (FIXNUM_OVERFLOW_P (size)) > > + xsignal1 (Qoverflow_error, build_string ("JSON array is too > long")); > > + Lisp_Object result = Fmake_vector (make_natnum (size), > Qunbound); > > Likewise here: Fmake_vector makes sure the size is not larger than > allowed. > Same as above: It can't. > > > + case JSON_OBJECT: > > + { > > + if (++lisp_eval_depth > max_lisp_eval_depth) > > + xsignal0 (Qjson_object_too_deep); > > + size_t size = json_object_size (json); > > + if (FIXNUM_OVERFLOW_P (size)) > > + xsignal1 (Qoverflow_error, > > + build_string ("JSON object has too many elements")); > > + Lisp_Object result = CALLN (Fmake_hash_table, QCtest, Qequal, > > + QCsize, make_natnum (size)); > > Likewise here: make_natnum does the equivalent test. > It doesn't and can't. > > > + /* Adjust point by how much we just read. Do this here because > > + tokener->char_offset becomes incorrect below. */ > > + bool overflow = INT_ADD_WRAPV (point, error.position, &point); > > + eassert (!overflow); > > + eassert (point <= ZV_BYTE); > > + SET_PT_BOTH (BYTE_TO_CHAR (point), point); > > It's better to use SET_PT here, I think. > That's not possible because we don't have the character offset. (And I think using SET_PT (BYTE_TO_CHAR (point)) would just require needlessly recalculating point.) > > > + define_error (Qjson_out_of_memory, "no free memory for creating JSON > object", > > I'd prefer "not enough memory for creating JSON object". > > Done. --94eb2c1c1da489d661055aa39a72 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


Eli Za= retskii <eliz@gnu.org> schrieb am= So., 1. Okt. 2017 um 20:06=C2=A0Uhr:
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sat, 30 Sep 2017 22:02:55 +0000
> Cc: emacs-dev= el@gnu.org
>
> Subject: [PATCH] Implement native JSON support using Jansson

Thanks, a few more comments/questions.

> +#if __has_attribute (warn_unused_result)
> +# define ATTRIBUTE_WARN_UNUSED_RESULT __attribute__ ((__warn_unused_r= esult__))
> +#else
> +# define ATTRIBUTE_WARN_UNUSED_RESULT
> +#endif

Hmm... why do we need this attribute?=C2=A0 You use it with 2 static
functions, so this sounds like a left-over from the development stage?
<= /blockquote>

It's not strictly needed (and if you do= n't like it, I can remove it), but it helps catching memory leaks.
=C2=A0

> +static Lisp_Object
> +internal_catch_all_1 (Lisp_Object (*function) (void *), void *argumen= t)

Can you tell why you needed this (and the similar internal_catch_all)?
Is that only because the callbacks could signal an error, or is there
another reason?=C2=A0 If the former, I'd prefer to simplify the code an= d
its maintenance by treating the error condition in a less drastic
manner, and avoiding the call to xsignal.

The callbacks (especially insert and before-/after-change-hook) can exit= nonlocally, but these nonlocal exits may not escape the Jansson callback. = Therefore all nonlocal exits must be caught here.
=C2=A0

> +static _GL_ARG_NONNULL ((2)) Lisp_Object
> +lisp_to_json_toplevel_1 (Lisp_Object lisp, json_t **json)
> +{
> +=C2=A0 if (VECTORP (lisp))
> +=C2=A0 =C2=A0 {
> +=C2=A0 =C2=A0 =C2=A0 ptrdiff_t size =3D ASIZE (lisp);
> +=C2=A0 =C2=A0 =C2=A0 eassert (size >=3D 0);
> +=C2=A0 =C2=A0 =C2=A0 if (size > SIZE_MAX)
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 xsignal1 (Qoverflow_error, build_string (= "vector is too long"));

I think this error text is too vague.=C2=A0 Can we come up with something that describes the problem more accurately?

Maybe, but it's probably not worth it because I don't think we hav= e many architectures where PTRDIFF_MAX > SIZE_MAX.=C2=A0
=C2= =A0

And btw, how can size be greater than SIZE_MAX in this case?=C2=A0 This is<= br> a valid Lisp object, isn't it?=C2=A0 (There are more such tests in the<= br> patch, e.g. in lisp_to_json, and I think they, too, are redundant.)

Depends on the range of ptrdiff_t and size_t. = IIUC nothing in the C standard guarantees PTRDIFF_MAX <=3D SIZE_MAX. If = we want to guarantee that, we should probably add at least a static asserti= on.
=C2=A0

> +=C2=A0 =C2=A0 =C2=A0 *json =3D json_check (json_array ());
> +=C2=A0 =C2=A0 =C2=A0 ptrdiff_t count =3D SPECPDL_INDEX ();
> +=C2=A0 =C2=A0 =C2=A0 record_unwind_protect_ptr (json_release_object, = json);
> +=C2=A0 =C2=A0 =C2=A0 for (ptrdiff_t i =3D 0; i < size; ++i)
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 int status
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =3D json_array_append_new (= *json, lisp_to_json (AREF (lisp, i)));
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (status =3D=3D -1)
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 json_out_of_memory ();
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 eassert (status =3D=3D 0);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
> +=C2=A0 =C2=A0 =C2=A0 eassert (json_array_size (*json) =3D=3D size); > +=C2=A0 =C2=A0 =C2=A0 clear_unwind_protect (count);
> +=C2=A0 =C2=A0 =C2=A0 return unbind_to (count, Qnil);

This, too, sounds more complex than it should: you record
unwind-protect just so lisp_to_json's subroutines could signal an
error due to insufficient memory, right?=C2=A0 Why can't we have the out-of-memory check only inside this loop, which you already do, and
avoid the checks on lower levels (which undoubtedly cost us extra
cycles)?=C2=A0 What do those extra checks in json_check buy us? the errors<= br> they signal are no more informative than the one in the loop, AFAICT.

I don't understand what you mean. We nee= d to check the return values of all functions if we want to to use them lat= er.
=C2=A0

> +static Lisp_Object
> +json_insert (void *data)
> +{
> +=C2=A0 const struct json_buffer_and_size *buffer_and_size =3D data; > +=C2=A0 if (buffer_and_size->size > PTRDIFF_MAX)
> +=C2=A0 =C2=A0 xsignal1 (Qoverflow_error, build_string ("buffer t= oo large"));
> +=C2=A0 insert (buffer_and_size->buffer, buffer_and_size->size);=

I don't think we need this test here, as 'insert' already has t= he
equivalent test in one of its subroutines.

<= div>It can't, because it takes the byte length as ptrdiff_t. We need to= check before whether the size is actually in the valid range of ptrdiff_t.=
=C2=A0

> +=C2=A0 =C2=A0 case JSON_INTEGER:
> +=C2=A0 =C2=A0 =C2=A0 {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 json_int_t value =3D json_integer_value (= json);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (FIXNUM_OVERFLOW_P (value))
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 xsignal1 (Qoverflow_error,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= build_string ("JSON integer is too large"));
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 return make_number (value);

This overflow test is also redundant, as make_number already does it.

It can't, because json_int_t can be larg= er than EMACS_INT. Also, make_number doesn't contain any checks.
<= div>=C2=A0

> +=C2=A0 =C2=A0 case JSON_STRING:
> +=C2=A0 =C2=A0 =C2=A0 {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 size_t size =3D json_string_length (json)= ;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (FIXNUM_OVERFLOW_P (size))
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 xsignal1 (Qoverflow_error, build_s= tring ("JSON string is too long"));
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 return json_make_string (json_string_valu= e (json), size);

Once again, the overflow test is redundant, as make_specified_string
(called by json_make_string) already includes an equivalent test.

And once again, we need to check at least whethe= r the size fits into ptrdiff_t.
=C2=A0

> +=C2=A0 =C2=A0 case JSON_ARRAY:
> +=C2=A0 =C2=A0 =C2=A0 {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (++lisp_eval_depth > max_lisp_eval_= depth)
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 xsignal0 (Qjson_object_too_deep);<= br> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 size_t size =3D json_array_size (json); > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (FIXNUM_OVERFLOW_P (size))
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 xsignal1 (Qoverflow_error, build_s= tring ("JSON array is too long"));
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 Lisp_Object result =3D Fmake_vector (make= _natnum (size), Qunbound);

Likewise here: Fmake_vector makes sure the size is not larger than
allowed.

Same as above: It can't.
=C2=A0

> +=C2=A0 =C2=A0 case JSON_OBJECT:
> +=C2=A0 =C2=A0 =C2=A0 {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (++lisp_eval_depth > max_lisp_eval_= depth)
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 xsignal0 (Qjson_object_too_deep);<= br> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 size_t size =3D json_object_size (json);<= br> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (FIXNUM_OVERFLOW_P (size))
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 xsignal1 (Qoverflow_error,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= build_string ("JSON object has too many elements"));
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 Lisp_Object result =3D CALLN (Fmake_hash_= table, QCtest, Qequal,
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 QCsize, make_natnu= m (size));

Likewise here: make_natnum does the equivalent test.
<= br>
It doesn't and can't.
=C2=A0

> +=C2=A0 =C2=A0 /* Adjust point by how much we just read.=C2=A0 Do this= here because
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0tokener->char_offset becomes incorrect = below.=C2=A0 */
> +=C2=A0 =C2=A0 bool overflow =3D INT_ADD_WRAPV (point, error.position,= &point);
> +=C2=A0 =C2=A0 eassert (!overflow);
> +=C2=A0 =C2=A0 eassert (point <=3D ZV_BYTE);
> +=C2=A0 =C2=A0 SET_PT_BOTH (BYTE_TO_CHAR (point), point);

It's better to use SET_PT here, I think.

That's not possible because we don't have the character offse= t. (And I think using SET_PT (BYTE_TO_CHAR (point)) would just require need= lessly recalculating point.)
=C2=A0

> +=C2=A0 define_error (Qjson_out_of_memory, "no free memory for cr= eating JSON object",

I'd prefer "not enough memory for creating JSON object".


Done.=C2=A0
--94eb2c1c1da489d661055aa39a72--