From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!.POSTED!not-for-mail
From: Philipp Stephani <p.stephani2@gmail.com>
Newsgroups: gmane.emacs.devel
Subject: Re: JSON/YAML/TOML/etc. parsing performance
Date: Tue, 03 Oct 2017 15:52:03 +0000
Message-ID: <CAArVCkTKXBnfsxY2e+F_94w+aDkMtc5=wNqm3gKNfB=BMVURkQ@mail.gmail.com>
References: <87poaqhc63.fsf@lifelogs.com>
	<CAArVCkQTLp=Cmh-FM1R-WK=WYFX_hP=6XiUUinKRT17bciL+CQ@mail.gmail.com>
	<CAArVCkTj_1P+fTDCzEY5xG8bBB7B6ctNkQCv+bAxt=N_cuD05Q@mail.gmail.com>
	<CAArVCkS52m8SGeOQt19k+XsfZnxy+bh9LJMyX=h+e67_adP6Mg@mail.gmail.com>
	<8360ceh5f1.fsf@gnu.org>
	<CAArVCkRvSaS-orqHcVPtZ2etUnRiY39okHh+6sYV-mtQQRYs-g@mail.gmail.com>
	<83h8vl5lf9.fsf@gnu.org>
	<CAArVCkRSY+bUiX2n1sysxy4dRTLE5M8R3OZmt+1W0-OfiYGDuA@mail.gmail.com>
	<83r2um3fqi.fsf@gnu.org>
	<CAArVCkSR5w5B4_E0UU8uJtZJtr1oP51md4-i-J+NSRCGpGP_9A@mail.gmail.com>
	<83fub01c4m.fsf@gnu.org>
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary="94eb2c04faa2966310055aa67996"
X-Trace: blaine.gmane.org 1507045969 25960 195.159.176.226 (3 Oct 2017 15:52:49 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Tue, 3 Oct 2017 15:52:49 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Oct 03 17:52:43 2017
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by blaine.gmane.org with esmtp (Exim 4.84_2)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1dzPUw-00057F-Fi
	for ged-emacs-devel@m.gmane.org; Tue, 03 Oct 2017 17:52:42 +0200
Original-Received: from localhost ([::1]:59218 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1dzPUu-0002e0-1S
	for ged-emacs-devel@m.gmane.org; Tue, 03 Oct 2017 11:52:40 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56569)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <p.stephani2@gmail.com>) id 1dzPUc-0002dJ-5Q
	for emacs-devel@gnu.org; Tue, 03 Oct 2017 11:52:29 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <p.stephani2@gmail.com>) id 1dzPUZ-0001rM-MI
	for emacs-devel@gnu.org; Tue, 03 Oct 2017 11:52:22 -0400
Original-Received: from mail-oi0-x22b.google.com ([2607:f8b0:4003:c06::22b]:55348)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <p.stephani2@gmail.com>)
	id 1dzPUW-0001kc-61; Tue, 03 Oct 2017 11:52:16 -0400
Original-Received: by mail-oi0-x22b.google.com with SMTP id q133so9191876oic.12;
	Tue, 03 Oct 2017 08:52:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
	h=mime-version:references:in-reply-to:from:date:message-id:subject:to
	:cc; bh=MQC+n74blNMggoQ+/Jpa0JD/TYHppgPDA/MTpIyJdvM=;
	b=j461y8qvnnBfy3+czf+4ad2s9LFLR6WXN1m1bbnVH9Yo2Moat5niC7iOn4VKaNNhRR
	Z+4dvuk2fwJ58XYUDiD5DCF3EJs3/7zkoX8uflLnkXSob4SifCjY41Cr5v67pUWgiy1U
	pDwGIMXHkN0jhS5pBFKqxKvLT1lk8I4RLnrEHwP2ZALHPHmjRYTZeCK5clwK6JpHBPlD
	sLKMiDA24Hpc0PSoLZwCdyYa8ifYU67zvvR0VX8XF9VY6kGFKbpMlymnGBOEsLBwplHD
	q/VBHAQJa5HJdhSsvHH7GtpsYJLA5TE9IYIX40IzJNxR0gdMfRwoT40VRszTwaEJFFIq
	ACSg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:mime-version:references:in-reply-to:from:date
	:message-id:subject:to:cc;
	bh=MQC+n74blNMggoQ+/Jpa0JD/TYHppgPDA/MTpIyJdvM=;
	b=FDEnFORmfzhjie0tvh9RZnDPod4w5hP7STLyWYOg1jrioxe4To0soxXskCaMFCtor6
	cFCTyyrx5RZ8bmG4JuRDBULtACXFiqBb36s4YG1/Rhazf44gKoCq7HWGSFsaSixWHdYv
	BvRS2iSg7TxN1BalLtXda9FsOUM6J8+yaQonVWAbovy/LGkTGIW0jtRjtl7EhGZBOD3W
	a9KDWo2yQi2kR/j8wXYis1s85I0J90f6XV/SIfZQJMVp7RyhrU+gkll58v5gXXSW6YPZ
	KJOotJ0fMdc76DreZBG76qmjQHuojjbhajXxIfs8+c8I543DAfR3xAq/+ugVmrXAD6rS
	PDRg==
X-Gm-Message-State: AMCzsaUuwRc08zCFZ8bKHd+B4IOX0y2eslDfBkZWEiBF+tieUdFzYRgK
	1ulAQCS3V1MKmLKNYUhU5Pir0Z/QcYsQ9brkUoGRdw==
X-Google-Smtp-Source: AOwi7QAlTNfUBZR8eDO6LoB63ah0iWaVny0glVg1UlBHpDA/kYJ6nPHOQ9NLnGLEBoURkRmC67wsL4MfF0M+oa9K/eg=
X-Received: by 10.157.41.36 with SMTP id d33mr10076949otb.61.1507045934521;
	Tue, 03 Oct 2017 08:52:14 -0700 (PDT)
In-Reply-To: <83fub01c4m.fsf@gnu.org>
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-Received-From: 2607:f8b0:4003:c06::22b
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel/>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: "Emacs-devel" <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.emacs.devel:219040
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/219040>

--94eb2c04faa2966310055aa67996
Content-Type: text/plain; charset="UTF-8"

Eli Zaretskii <eliz@gnu.org> schrieb am Di., 3. Okt. 2017 um 17:32 Uhr:

> >
> > > > +static Lisp_Object
> > > > +internal_catch_all_1 (Lisp_Object (*function) (void *), void
> *argument)
> > >
> > > Can you tell why you needed this (and the similar internal_catch_all)?
> > > Is that only because the callbacks could signal an error, or is there
> > > another reason?  If the former, I'd prefer to simplify the code and
> > > its maintenance by treating the error condition in a less drastic
> > > manner, and avoiding the call to xsignal.
> >
> > The callbacks (especially insert and before-/after-change-hook) can exit
> > nonlocally, but these nonlocal exits may not escape the Jansson callback.
> > Therefore all nonlocal exits must be caught here.
>
> Why can't you use record_unwind_protect, as we normally do in these
> situations?
>

How would that help? record_unwind_protect can't stop nonlocal exits.


>
> > > > +static _GL_ARG_NONNULL ((2)) Lisp_Object
> > > > +lisp_to_json_toplevel_1 (Lisp_Object lisp, json_t **json)
> > > > +{
> > > > +  if (VECTORP (lisp))
> > > > +    {
> > > > +      ptrdiff_t size = ASIZE (lisp);
> > > > +      eassert (size >= 0);
> > > > +      if (size > SIZE_MAX)
> > > > +        xsignal1 (Qoverflow_error, build_string ("vector is too
> long"));
> > >
> > > I think this error text is too vague.  Can we come up with something
> > > that describes the problem more accurately?
> >
> > Maybe, but it's probably not worth it because I don't think we have many
> > architectures where PTRDIFF_MAX > SIZE_MAX.
>
> Then why do we punish all the platforms with this runtime check?
>

If you think this cannot happen we can turn it into a runtime or
compile-time assertion.


>
> > > And btw, how can size be greater than SIZE_MAX in this case?  This is
> > > a valid Lisp object, isn't it?  (There are more such tests in the
> > > patch, e.g. in lisp_to_json, and I think they, too, are redundant.)
> >
> > Depends on the range of ptrdiff_t and size_t. IIUC nothing in the C
> > standard guarantees PTRDIFF_MAX <= SIZE_MAX.
>
> I wasn't talking about PTRDIFF_MAX, I was talking about 'size', which
> is the number of bytes in a Lisp string.  Since that Lisp string is a
> valid Lisp object, how can its size be greater than SIZE_MAX?  I don't
> think there's a way of creating such a Lisp string in Emacs, because
> functions that allocate memory for strings will prevent that.
>

Then I think we should at least add an assertion to document this.


>
> > > > +      *json = json_check (json_array ());
> > > > +      ptrdiff_t count = SPECPDL_INDEX ();
> > > > +      record_unwind_protect_ptr (json_release_object, json);
> > > > +      for (ptrdiff_t i = 0; i < size; ++i)
> > > > +        {
> > > > +          int status
> > > > +            = json_array_append_new (*json, lisp_to_json (AREF
> (lisp,
> > > i)));
> > > > +          if (status == -1)
> > > > +            json_out_of_memory ();
> > > > +          eassert (status == 0);
> > > > +        }
> > > > +      eassert (json_array_size (*json) == size);
> > > > +      clear_unwind_protect (count);
> > > > +      return unbind_to (count, Qnil);
> > >
> > > This, too, sounds more complex than it should: you record
> > > unwind-protect just so lisp_to_json's subroutines could signal an
> > > error due to insufficient memory, right?  Why can't we have the
> > > out-of-memory check only inside this loop, which you already do, and
> > > avoid the checks on lower levels (which undoubtedly cost us extra
> > > cycles)?  What do those extra checks in json_check buy us? the errors
> > > they signal are no more informative than the one in the loop, AFAICT.
> >
> > I don't understand what you mean. We need to check the return values of
> all
> > functions if we want to to use them later.
>
> Yes, but what problems can cause these return value to be invalid?
> AFAICT, only out-of-memory conditions, and that can be checked only
> once, there's no need to check every single allocation, because once
> an allocation fails, all the rest will too.
>

But if the first succeeds, the second can still fail, so we do need to
check all of them.


>
> > > > +static Lisp_Object
> > > > +json_insert (void *data)
> > > > +{
> > > > +  const struct json_buffer_and_size *buffer_and_size = data;
> > > > +  if (buffer_and_size->size > PTRDIFF_MAX)
> > > > +    xsignal1 (Qoverflow_error, build_string ("buffer too large"));
> > > > +  insert (buffer_and_size->buffer, buffer_and_size->size);
> > >
> > > I don't think we need this test here, as 'insert' already has the
> > > equivalent test in one of its subroutines.
> >
> > It can't, because it takes the byte length as ptrdiff_t. We need to check
> > before whether the size is actually in the valid range of ptrdiff_t.
>
> I'm sorry, but I don't see why we should support such exotic
> situations only for this one feature.  In all other cases we use
> either ptrdiff_t type or EMACS_INT type, and these issues disappear
> then.  Trying to support the SIZE_MAX > PTRDIFF_MAX situation causes
> the code to be much more complicated, harder to maintain, and more
> expensive at run time than it should be.


We can't avoid these checks. The API returns size_t, so we can only assume
that the numbers are in the valid range of size_t, which is larger than the
ones for positive ptrdiff_t's. There's no way around this.


> I'm not even sure there are
> such platforms out there that Emacs supports,


All platforms that I know of have SIZE_MAX > PTRDIFF_MAX.


> but if there are, we
> already have a gazillion problems like that all over our code.


Just because other parts of the codebase are buggy doesn't mean we need to
introduce more bugs in new code.


> I
> object to having such code just for this reason, sorry.
>

We can't avoid it.


>
> > > > +    case JSON_INTEGER:
> > > > +      {
> > > > +        json_int_t value = json_integer_value (json);
> > > > +        if (FIXNUM_OVERFLOW_P (value))
> > > > +          xsignal1 (Qoverflow_error,
> > > > +                    build_string ("JSON integer is too large"));
> > > > +        return make_number (value);
> > >
> > > This overflow test is also redundant, as make_number already does it.
> >
> > It can't, because json_int_t can be larger than EMACS_INT.
>
> OK, but then I think we should consider returning a float value, or a
> cons of 2 integers.  If these situations are frequent enough, users
> will thank us, and if they are very infrequent, they will never see
> such values, and we gain code simplicity and less non-local exits.
>

Returning a float (using make_natnum_or_float) might work, but in the end
I've decided against it because it could silently drop precision. I think
that's worse than signaling an error.


>
> > > > +    case JSON_STRING:
> > > > +      {
> > > > +        size_t size = json_string_length (json);
> > > > +        if (FIXNUM_OVERFLOW_P (size))
> > > > +          xsignal1 (Qoverflow_error, build_string ("JSON string is
> too
> > > long"));
> > > > +        return json_make_string (json_string_value (json), size);
> > >
> > > Once again, the overflow test is redundant, as make_specified_string
> > > (called by json_make_string) already includes an equivalent test.
> >
> > And once again, we need to check at least whether the size fits into
> > ptrdiff_t.
>
> No, we don't, as we don't in other similar cases.
>

I don't understand why you think these checks aren't necessary. Converting
between integral types when the number is out of range for the destination
type results in an implementation-defined result, i.e. it's unportable.
Even assuming the GCC convention, performing such conversions results in
dangerously incorrect values.


>
> > > > +    case JSON_ARRAY:
> > > > +      {
> > > > +        if (++lisp_eval_depth > max_lisp_eval_depth)
> > > > +          xsignal0 (Qjson_object_too_deep);
> > > > +        size_t size = json_array_size (json);
> > > > +        if (FIXNUM_OVERFLOW_P (size))
> > > > +          xsignal1 (Qoverflow_error, build_string ("JSON array is
> too
> > > long"));
> > > > +        Lisp_Object result = Fmake_vector (make_natnum (size),
> > > Qunbound);
> > >
> > > Likewise here: Fmake_vector makes sure the size is not larger than
> > > allowed.
> >
> > Same as above: It can't.
>
> It can and it does.
>

No, it can't. make_natnum takes a ptrdiff_t argument, and when passing a
value that's out of range for ptrdiff_t, it will receive an incorrect,
implementation-defined value.


>
> > > > +    case JSON_OBJECT:
> > > > +      {
> > > > +        if (++lisp_eval_depth > max_lisp_eval_depth)
> > > > +          xsignal0 (Qjson_object_too_deep);
> > > > +        size_t size = json_object_size (json);
> > > > +        if (FIXNUM_OVERFLOW_P (size))
> > > > +          xsignal1 (Qoverflow_error,
> > > > +                    build_string ("JSON object has too many
> elements"));
> > > > +        Lisp_Object result = CALLN (Fmake_hash_table, QCtest,
> Qequal,
> > > > +                                    QCsize, make_natnum (size));
> > >
> > > Likewise here: make_natnum does the equivalent test.
> >
> > It doesn't and can't.
>
> Yes, it does:
>
>   INLINE Lisp_Object
>   make_natnum (EMACS_INT n)
>   {
>     eassert (0 <= n && n <= MOST_POSITIVE_FIXNUM);  <<<<<<<<<<<<<<<
>     EMACS_INT int0 = Lisp_Int0;
>

We're not talking about the same thing. What if make_natnum is called with
a value that doesn't fit in EMACS_INT?
Also an assertion is incorrect here because the overflowing value comes
from user data.

--94eb2c04faa2966310055aa67996
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br><div class=3D"gmail_quote"><div dir=3D"ltr">Eli Za=
retskii &lt;<a href=3D"mailto:eliz@gnu.org">eliz@gnu.org</a>&gt; schrieb am=
 Di., 3. Okt. 2017 um 17:32=C2=A0Uhr:<br></div><blockquote class=3D"gmail_q=
uote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1e=
x">&gt;=C2=A0<br>
&gt; &gt; &gt; +static Lisp_Object<br>
&gt; &gt; &gt; +internal_catch_all_1 (Lisp_Object (*function) (void *), voi=
d *argument)<br>
&gt; &gt;<br>
&gt; &gt; Can you tell why you needed this (and the similar internal_catch_=
all)?<br>
&gt; &gt; Is that only because the callbacks could signal an error, or is t=
here<br>
&gt; &gt; another reason?=C2=A0 If the former, I&#39;d prefer to simplify t=
he code and<br>
&gt; &gt; its maintenance by treating the error condition in a less drastic=
<br>
&gt; &gt; manner, and avoiding the call to xsignal.<br>
&gt;<br>
&gt; The callbacks (especially insert and before-/after-change-hook) can ex=
it<br>
&gt; nonlocally, but these nonlocal exits may not escape the Jansson callba=
ck.<br>
&gt; Therefore all nonlocal exits must be caught here.<br>
<br>
Why can&#39;t you use record_unwind_protect, as we normally do in these<br>
situations?<br></blockquote><div><br></div><div>How would that help? record=
_unwind_protect can&#39;t stop nonlocal exits.</div><div>=C2=A0</div><block=
quote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc=
 solid;padding-left:1ex">
<br>
&gt; &gt; &gt; +static _GL_ARG_NONNULL ((2)) Lisp_Object<br>
&gt; &gt; &gt; +lisp_to_json_toplevel_1 (Lisp_Object lisp, json_t **json)<b=
r>
&gt; &gt; &gt; +{<br>
&gt; &gt; &gt; +=C2=A0 if (VECTORP (lisp))<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 {<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 ptrdiff_t size =3D ASIZE (lisp);<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 eassert (size &gt;=3D 0);<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 if (size &gt; SIZE_MAX)<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 xsignal1 (Qoverflow_error, buil=
d_string (&quot;vector is too long&quot;));<br>
&gt; &gt;<br>
&gt; &gt; I think this error text is too vague.=C2=A0 Can we come up with s=
omething<br>
&gt; &gt; that describes the problem more accurately?<br>
&gt;<br>
&gt; Maybe, but it&#39;s probably not worth it because I don&#39;t think we=
 have many<br>
&gt; architectures where PTRDIFF_MAX &gt; SIZE_MAX.<br>
<br>
Then why do we punish all the platforms with this runtime check?<br></block=
quote><div><br></div><div>If you think this cannot happen we can turn it in=
to a runtime or compile-time assertion.</div><div>=C2=A0</div><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;=
padding-left:1ex">
<br>
&gt; &gt; And btw, how can size be greater than SIZE_MAX in this case?=C2=
=A0 This is<br>
&gt; &gt; a valid Lisp object, isn&#39;t it?=C2=A0 (There are more such tes=
ts in the<br>
&gt; &gt; patch, e.g. in lisp_to_json, and I think they, too, are redundant=
.)<br>
&gt;<br>
&gt; Depends on the range of ptrdiff_t and size_t. IIUC nothing in the C<br=
>
&gt; standard guarantees PTRDIFF_MAX &lt;=3D SIZE_MAX.<br>
<br>
I wasn&#39;t talking about PTRDIFF_MAX, I was talking about &#39;size&#39;,=
 which<br>
is the number of bytes in a Lisp string.=C2=A0 Since that Lisp string is a<=
br>
valid Lisp object, how can its size be greater than SIZE_MAX?=C2=A0 I don&#=
39;t<br>
think there&#39;s a way of creating such a Lisp string in Emacs, because<br=
>
functions that allocate memory for strings will prevent that.<br></blockquo=
te><div><br></div><div>Then I think we should at least add an assertion to =
document this.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" styl=
e=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 *json =3D json_check (json_array ());<=
br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 ptrdiff_t count =3D SPECPDL_INDEX ();<=
br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 record_unwind_protect_ptr (json_releas=
e_object, json);<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 for (ptrdiff_t i =3D 0; i &lt; size; +=
+i)<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 {<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 int status<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =3D json_array_ap=
pend_new (*json, lisp_to_json (AREF (lisp,<br>
&gt; &gt; i)));<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (status =3D=3D -1)<br=
>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 json_out_of_memor=
y ();<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 eassert (status =3D=3D 0=
);<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 }<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 eassert (json_array_size (*json) =3D=
=3D size);<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 clear_unwind_protect (count);<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 return unbind_to (count, Qnil);<br>
&gt; &gt;<br>
&gt; &gt; This, too, sounds more complex than it should: you record<br>
&gt; &gt; unwind-protect just so lisp_to_json&#39;s subroutines could signa=
l an<br>
&gt; &gt; error due to insufficient memory, right?=C2=A0 Why can&#39;t we h=
ave the<br>
&gt; &gt; out-of-memory check only inside this loop, which you already do, =
and<br>
&gt; &gt; avoid the checks on lower levels (which undoubtedly cost us extra=
<br>
&gt; &gt; cycles)?=C2=A0 What do those extra checks in json_check buy us? t=
he errors<br>
&gt; &gt; they signal are no more informative than the one in the loop, AFA=
ICT.<br>
&gt;<br>
&gt; I don&#39;t understand what you mean. We need to check the return valu=
es of all<br>
&gt; functions if we want to to use them later.<br>
<br>
Yes, but what problems can cause these return value to be invalid?<br>
AFAICT, only out-of-memory conditions, and that can be checked only<br>
once, there&#39;s no need to check every single allocation, because once<br=
>
an allocation fails, all the rest will too.<br></blockquote><div><br></div>=
<div>But if the first succeeds, the second can still fail, so we do need to=
 check all of them.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote"=
 style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
&gt; &gt; &gt; +static Lisp_Object<br>
&gt; &gt; &gt; +json_insert (void *data)<br>
&gt; &gt; &gt; +{<br>
&gt; &gt; &gt; +=C2=A0 const struct json_buffer_and_size *buffer_and_size =
=3D data;<br>
&gt; &gt; &gt; +=C2=A0 if (buffer_and_size-&gt;size &gt; PTRDIFF_MAX)<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 xsignal1 (Qoverflow_error, build_string (&quo=
t;buffer too large&quot;));<br>
&gt; &gt; &gt; +=C2=A0 insert (buffer_and_size-&gt;buffer, buffer_and_size-=
&gt;size);<br>
&gt; &gt;<br>
&gt; &gt; I don&#39;t think we need this test here, as &#39;insert&#39; alr=
eady has the<br>
&gt; &gt; equivalent test in one of its subroutines.<br>
&gt;<br>
&gt; It can&#39;t, because it takes the byte length as ptrdiff_t. We need t=
o check<br>
&gt; before whether the size is actually in the valid range of ptrdiff_t.<b=
r>
<br>
I&#39;m sorry, but I don&#39;t see why we should support such exotic<br>
situations only for this one feature.=C2=A0 In all other cases we use<br>
either ptrdiff_t type or EMACS_INT type, and these issues disappear<br>
then.=C2=A0 Trying to support the SIZE_MAX &gt; PTRDIFF_MAX situation cause=
s<br>
the code to be much more complicated, harder to maintain, and more<br>
expensive at run time than it should be.=C2=A0</blockquote><div><br></div><=
div>We can&#39;t avoid these checks. The API returns size_t, so we can only=
 assume that the numbers are in the valid range of size_t, which is larger =
than the ones for positive ptrdiff_t&#39;s. There&#39;s no way around this.=
</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 =
0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> I&#39;m not even sur=
e there are<br>
such platforms out there that Emacs supports, </blockquote><div><br></div><=
div>All platforms that I know of have SIZE_MAX &gt; PTRDIFF_MAX.</div><div>=
=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bo=
rder-left:1px #ccc solid;padding-left:1ex">but if there are, we<br>
already have a gazillion problems like that all over our code.=C2=A0</block=
quote><div><br></div><div>Just because other parts of the codebase are bugg=
y doesn&#39;t mean we need to introduce more bugs in new code.</div><div>=
=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bo=
rder-left:1px #ccc solid;padding-left:1ex"> I<br>
object to having such code just for this reason, sorry.<br></blockquote><di=
v><br></div><div>We can&#39;t avoid it.</div><div>=C2=A0</div><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;=
padding-left:1ex">
<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 case JSON_INTEGER:<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 {<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 json_int_t value =3D json_integ=
er_value (json);<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (FIXNUM_OVERFLOW_P (value))<=
br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 xsignal1 (Qoverflow_erro=
r,<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 build_string (&quot;JSON integer is too large&quot;));<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 return make_number (value);<br>
&gt; &gt;<br>
&gt; &gt; This overflow test is also redundant, as make_number already does=
 it.<br>
&gt;<br>
&gt; It can&#39;t, because json_int_t can be larger than EMACS_INT.<br>
<br>
OK, but then I think we should consider returning a float value, or a<br>
cons of 2 integers.=C2=A0 If these situations are frequent enough, users<br=
>
will thank us, and if they are very infrequent, they will never see<br>
such values, and we gain code simplicity and less non-local exits.<br></blo=
ckquote><div><br></div><div>Returning a float (using make_natnum_or_float) =
might work, but in the end I&#39;ve decided against it because it could sil=
ently drop precision. I think that&#39;s worse than signaling an error.</di=
v><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 case JSON_STRING:<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 {<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 size_t size =3D json_string_len=
gth (json);<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (FIXNUM_OVERFLOW_P (size))<b=
r>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 xsignal1 (Qoverflow_erro=
r, build_string (&quot;JSON string is too<br>
&gt; &gt; long&quot;));<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 return json_make_string (json_s=
tring_value (json), size);<br>
&gt; &gt;<br>
&gt; &gt; Once again, the overflow test is redundant, as make_specified_str=
ing<br>
&gt; &gt; (called by json_make_string) already includes an equivalent test.=
<br>
&gt;<br>
&gt; And once again, we need to check at least whether the size fits into<b=
r>
&gt; ptrdiff_t.<br>
<br>
No, we don&#39;t, as we don&#39;t in other similar cases.<br></blockquote><=
div><br></div><div>I don&#39;t understand why you think these checks aren&#=
39;t necessary. Converting between integral types when the number is out of=
 range for the destination type results in an implementation-defined result=
, i.e. it&#39;s unportable. Even assuming the GCC convention, performing su=
ch conversions results in dangerously incorrect values.</div><div>=C2=A0</d=
iv><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left=
:1px #ccc solid;padding-left:1ex">
<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 case JSON_ARRAY:<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 {<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (++lisp_eval_depth &gt; max_=
lisp_eval_depth)<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 xsignal0 (Qjson_object_t=
oo_deep);<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 size_t size =3D json_array_size=
 (json);<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (FIXNUM_OVERFLOW_P (size))<b=
r>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 xsignal1 (Qoverflow_erro=
r, build_string (&quot;JSON array is too<br>
&gt; &gt; long&quot;));<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 Lisp_Object result =3D Fmake_ve=
ctor (make_natnum (size),<br>
&gt; &gt; Qunbound);<br>
&gt; &gt;<br>
&gt; &gt; Likewise here: Fmake_vector makes sure the size is not larger tha=
n<br>
&gt; &gt; allowed.<br>
&gt;<br>
&gt; Same as above: It can&#39;t.<br>
<br>
It can and it does.<br></blockquote><div><br></div><div>No, it can&#39;t. m=
ake_natnum takes a ptrdiff_t argument, and when passing a value that&#39;s =
out of range for ptrdiff_t, it will receive an incorrect, implementation-de=
fined value.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 case JSON_OBJECT:<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 {<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (++lisp_eval_depth &gt; max_=
lisp_eval_depth)<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 xsignal0 (Qjson_object_t=
oo_deep);<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 size_t size =3D json_object_siz=
e (json);<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (FIXNUM_OVERFLOW_P (size))<b=
r>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 xsignal1 (Qoverflow_erro=
r,<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 build_string (&quot;JSON object has too many elements&quot;));<b=
r>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 Lisp_Object result =3D CALLN (F=
make_hash_table, QCtest, Qequal,<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 QCsize, =
make_natnum (size));<br>
&gt; &gt;<br>
&gt; &gt; Likewise here: make_natnum does the equivalent test.<br>
&gt;<br>
&gt; It doesn&#39;t and can&#39;t.<br>
<br>
Yes, it does:<br>
<br>
=C2=A0 INLINE Lisp_Object<br>
=C2=A0 make_natnum (EMACS_INT n)<br>
=C2=A0 {<br>
=C2=A0 =C2=A0 eassert (0 &lt;=3D n &amp;&amp; n &lt;=3D MOST_POSITIVE_FIXNU=
M);=C2=A0 &lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;<br>
=C2=A0 =C2=A0 EMACS_INT int0 =3D Lisp_Int0;<br></blockquote><div><br></div>=
<div>We&#39;re not talking about the same thing. What if make_natnum is cal=
led with a value that doesn&#39;t fit in EMACS_INT?</div><div>Also an asser=
tion is incorrect here because the overflowing value comes from user data.<=
/div></div></div>

--94eb2c04faa2966310055aa67996--