From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!.POSTED!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: JSON/YAML/TOML/etc. parsing performance
Date: Sun, 01 Oct 2017 21:06:13 +0300
Message-ID: <83r2um3fqi.fsf@gnu.org>
References: <87poaqhc63.fsf@lifelogs.com>
	<CAArVCkQTLp=Cmh-FM1R-WK=WYFX_hP=6XiUUinKRT17bciL+CQ@mail.gmail.com>
	<CAArVCkTj_1P+fTDCzEY5xG8bBB7B6ctNkQCv+bAxt=N_cuD05Q@mail.gmail.com>
	<CAArVCkS52m8SGeOQt19k+XsfZnxy+bh9LJMyX=h+e67_adP6Mg@mail.gmail.com>
	<8360ceh5f1.fsf@gnu.org>
	<CAArVCkRvSaS-orqHcVPtZ2etUnRiY39okHh+6sYV-mtQQRYs-g@mail.gmail.com>
	<83h8vl5lf9.fsf@gnu.org>
	<CAArVCkRSY+bUiX2n1sysxy4dRTLE5M8R3OZmt+1W0-OfiYGDuA@mail.gmail.com>
Reply-To: Eli Zaretskii <eliz@gnu.org>
NNTP-Posting-Host: blaine.gmane.org
X-Trace: blaine.gmane.org 1506881200 26369 195.159.176.226 (1 Oct 2017 18:06:40 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Sun, 1 Oct 2017 18:06:40 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: Philipp Stephani <p.stephani2@gmail.com>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Oct 01 20:06:35 2017
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by blaine.gmane.org with esmtp (Exim 4.84_2)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1dyidN-0005j0-CB
	for ged-emacs-devel@m.gmane.org; Sun, 01 Oct 2017 20:06:33 +0200
Original-Received: from localhost ([::1]:49393 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1dyidO-0004Yx-7z
	for ged-emacs-devel@m.gmane.org; Sun, 01 Oct 2017 14:06:34 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56385)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1dyidH-0004YN-KR
	for emacs-devel@gnu.org; Sun, 01 Oct 2017 14:06:28 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1dyidD-0003Wv-9n
	for emacs-devel@gnu.org; Sun, 01 Oct 2017 14:06:27 -0400
Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:42773)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>)
	id 1dyidD-0003Wq-7O; Sun, 01 Oct 2017 14:06:23 -0400
Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:1086
	helo=home-c4e4a596f7)
	by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
	(Exim 4.82) (envelope-from <eliz@gnu.org>)
	id 1dyidB-0003Nj-Mh; Sun, 01 Oct 2017 14:06:23 -0400
In-reply-to: <CAArVCkRSY+bUiX2n1sysxy4dRTLE5M8R3OZmt+1W0-OfiYGDuA@mail.gmail.com>
	(message from Philipp Stephani on Sat, 30 Sep 2017 22:02:55 +0000)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2001:4830:134:3::e
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel/>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: "Emacs-devel" <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.emacs.devel:218990
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/218990>

> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sat, 30 Sep 2017 22:02:55 +0000
> Cc: emacs-devel@gnu.org
> 
> Subject: [PATCH] Implement native JSON support using Jansson

Thanks, a few more comments/questions.

> +#if __has_attribute (warn_unused_result)
> +# define ATTRIBUTE_WARN_UNUSED_RESULT __attribute__ ((__warn_unused_result__))
> +#else
> +# define ATTRIBUTE_WARN_UNUSED_RESULT
> +#endif

Hmm... why do we need this attribute?  You use it with 2 static
functions, so this sounds like a left-over from the development stage?

> +static Lisp_Object
> +internal_catch_all_1 (Lisp_Object (*function) (void *), void *argument)

Can you tell why you needed this (and the similar internal_catch_all)?
Is that only because the callbacks could signal an error, or is there
another reason?  If the former, I'd prefer to simplify the code and
its maintenance by treating the error condition in a less drastic
manner, and avoiding the call to xsignal.

> +static _GL_ARG_NONNULL ((2)) Lisp_Object
> +lisp_to_json_toplevel_1 (Lisp_Object lisp, json_t **json)
> +{
> +  if (VECTORP (lisp))
> +    {
> +      ptrdiff_t size = ASIZE (lisp);
> +      eassert (size >= 0);
> +      if (size > SIZE_MAX)
> +        xsignal1 (Qoverflow_error, build_string ("vector is too long"));

I think this error text is too vague.  Can we come up with something
that describes the problem more accurately?

And btw, how can size be greater than SIZE_MAX in this case?  This is
a valid Lisp object, isn't it?  (There are more such tests in the
patch, e.g. in lisp_to_json, and I think they, too, are redundant.)

> +      *json = json_check (json_array ());
> +      ptrdiff_t count = SPECPDL_INDEX ();
> +      record_unwind_protect_ptr (json_release_object, json);
> +      for (ptrdiff_t i = 0; i < size; ++i)
> +        {
> +          int status
> +            = json_array_append_new (*json, lisp_to_json (AREF (lisp, i)));
> +          if (status == -1)
> +            json_out_of_memory ();
> +          eassert (status == 0);
> +        }
> +      eassert (json_array_size (*json) == size);
> +      clear_unwind_protect (count);
> +      return unbind_to (count, Qnil);

This, too, sounds more complex than it should: you record
unwind-protect just so lisp_to_json's subroutines could signal an
error due to insufficient memory, right?  Why can't we have the
out-of-memory check only inside this loop, which you already do, and
avoid the checks on lower levels (which undoubtedly cost us extra
cycles)?  What do those extra checks in json_check buy us? the errors
they signal are no more informative than the one in the loop, AFAICT.

> +static Lisp_Object
> +json_insert (void *data)
> +{
> +  const struct json_buffer_and_size *buffer_and_size = data;
> +  if (buffer_and_size->size > PTRDIFF_MAX)
> +    xsignal1 (Qoverflow_error, build_string ("buffer too large"));
> +  insert (buffer_and_size->buffer, buffer_and_size->size);

I don't think we need this test here, as 'insert' already has the
equivalent test in one of its subroutines.

> +    case JSON_INTEGER:
> +      {
> +        json_int_t value = json_integer_value (json);
> +        if (FIXNUM_OVERFLOW_P (value))
> +          xsignal1 (Qoverflow_error,
> +                    build_string ("JSON integer is too large"));
> +        return make_number (value);

This overflow test is also redundant, as make_number already does it.

> +    case JSON_STRING:
> +      {
> +        size_t size = json_string_length (json);
> +        if (FIXNUM_OVERFLOW_P (size))
> +          xsignal1 (Qoverflow_error, build_string ("JSON string is too long"));
> +        return json_make_string (json_string_value (json), size);

Once again, the overflow test is redundant, as make_specified_string
(called by json_make_string) already includes an equivalent test.

> +    case JSON_ARRAY:
> +      {
> +        if (++lisp_eval_depth > max_lisp_eval_depth)
> +          xsignal0 (Qjson_object_too_deep);
> +        size_t size = json_array_size (json);
> +        if (FIXNUM_OVERFLOW_P (size))
> +          xsignal1 (Qoverflow_error, build_string ("JSON array is too long"));
> +        Lisp_Object result = Fmake_vector (make_natnum (size), Qunbound);

Likewise here: Fmake_vector makes sure the size is not larger than
allowed.

> +    case JSON_OBJECT:
> +      {
> +        if (++lisp_eval_depth > max_lisp_eval_depth)
> +          xsignal0 (Qjson_object_too_deep);
> +        size_t size = json_object_size (json);
> +        if (FIXNUM_OVERFLOW_P (size))
> +          xsignal1 (Qoverflow_error,
> +                    build_string ("JSON object has too many elements"));
> +        Lisp_Object result = CALLN (Fmake_hash_table, QCtest, Qequal,
> +                                    QCsize, make_natnum (size));

Likewise here: make_natnum does the equivalent test.

> +    /* Adjust point by how much we just read.  Do this here because
> +       tokener->char_offset becomes incorrect below.  */
> +    bool overflow = INT_ADD_WRAPV (point, error.position, &point);
> +    eassert (!overflow);
> +    eassert (point <= ZV_BYTE);
> +    SET_PT_BOTH (BYTE_TO_CHAR (point), point);

It's better to use SET_PT here, I think.

> +  define_error (Qjson_out_of_memory, "no free memory for creating JSON object",

I'd prefer "not enough memory for creating JSON object".

Thanks again for working on this.