Re: JSON/YAML/TOML/etc. parsing performance

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

From: Eli Zaretskii <eliz@gnu.org>
To: Philipp Stephani <p.stephani2@gmail.com>
Cc: emacs-devel@gnu.org
Subject: Re: JSON/YAML/TOML/etc. parsing performance
Date: Sun, 01 Oct 2017 21:06:13 +0300	[thread overview]
Message-ID: <83r2um3fqi.fsf@gnu.org> (raw)
In-Reply-To: <CAArVCkRSY+bUiX2n1sysxy4dRTLE5M8R3OZmt+1W0-OfiYGDuA@mail.gmail.com> (message from Philipp Stephani on Sat, 30 Sep 2017 22:02:55 +0000)

> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sat, 30 Sep 2017 22:02:55 +0000
> Cc: emacs-devel@gnu.org
> 
> Subject: [PATCH] Implement native JSON support using Jansson

Thanks, a few more comments/questions.

> +#if __has_attribute (warn_unused_result)
> +# define ATTRIBUTE_WARN_UNUSED_RESULT __attribute__ ((__warn_unused_result__))
> +#else
> +# define ATTRIBUTE_WARN_UNUSED_RESULT
> +#endif

Hmm... why do we need this attribute?  You use it with 2 static
functions, so this sounds like a left-over from the development stage?

> +static Lisp_Object
> +internal_catch_all_1 (Lisp_Object (*function) (void *), void *argument)

Can you tell why you needed this (and the similar internal_catch_all)?
Is that only because the callbacks could signal an error, or is there
another reason?  If the former, I'd prefer to simplify the code and
its maintenance by treating the error condition in a less drastic
manner, and avoiding the call to xsignal.

> +static _GL_ARG_NONNULL ((2)) Lisp_Object
> +lisp_to_json_toplevel_1 (Lisp_Object lisp, json_t **json)
> +{
> +  if (VECTORP (lisp))
> +    {
> +      ptrdiff_t size = ASIZE (lisp);
> +      eassert (size >= 0);
> +      if (size > SIZE_MAX)
> +        xsignal1 (Qoverflow_error, build_string ("vector is too long"));

I think this error text is too vague.  Can we come up with something
that describes the problem more accurately?

And btw, how can size be greater than SIZE_MAX in this case?  This is
a valid Lisp object, isn't it?  (There are more such tests in the
patch, e.g. in lisp_to_json, and I think they, too, are redundant.)

> +      *json = json_check (json_array ());
> +      ptrdiff_t count = SPECPDL_INDEX ();
> +      record_unwind_protect_ptr (json_release_object, json);
> +      for (ptrdiff_t i = 0; i < size; ++i)
> +        {
> +          int status
> +            = json_array_append_new (*json, lisp_to_json (AREF (lisp, i)));
> +          if (status == -1)
> +            json_out_of_memory ();
> +          eassert (status == 0);
> +        }
> +      eassert (json_array_size (*json) == size);
> +      clear_unwind_protect (count);
> +      return unbind_to (count, Qnil);

This, too, sounds more complex than it should: you record
unwind-protect just so lisp_to_json's subroutines could signal an
error due to insufficient memory, right?  Why can't we have the
out-of-memory check only inside this loop, which you already do, and
avoid the checks on lower levels (which undoubtedly cost us extra
cycles)?  What do those extra checks in json_check buy us? the errors
they signal are no more informative than the one in the loop, AFAICT.

> +static Lisp_Object
> +json_insert (void *data)
> +{
> +  const struct json_buffer_and_size *buffer_and_size = data;
> +  if (buffer_and_size->size > PTRDIFF_MAX)
> +    xsignal1 (Qoverflow_error, build_string ("buffer too large"));
> +  insert (buffer_and_size->buffer, buffer_and_size->size);

I don't think we need this test here, as 'insert' already has the
equivalent test in one of its subroutines.

> +    case JSON_INTEGER:
> +      {
> +        json_int_t value = json_integer_value (json);
> +        if (FIXNUM_OVERFLOW_P (value))
> +          xsignal1 (Qoverflow_error,
> +                    build_string ("JSON integer is too large"));
> +        return make_number (value);

This overflow test is also redundant, as make_number already does it.

> +    case JSON_STRING:
> +      {
> +        size_t size = json_string_length (json);
> +        if (FIXNUM_OVERFLOW_P (size))
> +          xsignal1 (Qoverflow_error, build_string ("JSON string is too long"));
> +        return json_make_string (json_string_value (json), size);

Once again, the overflow test is redundant, as make_specified_string
(called by json_make_string) already includes an equivalent test.

> +    case JSON_ARRAY:
> +      {
> +        if (++lisp_eval_depth > max_lisp_eval_depth)
> +          xsignal0 (Qjson_object_too_deep);
> +        size_t size = json_array_size (json);
> +        if (FIXNUM_OVERFLOW_P (size))
> +          xsignal1 (Qoverflow_error, build_string ("JSON array is too long"));
> +        Lisp_Object result = Fmake_vector (make_natnum (size), Qunbound);

Likewise here: Fmake_vector makes sure the size is not larger than
allowed.

> +    case JSON_OBJECT:
> +      {
> +        if (++lisp_eval_depth > max_lisp_eval_depth)
> +          xsignal0 (Qjson_object_too_deep);
> +        size_t size = json_object_size (json);
> +        if (FIXNUM_OVERFLOW_P (size))
> +          xsignal1 (Qoverflow_error,
> +                    build_string ("JSON object has too many elements"));
> +        Lisp_Object result = CALLN (Fmake_hash_table, QCtest, Qequal,
> +                                    QCsize, make_natnum (size));

Likewise here: make_natnum does the equivalent test.

> +    /* Adjust point by how much we just read.  Do this here because
> +       tokener->char_offset becomes incorrect below.  */
> +    bool overflow = INT_ADD_WRAPV (point, error.position, &point);
> +    eassert (!overflow);
> +    eassert (point <= ZV_BYTE);
> +    SET_PT_BOTH (BYTE_TO_CHAR (point), point);

It's better to use SET_PT here, I think.

> +  define_error (Qjson_out_of_memory, "no free memory for creating JSON object",

I'd prefer "not enough memory for creating JSON object".

Thanks again for working on this.

next prev parent reply	other threads:[~2017-10-01 18:06 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-16 15:54 JSON/YAML/TOML/etc. parsing performance Ted Zlatanov
2017-09-16 16:02 ` Mark Oteiza
2017-09-17  0:02   ` Richard Stallman
2017-09-17  3:13     ` Mark Oteiza
2017-09-18  0:00       ` Richard Stallman
2017-09-17  0:02 ` Richard Stallman
2017-09-18 13:46   ` Ted Zlatanov
2017-09-17 18:46 ` Philipp Stephani
2017-09-17 19:05   ` Eli Zaretskii
2017-09-17 20:27     ` Philipp Stephani
2017-09-17 22:41       ` Mark Oteiza
2017-09-18 13:53       ` Ted Zlatanov
2017-09-17 21:17   ` Speed of Elisp (was: JSON/YAML/TOML/etc. parsing performance) Stefan Monnier
2017-09-18 13:26   ` JSON/YAML/TOML/etc. parsing performance Philipp Stephani
2017-09-18 13:58     ` Mark Oteiza
2017-09-18 14:14       ` Philipp Stephani
2017-09-18 14:28         ` Mark Oteiza
2017-09-18 14:36           ` Philipp Stephani
2017-09-18 15:02             ` Eli Zaretskii
2017-09-18 16:14               ` Philipp Stephani
2017-09-18 17:33                 ` Eli Zaretskii
2017-09-18 19:57                 ` Thien-Thi Nguyen
2017-09-18 14:57     ` Eli Zaretskii
2017-09-18 15:07       ` Mark Oteiza
2017-09-18 15:51         ` Eli Zaretskii
2017-09-18 16:22           ` Philipp Stephani
2017-09-18 18:08             ` Eli Zaretskii
2017-09-19 19:32               ` Richard Stallman
2017-09-18 17:26           ` Glenn Morris
2017-09-18 18:16             ` Eli Zaretskii
2017-09-18 16:08       ` Philipp Stephani
2017-09-19  8:18     ` Philipp Stephani
2017-09-19 19:09       ` Eli Zaretskii
2017-09-28 21:19         ` Philipp Stephani
2017-09-28 21:27           ` Stefan Monnier
2017-09-29 19:55           ` Eli Zaretskii
2017-09-30 22:02             ` Philipp Stephani
2017-10-01 18:06               ` Eli Zaretskii [this message]
2017-10-03 12:26                 ` Philipp Stephani
2017-10-03 15:31                   ` Eli Zaretskii
2017-10-03 15:52                     ` Philipp Stephani
2017-10-03 16:26                       ` Eli Zaretskii
2017-10-03 17:10                         ` Eli Zaretskii
2017-10-03 18:37                           ` Philipp Stephani
2017-10-03 20:52                   ` Paul Eggert
2017-10-04  5:33                     ` Eli Zaretskii
2017-10-04  6:41                       ` Paul Eggert
2017-10-04  8:03                         ` Eli Zaretskii
2017-10-04 17:51                           ` Paul Eggert
2017-10-04 19:38                             ` Eli Zaretskii
2017-10-04 21:24                               ` Paul Eggert
2017-10-05  1:48                                 ` Paul Eggert
2017-10-05  7:14                                   ` Eli Zaretskii
2017-10-08 22:52                                   ` Philipp Stephani
2017-10-09  5:54                                     ` Paul Eggert
2017-10-29 20:48                                       ` Philipp Stephani
2017-10-09  6:38                                     ` Eli Zaretskii
2017-10-05  7:12                                 ` Eli Zaretskii
2017-10-06  1:58                                   ` Paul Eggert
2017-10-06  7:40                                     ` Eli Zaretskii
2017-10-06 19:36                                       ` Paul Eggert
2017-10-06 21:03                                         ` Eli Zaretskii
2017-10-08 23:09                                     ` Philipp Stephani
2017-10-09  6:19                                       ` Paul Eggert
2017-10-29 20:48                                         ` Philipp Stephani
2017-10-29 22:49                                           ` Philipp Stephani
2017-12-09 23:05                                             ` Philipp Stephani
2017-12-10  7:08                                               ` Eli Zaretskii
2017-12-10 13:26                                                 ` Philipp Stephani
2017-12-10 13:32                                                   ` Ted Zlatanov
2017-10-08 23:04                                   ` Philipp Stephani
2017-10-09  6:47                                     ` Eli Zaretskii
2017-10-08 17:58                     ` Philipp Stephani
2017-10-08 18:42                       ` Eli Zaretskii
2017-10-08 23:14                         ` Philipp Stephani
2017-10-09  6:53                           ` Eli Zaretskii
2017-10-29 20:41                             ` Philipp Stephani
2017-10-09  6:22                       ` Paul Eggert
2017-10-01 18:38               ` Eli Zaretskii
2017-10-03 12:12                 ` Philipp Stephani
2017-10-03 14:54                   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83r2um3fqi.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=p.stephani2@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).