From: Philipp Stephani <p.stephani2@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: emacs-devel@gnu.org
Subject: Re: JSON/YAML/TOML/etc. parsing performance
Date: Tue, 03 Oct 2017 12:26:32 +0000 [thread overview]
Message-ID: <CAArVCkSR5w5B4_E0UU8uJtZJtr1oP51md4-i-J+NSRCGpGP_9A@mail.gmail.com> (raw)
In-Reply-To: <83r2um3fqi.fsf@gnu.org>
[-- Attachment #1: Type: text/plain, Size: 6984 bytes --]
Eli Zaretskii <eliz@gnu.org> schrieb am So., 1. Okt. 2017 um 20:06 Uhr:
> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Sat, 30 Sep 2017 22:02:55 +0000
> > Cc: emacs-devel@gnu.org
> >
> > Subject: [PATCH] Implement native JSON support using Jansson
>
> Thanks, a few more comments/questions.
>
> > +#if __has_attribute (warn_unused_result)
> > +# define ATTRIBUTE_WARN_UNUSED_RESULT __attribute__
> ((__warn_unused_result__))
> > +#else
> > +# define ATTRIBUTE_WARN_UNUSED_RESULT
> > +#endif
>
> Hmm... why do we need this attribute? You use it with 2 static
> functions, so this sounds like a left-over from the development stage?
>
It's not strictly needed (and if you don't like it, I can remove it), but
it helps catching memory leaks.
>
> > +static Lisp_Object
> > +internal_catch_all_1 (Lisp_Object (*function) (void *), void *argument)
>
> Can you tell why you needed this (and the similar internal_catch_all)?
> Is that only because the callbacks could signal an error, or is there
> another reason? If the former, I'd prefer to simplify the code and
> its maintenance by treating the error condition in a less drastic
> manner, and avoiding the call to xsignal.
>
The callbacks (especially insert and before-/after-change-hook) can exit
nonlocally, but these nonlocal exits may not escape the Jansson callback.
Therefore all nonlocal exits must be caught here.
>
> > +static _GL_ARG_NONNULL ((2)) Lisp_Object
> > +lisp_to_json_toplevel_1 (Lisp_Object lisp, json_t **json)
> > +{
> > + if (VECTORP (lisp))
> > + {
> > + ptrdiff_t size = ASIZE (lisp);
> > + eassert (size >= 0);
> > + if (size > SIZE_MAX)
> > + xsignal1 (Qoverflow_error, build_string ("vector is too long"));
>
> I think this error text is too vague. Can we come up with something
> that describes the problem more accurately?
Maybe, but it's probably not worth it because I don't think we have many
architectures where PTRDIFF_MAX > SIZE_MAX.
>
> And btw, how can size be greater than SIZE_MAX in this case? This is
> a valid Lisp object, isn't it? (There are more such tests in the
> patch, e.g. in lisp_to_json, and I think they, too, are redundant.)
>
Depends on the range of ptrdiff_t and size_t. IIUC nothing in the C
standard guarantees PTRDIFF_MAX <= SIZE_MAX. If we want to guarantee that,
we should probably add at least a static assertion.
>
> > + *json = json_check (json_array ());
> > + ptrdiff_t count = SPECPDL_INDEX ();
> > + record_unwind_protect_ptr (json_release_object, json);
> > + for (ptrdiff_t i = 0; i < size; ++i)
> > + {
> > + int status
> > + = json_array_append_new (*json, lisp_to_json (AREF (lisp,
> i)));
> > + if (status == -1)
> > + json_out_of_memory ();
> > + eassert (status == 0);
> > + }
> > + eassert (json_array_size (*json) == size);
> > + clear_unwind_protect (count);
> > + return unbind_to (count, Qnil);
>
> This, too, sounds more complex than it should: you record
> unwind-protect just so lisp_to_json's subroutines could signal an
> error due to insufficient memory, right? Why can't we have the
> out-of-memory check only inside this loop, which you already do, and
> avoid the checks on lower levels (which undoubtedly cost us extra
> cycles)? What do those extra checks in json_check buy us? the errors
> they signal are no more informative than the one in the loop, AFAICT.
>
I don't understand what you mean. We need to check the return values of all
functions if we want to to use them later.
>
> > +static Lisp_Object
> > +json_insert (void *data)
> > +{
> > + const struct json_buffer_and_size *buffer_and_size = data;
> > + if (buffer_and_size->size > PTRDIFF_MAX)
> > + xsignal1 (Qoverflow_error, build_string ("buffer too large"));
> > + insert (buffer_and_size->buffer, buffer_and_size->size);
>
> I don't think we need this test here, as 'insert' already has the
> equivalent test in one of its subroutines.
>
It can't, because it takes the byte length as ptrdiff_t. We need to check
before whether the size is actually in the valid range of ptrdiff_t.
>
> > + case JSON_INTEGER:
> > + {
> > + json_int_t value = json_integer_value (json);
> > + if (FIXNUM_OVERFLOW_P (value))
> > + xsignal1 (Qoverflow_error,
> > + build_string ("JSON integer is too large"));
> > + return make_number (value);
>
> This overflow test is also redundant, as make_number already does it.
>
It can't, because json_int_t can be larger than EMACS_INT. Also,
make_number doesn't contain any checks.
>
> > + case JSON_STRING:
> > + {
> > + size_t size = json_string_length (json);
> > + if (FIXNUM_OVERFLOW_P (size))
> > + xsignal1 (Qoverflow_error, build_string ("JSON string is too
> long"));
> > + return json_make_string (json_string_value (json), size);
>
> Once again, the overflow test is redundant, as make_specified_string
> (called by json_make_string) already includes an equivalent test.
>
And once again, we need to check at least whether the size fits into
ptrdiff_t.
>
> > + case JSON_ARRAY:
> > + {
> > + if (++lisp_eval_depth > max_lisp_eval_depth)
> > + xsignal0 (Qjson_object_too_deep);
> > + size_t size = json_array_size (json);
> > + if (FIXNUM_OVERFLOW_P (size))
> > + xsignal1 (Qoverflow_error, build_string ("JSON array is too
> long"));
> > + Lisp_Object result = Fmake_vector (make_natnum (size),
> Qunbound);
>
> Likewise here: Fmake_vector makes sure the size is not larger than
> allowed.
>
Same as above: It can't.
>
> > + case JSON_OBJECT:
> > + {
> > + if (++lisp_eval_depth > max_lisp_eval_depth)
> > + xsignal0 (Qjson_object_too_deep);
> > + size_t size = json_object_size (json);
> > + if (FIXNUM_OVERFLOW_P (size))
> > + xsignal1 (Qoverflow_error,
> > + build_string ("JSON object has too many elements"));
> > + Lisp_Object result = CALLN (Fmake_hash_table, QCtest, Qequal,
> > + QCsize, make_natnum (size));
>
> Likewise here: make_natnum does the equivalent test.
>
It doesn't and can't.
>
> > + /* Adjust point by how much we just read. Do this here because
> > + tokener->char_offset becomes incorrect below. */
> > + bool overflow = INT_ADD_WRAPV (point, error.position, &point);
> > + eassert (!overflow);
> > + eassert (point <= ZV_BYTE);
> > + SET_PT_BOTH (BYTE_TO_CHAR (point), point);
>
> It's better to use SET_PT here, I think.
>
That's not possible because we don't have the character offset. (And I
think using SET_PT (BYTE_TO_CHAR (point)) would just require needlessly
recalculating point.)
>
> > + define_error (Qjson_out_of_memory, "no free memory for creating JSON
> object",
>
> I'd prefer "not enough memory for creating JSON object".
>
>
Done.
[-- Attachment #2: Type: text/html, Size: 10111 bytes --]
next prev parent reply other threads:[~2017-10-03 12:26 UTC|newest]
Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-16 15:54 JSON/YAML/TOML/etc. parsing performance Ted Zlatanov
2017-09-16 16:02 ` Mark Oteiza
2017-09-17 0:02 ` Richard Stallman
2017-09-17 3:13 ` Mark Oteiza
2017-09-18 0:00 ` Richard Stallman
2017-09-17 0:02 ` Richard Stallman
2017-09-18 13:46 ` Ted Zlatanov
2017-09-17 18:46 ` Philipp Stephani
2017-09-17 19:05 ` Eli Zaretskii
2017-09-17 20:27 ` Philipp Stephani
2017-09-17 22:41 ` Mark Oteiza
2017-09-18 13:53 ` Ted Zlatanov
2017-09-17 21:17 ` Speed of Elisp (was: JSON/YAML/TOML/etc. parsing performance) Stefan Monnier
2017-09-18 13:26 ` JSON/YAML/TOML/etc. parsing performance Philipp Stephani
2017-09-18 13:58 ` Mark Oteiza
2017-09-18 14:14 ` Philipp Stephani
2017-09-18 14:28 ` Mark Oteiza
2017-09-18 14:36 ` Philipp Stephani
2017-09-18 15:02 ` Eli Zaretskii
2017-09-18 16:14 ` Philipp Stephani
2017-09-18 17:33 ` Eli Zaretskii
2017-09-18 19:57 ` Thien-Thi Nguyen
2017-09-18 14:57 ` Eli Zaretskii
2017-09-18 15:07 ` Mark Oteiza
2017-09-18 15:51 ` Eli Zaretskii
2017-09-18 16:22 ` Philipp Stephani
2017-09-18 18:08 ` Eli Zaretskii
2017-09-19 19:32 ` Richard Stallman
2017-09-18 17:26 ` Glenn Morris
2017-09-18 18:16 ` Eli Zaretskii
2017-09-18 16:08 ` Philipp Stephani
2017-09-19 8:18 ` Philipp Stephani
2017-09-19 19:09 ` Eli Zaretskii
2017-09-28 21:19 ` Philipp Stephani
2017-09-28 21:27 ` Stefan Monnier
2017-09-29 19:55 ` Eli Zaretskii
2017-09-30 22:02 ` Philipp Stephani
2017-10-01 18:06 ` Eli Zaretskii
2017-10-03 12:26 ` Philipp Stephani [this message]
2017-10-03 15:31 ` Eli Zaretskii
2017-10-03 15:52 ` Philipp Stephani
2017-10-03 16:26 ` Eli Zaretskii
2017-10-03 17:10 ` Eli Zaretskii
2017-10-03 18:37 ` Philipp Stephani
2017-10-03 20:52 ` Paul Eggert
2017-10-04 5:33 ` Eli Zaretskii
2017-10-04 6:41 ` Paul Eggert
2017-10-04 8:03 ` Eli Zaretskii
2017-10-04 17:51 ` Paul Eggert
2017-10-04 19:38 ` Eli Zaretskii
2017-10-04 21:24 ` Paul Eggert
2017-10-05 1:48 ` Paul Eggert
2017-10-05 7:14 ` Eli Zaretskii
2017-10-08 22:52 ` Philipp Stephani
2017-10-09 5:54 ` Paul Eggert
2017-10-29 20:48 ` Philipp Stephani
2017-10-09 6:38 ` Eli Zaretskii
2017-10-05 7:12 ` Eli Zaretskii
2017-10-06 1:58 ` Paul Eggert
2017-10-06 7:40 ` Eli Zaretskii
2017-10-06 19:36 ` Paul Eggert
2017-10-06 21:03 ` Eli Zaretskii
2017-10-08 23:09 ` Philipp Stephani
2017-10-09 6:19 ` Paul Eggert
2017-10-29 20:48 ` Philipp Stephani
2017-10-29 22:49 ` Philipp Stephani
2017-12-09 23:05 ` Philipp Stephani
2017-12-10 7:08 ` Eli Zaretskii
2017-12-10 13:26 ` Philipp Stephani
2017-12-10 13:32 ` Ted Zlatanov
2017-10-08 23:04 ` Philipp Stephani
2017-10-09 6:47 ` Eli Zaretskii
2017-10-08 17:58 ` Philipp Stephani
2017-10-08 18:42 ` Eli Zaretskii
2017-10-08 23:14 ` Philipp Stephani
2017-10-09 6:53 ` Eli Zaretskii
2017-10-29 20:41 ` Philipp Stephani
2017-10-09 6:22 ` Paul Eggert
2017-10-01 18:38 ` Eli Zaretskii
2017-10-03 12:12 ` Philipp Stephani
2017-10-03 14:54 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAArVCkSR5w5B4_E0UU8uJtZJtr1oP51md4-i-J+NSRCGpGP_9A@mail.gmail.com \
--to=p.stephani2@gmail.com \
--cc=eliz@gnu.org \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).