all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Philipp Stephani <p.stephani2@gmail.com>
Cc: emacs-devel@gnu.org
Subject: Re: JSON/YAML/TOML/etc. parsing performance
Date: Tue, 19 Sep 2017 22:09:06 +0300	[thread overview]
Message-ID: <8360ceh5f1.fsf@gnu.org> (raw)
In-Reply-To: <CAArVCkS52m8SGeOQt19k+XsfZnxy+bh9LJMyX=h+e67_adP6Mg@mail.gmail.com> (message from Philipp Stephani on Tue, 19 Sep 2017 08:18:14 +0000)

> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Tue, 19 Sep 2017 08:18:14 +0000
> 
> Here's a newer version of the patch. The only significant difference is that now the Lisp values for JSON null
> and false are :null and :false, respectively. Using a dedicated symbol for :null reduces the mental overhead of
> the triple meaning of nil (null, false, empty list), and is more future-proof, should we ever want to support lists. 

Thanks, a few comments below.

> +static _Noreturn void
> +json_parse_error (const json_error_t *error)
> +{
> +  xsignal (Qjson_parse_error,
> +           list5 (build_string (error->text), build_string (error->source),
> +                  make_natnum (error->line), make_natnum (error->column),
> +                  make_natnum (error->position)));
> +}

I think error->source could include non-ASCII characters, in which
case you need to use make_specified_string with its last argument
non-zero, not build_string, which has its own ideas about when to
produce a multibyte string.

> +static  _GL_ARG_NONNULL ((2)) Lisp_Object
> +lisp_to_json_1 (Lisp_Object lisp, json_t **json)
> +{
> +  if (VECTORP (lisp))
> +    {
> +      ptrdiff_t size = ASIZE (lisp);
> +      eassert (size >= 0);
> +      if (size > SIZE_MAX)
> +        xsignal1 (Qoverflow_error, build_pure_c_string ("vector is too long"));

I don't think you can allocate pure storage at run time, only at dump
time.  (There are more of this elsewhere in the patch.)

> +  /* LISP now must be a vector or hashtable.  */
> +  if (++lisp_eval_depth > max_lisp_eval_depth)
> +    xsignal0 (Qjson_object_too_deep);

This error could mislead: the problem could be in the nesting of
surrounding Lisp being too deep, and the JSON part could be just fine.

> +  Lisp_Object string
> +    = make_string (buffer_and_size->buffer, buffer_and_size->size);

This is arbitrary text, so I'm not sure make_string is appropriate.
Could the text be a byte stream, i.e. not human-readable text?  If so,
do we want to create a unibyte string or a multibyte string here?

> +  insert_from_string (string, 0, 0, SCHARS (string), SBYTES (string), false);

Hmmm... if you want to insert the text into the buffer, you need to
make sure it has the right representation.  What kind of text is this?
It probably should be decoded.

In any case, going through a string sounds gross.  You should insert
the text directly into the gap, like we do in a couple of places
already.  See insert_from_gap and its users, and maybe also
decode_coding_gap.

> +DEFUN ("json-parse-string", Fjson_parse_string, Sjson_parse_string, 1, 1, NULL,
> +       doc: /* Parse the JSON STRING into a Lisp object.
> +This is essentially the reverse operation of `json-serialize', which
> +see.  The returned object will be a vector or hashtable.  Its elements
> +will be `:null', `:false', t, numbers, strings, or further vectors and
> +hashtables.  If there are duplicate keys in an object, all but the
> +last one are ignored.  If STRING doesn't contain a valid JSON object,
> +an error of type `json-parse-error' is signaled.  */)
> +  (Lisp_Object string)
> +{
> +  ptrdiff_t count = SPECPDL_INDEX ();
> +  check_string_without_embedded_nulls (string);
> +
> +  json_error_t error;
> +  json_t *object = json_loads (SSDATA (string), 0, &error);

Doesn't json_loads require the string to be encoded in some particular
encoding?  If so, passing it our internal representation might not be
TRT.

> +  /* First, parse from point to the gap or the end of the accessible
> +     portion, whatever is closer.  */
> +  ptrdiff_t point = d->point;
> +  ptrdiff_t end;
> +  {
> +    bool overflow = INT_ADD_WRAPV (BUFFER_CEILING_OF (point), 1, &end);
> +    eassert (!overflow);
> +  }
> +  size_t count;
> +  {
> +    bool overflow = INT_SUBTRACT_WRAPV (end, point, &count);
> +    eassert (!overflow);
> +  }

Why did you need these blocks in braces?

> +(provide 'json-tests)
> +;;; json-tests.el ends here

IMO, it would be good to test also non-ASCII text in JSON objects.

Finally, this needs documentation: NEWS and the ELisp manual.

Thanks again for working on this.



  reply	other threads:[~2017-09-19 19:09 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-16 15:54 JSON/YAML/TOML/etc. parsing performance Ted Zlatanov
2017-09-16 16:02 ` Mark Oteiza
2017-09-17  0:02   ` Richard Stallman
2017-09-17  3:13     ` Mark Oteiza
2017-09-18  0:00       ` Richard Stallman
2017-09-17  0:02 ` Richard Stallman
2017-09-18 13:46   ` Ted Zlatanov
2017-09-17 18:46 ` Philipp Stephani
2017-09-17 19:05   ` Eli Zaretskii
2017-09-17 20:27     ` Philipp Stephani
2017-09-17 22:41       ` Mark Oteiza
2017-09-18 13:53       ` Ted Zlatanov
2017-09-17 21:17   ` Speed of Elisp (was: JSON/YAML/TOML/etc. parsing performance) Stefan Monnier
2017-09-18 13:26   ` JSON/YAML/TOML/etc. parsing performance Philipp Stephani
2017-09-18 13:58     ` Mark Oteiza
2017-09-18 14:14       ` Philipp Stephani
2017-09-18 14:28         ` Mark Oteiza
2017-09-18 14:36           ` Philipp Stephani
2017-09-18 15:02             ` Eli Zaretskii
2017-09-18 16:14               ` Philipp Stephani
2017-09-18 17:33                 ` Eli Zaretskii
2017-09-18 19:57                 ` Thien-Thi Nguyen
2017-09-18 14:57     ` Eli Zaretskii
2017-09-18 15:07       ` Mark Oteiza
2017-09-18 15:51         ` Eli Zaretskii
2017-09-18 16:22           ` Philipp Stephani
2017-09-18 18:08             ` Eli Zaretskii
2017-09-19 19:32               ` Richard Stallman
2017-09-18 17:26           ` Glenn Morris
2017-09-18 18:16             ` Eli Zaretskii
2017-09-18 16:08       ` Philipp Stephani
2017-09-19  8:18     ` Philipp Stephani
2017-09-19 19:09       ` Eli Zaretskii [this message]
2017-09-28 21:19         ` Philipp Stephani
2017-09-28 21:27           ` Stefan Monnier
2017-09-29 19:55           ` Eli Zaretskii
2017-09-30 22:02             ` Philipp Stephani
2017-10-01 18:06               ` Eli Zaretskii
2017-10-03 12:26                 ` Philipp Stephani
2017-10-03 15:31                   ` Eli Zaretskii
2017-10-03 15:52                     ` Philipp Stephani
2017-10-03 16:26                       ` Eli Zaretskii
2017-10-03 17:10                         ` Eli Zaretskii
2017-10-03 18:37                           ` Philipp Stephani
2017-10-03 20:52                   ` Paul Eggert
2017-10-04  5:33                     ` Eli Zaretskii
2017-10-04  6:41                       ` Paul Eggert
2017-10-04  8:03                         ` Eli Zaretskii
2017-10-04 17:51                           ` Paul Eggert
2017-10-04 19:38                             ` Eli Zaretskii
2017-10-04 21:24                               ` Paul Eggert
2017-10-05  1:48                                 ` Paul Eggert
2017-10-05  7:14                                   ` Eli Zaretskii
2017-10-08 22:52                                   ` Philipp Stephani
2017-10-09  5:54                                     ` Paul Eggert
2017-10-29 20:48                                       ` Philipp Stephani
2017-10-09  6:38                                     ` Eli Zaretskii
2017-10-05  7:12                                 ` Eli Zaretskii
2017-10-06  1:58                                   ` Paul Eggert
2017-10-06  7:40                                     ` Eli Zaretskii
2017-10-06 19:36                                       ` Paul Eggert
2017-10-06 21:03                                         ` Eli Zaretskii
2017-10-08 23:09                                     ` Philipp Stephani
2017-10-09  6:19                                       ` Paul Eggert
2017-10-29 20:48                                         ` Philipp Stephani
2017-10-29 22:49                                           ` Philipp Stephani
2017-12-09 23:05                                             ` Philipp Stephani
2017-12-10  7:08                                               ` Eli Zaretskii
2017-12-10 13:26                                                 ` Philipp Stephani
2017-12-10 13:32                                                   ` Ted Zlatanov
2017-10-08 23:04                                   ` Philipp Stephani
2017-10-09  6:47                                     ` Eli Zaretskii
2017-10-08 17:58                     ` Philipp Stephani
2017-10-08 18:42                       ` Eli Zaretskii
2017-10-08 23:14                         ` Philipp Stephani
2017-10-09  6:53                           ` Eli Zaretskii
2017-10-29 20:41                             ` Philipp Stephani
2017-10-09  6:22                       ` Paul Eggert
2017-10-01 18:38               ` Eli Zaretskii
2017-10-03 12:12                 ` Philipp Stephani
2017-10-03 14:54                   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8360ceh5f1.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=p.stephani2@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.