unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Philipp Stephani <p.stephani2@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: emacs-devel@gnu.org
Subject: Re: JSON/YAML/TOML/etc. parsing performance
Date: Thu, 28 Sep 2017 21:19:00 +0000	[thread overview]
Message-ID: <CAArVCkRvSaS-orqHcVPtZ2etUnRiY39okHh+6sYV-mtQQRYs-g@mail.gmail.com> (raw)
In-Reply-To: <8360ceh5f1.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 5350 bytes --]

Eli Zaretskii <eliz@gnu.org> schrieb am Di., 19. Sep. 2017 um 21:10 Uhr:

> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Tue, 19 Sep 2017 08:18:14 +0000
> >
> > Here's a newer version of the patch. The only significant difference is
> that now the Lisp values for JSON null
> > and false are :null and :false, respectively. Using a dedicated symbol
> for :null reduces the mental overhead of
> > the triple meaning of nil (null, false, empty list), and is more
> future-proof, should we ever want to support lists.
>
> Thanks, a few comments below.
>

Thanks for the review. Most of the comments are about converting between C
and Lisp strings, so let me summarize my questions here.
IIUC Jansson only accepts UTF-8 strings (i.e. it will generate an error
some input is not an UTF-8 string), and will only return UTF-8 strings as
well. Therefore I think that direct conversion between Lisp strings and C
strings (using SDATA etc.) is always correct because the internal Emacs
encoding is a superset of UTF-8. Also build_string should always be correct
because it will generate a correct multibyte string for an UTF-8 string
with non-ASCII characters, and a correct unibyte string for an ASCII
string, right?


>
> > +static _Noreturn void
> > +json_parse_error (const json_error_t *error)
> > +{
> > +  xsignal (Qjson_parse_error,
> > +           list5 (build_string (error->text), build_string
> (error->source),
> > +                  make_natnum (error->line), make_natnum
> (error->column),
> > +                  make_natnum (error->position)));
> > +}
>
> I think error->source could include non-ASCII characters, in which
> case you need to use make_specified_string with its last argument
> non-zero, not build_string, which has its own ideas about when to
> produce a multibyte string.
>
> > +static  _GL_ARG_NONNULL ((2)) Lisp_Object
> > +lisp_to_json_1 (Lisp_Object lisp, json_t **json)
> > +{
> > +  if (VECTORP (lisp))
> > +    {
> > +      ptrdiff_t size = ASIZE (lisp);
> > +      eassert (size >= 0);
> > +      if (size > SIZE_MAX)
> > +        xsignal1 (Qoverflow_error, build_pure_c_string ("vector is too
> long"));
>
> I don't think you can allocate pure storage at run time, only at dump
> time.  (There are more of this elsewhere in the patch.)
>

OK, will be fixed in the next version.


>
> > +  /* LISP now must be a vector or hashtable.  */
> > +  if (++lisp_eval_depth > max_lisp_eval_depth)
> > +    xsignal0 (Qjson_object_too_deep);
>
> This error could mislead: the problem could be in the nesting of
> surrounding Lisp being too deep, and the JSON part could be just fine.
>

Agreed, but I think it's better to use lisp_eval_depth here because it's
the total nesting depth that could cause stack overflows.


>
> > +  Lisp_Object string
> > +    = make_string (buffer_and_size->buffer, buffer_and_size->size);
>
> This is arbitrary text, so I'm not sure make_string is appropriate.
> Could the text be a byte stream, i.e. not human-readable text?  If so,
> do we want to create a unibyte string or a multibyte string here?
>

It should always be UTF-8.


>
> > +  insert_from_string (string, 0, 0, SCHARS (string), SBYTES (string),
> false);
>
> Hmmm... if you want to insert the text into the buffer, you need to
> make sure it has the right representation.  What kind of text is this?
> It probably should be decoded.
>
> In any case, going through a string sounds gross.  You should insert
> the text directly into the gap, like we do in a couple of places
> already.  See insert_from_gap and its users, and maybe also
> decode_coding_gap.
>

OK, I'll have to check that, but it sounds doable.


>
> > +DEFUN ("json-parse-string", Fjson_parse_string, Sjson_parse_string, 1,
> 1, NULL,
> > +       doc: /* Parse the JSON STRING into a Lisp object.
> > +This is essentially the reverse operation of `json-serialize', which
> > +see.  The returned object will be a vector or hashtable.  Its elements
> > +will be `:null', `:false', t, numbers, strings, or further vectors and
> > +hashtables.  If there are duplicate keys in an object, all but the
> > +last one are ignored.  If STRING doesn't contain a valid JSON object,
> > +an error of type `json-parse-error' is signaled.  */)
> > +  (Lisp_Object string)
> > +{
> > +  ptrdiff_t count = SPECPDL_INDEX ();
> > +  check_string_without_embedded_nulls (string);
> > +
> > +  json_error_t error;
> > +  json_t *object = json_loads (SSDATA (string), 0, &error);
>
> Doesn't json_loads require the string to be encoded in some particular
> encoding?  If so, passing it our internal representation might not be
> TRT.
>
> > +  /* First, parse from point to the gap or the end of the accessible
> > +     portion, whatever is closer.  */
> > +  ptrdiff_t point = d->point;
> > +  ptrdiff_t end;
> > +  {
> > +    bool overflow = INT_ADD_WRAPV (BUFFER_CEILING_OF (point), 1, &end);
> > +    eassert (!overflow);
> > +  }
> > +  size_t count;
> > +  {
> > +    bool overflow = INT_SUBTRACT_WRAPV (end, point, &count);
> > +    eassert (!overflow);
> > +  }
>
> Why did you need these blocks in braces?
>

To be able to reuse the "overflow" name/


>
> > +(provide 'json-tests)
> > +;;; json-tests.el ends here
>
> IMO, it would be good to test also non-ASCII text in JSON objects.
>
>
Yes, once the patch is in acceptable shape, I plan to add many more tests.

[-- Attachment #2: Type: text/html, Size: 7284 bytes --]

  reply	other threads:[~2017-09-28 21:19 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-16 15:54 JSON/YAML/TOML/etc. parsing performance Ted Zlatanov
2017-09-16 16:02 ` Mark Oteiza
2017-09-17  0:02   ` Richard Stallman
2017-09-17  3:13     ` Mark Oteiza
2017-09-18  0:00       ` Richard Stallman
2017-09-17  0:02 ` Richard Stallman
2017-09-18 13:46   ` Ted Zlatanov
2017-09-17 18:46 ` Philipp Stephani
2017-09-17 19:05   ` Eli Zaretskii
2017-09-17 20:27     ` Philipp Stephani
2017-09-17 22:41       ` Mark Oteiza
2017-09-18 13:53       ` Ted Zlatanov
2017-09-17 21:17   ` Speed of Elisp (was: JSON/YAML/TOML/etc. parsing performance) Stefan Monnier
2017-09-18 13:26   ` JSON/YAML/TOML/etc. parsing performance Philipp Stephani
2017-09-18 13:58     ` Mark Oteiza
2017-09-18 14:14       ` Philipp Stephani
2017-09-18 14:28         ` Mark Oteiza
2017-09-18 14:36           ` Philipp Stephani
2017-09-18 15:02             ` Eli Zaretskii
2017-09-18 16:14               ` Philipp Stephani
2017-09-18 17:33                 ` Eli Zaretskii
2017-09-18 19:57                 ` Thien-Thi Nguyen
2017-09-18 14:57     ` Eli Zaretskii
2017-09-18 15:07       ` Mark Oteiza
2017-09-18 15:51         ` Eli Zaretskii
2017-09-18 16:22           ` Philipp Stephani
2017-09-18 18:08             ` Eli Zaretskii
2017-09-19 19:32               ` Richard Stallman
2017-09-18 17:26           ` Glenn Morris
2017-09-18 18:16             ` Eli Zaretskii
2017-09-18 16:08       ` Philipp Stephani
2017-09-19  8:18     ` Philipp Stephani
2017-09-19 19:09       ` Eli Zaretskii
2017-09-28 21:19         ` Philipp Stephani [this message]
2017-09-28 21:27           ` Stefan Monnier
2017-09-29 19:55           ` Eli Zaretskii
2017-09-30 22:02             ` Philipp Stephani
2017-10-01 18:06               ` Eli Zaretskii
2017-10-03 12:26                 ` Philipp Stephani
2017-10-03 15:31                   ` Eli Zaretskii
2017-10-03 15:52                     ` Philipp Stephani
2017-10-03 16:26                       ` Eli Zaretskii
2017-10-03 17:10                         ` Eli Zaretskii
2017-10-03 18:37                           ` Philipp Stephani
2017-10-03 20:52                   ` Paul Eggert
2017-10-04  5:33                     ` Eli Zaretskii
2017-10-04  6:41                       ` Paul Eggert
2017-10-04  8:03                         ` Eli Zaretskii
2017-10-04 17:51                           ` Paul Eggert
2017-10-04 19:38                             ` Eli Zaretskii
2017-10-04 21:24                               ` Paul Eggert
2017-10-05  1:48                                 ` Paul Eggert
2017-10-05  7:14                                   ` Eli Zaretskii
2017-10-08 22:52                                   ` Philipp Stephani
2017-10-09  5:54                                     ` Paul Eggert
2017-10-29 20:48                                       ` Philipp Stephani
2017-10-09  6:38                                     ` Eli Zaretskii
2017-10-05  7:12                                 ` Eli Zaretskii
2017-10-06  1:58                                   ` Paul Eggert
2017-10-06  7:40                                     ` Eli Zaretskii
2017-10-06 19:36                                       ` Paul Eggert
2017-10-06 21:03                                         ` Eli Zaretskii
2017-10-08 23:09                                     ` Philipp Stephani
2017-10-09  6:19                                       ` Paul Eggert
2017-10-29 20:48                                         ` Philipp Stephani
2017-10-29 22:49                                           ` Philipp Stephani
2017-12-09 23:05                                             ` Philipp Stephani
2017-12-10  7:08                                               ` Eli Zaretskii
2017-12-10 13:26                                                 ` Philipp Stephani
2017-12-10 13:32                                                   ` Ted Zlatanov
2017-10-08 23:04                                   ` Philipp Stephani
2017-10-09  6:47                                     ` Eli Zaretskii
2017-10-08 17:58                     ` Philipp Stephani
2017-10-08 18:42                       ` Eli Zaretskii
2017-10-08 23:14                         ` Philipp Stephani
2017-10-09  6:53                           ` Eli Zaretskii
2017-10-29 20:41                             ` Philipp Stephani
2017-10-09  6:22                       ` Paul Eggert
2017-10-01 18:38               ` Eli Zaretskii
2017-10-03 12:12                 ` Philipp Stephani
2017-10-03 14:54                   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAArVCkRvSaS-orqHcVPtZ2etUnRiY39okHh+6sYV-mtQQRYs-g@mail.gmail.com \
    --to=p.stephani2@gmail.com \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).