all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Philipp Stephani <p.stephani2@gmail.com>
Cc: emacs-devel@gnu.org
Subject: Re: JSON/YAML/TOML/etc. parsing performance
Date: Fri, 29 Sep 2017 22:55:54 +0300	[thread overview]
Message-ID: <83h8vl5lf9.fsf@gnu.org> (raw)
In-Reply-To: <CAArVCkRvSaS-orqHcVPtZ2etUnRiY39okHh+6sYV-mtQQRYs-g@mail.gmail.com> (message from Philipp Stephani on Thu, 28 Sep 2017 21:19:00 +0000)

> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Thu, 28 Sep 2017 21:19:00 +0000
> Cc: emacs-devel@gnu.org
> 
> IIUC Jansson only accepts UTF-8 strings (i.e. it will generate an error some input is not an UTF-8 string), and
> will only return UTF-8 strings as well. Therefore I think that direct conversion between Lisp strings and C
> strings (using SDATA etc.) is always correct because the internal Emacs encoding is a superset of UTF-8.
> Also build_string should always be correct because it will generate a correct multibyte string for an UTF-8
> string with non-ASCII characters, and a correct unibyte string for an ASCII string, right?

I don't think it's a good idea to write code which has such
assumptions embedded in it.  We don't do that in other cases, although
UTF-8 based systems are widespread nowadays.  Instead, we make sure
that encoding and decoding UTF-8 byte stream is implemented
efficiently, and when possible simply reuses the same string data.

Besides, these assumptions are not always true, for example:

  . The Emacs internal representation could include raw bytes, whose
    representations (both of them) is not valid UTF-8;
  . Strings we receive from the library could be invalid UTF-8, in
    which case putting them into a buffer or string without decoding
    will mean trouble for programs that will try to process them;

So I think decoding and encoding any string passed to/from Jansson is
better for stability and future maintenance.  If you worry about
performance, you shouldn't: we convert UTF-8 into our internal
representation as efficiently as possible.

>  > + /* LISP now must be a vector or hashtable. */
>  > + if (++lisp_eval_depth > max_lisp_eval_depth)
>  > + xsignal0 (Qjson_object_too_deep);
> 
>  This error could mislead: the problem could be in the nesting of
>  surrounding Lisp being too deep, and the JSON part could be just fine.
> 
> Agreed, but I think it's better to use lisp_eval_depth here because it's the total nesting depth that could cause
> stack overflows.

Well, at least the error message should not point exclusively to a
JSON problem, it should mention the possibility of a Lisp eval depth
overflow as well.

>  > + Lisp_Object string
>  > + = make_string (buffer_and_size->buffer, buffer_and_size->size);
> 
>  This is arbitrary text, so I'm not sure make_string is appropriate.
>  Could the text be a byte stream, i.e. not human-readable text? If so,
>  do we want to create a unibyte string or a multibyte string here?
> 
> It should always be UTF-8.

How does JSON express byte streams, then?  Doesn't it support data (as
opposed to text)?

>  > + {
>  > + bool overflow = INT_ADD_WRAPV (BUFFER_CEILING_OF (point), 1, &end);
>  > + eassert (!overflow);
>  > + }
>  > + size_t count;
>  > + {
>  > + bool overflow = INT_SUBTRACT_WRAPV (end, point, &count);
>  > + eassert (!overflow);
>  > + }
> 
>  Why did you need these blocks in braces?
> 
> To be able to reuse the "overflow" name/

Why can't you reuse it without the braces?

Thanks.



  parent reply	other threads:[~2017-09-29 19:55 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-16 15:54 JSON/YAML/TOML/etc. parsing performance Ted Zlatanov
2017-09-16 16:02 ` Mark Oteiza
2017-09-17  0:02   ` Richard Stallman
2017-09-17  3:13     ` Mark Oteiza
2017-09-18  0:00       ` Richard Stallman
2017-09-17  0:02 ` Richard Stallman
2017-09-18 13:46   ` Ted Zlatanov
2017-09-17 18:46 ` Philipp Stephani
2017-09-17 19:05   ` Eli Zaretskii
2017-09-17 20:27     ` Philipp Stephani
2017-09-17 22:41       ` Mark Oteiza
2017-09-18 13:53       ` Ted Zlatanov
2017-09-17 21:17   ` Speed of Elisp (was: JSON/YAML/TOML/etc. parsing performance) Stefan Monnier
2017-09-18 13:26   ` JSON/YAML/TOML/etc. parsing performance Philipp Stephani
2017-09-18 13:58     ` Mark Oteiza
2017-09-18 14:14       ` Philipp Stephani
2017-09-18 14:28         ` Mark Oteiza
2017-09-18 14:36           ` Philipp Stephani
2017-09-18 15:02             ` Eli Zaretskii
2017-09-18 16:14               ` Philipp Stephani
2017-09-18 17:33                 ` Eli Zaretskii
2017-09-18 19:57                 ` Thien-Thi Nguyen
2017-09-18 14:57     ` Eli Zaretskii
2017-09-18 15:07       ` Mark Oteiza
2017-09-18 15:51         ` Eli Zaretskii
2017-09-18 16:22           ` Philipp Stephani
2017-09-18 18:08             ` Eli Zaretskii
2017-09-19 19:32               ` Richard Stallman
2017-09-18 17:26           ` Glenn Morris
2017-09-18 18:16             ` Eli Zaretskii
2017-09-18 16:08       ` Philipp Stephani
2017-09-19  8:18     ` Philipp Stephani
2017-09-19 19:09       ` Eli Zaretskii
2017-09-28 21:19         ` Philipp Stephani
2017-09-28 21:27           ` Stefan Monnier
2017-09-29 19:55           ` Eli Zaretskii [this message]
2017-09-30 22:02             ` Philipp Stephani
2017-10-01 18:06               ` Eli Zaretskii
2017-10-03 12:26                 ` Philipp Stephani
2017-10-03 15:31                   ` Eli Zaretskii
2017-10-03 15:52                     ` Philipp Stephani
2017-10-03 16:26                       ` Eli Zaretskii
2017-10-03 17:10                         ` Eli Zaretskii
2017-10-03 18:37                           ` Philipp Stephani
2017-10-03 20:52                   ` Paul Eggert
2017-10-04  5:33                     ` Eli Zaretskii
2017-10-04  6:41                       ` Paul Eggert
2017-10-04  8:03                         ` Eli Zaretskii
2017-10-04 17:51                           ` Paul Eggert
2017-10-04 19:38                             ` Eli Zaretskii
2017-10-04 21:24                               ` Paul Eggert
2017-10-05  1:48                                 ` Paul Eggert
2017-10-05  7:14                                   ` Eli Zaretskii
2017-10-08 22:52                                   ` Philipp Stephani
2017-10-09  5:54                                     ` Paul Eggert
2017-10-29 20:48                                       ` Philipp Stephani
2017-10-09  6:38                                     ` Eli Zaretskii
2017-10-05  7:12                                 ` Eli Zaretskii
2017-10-06  1:58                                   ` Paul Eggert
2017-10-06  7:40                                     ` Eli Zaretskii
2017-10-06 19:36                                       ` Paul Eggert
2017-10-06 21:03                                         ` Eli Zaretskii
2017-10-08 23:09                                     ` Philipp Stephani
2017-10-09  6:19                                       ` Paul Eggert
2017-10-29 20:48                                         ` Philipp Stephani
2017-10-29 22:49                                           ` Philipp Stephani
2017-12-09 23:05                                             ` Philipp Stephani
2017-12-10  7:08                                               ` Eli Zaretskii
2017-12-10 13:26                                                 ` Philipp Stephani
2017-12-10 13:32                                                   ` Ted Zlatanov
2017-10-08 23:04                                   ` Philipp Stephani
2017-10-09  6:47                                     ` Eli Zaretskii
2017-10-08 17:58                     ` Philipp Stephani
2017-10-08 18:42                       ` Eli Zaretskii
2017-10-08 23:14                         ` Philipp Stephani
2017-10-09  6:53                           ` Eli Zaretskii
2017-10-29 20:41                             ` Philipp Stephani
2017-10-09  6:22                       ` Paul Eggert
2017-10-01 18:38               ` Eli Zaretskii
2017-10-03 12:12                 ` Philipp Stephani
2017-10-03 14:54                   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83h8vl5lf9.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=p.stephani2@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.