From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: JSON/YAML/TOML/etc. parsing performance Date: Tue, 19 Sep 2017 22:09:06 +0300 Message-ID: <8360ceh5f1.fsf@gnu.org> References: <87poaqhc63.fsf@lifelogs.com> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1505848268 28770 195.159.176.226 (19 Sep 2017 19:11:08 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 19 Sep 2017 19:11:08 +0000 (UTC) Cc: emacs-devel@gnu.org To: Philipp Stephani Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Sep 19 21:11:02 2017 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1duNvA-0007CO-Uf for ged-emacs-devel@m.gmane.org; Tue, 19 Sep 2017 21:11:01 +0200 Original-Received: from localhost ([::1]:44938 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1duNvI-0000Cz-5r for ged-emacs-devel@m.gmane.org; Tue, 19 Sep 2017 15:11:08 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:51226) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1duNuD-0000Aj-48 for emacs-devel@gnu.org; Tue, 19 Sep 2017 15:10:02 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1duNu9-0003ag-T9 for emacs-devel@gnu.org; Tue, 19 Sep 2017 15:10:01 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:49949) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1duNu9-0003ac-P0; Tue, 19 Sep 2017 15:09:57 -0400 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2598 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1duNu4-0005CJ-Qb; Tue, 19 Sep 2017 15:09:57 -0400 In-reply-to: (message from Philipp Stephani on Tue, 19 Sep 2017 08:18:14 +0000) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:218532 Archived-At: > From: Philipp Stephani > Date: Tue, 19 Sep 2017 08:18:14 +0000 > > Here's a newer version of the patch. The only significant difference is that now the Lisp values for JSON null > and false are :null and :false, respectively. Using a dedicated symbol for :null reduces the mental overhead of > the triple meaning of nil (null, false, empty list), and is more future-proof, should we ever want to support lists. Thanks, a few comments below. > +static _Noreturn void > +json_parse_error (const json_error_t *error) > +{ > + xsignal (Qjson_parse_error, > + list5 (build_string (error->text), build_string (error->source), > + make_natnum (error->line), make_natnum (error->column), > + make_natnum (error->position))); > +} I think error->source could include non-ASCII characters, in which case you need to use make_specified_string with its last argument non-zero, not build_string, which has its own ideas about when to produce a multibyte string. > +static _GL_ARG_NONNULL ((2)) Lisp_Object > +lisp_to_json_1 (Lisp_Object lisp, json_t **json) > +{ > + if (VECTORP (lisp)) > + { > + ptrdiff_t size = ASIZE (lisp); > + eassert (size >= 0); > + if (size > SIZE_MAX) > + xsignal1 (Qoverflow_error, build_pure_c_string ("vector is too long")); I don't think you can allocate pure storage at run time, only at dump time. (There are more of this elsewhere in the patch.) > + /* LISP now must be a vector or hashtable. */ > + if (++lisp_eval_depth > max_lisp_eval_depth) > + xsignal0 (Qjson_object_too_deep); This error could mislead: the problem could be in the nesting of surrounding Lisp being too deep, and the JSON part could be just fine. > + Lisp_Object string > + = make_string (buffer_and_size->buffer, buffer_and_size->size); This is arbitrary text, so I'm not sure make_string is appropriate. Could the text be a byte stream, i.e. not human-readable text? If so, do we want to create a unibyte string or a multibyte string here? > + insert_from_string (string, 0, 0, SCHARS (string), SBYTES (string), false); Hmmm... if you want to insert the text into the buffer, you need to make sure it has the right representation. What kind of text is this? It probably should be decoded. In any case, going through a string sounds gross. You should insert the text directly into the gap, like we do in a couple of places already. See insert_from_gap and its users, and maybe also decode_coding_gap. > +DEFUN ("json-parse-string", Fjson_parse_string, Sjson_parse_string, 1, 1, NULL, > + doc: /* Parse the JSON STRING into a Lisp object. > +This is essentially the reverse operation of `json-serialize', which > +see. The returned object will be a vector or hashtable. Its elements > +will be `:null', `:false', t, numbers, strings, or further vectors and > +hashtables. If there are duplicate keys in an object, all but the > +last one are ignored. If STRING doesn't contain a valid JSON object, > +an error of type `json-parse-error' is signaled. */) > + (Lisp_Object string) > +{ > + ptrdiff_t count = SPECPDL_INDEX (); > + check_string_without_embedded_nulls (string); > + > + json_error_t error; > + json_t *object = json_loads (SSDATA (string), 0, &error); Doesn't json_loads require the string to be encoded in some particular encoding? If so, passing it our internal representation might not be TRT. > + /* First, parse from point to the gap or the end of the accessible > + portion, whatever is closer. */ > + ptrdiff_t point = d->point; > + ptrdiff_t end; > + { > + bool overflow = INT_ADD_WRAPV (BUFFER_CEILING_OF (point), 1, &end); > + eassert (!overflow); > + } > + size_t count; > + { > + bool overflow = INT_SUBTRACT_WRAPV (end, point, &count); > + eassert (!overflow); > + } Why did you need these blocks in braces? > +(provide 'json-tests) > +;;; json-tests.el ends here IMO, it would be good to test also non-ASCII text in JSON objects. Finally, this needs documentation: NEWS and the ELisp manual. Thanks again for working on this.