* JSON/YAML/TOML/etc. parsing performance
@ 2017-09-16 15:54 Ted Zlatanov
2017-09-16 16:02 ` Mark Oteiza
` (2 more replies)
0 siblings, 3 replies; 81+ messages in thread
From: Ted Zlatanov @ 2017-09-16 15:54 UTC (permalink / raw)
To: emacs-devel
I wanted to ask if there's any chance of improving the parsing
performance of JSON, YAML, TOML, and similar data formats. It's pretty
poor today.
That could be done in the core with C code, improved Lisp code,
integration with an external library, or a mix of those.
Ted
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-16 15:54 JSON/YAML/TOML/etc. parsing performance Ted Zlatanov
@ 2017-09-16 16:02 ` Mark Oteiza
2017-09-17 0:02 ` Richard Stallman
2017-09-17 0:02 ` Richard Stallman
2017-09-17 18:46 ` Philipp Stephani
2 siblings, 1 reply; 81+ messages in thread
From: Mark Oteiza @ 2017-09-16 16:02 UTC (permalink / raw)
To: emacs-devel
Ted Zlatanov <tzz@lifelogs.com> writes:
> I wanted to ask if there's any chance of improving the parsing
> performance of JSON, YAML, TOML, and similar data formats. It's pretty
> poor today.
>
> That could be done in the core with C code, improved Lisp code,
> integration with an external library, or a mix of those.
There are a ton of external libraries for parsing JSON, many of which
have lots of high level functions for dealing with it--at the cost of
being forced to use their object system. I had a go at integrating
jansson but had issues.
I'm fond of this JSON tokenizer, alas AIUI we cannot use it without
copyright assignment:
https://github.com/zserge/jsmn
I'm interested in JSON parsing in core, and to that end I'm learning
Ragel to generate a parser. This could take a while, though :)
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-16 15:54 JSON/YAML/TOML/etc. parsing performance Ted Zlatanov
2017-09-16 16:02 ` Mark Oteiza
@ 2017-09-17 0:02 ` Richard Stallman
2017-09-18 13:46 ` Ted Zlatanov
2017-09-17 18:46 ` Philipp Stephani
2 siblings, 1 reply; 81+ messages in thread
From: Richard Stallman @ 2017-09-17 0:02 UTC (permalink / raw)
To: Ted Zlatanov; +Cc: emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> I wanted to ask if there's any chance of improving the parsing
> performance of JSON, YAML, TOML, and similar data formats. It's pretty
> poor today.
Can you design a primitive general enough to speed up parsing of all those
formats? That would make it feasible to handle them all without
too much work.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-16 16:02 ` Mark Oteiza
@ 2017-09-17 0:02 ` Richard Stallman
2017-09-17 3:13 ` Mark Oteiza
0 siblings, 1 reply; 81+ messages in thread
From: Richard Stallman @ 2017-09-17 0:02 UTC (permalink / raw)
To: Mark Oteiza; +Cc: emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> I'm fond of this JSON tokenizer, alas AIUI we cannot use it without
> copyright assignment:
> https://github.com/zserge/jsmn
If it is an external package, not specifically for Emacs,
we can include it along with Emacs without copyright papers
if its license is compatible with GPLv3+.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-17 0:02 ` Richard Stallman
@ 2017-09-17 3:13 ` Mark Oteiza
2017-09-18 0:00 ` Richard Stallman
0 siblings, 1 reply; 81+ messages in thread
From: Mark Oteiza @ 2017-09-17 3:13 UTC (permalink / raw)
To: Richard Stallman; +Cc: emacs-devel
Richard Stallman <rms@gnu.org> writes:
> > I'm fond of this JSON tokenizer, alas AIUI we cannot use it without
> > copyright assignment:
> > https://github.com/zserge/jsmn
>
> If it is an external package, not specifically for Emacs,
> we can include it along with Emacs without copyright papers
> if its license is compatible with GPLv3+.
I see. It's MIT licensed, so I guess it can be integrated. Thanks for
the correction.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-16 15:54 JSON/YAML/TOML/etc. parsing performance Ted Zlatanov
2017-09-16 16:02 ` Mark Oteiza
2017-09-17 0:02 ` Richard Stallman
@ 2017-09-17 18:46 ` Philipp Stephani
2017-09-17 19:05 ` Eli Zaretskii
` (2 more replies)
2 siblings, 3 replies; 81+ messages in thread
From: Philipp Stephani @ 2017-09-17 18:46 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 842 bytes --]
Ted Zlatanov <tzz@lifelogs.com> schrieb am Sa., 16. Sep. 2017 um 17:55 Uhr:
> I wanted to ask if there's any chance of improving the parsing
> performance of JSON, YAML, TOML, and similar data formats. It's pretty
> poor today.
>
> That could be done in the core with C code, improved Lisp code,
> integration with an external library, or a mix of those.
>
I don't know much about the others, but given the importance of JSON as
data exchange and serialization format, I think it's worthwhile to invest
some time here. I've implemented a wrapper around the json-c library
(license: Expat/X11/MIT), resulting in significant speedups using the test
data from https://github.com/miloyip/nativejson-benchmark: a factor of 3.9
to 6.4 for parsing, and a factor of 27 to 67 for serializing. If people
agree that this is useful I can send a patch.
[-- Attachment #2: Type: text/html, Size: 1210 bytes --]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-17 18:46 ` Philipp Stephani
@ 2017-09-17 19:05 ` Eli Zaretskii
2017-09-17 20:27 ` Philipp Stephani
2017-09-17 21:17 ` Speed of Elisp (was: JSON/YAML/TOML/etc. parsing performance) Stefan Monnier
2017-09-18 13:26 ` JSON/YAML/TOML/etc. parsing performance Philipp Stephani
2 siblings, 1 reply; 81+ messages in thread
From: Eli Zaretskii @ 2017-09-17 19:05 UTC (permalink / raw)
To: Philipp Stephani; +Cc: emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sun, 17 Sep 2017 18:46:45 +0000
>
> I don't know much about the others, but given the importance of JSON as data exchange and serialization
> format, I think it's worthwhile to invest some time here. I've implemented a wrapper around the json-c library
> (license: Expat/X11/MIT), resulting in significant speedups using the test data from
> https://github.com/miloyip/nativejson-benchmark: a factor of 3.9 to 6.4 for parsing, and a factor of 27 to 67 for
> serializing. If people agree that this is useful I can send a patch.
Before we make a decision on which library to use, I'd prefer some
kind of survey of available free software libraries, including their
popularity and development activity. The survey doesn't have to be
exhaustive, but I think we should compare at least a few candidates.
We already have 2, so maybe we should start by comparing them.
Thanks.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-17 19:05 ` Eli Zaretskii
@ 2017-09-17 20:27 ` Philipp Stephani
2017-09-17 22:41 ` Mark Oteiza
2017-09-18 13:53 ` Ted Zlatanov
0 siblings, 2 replies; 81+ messages in thread
From: Philipp Stephani @ 2017-09-17 20:27 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2075 bytes --]
Eli Zaretskii <eliz@gnu.org> schrieb am So., 17. Sep. 2017 um 21:05 Uhr:
> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Sun, 17 Sep 2017 18:46:45 +0000
> >
> > I don't know much about the others, but given the importance of JSON as
> data exchange and serialization
> > format, I think it's worthwhile to invest some time here. I've
> implemented a wrapper around the json-c library
> > (license: Expat/X11/MIT), resulting in significant speedups using the
> test data from
> > https://github.com/miloyip/nativejson-benchmark: a factor of 3.9 to 6.4
> for parsing, and a factor of 27 to 67 for
> > serializing. If people agree that this is useful I can send a patch.
>
> Before we make a decision on which library to use, I'd prefer some
> kind of survey of available free software libraries, including their
> popularity and development activity. The survey doesn't have to be
> exhaustive, but I think we should compare at least a few candidates.
> We already have 2, so maybe we should start by comparing them.
>
>
Sure, I've made a quick overview based on
https://github.com/miloyip/nativejson-benchmark. I've only used the
libraries that are written in C and have been tested in that benchmark;
it's still quite a few. I've checked the conformance and speed metrics from
the benchmark as well as number of GitHub stars (as proxy for popularity)
and number of commit in the last month (as proxy for development activity):
Here are the results:
https://docs.google.com/spreadsheets/d/e/2PACX-1vTqKxqo47s67L3EJ9AWvZclNuT2xbd9rgoRuJ_UYbXgnV171owr8h2mksHjrjNGADDR3DVTWQvUMBpe/pubhtml?gid=0&single=true
Note that some of the libraries (jsmn, ujson4c) don't appear to support
serialization at all; I'd suggest to avoid them, because we'd then need to
wrap another library for serialization. Also, even though JSMN advertises
itself as "world's fastest JSON parser", it's actually the slowest of the
libraries in the survey. json-c appears to be reasonably conformant and
fast for both parsing and serialization, and has by far the largest
development activity.
[-- Attachment #2: Type: text/html, Size: 2876 bytes --]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Speed of Elisp (was: JSON/YAML/TOML/etc. parsing performance)
2017-09-17 18:46 ` Philipp Stephani
2017-09-17 19:05 ` Eli Zaretskii
@ 2017-09-17 21:17 ` Stefan Monnier
2017-09-18 13:26 ` JSON/YAML/TOML/etc. parsing performance Philipp Stephani
2 siblings, 0 replies; 81+ messages in thread
From: Stefan Monnier @ 2017-09-17 21:17 UTC (permalink / raw)
To: emacs-devel
[ To clarify up front: I'm in favor of using those libraries.
The questions below don't mean that I think it's better to speed up
Elisp than to use a C implementation of those json primitives: In any
case, it makes sense to use existing C libraries for that, both for
speed reasons and for maintenance reasons; like we do for XML.
The choice between C and Elisp would only make sense if we had to
write&maintain the C code. ]
> (license: Expat/X11/MIT), resulting in significant speedups using the test
> data from https://github.com/miloyip/nativejson-benchmark: a factor of 3.9
> to 6.4 for parsing,
Very interesting. The way I read it, it means either that Elisp is not
nearly as slow as we tend to assume, or that the overhead introduced
when turning json-c's output into an Elisp-usable form dwarfs the json-c
parsing itself.
> and a factor of 27 to 67 for serializing.
I'm curious why there is such a wide discrepancy between the speedup for
parsing and that for serializing (sounds like a factor 10 difference).
Is it because parsing with json-c is slowed down by the conversion to
(especially allocation of) Elisp data structures, or is it because the
Elisp implementation of json serialization suffers more from poor
performance (in which case, maybe it could point to a performance issue
in Elisp which we could try to tackle)?
Stefan
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-17 20:27 ` Philipp Stephani
@ 2017-09-17 22:41 ` Mark Oteiza
2017-09-18 13:53 ` Ted Zlatanov
1 sibling, 0 replies; 81+ messages in thread
From: Mark Oteiza @ 2017-09-17 22:41 UTC (permalink / raw)
To: Philipp Stephani; +Cc: Eli Zaretskii, emacs-devel
Philipp Stephani <p.stephani2@gmail.com> writes:
> Eli Zaretskii <eliz@gnu.org> schrieb am So., 17. Sep. 2017 um 21:05 Uhr:
>
> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Sun, 17 Sep 2017 18:46:45 +0000
> >
> > I don't know much about the others, but given the importance of JSON as data exchange and serialization
> > format, I think it's worthwhile to invest some time here. I've implemented a wrapper around the json-c library
> > (license: Expat/X11/MIT), resulting in significant speedups using the test data from
> > https://github.com/miloyip/nativejson-benchmark: a factor of 3.9 to 6.4 for parsing, and a factor of 27 to 67 for
> > serializing. If people agree that this is useful I can send a patch.
>
> Before we make a decision on which library to use, I'd prefer some
> kind of survey of available free software libraries, including their
> popularity and development activity. The survey doesn't have to be
> exhaustive, but I think we should compare at least a few candidates.
> We already have 2, so maybe we should start by comparing them.
>
> Sure, I've made a quick overview based on
> https://github.com/miloyip/nativejson-benchmark. I've only used the
> libraries that are written in C and have been tested in that
> benchmark; it's still quite a few. I've checked the conformance and
> speed metrics from the benchmark as well as number of GitHub stars (as
> proxy for popularity) and number of commit in the last month (as proxy
> for development activity): Here are the results:
> https://docs.google.com/spreadsheets/d/e/2PACX-1vTqKxqo47s67L3EJ9AWvZclNuT2xbd9rgoRuJ_UYbXgnV171owr8h2mksHjrjNGADDR3DVTWQvUMBpe/pubhtml?gid=0&single=true
Thanks for coming up with these comparisons.
> Note that some of the libraries (jsmn, ujson4c) don't appear to
> support serialization at all; I'd suggest to avoid them, because we'd
> then need to wrap another library for serialization. Also, even though
> JSMN advertises itself as "world's fastest JSON parser", it's actually
> the slowest of the libraries in the survey. json-c appears to be
> reasonably conformant and fast for both parsing and serialization, and
> has by far the largest development activity.
I was a little confused how they got a "parsing" benchmark out of jsmn,
since all it does is tokenize. No matter, it makes sense pursue
something that does both.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-17 3:13 ` Mark Oteiza
@ 2017-09-18 0:00 ` Richard Stallman
0 siblings, 0 replies; 81+ messages in thread
From: Richard Stallman @ 2017-09-18 0:00 UTC (permalink / raw)
To: Mark Oteiza; +Cc: emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> I see. It's MIT licensed, so I guess it can be integrated.
Yes it can be, but please avoid the term "MIT license". There are two
different licenses people sometimes call by that term, the X11 license
and the Expat license. See https://gnu.org/licenses/license-list.html.
Both of them are weak licenses -- we also call them "pushover"
licenses -- since they permit inclusion in nonfree software.
Please don't associate them with the name of MIT, as the association
tends to promote them.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-17 18:46 ` Philipp Stephani
2017-09-17 19:05 ` Eli Zaretskii
2017-09-17 21:17 ` Speed of Elisp (was: JSON/YAML/TOML/etc. parsing performance) Stefan Monnier
@ 2017-09-18 13:26 ` Philipp Stephani
2017-09-18 13:58 ` Mark Oteiza
` (2 more replies)
2 siblings, 3 replies; 81+ messages in thread
From: Philipp Stephani @ 2017-09-18 13:26 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1.1: Type: text/plain, Size: 1112 bytes --]
Philipp Stephani <p.stephani2@gmail.com> schrieb am So., 17. Sep. 2017 um
20:46 Uhr:
> Ted Zlatanov <tzz@lifelogs.com> schrieb am Sa., 16. Sep. 2017 um
> 17:55 Uhr:
>
>> I wanted to ask if there's any chance of improving the parsing
>> performance of JSON, YAML, TOML, and similar data formats. It's pretty
>> poor today.
>>
>> That could be done in the core with C code, improved Lisp code,
>> integration with an external library, or a mix of those.
>>
>
> I don't know much about the others, but given the importance of JSON as
> data exchange and serialization format, I think it's worthwhile to invest
> some time here. I've implemented a wrapper around the json-c library
> (license: Expat/X11/MIT), resulting in significant speedups using the test
> data from https://github.com/miloyip/nativejson-benchmark: a factor of
> 3.9 to 6.4 for parsing, and a factor of 27 to 67 for serializing. If people
> agree that this is useful I can send a patch.
>
I've discovered that the interface and documentation of Jansson are much
better than the ones of json-c, so I switched to Jansson. I've attached a
patch.
[-- Attachment #1.2: Type: text/html, Size: 1807 bytes --]
[-- Attachment #2: 0001-Implement-native-JSON-support-using-Jansson.txt --]
[-- Type: text/plain, Size: 22836 bytes --]
From 67ad4e22c5a0b5dcc0dea2abdef32ee3c636fade Mon Sep 17 00:00:00 2001
From: Philipp Stephani <phst@google.com>
Date: Mon, 18 Sep 2017 10:51:39 +0200
Subject: [PATCH] Implement native JSON support using Jansson
* configure.ac: New option --with-json.
* src/json.c (Fjson_serialize, Fjson_insert, Fjson_parse_string)
(Fjson_parse_buffer): New defuns.
(json_out_of_memory, json_parse_error, json_release_object)
(check_string_without_embedded_nulls, json_check, lisp_to_json)
(json_insert, json_insert_callback, json_to_lisp)
(json_read_buffer_callback, Fjson_parse_buffer, define_error): New
helper function.
(syms_of_json): New file.
* src/lisp.h: Declaration for syms_of_json.
* src/emacs.c (main): Enable JSON functions.
* src/Makefile.in (JSON_LIBS, JSON_CFLAGS, JSON_OBJ, EMACS_CFLAGS)
(base_obj, LIBES): Compile json.c if --with-json is enabled.
* test/src/json-tests.el (json-serialize/roundtrip)
(json-serialize/object, json-parse-string/object): New unit tests.
---
configure.ac | 20 ++-
src/Makefile.in | 11 +-
src/emacs.c | 4 +
src/json.c | 442 +++++++++++++++++++++++++++++++++++++++++++++++++
src/lisp.h | 5 +
test/src/json-tests.el | 61 +++++++
6 files changed, 539 insertions(+), 4 deletions(-)
create mode 100644 src/json.c
create mode 100644 test/src/json-tests.el
diff --git a/configure.ac b/configure.ac
index 35b7e69daf..c9ce5ee120 100644
--- a/configure.ac
+++ b/configure.ac
@@ -348,6 +348,7 @@ AC_DEFUN
OPTION_DEFAULT_OFF([cairo],[compile with Cairo drawing (experimental)])
OPTION_DEFAULT_ON([xml2],[don't compile with XML parsing support])
OPTION_DEFAULT_ON([imagemagick],[don't compile with ImageMagick image support])
+OPTION_DEFAULT_ON([json], [don't compile with native JSON support])
OPTION_DEFAULT_ON([xft],[don't use XFT for anti aliased fonts])
OPTION_DEFAULT_ON([libotf],[don't use libotf for OpenType font support])
@@ -2856,6 +2857,22 @@ AC_DEFUN
AC_SUBST(LIBSYSTEMD_LIBS)
AC_SUBST(LIBSYSTEMD_CFLAGS)
+HAVE_JSON=no
+JSON_OBJ=
+
+if test "${with_json}" = yes; then
+ EMACS_CHECK_MODULES([JSON], [jansson >= 2.5],
+ [HAVE_JSON=yes], [HAVE_JSON=no])
+ if test "${HAVE_JSON}" = yes; then
+ AC_DEFINE(HAVE_JSON, 1, [Define if using Jansson.])
+ JSON_OBJ=json.o
+ fi
+fi
+
+AC_SUBST(JSON_LIBS)
+AC_SUBST(JSON_CFLAGS)
+AC_SUBST(JSON_OBJ)
+
NOTIFY_OBJ=
NOTIFY_SUMMARY=no
@@ -5368,7 +5385,7 @@ AC_DEFUN
for opt in XAW3D XPM JPEG TIFF GIF PNG RSVG CAIRO IMAGEMAGICK SOUND GPM DBUS \
GCONF GSETTINGS NOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT \
LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS X_TOOLKIT OLDXMENU X11 NS MODULES \
- XWIDGETS LIBSYSTEMD CANNOT_DUMP LCMS2; do
+ XWIDGETS LIBSYSTEMD JSON CANNOT_DUMP LCMS2; do
case $opt in
CANNOT_DUMP) eval val=\${$opt} ;;
@@ -5418,6 +5435,7 @@ AC_DEFUN
Does Emacs use -lotf? ${HAVE_LIBOTF}
Does Emacs use -lxft? ${HAVE_XFT}
Does Emacs use -lsystemd? ${HAVE_LIBSYSTEMD}
+ Does Emacs use -ljanssoon? ${HAVE_JSON}
Does Emacs directly use zlib? ${HAVE_ZLIB}
Does Emacs have dynamic modules support? ${HAVE_MODULES}
Does Emacs use toolkit scroll bars? ${USE_TOOLKIT_SCROLL_BARS}
diff --git a/src/Makefile.in b/src/Makefile.in
index 0e55ad4bb2..4d33682629 100644
--- a/src/Makefile.in
+++ b/src/Makefile.in
@@ -312,6 +312,10 @@ LIBGNUTLS_CFLAGS =
LIBSYSTEMD_LIBS = @LIBSYSTEMD_LIBS@
LIBSYSTEMD_CFLAGS = @LIBSYSTEMD_CFLAGS@
+JSON_LIBS = @JSON_LIBS@
+JSON_CFLAGS = @JSON_CFLAGS@
+JSON_OBJ = @JSON_OBJ@
+
INTERVALS_H = dispextern.h intervals.h composite.h
GETLOADAVG_LIBS = @GETLOADAVG_LIBS@
@@ -363,7 +367,7 @@ EMACS_CFLAGS=
$(WEBKIT_CFLAGS) \
$(SETTINGS_CFLAGS) $(FREETYPE_CFLAGS) $(FONTCONFIG_CFLAGS) \
$(LIBOTF_CFLAGS) $(M17N_FLT_CFLAGS) $(DEPFLAGS) \
- $(LIBSYSTEMD_CFLAGS) \
+ $(LIBSYSTEMD_CFLAGS) $(JSON_CFLAGS) \
$(LIBGNUTLS_CFLAGS) $(NOTIFY_CFLAGS) $(CAIRO_CFLAGS) \
$(WERROR_CFLAGS)
ALL_CFLAGS = $(EMACS_CFLAGS) $(WARN_CFLAGS) $(CFLAGS)
@@ -397,7 +401,7 @@ base_obj =
thread.o systhread.o \
$(if $(HYBRID_MALLOC),sheap.o) \
$(MSDOS_OBJ) $(MSDOS_X_OBJ) $(NS_OBJ) $(CYGWIN_OBJ) $(FONT_OBJ) \
- $(W32_OBJ) $(WINDOW_SYSTEM_OBJ) $(XGSELOBJ)
+ $(W32_OBJ) $(WINDOW_SYSTEM_OBJ) $(XGSELOBJ) $(JSON_OBJ)
obj = $(base_obj) $(NS_OBJC_OBJ)
## Object files used on some machine or other.
@@ -493,7 +497,8 @@ LIBES =
$(LIBS_TERMCAP) $(GETLOADAVG_LIBS) $(SETTINGS_LIBS) $(LIBSELINUX_LIBS) \
$(FREETYPE_LIBS) $(FONTCONFIG_LIBS) $(LIBOTF_LIBS) $(M17N_FLT_LIBS) \
$(LIBGNUTLS_LIBS) $(LIB_PTHREAD) $(GETADDRINFO_A_LIBS) $(LIBLCMS2) \
- $(NOTIFY_LIBS) $(LIB_MATH) $(LIBZ) $(LIBMODULES) $(LIBSYSTEMD_LIBS)
+ $(NOTIFY_LIBS) $(LIB_MATH) $(LIBZ) $(LIBMODULES) $(LIBSYSTEMD_LIBS) \
+ $(JSON_LIBS)
## FORCE it so that admin/unidata can decide whether these files
## are up-to-date. Although since charprop depends on bootstrap-emacs,
diff --git a/src/emacs.c b/src/emacs.c
index 1ad8af70a7..eb5f1128f6 100644
--- a/src/emacs.c
+++ b/src/emacs.c
@@ -1610,6 +1610,10 @@ Using an Emacs configured with --with-x-toolkit=lucid does not have this problem
syms_of_threads ();
syms_of_profiler ();
+#ifdef HAVE_JSON
+ syms_of_json ();
+#endif
+
keys_of_casefiddle ();
keys_of_cmds ();
keys_of_buffer ();
diff --git a/src/json.c b/src/json.c
new file mode 100644
index 0000000000..628de82921
--- /dev/null
+++ b/src/json.c
@@ -0,0 +1,442 @@
+/* JSON parsing and serialization.
+
+Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GNU Emacs.
+
+GNU Emacs is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 3 of the License, or (at
+your option) any later version.
+
+GNU Emacs is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. */
+
+#include <config.h>
+
+#include <stddef.h>
+#include <stdint.h>
+
+#include <jansson.h>
+
+#include "lisp.h"
+#include "buffer.h"
+
+static _Noreturn void
+json_out_of_memory (void)
+{
+ xsignal0 (Qjson_out_of_memory);
+}
+
+static _Noreturn void
+json_parse_error (const json_error_t *error)
+{
+ xsignal (Qjson_parse_error,
+ list5 (build_string (error->text), build_string (error->source),
+ make_natnum (error->line), make_natnum (error->column),
+ make_natnum (error->position)));
+}
+
+static void
+json_release_object (void *object)
+{
+ json_decref (object);
+}
+
+static void
+check_string_without_embedded_nulls (Lisp_Object object)
+{
+ CHECK_STRING (object);
+ CHECK_TYPE (memchr (SDATA (object), '\0', SBYTES (object)) == NULL,
+ Qstring_without_embedded_nulls_p, object);
+}
+
+static json_t *
+json_check (json_t *object)
+{
+ if (object == NULL)
+ json_out_of_memory ();
+ return object;
+}
+
+static json_t *
+lisp_to_json (Lisp_Object lisp)
+{
+ if (NILP (lisp))
+ return json_check (json_null ());
+ else if (EQ (lisp, QCjson_false))
+ return json_check (json_false ());
+ else if (EQ (lisp, Qt))
+ return json_check (json_true ());
+ else if (INTEGERP (lisp))
+ {
+ CHECK_TYPE_RANGED_INTEGER (json_int_t, lisp);
+ return json_check (json_integer (XINT (lisp)));
+ }
+ else if (FLOATP (lisp))
+ return json_check (json_real (XFLOAT_DATA (lisp)));
+ else if (STRINGP (lisp))
+ {
+ ptrdiff_t size = SBYTES (lisp);
+ eassert (size >= 0);
+ if (size > SIZE_MAX)
+ xsignal1 (Qoverflow_error, build_pure_c_string ("string is too long"));
+ return json_check (json_stringn (SSDATA (lisp), size));
+ }
+ else if (VECTORP (lisp))
+ {
+ if (++lisp_eval_depth > max_lisp_eval_depth)
+ xsignal0 (Qjson_object_too_deep);
+ ptrdiff_t size = ASIZE (lisp);
+ eassert (size >= 0);
+ if (size > SIZE_MAX)
+ xsignal1 (Qoverflow_error, build_pure_c_string ("vector is too long"));
+ json_t *json = json_check (json_array ());
+ for (ptrdiff_t i = 0; i < size; ++i)
+ {
+ int status
+ = json_array_append_new (json, lisp_to_json (AREF (lisp, i)));
+ if (status == -1)
+ json_out_of_memory ();
+ eassert (status == 0);
+ }
+ eassert (json_array_size (json) == size);
+ --lisp_eval_depth;
+ return json;
+ }
+ else if (HASH_TABLE_P (lisp))
+ {
+ if (++lisp_eval_depth > max_lisp_eval_depth)
+ xsignal0 (Qjson_object_too_deep);
+ json_t* json = json_check (json_object ());
+ struct Lisp_Hash_Table *h = XHASH_TABLE (lisp);
+ for (ptrdiff_t i = 0; i < HASH_TABLE_SIZE (h); ++i)
+ if (!NILP (HASH_HASH (h, i)))
+ {
+ Lisp_Object key = HASH_KEY (h, i);
+ /* We can’t specify the length, so the string must be
+ null-terminated. */
+ check_string_without_embedded_nulls (key);
+ int status = json_object_set_new (json, SSDATA (key),
+ lisp_to_json (HASH_VALUE (h, i)));
+ if (status == -1)
+ json_out_of_memory ();
+ eassert (status == 0);
+ }
+ --lisp_eval_depth;
+ return json;
+ }
+ wrong_type_argument (Qjson_value_p, lisp);
+}
+
+DEFUN ("json-serialize", Fjson_serialize, Sjson_serialize, 1, 1, NULL,
+ doc: /* Return the JSON representation of OBJECT as a string.
+OBJECT must be a vector or hashtable, and its elements can recursively
+contain nil, t, `:json-false', numbers, strings, or other vectors and
+hashtables. nil, t, and `:json-false' will be converted to JSON null,
+true, and false values, respectively. Vectors will be converted to
+JSON arrays, and hashtables to JSON objects. Hashtable keys must be
+strings without embedded null characters and must be unique within
+each object. */)
+ (Lisp_Object object)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+
+ json_t *json = lisp_to_json (object);
+ record_unwind_protect_ptr (json_release_object, json);
+
+ char *string = json_dumps (json, JSON_COMPACT);
+ if (string == NULL)
+ json_out_of_memory ();
+ record_unwind_protect_ptr (free, string);
+
+ return unbind_to (count, build_string (string));
+}
+
+struct json_buffer_and_size
+{
+ const char *buffer;
+ size_t size;
+};
+
+static Lisp_Object
+json_insert (Lisp_Object data)
+{
+ const struct json_buffer_and_size *buffer_and_size = XSAVE_POINTER (data, 0);
+ if (FIXNUM_OVERFLOW_P (buffer_and_size->size))
+ xsignal1 (Qoverflow_error, build_pure_c_string ("buffer too large"));
+ Lisp_Object string
+ = make_string (buffer_and_size->buffer, buffer_and_size->size);
+ insert_from_string (string, 0, 0, SCHARS (string), SBYTES (string), false);
+ return Qnil;
+}
+
+struct json_insert_data
+{
+ /* nil if json_insert succeeded, otherwise a cons
+ (ERROR-SYMBOL . ERROR-DATA). */
+ Lisp_Object error;
+};
+
+static int
+json_insert_callback (const char *buffer, size_t size, void *data)
+{
+ /* This function may not exit nonlocally. */
+ struct json_insert_data *d = data;
+ struct json_buffer_and_size buffer_and_size
+ = {.buffer = buffer, .size = size};
+ d->error
+ = internal_condition_case_1 (json_insert, make_save_ptr (&buffer_and_size),
+ Qt, Fidentity);
+ return 0;
+}
+
+DEFUN ("json-insert", Fjson_insert, Sjson_insert, 1, 1, NULL,
+ doc: /* Insert the JSON representation of OBJECT before point.
+This is the same as (insert (json-serialize OBJECT)), but potentially
+faster. See the function `json-serialize' for allowed values of
+OBJECT. */)
+ (Lisp_Object object)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+
+ json_t *json = lisp_to_json (object);
+ record_unwind_protect_ptr (json_release_object, json);
+
+ struct json_insert_data data;
+ int status
+ = json_dump_callback (json, json_insert_callback, &data, JSON_COMPACT);
+ if (status == -1)
+ json_out_of_memory ();
+ eassert (status == 0);
+
+ if (!NILP (data.error))
+ xsignal (XCAR (data.error), XCDR (data.error));
+
+ return unbind_to (count, Qnil);
+}
+
+static Lisp_Object
+json_to_lisp (json_t *json)
+{
+ switch (json_typeof (json))
+ {
+ case JSON_NULL:
+ return Qnil;
+ case JSON_FALSE:
+ return QCjson_false;
+ case JSON_TRUE:
+ return Qt;
+ case JSON_INTEGER:
+ {
+ json_int_t value = json_integer_value (json);
+ if (FIXNUM_OVERFLOW_P (value))
+ xsignal1 (Qoverflow_error,
+ build_pure_c_string ("JSON integer is too large"));
+ return make_number (value);
+ }
+ case JSON_REAL:
+ return make_float (json_real_value (json));
+ case JSON_STRING:
+ {
+ size_t size = json_string_length (json);
+ if (FIXNUM_OVERFLOW_P (size))
+ xsignal1 (Qoverflow_error,
+ build_pure_c_string ("JSON string is too long"));
+ return make_string (json_string_value (json), size);
+ }
+ case JSON_ARRAY:
+ {
+ if (++lisp_eval_depth > max_lisp_eval_depth)
+ xsignal0 (Qjson_object_too_deep);
+ size_t size = json_array_size (json);
+ if (FIXNUM_OVERFLOW_P (size))
+ xsignal1 (Qoverflow_error,
+ build_pure_c_string ("JSON array is too long"));
+ Lisp_Object result = Fmake_vector (make_natnum (size), Qunbound);
+ for (ptrdiff_t i = 0; i < size; ++i)
+ ASET (result, i,
+ json_to_lisp (json_array_get (json, i)));
+ --lisp_eval_depth;
+ return result;
+ }
+ case JSON_OBJECT:
+ {
+ if (++lisp_eval_depth > max_lisp_eval_depth)
+ xsignal0 (Qjson_object_too_deep);
+ size_t size = json_object_size (json);
+ if (FIXNUM_OVERFLOW_P (size))
+ xsignal1 (Qoverflow_error,
+ build_pure_c_string ("JSON object has too many elements"));
+ Lisp_Object result = CALLN (Fmake_hash_table, QCtest, Qequal,
+ QCsize, make_natnum (size));
+ struct Lisp_Hash_Table *h = XHASH_TABLE (result);
+ const char *key_str;
+ json_t *value;
+ json_object_foreach (json, key_str, value)
+ {
+ Lisp_Object key = build_string (key_str);
+ EMACS_UINT hash;
+ ptrdiff_t i = hash_lookup (h, key, &hash);
+ eassert (i < 0);
+ hash_put (h, key, json_to_lisp (value), hash);
+ }
+ --lisp_eval_depth;
+ return result;
+ }
+ }
+ /* Can’t get here. */
+ emacs_abort ();
+}
+
+DEFUN ("json-parse-string", Fjson_parse_string, Sjson_parse_string, 1, 1, NULL,
+ doc: /* Parse the JSON STRING into a Lisp object.
+This is essentially the reverse operation of `json-serialize', which
+see. The returned object will be a vector or hashtable. Its elements
+will be nil, t, `:json-false', numbers, strings, or further vectors
+and hashtables. If there are duplicate keys in an object, all but the
+last one are ignored. If STRING doesn't contain a valid JSON object,
+an error of type `json-parse-error' is signaled. */)
+ (Lisp_Object string)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+ check_string_without_embedded_nulls (string);
+
+ json_error_t error;
+ json_t *object = json_loads (SSDATA (string), 0, &error);
+ if (object == NULL)
+ json_parse_error (&error);
+
+ /* Avoid leaking the object in case of further errors. */
+ if (object != NULL)
+ record_unwind_protect_ptr (json_release_object, object);
+
+ return unbind_to (count, json_to_lisp (object));
+}
+
+struct json_read_buffer_data
+{
+ ptrdiff_t point;
+};
+
+static size_t
+json_read_buffer_callback (void *buffer, size_t buflen, void *data)
+{
+ struct json_read_buffer_data *d = data;
+
+ /* First, parse from point to the gap or the end of the accessible
+ portion, whatever is closer. */
+ ptrdiff_t point = d->point;
+ ptrdiff_t end;
+ {
+ bool overflow = INT_ADD_WRAPV (BUFFER_CEILING_OF (point), 1, &end);
+ eassert (!overflow);
+ }
+ size_t count;
+ {
+ bool overflow = INT_SUBTRACT_WRAPV (end, point, &count);
+ eassert (!overflow);
+ }
+ if (buflen < count)
+ count = buflen;
+ memcpy (buffer, BYTE_POS_ADDR (point), count);
+ {
+ bool overflow = INT_ADD_WRAPV (d->point, count, &d->point);
+ eassert (!overflow);
+ }
+ return count;
+}
+
+DEFUN ("json-parse-buffer", Fjson_parse_buffer, Sjson_parse_buffer,
+ 0, 0, NULL,
+ doc: /* Read JSON object from current buffer starting at point.
+This is similar to `json-parse-string', which see. Move point after
+the end of the object if parsing was successful. On error, point is
+not moved. */)
+ (void)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+
+ ptrdiff_t point = PT_BYTE;
+ struct json_read_buffer_data data = {.point = point};
+ json_error_t error;
+ json_t *object = json_load_callback (json_read_buffer_callback, &data,
+ JSON_DISABLE_EOF_CHECK, &error);
+
+ if (object == NULL)
+ json_parse_error (&error);
+
+ /* Avoid leaking the object in case of further errors. */
+ record_unwind_protect_ptr (json_release_object, object);
+
+ /* Convert and then move point only if everything succeeded. */
+ Lisp_Object lisp = json_to_lisp (object);
+
+ {
+ /* Adjust point by how much we just read. Do this here because
+ tokener->char_offset becomes incorrect below. */
+ bool overflow = INT_ADD_WRAPV (point, error.position, &point);
+ eassert (!overflow);
+ eassert (point <= ZV_BYTE);
+ SET_PT_BOTH (BYTE_TO_CHAR (point), point);
+ }
+
+ return unbind_to (count, lisp);
+}
+
+/* Simplified version of ‘define-error’ that works with pure
+ objects. */
+
+static void
+define_error (Lisp_Object name, const char *message, Lisp_Object parent)
+{
+ eassert (SYMBOLP (name));
+ eassert (SYMBOLP (parent));
+ Lisp_Object parent_conditions = Fget (parent, Qerror_conditions);
+ eassert (CONSP (parent_conditions));
+ eassert (!NILP (Fmemq (parent, parent_conditions)));
+ eassert (NILP (Fmemq (name, parent_conditions)));
+ Fput (name, Qerror_conditions, pure_cons (name, parent_conditions));
+ Fput (name, Qerror_message, build_pure_c_string (message));
+}
+
+void
+syms_of_json (void)
+{
+ DEFSYM (QCjson_false, ":json-false");
+
+ DEFSYM (Qstring_without_embedded_nulls_p, "string-without-embedded-nulls-p");
+ DEFSYM (Qjson_value_p, "json-value-p");
+
+ DEFSYM (Qjson_error, "json-error");
+ DEFSYM (Qjson_out_of_memory, "json-out-of-memory");
+ DEFSYM (Qjson_parse_error, "json-parse-error");
+ DEFSYM (Qjson_object_too_deep, "json-object-too-deep");
+ define_error (Qjson_error, "generic JSON error", Qerror);
+ define_error (Qjson_out_of_memory, "no free memory for creating JSON object",
+ Qjson_error);
+ define_error (Qjson_parse_error, "could not parse JSON stream",
+ Qjson_error);
+ define_error (Qjson_object_too_deep, "object cyclic or too deep",
+ Qjson_error);
+
+ DEFSYM (Qpure, "pure");
+ DEFSYM (Qside_effect_free, "side-effect-free");
+
+ DEFSYM (Qjson_serialize, "json-serialize");
+ DEFSYM (Qjson_parse_string, "json-parse-string");
+ Fput (Qjson_serialize, Qpure, Qt);
+ Fput (Qjson_serialize, Qside_effect_free, Qt);
+ Fput (Qjson_parse_string, Qpure, Qt);
+ Fput (Qjson_parse_string, Qside_effect_free, Qt);
+
+ defsubr (&Sjson_serialize);
+ defsubr (&Sjson_insert);
+ defsubr (&Sjson_parse_string);
+ defsubr (&Sjson_parse_buffer);
+}
diff --git a/src/lisp.h b/src/lisp.h
index c503082442..8d485098ac 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -3440,6 +3440,11 @@ extern int x_bitmap_mask (struct frame *, ptrdiff_t);
extern void reset_image_types (void);
extern void syms_of_image (void);
+#ifdef HAVE_JSON
+/* Defined in json.c. */
+extern void syms_of_json (void);
+#endif
+
/* Defined in insdel.c. */
extern void move_gap_both (ptrdiff_t, ptrdiff_t);
extern _Noreturn void buffer_overflow (void);
diff --git a/test/src/json-tests.el b/test/src/json-tests.el
new file mode 100644
index 0000000000..1d8f9a490b
--- /dev/null
+++ b/test/src/json-tests.el
@@ -0,0 +1,61 @@
+;;; json-tests.el --- unit tests for json.c -*- lexical-binding: t; -*-
+
+;; Copyright (C) 2017 Free Software Foundation, Inc.
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
+
+;;; Commentary:
+
+;; Unit tests for src/json.c.
+
+;;; Code:
+
+(require 'cl-lib)
+(require 'map)
+
+(ert-deftest json-serialize/roundtrip ()
+ (let ((lisp [nil :json-false t 0 123 -456 3.75 "foo"])
+ (json "[null,false,true,0,123,-456,3.75,\"foo\"]"))
+ (should (equal (json-serialize lisp) json))
+ (with-temp-buffer
+ (json-insert lisp)
+ (should (equal (buffer-string) json))
+ (should (eobp)))
+ (should (equal (json-parse-string json) lisp))
+ (with-temp-buffer
+ (insert json)
+ (goto-char 1)
+ (should (equal (json-parse-buffer) lisp))
+ (should (eobp)))))
+
+(ert-deftest json-serialize/object ()
+ (let ((table (make-hash-table :test #'equal)))
+ (puthash "abc" [1 2 t] table)
+ (puthash "def" nil table)
+ (should (equal (json-serialize table)
+ "{\"abc\":[1,2,true],\"def\":null}"))))
+
+(ert-deftest json-parse-string/object ()
+ (let ((actual
+ (json-parse-string
+ "{ \"abc\" : [1, 2, true], \"def\" : null, \"abc\" : [9, false] }\n")))
+ (should (hash-table-p actual))
+ (should (equal (hash-table-count actual) 2))
+ (should (equal (cl-sort (map-pairs actual) #'string< :key #'car)
+ '(("abc" . [9 :json-false]) ("def"))))))
+
+(provide 'json-tests)
+;;; json-tests.el ends here
--
2.14.1
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-17 0:02 ` Richard Stallman
@ 2017-09-18 13:46 ` Ted Zlatanov
0 siblings, 0 replies; 81+ messages in thread
From: Ted Zlatanov @ 2017-09-18 13:46 UTC (permalink / raw)
To: emacs-devel
On Sat, 16 Sep 2017 20:02:17 -0400 Richard Stallman <rms@gnu.org> wrote:
> Ted wrote:
>> I wanted to ask if there's any chance of improving the parsing
>> performance of JSON, YAML, TOML, and similar data formats. It's pretty
>> poor today.
RS> Can you design a primitive general enough to speed up parsing of all those
RS> formats? That would make it feasible to handle them all without
RS> too much work.
I don't think that's easy. They are pretty different. Maybe TOML and INI
parsing can be unified, but that's about it.
I think the best approach is to choose one of the libraries Philipp
suggested but I don't have a favorite.
Thanks
Ted
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-17 20:27 ` Philipp Stephani
2017-09-17 22:41 ` Mark Oteiza
@ 2017-09-18 13:53 ` Ted Zlatanov
1 sibling, 0 replies; 81+ messages in thread
From: Ted Zlatanov @ 2017-09-18 13:53 UTC (permalink / raw)
To: emacs-devel
On Sun, 17 Sep 2017 20:27:02 +0000 Philipp Stephani <p.stephani2@gmail.com> wrote:
PS> Sure, I've made a quick overview based on
PS> https://github.com/miloyip/nativejson-benchmark. I've only used the
PS> libraries that are written in C and have been tested in that benchmark;
PS> it's still quite a few. I've checked the conformance and speed metrics from
PS> the benchmark as well as number of GitHub stars (as proxy for popularity)
PS> and number of commit in the last month (as proxy for development activity):
PS> Here are the results:
PS> https://docs.google.com/spreadsheets/d/e/2PACX-1vTqKxqo47s67L3EJ9AWvZclNuT2xbd9rgoRuJ_UYbXgnV171owr8h2mksHjrjNGADDR3DVTWQvUMBpe/pubhtml?gid=0&single=true
PS> Note that some of the libraries (jsmn, ujson4c) don't appear to support
PS> serialization at all; I'd suggest to avoid them, because we'd then need to
PS> wrap another library for serialization. Also, even though JSMN advertises
PS> itself as "world's fastest JSON parser", it's actually the slowest of the
PS> libraries in the survey. json-c appears to be reasonably conformant and
PS> fast for both parsing and serialization, and has by far the largest
PS> development activity.
Hi Philipp,
thanks for doing all that work!
I'd suggest posting the survey results here directly.
Also maybe consider the jq built-in JSON parser, which could be a good
fit (it's usable as a library IIRC). The rest of the libraries look good
and I hope we make a choice soon. I don't have a favorite.
Thanks
Ted
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-18 13:26 ` JSON/YAML/TOML/etc. parsing performance Philipp Stephani
@ 2017-09-18 13:58 ` Mark Oteiza
2017-09-18 14:14 ` Philipp Stephani
2017-09-18 14:57 ` Eli Zaretskii
2017-09-19 8:18 ` Philipp Stephani
2 siblings, 1 reply; 81+ messages in thread
From: Mark Oteiza @ 2017-09-18 13:58 UTC (permalink / raw)
To: Philipp Stephani; +Cc: emacs-devel
> Philipp Stephani <p.stephani2@gmail.com> schrieb am So., 17. Sep. 2017 um 20:46 Uhr:
>
> I've discovered that the interface and documentation of Jansson are much better than
> the ones of json-c, so I switched to Jansson. I've attached a patch.
Doing the following on a 276K file:
(with-temp-buffer
(insert-file-contents-literally "test.json")
(benchmark-run 10
(goto-char (point-min))
(json-parse-buffer)))
These are my rough average benchmarks:
Time GCs GC time
Jansson 0.33 10 0.15
json.el 1.21 38 0.48
Nice. Was there a particular reason (aside from access time) you chose
hash tables instead of a sexp form?
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-18 13:58 ` Mark Oteiza
@ 2017-09-18 14:14 ` Philipp Stephani
2017-09-18 14:28 ` Mark Oteiza
0 siblings, 1 reply; 81+ messages in thread
From: Philipp Stephani @ 2017-09-18 14:14 UTC (permalink / raw)
To: Mark Oteiza; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 702 bytes --]
Mark Oteiza <mvoteiza@udel.edu> schrieb am Mo., 18. Sep. 2017 um 15:58 Uhr:
> Was there a particular reason (aside from access time) you chose
> hash tables instead of a sexp form?
>
- Hashtables have similar constraints as the underlying JSON objects (no
duplicate keys, no ordering), so they are a better match.
- Hashtables have non-nil empty values. If I had uses alists, I would have
had to introduce a separate keyword :json-null for null.
- Hashtables always represent maps, but alists are also normal sequences,
so users could expect that they get translated into arrays instead of
objects.
- Using only one data structure per JSON object type makes the interface
and implementation simpler.
[-- Attachment #2: Type: text/html, Size: 1004 bytes --]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-18 14:14 ` Philipp Stephani
@ 2017-09-18 14:28 ` Mark Oteiza
2017-09-18 14:36 ` Philipp Stephani
0 siblings, 1 reply; 81+ messages in thread
From: Mark Oteiza @ 2017-09-18 14:28 UTC (permalink / raw)
To: Philipp Stephani; +Cc: emacs-devel
On 18/09/17 at 02:14pm, Philipp Stephani wrote:
> Mark Oteiza <mvoteiza@udel.edu> schrieb am Mo., 18. Sep. 2017 um 15:58 Uhr:
>
> > Was there a particular reason (aside from access time) you chose
> > hash tables instead of a sexp form?
>
> - Hashtables have similar constraints as the underlying JSON objects (no
> duplicate keys, no ordering), so they are a better match.
> - Hashtables have non-nil empty values. If I had uses alists, I would have
> had to introduce a separate keyword :json-null for null.
> - Hashtables always represent maps, but alists are also normal sequences,
> so users could expect that they get translated into arrays instead of
> objects.
> - Using only one data structure per JSON object type makes the interface
> and implementation simpler.
Makes sense and I agree, thank you. Thanks for the patch.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-18 14:28 ` Mark Oteiza
@ 2017-09-18 14:36 ` Philipp Stephani
2017-09-18 15:02 ` Eli Zaretskii
0 siblings, 1 reply; 81+ messages in thread
From: Philipp Stephani @ 2017-09-18 14:36 UTC (permalink / raw)
To: Mark Oteiza; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1006 bytes --]
Mark Oteiza <mvoteiza@udel.edu> schrieb am Mo., 18. Sep. 2017 um 16:28 Uhr:
> On 18/09/17 at 02:14pm, Philipp Stephani wrote:
> > Mark Oteiza <mvoteiza@udel.edu> schrieb am Mo., 18. Sep. 2017 um 15:58
> Uhr:
> >
> > > Was there a particular reason (aside from access time) you chose
> > > hash tables instead of a sexp form?
> >
> > - Hashtables have similar constraints as the underlying JSON objects (no
> > duplicate keys, no ordering), so they are a better match.
> > - Hashtables have non-nil empty values. If I had uses alists, I would
> have
> > had to introduce a separate keyword :json-null for null.
> > - Hashtables always represent maps, but alists are also normal sequences,
> > so users could expect that they get translated into arrays instead of
> > objects.
> > - Using only one data structure per JSON object type makes the interface
> > and implementation simpler.
>
> Makes sense and I agree, thank you. Thanks for the patch.
>
Thanks for the review; pushed to master as cb99cf5a99.
[-- Attachment #2: Type: text/html, Size: 1444 bytes --]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-18 13:26 ` JSON/YAML/TOML/etc. parsing performance Philipp Stephani
2017-09-18 13:58 ` Mark Oteiza
@ 2017-09-18 14:57 ` Eli Zaretskii
2017-09-18 15:07 ` Mark Oteiza
2017-09-18 16:08 ` Philipp Stephani
2017-09-19 8:18 ` Philipp Stephani
2 siblings, 2 replies; 81+ messages in thread
From: Eli Zaretskii @ 2017-09-18 14:57 UTC (permalink / raw)
To: Philipp Stephani; +Cc: emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Mon, 18 Sep 2017 13:26:34 +0000
>
> I've discovered that the interface and documentation of Jansson are much better than the ones of json-c, so I
> switched to Jansson.
Thanks, but isn't Jansson less actively developed, judging by your
survey?
> I've attached a patch.
I thought we wanted to import the library into Emacs proper, didn't
we? What is the purpose of providing such a core functionality as an
optional feature?
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-18 14:36 ` Philipp Stephani
@ 2017-09-18 15:02 ` Eli Zaretskii
2017-09-18 16:14 ` Philipp Stephani
0 siblings, 1 reply; 81+ messages in thread
From: Eli Zaretskii @ 2017-09-18 15:02 UTC (permalink / raw)
To: Philipp Stephani; +Cc: mvoteiza, emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Mon, 18 Sep 2017 14:36:43 +0000
> Cc: emacs-devel@gnu.org
>
> Thanks for the review; pushed to master as cb99cf5a99.
Boom! We've just started talking about this, and AFAIU didn't even
agree on this library. And I'm not sure we want a JSON library as an
optional feature. So I must ask you to please revert, and let's wait
until the discussion comes to its completion.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-18 14:57 ` Eli Zaretskii
@ 2017-09-18 15:07 ` Mark Oteiza
2017-09-18 15:51 ` Eli Zaretskii
2017-09-18 16:08 ` Philipp Stephani
1 sibling, 1 reply; 81+ messages in thread
From: Mark Oteiza @ 2017-09-18 15:07 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Philipp Stephani, emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Philipp Stephani <p.stephani2@gmail.com>
>> Date: Mon, 18 Sep 2017 13:26:34 +0000
>>
>> I've discovered that the interface and documentation of Jansson are
>> much better than the ones of json-c, so I
>> switched to Jansson.
>
> Thanks, but isn't Jansson less actively developed, judging by your
> survey?
>
>> I've attached a patch.
>
> I thought we wanted to import the library into Emacs proper, didn't
> we? What is the purpose of providing such a core functionality as an
> optional feature?
I referred to embeddable code; however, there is a wide variety from
embedded to big featureful libraries for JSON. Jansson is one of those
featureful libraries, to which we should link instead of assimilating.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-18 15:07 ` Mark Oteiza
@ 2017-09-18 15:51 ` Eli Zaretskii
2017-09-18 16:22 ` Philipp Stephani
2017-09-18 17:26 ` Glenn Morris
0 siblings, 2 replies; 81+ messages in thread
From: Eli Zaretskii @ 2017-09-18 15:51 UTC (permalink / raw)
To: Mark Oteiza; +Cc: p.stephani2, emacs-devel
> > I thought we wanted to import the library into Emacs proper, didn't
> > we? What is the purpose of providing such a core functionality as an
> > optional feature?
>
> I referred to embeddable code; however, there is a wide variety from
> embedded to big featureful libraries for JSON. Jansson is one of those
> featureful libraries, to which we should link instead of assimilating.
I'm not sure I follow: are you agreeing with me or are you
disagreeing? If the latter, can you elaborate why you think we
shouldn't add a JSON library to the Emacs sources, like with do with
lwlib?
Thanks.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-18 14:57 ` Eli Zaretskii
2017-09-18 15:07 ` Mark Oteiza
@ 2017-09-18 16:08 ` Philipp Stephani
1 sibling, 0 replies; 81+ messages in thread
From: Philipp Stephani @ 2017-09-18 16:08 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1117 bytes --]
Eli Zaretskii <eliz@gnu.org> schrieb am Mo., 18. Sep. 2017 um 16:58 Uhr:
> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Mon, 18 Sep 2017 13:26:34 +0000
> >
> > I've discovered that the interface and documentation of Jansson are much
> better than the ones of json-c, so I
> > switched to Jansson.
>
> Thanks, but isn't Jansson less actively developed, judging by your
> survey?
>
A bit, but it did have a nonzero number of commits in the last month, and
over a longer term seems quite active. (Some of the other libraries had
their last commit in 2014.)
>
> > I've attached a patch.
>
> I thought we wanted to import the library into Emacs proper, didn't
> we? What is the purpose of providing such a core functionality as an
> optional feature?
>
If we want to make it mandatory, we can always make the library mandatory
and/or include its source code. However, the latter has significant
downsides (we need to maintain the source code), so I don't think we should
do it, at least not in the first iteration. For now, we can define the new
functions in json.el if native support isn't available.
[-- Attachment #2: Type: text/html, Size: 1677 bytes --]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-18 15:02 ` Eli Zaretskii
@ 2017-09-18 16:14 ` Philipp Stephani
2017-09-18 17:33 ` Eli Zaretskii
2017-09-18 19:57 ` Thien-Thi Nguyen
0 siblings, 2 replies; 81+ messages in thread
From: Philipp Stephani @ 2017-09-18 16:14 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: mvoteiza, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 738 bytes --]
Eli Zaretskii <eliz@gnu.org> schrieb am Mo., 18. Sep. 2017 um 17:02 Uhr:
> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Mon, 18 Sep 2017 14:36:43 +0000
> > Cc: emacs-devel@gnu.org
> >
> > Thanks for the review; pushed to master as cb99cf5a99.
>
> Boom! We've just started talking about this, and AFAIU didn't even
> agree on this library.
The library is an implementation detail that can be easily changed later.
> And I'm not sure we want a JSON library as an
> optional feature.
I'm also not sure, but going from optional to required is always possible
later.
> So I must ask you to please revert, and let's wait
> until the discussion comes to its completion.
>
Done; I probably should have waited a few days.
[-- Attachment #2: Type: text/html, Size: 1488 bytes --]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-18 15:51 ` Eli Zaretskii
@ 2017-09-18 16:22 ` Philipp Stephani
2017-09-18 18:08 ` Eli Zaretskii
2017-09-18 17:26 ` Glenn Morris
1 sibling, 1 reply; 81+ messages in thread
From: Philipp Stephani @ 2017-09-18 16:22 UTC (permalink / raw)
To: Eli Zaretskii, Mark Oteiza; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1042 bytes --]
Eli Zaretskii <eliz@gnu.org> schrieb am Mo., 18. Sep. 2017 um 17:51 Uhr:
> > > I thought we wanted to import the library into Emacs proper, didn't
> > > we? What is the purpose of providing such a core functionality as an
> > > optional feature?
> >
> > I referred to embeddable code; however, there is a wide variety from
> > embedded to big featureful libraries for JSON. Jansson is one of those
> > featureful libraries, to which we should link instead of assimilating.
>
> I'm not sure I follow: are you agreeing with me or are you
> disagreeing? If the latter, can you elaborate why you think we
> shouldn't add a JSON library to the Emacs sources, like with do with
> lwlib?
>
>
I can't find the sources now, but I dimly remember that adding the code
directly to the Emacs tree would require copyright transfer to the FSF, and
we only make an exception to this for lwlib because it's not otherwise
maintained.
A possible middle ground would be to link it statically, then only the
machines that run Emacs builds would require it.
[-- Attachment #2: Type: text/html, Size: 1423 bytes --]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-18 15:51 ` Eli Zaretskii
2017-09-18 16:22 ` Philipp Stephani
@ 2017-09-18 17:26 ` Glenn Morris
2017-09-18 18:16 ` Eli Zaretskii
1 sibling, 1 reply; 81+ messages in thread
From: Glenn Morris @ 2017-09-18 17:26 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Mark Oteiza, p.stephani2, emacs-devel
Eli Zaretskii wrote:
> If the latter, can you elaborate why you think we shouldn't add a JSON
> library to the Emacs sources, like with do with lwlib?
The reasons are nicely summarized here
https://wiki.gentoo.org/wiki/Why_not_bundle_dependencies
lwlib is a historical artefact that has not existed outside of Emacs for
some time. Please don't use it as an example for future practice.
Bundling dependencies is just wrong.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-18 16:14 ` Philipp Stephani
@ 2017-09-18 17:33 ` Eli Zaretskii
2017-09-18 19:57 ` Thien-Thi Nguyen
1 sibling, 0 replies; 81+ messages in thread
From: Eli Zaretskii @ 2017-09-18 17:33 UTC (permalink / raw)
To: Philipp Stephani; +Cc: mvoteiza, emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Mon, 18 Sep 2017 16:14:32 +0000
> Cc: mvoteiza@udel.edu, emacs-devel@gnu.org
>
> So I must ask you to please revert, and let's wait
> until the discussion comes to its completion.
>
> Done; I probably should have waited a few days.
Thanks. In a week or so, if no objections or better ideas come up,
please revert the revert.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-18 16:22 ` Philipp Stephani
@ 2017-09-18 18:08 ` Eli Zaretskii
2017-09-19 19:32 ` Richard Stallman
0 siblings, 1 reply; 81+ messages in thread
From: Eli Zaretskii @ 2017-09-18 18:08 UTC (permalink / raw)
To: Philipp Stephani; +Cc: mvoteiza, emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Mon, 18 Sep 2017 16:22:12 +0000
> Cc: emacs-devel@gnu.org
>
> I'm not sure I follow: are you agreeing with me or are you
> disagreeing? If the latter, can you elaborate why you think we
> shouldn't add a JSON library to the Emacs sources, like with do with
> lwlib?
>
> I can't find the sources now, but I dimly remember that adding the code directly to the Emacs tree would
> require copyright transfer to the FSF, and we only make an exception to this for lwlib because it's not
> otherwise maintained.
I thought Richard just said we didn't need an assignment in such
cases, but maybe I misunderstood.
> A possible middle ground would be to link it statically, then only the machines that run Emacs builds would
> require it.
This could be problematic because AFAIK linking a single library
statically requires to name its file name explicitly, rather than just
adding -lLIBRARY to the link command line.
And if I understand Glenn's objections correctly, linking statically
is not different from bundling.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-18 17:26 ` Glenn Morris
@ 2017-09-18 18:16 ` Eli Zaretskii
0 siblings, 0 replies; 81+ messages in thread
From: Eli Zaretskii @ 2017-09-18 18:16 UTC (permalink / raw)
To: Glenn Morris; +Cc: mvoteiza, p.stephani2, emacs-devel
> From: Glenn Morris <rgm@gnu.org>
> Cc: Mark Oteiza <mvoteiza@udel.edu>, p.stephani2@gmail.com, emacs-devel@gnu.org
> Date: Mon, 18 Sep 2017 13:26:44 -0400
>
> The reasons are nicely summarized here
> https://wiki.gentoo.org/wiki/Why_not_bundle_dependencies
Thanks for the pointer.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-18 16:14 ` Philipp Stephani
2017-09-18 17:33 ` Eli Zaretskii
@ 2017-09-18 19:57 ` Thien-Thi Nguyen
1 sibling, 0 replies; 81+ messages in thread
From: Thien-Thi Nguyen @ 2017-09-18 19:57 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1099 bytes --]
() Philipp Stephani <p.stephani2@gmail.com>
() Mon, 18 Sep 2017 16:14:32 +0000
> So I must ask you to please revert, and let's wait
> until the discussion comes to its completion.
Done; I probably should have waited a few days.
Just a thought (also applicable to YAML, etc), while waiting:
How about we use this opportunity to explore the (C language)
modules support? Parsing is in practice a bug-ridden endeavor,
so if we're going to do it in C (i.e., commit to handling all
those latent CVEs), why not prioritize isolation more highly?
Even more extremely, why not call out to an external program?
For instance, modify jq(1) to emit sexps for Emacs to ‘read’ (w/
proper exception handling). This would benefit other Lisp and
(maybe) Scheme systems as well.
--
Thien-Thi Nguyen -----------------------------------------------
(defun responsep (query)
(pcase (context query)
(`(technical ,ml) (correctp ml))
...)) 748E A0E8 1CB8 A748 9BFA
--------------------------------------- 6CE4 6703 2224 4C80 7502
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-18 13:26 ` JSON/YAML/TOML/etc. parsing performance Philipp Stephani
2017-09-18 13:58 ` Mark Oteiza
2017-09-18 14:57 ` Eli Zaretskii
@ 2017-09-19 8:18 ` Philipp Stephani
2017-09-19 19:09 ` Eli Zaretskii
2 siblings, 1 reply; 81+ messages in thread
From: Philipp Stephani @ 2017-09-19 8:18 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1.1: Type: text/plain, Size: 1572 bytes --]
Philipp Stephani <p.stephani2@gmail.com> schrieb am Mo., 18. Sep. 2017 um
15:26 Uhr:
> Philipp Stephani <p.stephani2@gmail.com> schrieb am So., 17. Sep. 2017 um
> 20:46 Uhr:
>
>> Ted Zlatanov <tzz@lifelogs.com> schrieb am Sa., 16. Sep. 2017 um
>> 17:55 Uhr:
>>
>>> I wanted to ask if there's any chance of improving the parsing
>>> performance of JSON, YAML, TOML, and similar data formats. It's pretty
>>> poor today.
>>>
>>> That could be done in the core with C code, improved Lisp code,
>>> integration with an external library, or a mix of those.
>>>
>>
>> I don't know much about the others, but given the importance of JSON as
>> data exchange and serialization format, I think it's worthwhile to invest
>> some time here. I've implemented a wrapper around the json-c library
>> (license: Expat/X11/MIT), resulting in significant speedups using the test
>> data from https://github.com/miloyip/nativejson-benchmark: a factor of
>> 3.9 to 6.4 for parsing, and a factor of 27 to 67 for serializing. If people
>> agree that this is useful I can send a patch.
>>
>
> I've discovered that the interface and documentation of Jansson are much
> better than the ones of json-c, so I switched to Jansson. I've attached a
> patch.
>
Here's a newer version of the patch. The only significant difference is
that now the Lisp values for JSON null and false are :null and :false,
respectively. Using a dedicated symbol for :null reduces the mental
overhead of the triple meaning of nil (null, false, empty list), and is
more future-proof, should we ever want to support lists.
[-- Attachment #1.2: Type: text/html, Size: 2572 bytes --]
[-- Attachment #2: 0001-Implement-native-JSON-support-using-Jansson.txt --]
[-- Type: text/plain, Size: 24203 bytes --]
From e45f18cdaec4b9e07aa53159083d760178833bf0 Mon Sep 17 00:00:00 2001
From: Philipp Stephani <phst@google.com>
Date: Mon, 18 Sep 2017 10:51:39 +0200
Subject: [PATCH] Implement native JSON support using Jansson
* configure.ac: New option --with-json.
* src/json.c (Fjson_serialize, Fjson_insert, Fjson_parse_string)
(Fjson_parse_buffer): New defuns.
(json_out_of_memory, json_parse_error, json_release_object)
(check_string_without_embedded_nulls, json_check, lisp_to_json)
(json_insert, json_insert_callback, json_to_lisp)
(json_read_buffer_callback, Fjson_parse_buffer, define_error): New
helper function.
(syms_of_json): New file.
* src/lisp.h: Declaration for syms_of_json.
* src/conf_post.h (ATTRIBUTE_WARN_UNUSED_RESULT): New attribute macro.
* src/emacs.c (main): Enable JSON functions.
* src/Makefile.in (JSON_LIBS, JSON_CFLAGS, JSON_OBJ, EMACS_CFLAGS)
(base_obj, LIBES): Compile json.c if --with-json is enabled.
* test/src/json-tests.el (json-serialize/roundtrip)
(json-serialize/object, json-parse-string/object): New unit tests.
---
configure.ac | 20 ++-
src/Makefile.in | 11 +-
src/conf_post.h | 6 +
src/emacs.c | 4 +
src/json.c | 461 +++++++++++++++++++++++++++++++++++++++++++++++++
src/lisp.h | 5 +
test/src/json-tests.el | 61 +++++++
7 files changed, 564 insertions(+), 4 deletions(-)
create mode 100644 src/json.c
create mode 100644 test/src/json-tests.el
diff --git a/configure.ac b/configure.ac
index 35b7e69daf..03ad244d37 100644
--- a/configure.ac
+++ b/configure.ac
@@ -348,6 +348,7 @@ AC_DEFUN
OPTION_DEFAULT_OFF([cairo],[compile with Cairo drawing (experimental)])
OPTION_DEFAULT_ON([xml2],[don't compile with XML parsing support])
OPTION_DEFAULT_ON([imagemagick],[don't compile with ImageMagick image support])
+OPTION_DEFAULT_ON([json], [don't compile with native JSON support])
OPTION_DEFAULT_ON([xft],[don't use XFT for anti aliased fonts])
OPTION_DEFAULT_ON([libotf],[don't use libotf for OpenType font support])
@@ -2856,6 +2857,22 @@ AC_DEFUN
AC_SUBST(LIBSYSTEMD_LIBS)
AC_SUBST(LIBSYSTEMD_CFLAGS)
+HAVE_JSON=no
+JSON_OBJ=
+
+if test "${with_json}" = yes; then
+ EMACS_CHECK_MODULES([JSON], [jansson >= 2.5],
+ [HAVE_JSON=yes], [HAVE_JSON=no])
+ if test "${HAVE_JSON}" = yes; then
+ AC_DEFINE(HAVE_JSON, 1, [Define if using Jansson.])
+ JSON_OBJ=json.o
+ fi
+fi
+
+AC_SUBST(JSON_LIBS)
+AC_SUBST(JSON_CFLAGS)
+AC_SUBST(JSON_OBJ)
+
NOTIFY_OBJ=
NOTIFY_SUMMARY=no
@@ -5368,7 +5385,7 @@ AC_DEFUN
for opt in XAW3D XPM JPEG TIFF GIF PNG RSVG CAIRO IMAGEMAGICK SOUND GPM DBUS \
GCONF GSETTINGS NOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT \
LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS X_TOOLKIT OLDXMENU X11 NS MODULES \
- XWIDGETS LIBSYSTEMD CANNOT_DUMP LCMS2; do
+ XWIDGETS LIBSYSTEMD JSON CANNOT_DUMP LCMS2; do
case $opt in
CANNOT_DUMP) eval val=\${$opt} ;;
@@ -5418,6 +5435,7 @@ AC_DEFUN
Does Emacs use -lotf? ${HAVE_LIBOTF}
Does Emacs use -lxft? ${HAVE_XFT}
Does Emacs use -lsystemd? ${HAVE_LIBSYSTEMD}
+ Does Emacs use -ljansson? ${HAVE_JSON}
Does Emacs directly use zlib? ${HAVE_ZLIB}
Does Emacs have dynamic modules support? ${HAVE_MODULES}
Does Emacs use toolkit scroll bars? ${USE_TOOLKIT_SCROLL_BARS}
diff --git a/src/Makefile.in b/src/Makefile.in
index 0e55ad4bb2..4d33682629 100644
--- a/src/Makefile.in
+++ b/src/Makefile.in
@@ -312,6 +312,10 @@ LIBGNUTLS_CFLAGS =
LIBSYSTEMD_LIBS = @LIBSYSTEMD_LIBS@
LIBSYSTEMD_CFLAGS = @LIBSYSTEMD_CFLAGS@
+JSON_LIBS = @JSON_LIBS@
+JSON_CFLAGS = @JSON_CFLAGS@
+JSON_OBJ = @JSON_OBJ@
+
INTERVALS_H = dispextern.h intervals.h composite.h
GETLOADAVG_LIBS = @GETLOADAVG_LIBS@
@@ -363,7 +367,7 @@ EMACS_CFLAGS=
$(WEBKIT_CFLAGS) \
$(SETTINGS_CFLAGS) $(FREETYPE_CFLAGS) $(FONTCONFIG_CFLAGS) \
$(LIBOTF_CFLAGS) $(M17N_FLT_CFLAGS) $(DEPFLAGS) \
- $(LIBSYSTEMD_CFLAGS) \
+ $(LIBSYSTEMD_CFLAGS) $(JSON_CFLAGS) \
$(LIBGNUTLS_CFLAGS) $(NOTIFY_CFLAGS) $(CAIRO_CFLAGS) \
$(WERROR_CFLAGS)
ALL_CFLAGS = $(EMACS_CFLAGS) $(WARN_CFLAGS) $(CFLAGS)
@@ -397,7 +401,7 @@ base_obj =
thread.o systhread.o \
$(if $(HYBRID_MALLOC),sheap.o) \
$(MSDOS_OBJ) $(MSDOS_X_OBJ) $(NS_OBJ) $(CYGWIN_OBJ) $(FONT_OBJ) \
- $(W32_OBJ) $(WINDOW_SYSTEM_OBJ) $(XGSELOBJ)
+ $(W32_OBJ) $(WINDOW_SYSTEM_OBJ) $(XGSELOBJ) $(JSON_OBJ)
obj = $(base_obj) $(NS_OBJC_OBJ)
## Object files used on some machine or other.
@@ -493,7 +497,8 @@ LIBES =
$(LIBS_TERMCAP) $(GETLOADAVG_LIBS) $(SETTINGS_LIBS) $(LIBSELINUX_LIBS) \
$(FREETYPE_LIBS) $(FONTCONFIG_LIBS) $(LIBOTF_LIBS) $(M17N_FLT_LIBS) \
$(LIBGNUTLS_LIBS) $(LIB_PTHREAD) $(GETADDRINFO_A_LIBS) $(LIBLCMS2) \
- $(NOTIFY_LIBS) $(LIB_MATH) $(LIBZ) $(LIBMODULES) $(LIBSYSTEMD_LIBS)
+ $(NOTIFY_LIBS) $(LIB_MATH) $(LIBZ) $(LIBMODULES) $(LIBSYSTEMD_LIBS) \
+ $(JSON_LIBS)
## FORCE it so that admin/unidata can decide whether these files
## are up-to-date. Although since charprop depends on bootstrap-emacs,
diff --git a/src/conf_post.h b/src/conf_post.h
index febdb8b8bf..1a7f51fa51 100644
--- a/src/conf_post.h
+++ b/src/conf_post.h
@@ -338,6 +338,12 @@ extern int emacs_setenv_TZ (char const *);
# define ATTRIBUTE_NO_SANITIZE_ADDRESS
#endif
+#if __has_attribute (warn_unused_result)
+# define ATTRIBUTE_WARN_UNUSED_RESULT __attribute__ ((__warn_unused_result__))
+#else
+# define ATTRIBUTE_WARN_UNUSED_RESULT
+#endif
+
/* gcc -fsanitize=address does not work with vfork in Fedora 25 x86-64.
For now, assume that this problem occurs on all platforms. */
#if ADDRESS_SANITIZER && !defined vfork
diff --git a/src/emacs.c b/src/emacs.c
index 1ad8af70a7..eb5f1128f6 100644
--- a/src/emacs.c
+++ b/src/emacs.c
@@ -1610,6 +1610,10 @@ Using an Emacs configured with --with-x-toolkit=lucid does not have this problem
syms_of_threads ();
syms_of_profiler ();
+#ifdef HAVE_JSON
+ syms_of_json ();
+#endif
+
keys_of_casefiddle ();
keys_of_cmds ();
keys_of_buffer ();
diff --git a/src/json.c b/src/json.c
new file mode 100644
index 0000000000..7362b45165
--- /dev/null
+++ b/src/json.c
@@ -0,0 +1,461 @@
+/* JSON parsing and serialization.
+
+Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GNU Emacs.
+
+GNU Emacs is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 3 of the License, or (at
+your option) any later version.
+
+GNU Emacs is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. */
+
+#include <config.h>
+
+#include <stddef.h>
+#include <stdint.h>
+
+#include <jansson.h>
+
+#include "lisp.h"
+#include "buffer.h"
+
+static _Noreturn void
+json_out_of_memory (void)
+{
+ xsignal0 (Qjson_out_of_memory);
+}
+
+static _Noreturn void
+json_parse_error (const json_error_t *error)
+{
+ xsignal (Qjson_parse_error,
+ list5 (build_string (error->text), build_string (error->source),
+ make_natnum (error->line), make_natnum (error->column),
+ make_natnum (error->position)));
+}
+
+static void
+json_release_object (void *object)
+{
+ json_decref (object);
+}
+
+static void
+check_string_without_embedded_nulls (Lisp_Object object)
+{
+ CHECK_STRING (object);
+ CHECK_TYPE (memchr (SDATA (object), '\0', SBYTES (object)) == NULL,
+ Qstring_without_embedded_nulls_p, object);
+}
+
+static ATTRIBUTE_WARN_UNUSED_RESULT json_t *
+json_check (json_t *object)
+{
+ if (object == NULL)
+ json_out_of_memory ();
+ return object;
+}
+
+static json_t *lisp_to_json (Lisp_Object) ATTRIBUTE_WARN_UNUSED_RESULT;
+
+/* This returns Lisp_Object so we can use unbind_to. The return value
+ is always nil. */
+
+static _GL_ARG_NONNULL ((2)) Lisp_Object
+lisp_to_json_1 (Lisp_Object lisp, json_t **json)
+{
+ if (VECTORP (lisp))
+ {
+ ptrdiff_t size = ASIZE (lisp);
+ eassert (size >= 0);
+ if (size > SIZE_MAX)
+ xsignal1 (Qoverflow_error, build_pure_c_string ("vector is too long"));
+ *json = json_check (json_array ());
+ ptrdiff_t count = SPECPDL_INDEX ();
+ record_unwind_protect_ptr (json_release_object, json);
+ for (ptrdiff_t i = 0; i < size; ++i)
+ {
+ int status
+ = json_array_append_new (*json, lisp_to_json (AREF (lisp, i)));
+ if (status == -1)
+ json_out_of_memory ();
+ eassert (status == 0);
+ }
+ eassert (json_array_size (*json) == size);
+ clear_unwind_protect (count);
+ return unbind_to (count, Qnil);
+ }
+ else if (HASH_TABLE_P (lisp))
+ {
+ struct Lisp_Hash_Table *h = XHASH_TABLE (lisp);
+ *json = json_check (json_object ());
+ ptrdiff_t count = SPECPDL_INDEX ();
+ record_unwind_protect_ptr (json_release_object, *json);
+ for (ptrdiff_t i = 0; i < HASH_TABLE_SIZE (h); ++i)
+ if (!NILP (HASH_HASH (h, i)))
+ {
+ Lisp_Object key = HASH_KEY (h, i);
+ /* We can’t specify the length, so the string must be
+ null-terminated. */
+ check_string_without_embedded_nulls (key);
+ int status = json_object_set_new (*json, SSDATA (key),
+ lisp_to_json (HASH_VALUE (h, i)));
+ if (status == -1)
+ json_out_of_memory ();
+ eassert (status == 0);
+ }
+ clear_unwind_protect (count);
+ return unbind_to (count, Qnil);
+ }
+ wrong_type_argument (Qjson_value_p, lisp);
+}
+
+static ATTRIBUTE_WARN_UNUSED_RESULT json_t *
+lisp_to_json (Lisp_Object lisp)
+{
+ if (EQ (lisp, QCnull))
+ return json_check (json_null ());
+ else if (EQ (lisp, QCfalse))
+ return json_check (json_false ());
+ else if (EQ (lisp, Qt))
+ return json_check (json_true ());
+ else if (INTEGERP (lisp))
+ {
+ CHECK_TYPE_RANGED_INTEGER (json_int_t, lisp);
+ return json_check (json_integer (XINT (lisp)));
+ }
+ else if (FLOATP (lisp))
+ return json_check (json_real (XFLOAT_DATA (lisp)));
+ else if (STRINGP (lisp))
+ {
+ ptrdiff_t size = SBYTES (lisp);
+ eassert (size >= 0);
+ if (size > SIZE_MAX)
+ xsignal1 (Qoverflow_error, build_pure_c_string ("string is too long"));
+ return json_check (json_stringn (SSDATA (lisp), size));
+ }
+
+ /* LISP now must be a vector or hashtable. */
+ if (++lisp_eval_depth > max_lisp_eval_depth)
+ xsignal0 (Qjson_object_too_deep);
+ json_t *json;
+ lisp_to_json_1 (lisp, &json);
+ --lisp_eval_depth;
+ return json;
+}
+
+DEFUN ("json-serialize", Fjson_serialize, Sjson_serialize, 1, 1, NULL,
+ doc: /* Return the JSON representation of OBJECT as a string.
+OBJECT must be a vector or hashtable, and its elements can recursively
+contain `:null', `:false', t, numbers, strings, or other vectors and
+hashtables. `:null', `:false', and t will be converted to JSON null,
+false, and true values, respectively. Vectors will be converted to
+JSON arrays, and hashtables to JSON objects. Hashtable keys must be
+strings without embedded null characters and must be unique within
+each object. */)
+ (Lisp_Object object)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+
+ json_t *json = lisp_to_json (object);
+ record_unwind_protect_ptr (json_release_object, json);
+
+ char *string = json_dumps (json, JSON_COMPACT);
+ if (string == NULL)
+ json_out_of_memory ();
+ record_unwind_protect_ptr (free, string);
+
+ return unbind_to (count, build_string (string));
+}
+
+struct json_buffer_and_size
+{
+ const char *buffer;
+ size_t size;
+};
+
+static Lisp_Object
+json_insert (Lisp_Object data)
+{
+ const struct json_buffer_and_size *buffer_and_size = XSAVE_POINTER (data, 0);
+ if (FIXNUM_OVERFLOW_P (buffer_and_size->size))
+ xsignal1 (Qoverflow_error, build_pure_c_string ("buffer too large"));
+ Lisp_Object string
+ = make_string (buffer_and_size->buffer, buffer_and_size->size);
+ insert_from_string (string, 0, 0, SCHARS (string), SBYTES (string), false);
+ return Qnil;
+}
+
+struct json_insert_data
+{
+ /* nil if json_insert succeeded, otherwise a cons
+ (ERROR-SYMBOL . ERROR-DATA). */
+ Lisp_Object error;
+};
+
+static int
+json_insert_callback (const char *buffer, size_t size, void *data)
+{
+ /* This function may not exit nonlocally. */
+ struct json_insert_data *d = data;
+ struct json_buffer_and_size buffer_and_size
+ = {.buffer = buffer, .size = size};
+ d->error
+ = internal_condition_case_1 (json_insert, make_save_ptr (&buffer_and_size),
+ Qt, Fidentity);
+ return 0;
+}
+
+DEFUN ("json-insert", Fjson_insert, Sjson_insert, 1, 1, NULL,
+ doc: /* Insert the JSON representation of OBJECT before point.
+This is the same as (insert (json-serialize OBJECT)), but potentially
+faster. See the function `json-serialize' for allowed values of
+OBJECT. */)
+ (Lisp_Object object)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+
+ json_t *json = lisp_to_json (object);
+ record_unwind_protect_ptr (json_release_object, json);
+
+ struct json_insert_data data;
+ int status
+ = json_dump_callback (json, json_insert_callback, &data, JSON_COMPACT);
+ if (status == -1)
+ json_out_of_memory ();
+ eassert (status == 0);
+
+ if (!NILP (data.error))
+ xsignal (XCAR (data.error), XCDR (data.error));
+
+ return unbind_to (count, Qnil);
+}
+
+static _GL_ARG_NONNULL ((1)) Lisp_Object
+json_to_lisp (json_t *json)
+{
+ switch (json_typeof (json))
+ {
+ case JSON_NULL:
+ return QCnull;
+ case JSON_FALSE:
+ return QCfalse;
+ case JSON_TRUE:
+ return Qt;
+ case JSON_INTEGER:
+ {
+ json_int_t value = json_integer_value (json);
+ if (FIXNUM_OVERFLOW_P (value))
+ xsignal1 (Qoverflow_error,
+ build_pure_c_string ("JSON integer is too large"));
+ return make_number (value);
+ }
+ case JSON_REAL:
+ return make_float (json_real_value (json));
+ case JSON_STRING:
+ {
+ size_t size = json_string_length (json);
+ if (FIXNUM_OVERFLOW_P (size))
+ xsignal1 (Qoverflow_error,
+ build_pure_c_string ("JSON string is too long"));
+ return make_string (json_string_value (json), size);
+ }
+ case JSON_ARRAY:
+ {
+ if (++lisp_eval_depth > max_lisp_eval_depth)
+ xsignal0 (Qjson_object_too_deep);
+ size_t size = json_array_size (json);
+ if (FIXNUM_OVERFLOW_P (size))
+ xsignal1 (Qoverflow_error,
+ build_pure_c_string ("JSON array is too long"));
+ Lisp_Object result = Fmake_vector (make_natnum (size), Qunbound);
+ for (ptrdiff_t i = 0; i < size; ++i)
+ ASET (result, i,
+ json_to_lisp (json_array_get (json, i)));
+ --lisp_eval_depth;
+ return result;
+ }
+ case JSON_OBJECT:
+ {
+ if (++lisp_eval_depth > max_lisp_eval_depth)
+ xsignal0 (Qjson_object_too_deep);
+ size_t size = json_object_size (json);
+ if (FIXNUM_OVERFLOW_P (size))
+ xsignal1 (Qoverflow_error,
+ build_pure_c_string ("JSON object has too many elements"));
+ Lisp_Object result = CALLN (Fmake_hash_table, QCtest, Qequal,
+ QCsize, make_natnum (size));
+ struct Lisp_Hash_Table *h = XHASH_TABLE (result);
+ const char *key_str;
+ json_t *value;
+ json_object_foreach (json, key_str, value)
+ {
+ Lisp_Object key = build_string (key_str);
+ EMACS_UINT hash;
+ ptrdiff_t i = hash_lookup (h, key, &hash);
+ eassert (i < 0);
+ hash_put (h, key, json_to_lisp (value), hash);
+ }
+ --lisp_eval_depth;
+ return result;
+ }
+ }
+ /* Can’t get here. */
+ emacs_abort ();
+}
+
+DEFUN ("json-parse-string", Fjson_parse_string, Sjson_parse_string, 1, 1, NULL,
+ doc: /* Parse the JSON STRING into a Lisp object.
+This is essentially the reverse operation of `json-serialize', which
+see. The returned object will be a vector or hashtable. Its elements
+will be `:null', `:false', t, numbers, strings, or further vectors and
+hashtables. If there are duplicate keys in an object, all but the
+last one are ignored. If STRING doesn't contain a valid JSON object,
+an error of type `json-parse-error' is signaled. */)
+ (Lisp_Object string)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+ check_string_without_embedded_nulls (string);
+
+ json_error_t error;
+ json_t *object = json_loads (SSDATA (string), 0, &error);
+ if (object == NULL)
+ json_parse_error (&error);
+
+ /* Avoid leaking the object in case of further errors. */
+ if (object != NULL)
+ record_unwind_protect_ptr (json_release_object, object);
+
+ return unbind_to (count, json_to_lisp (object));
+}
+
+struct json_read_buffer_data
+{
+ ptrdiff_t point;
+};
+
+static size_t
+json_read_buffer_callback (void *buffer, size_t buflen, void *data)
+{
+ struct json_read_buffer_data *d = data;
+
+ /* First, parse from point to the gap or the end of the accessible
+ portion, whatever is closer. */
+ ptrdiff_t point = d->point;
+ ptrdiff_t end;
+ {
+ bool overflow = INT_ADD_WRAPV (BUFFER_CEILING_OF (point), 1, &end);
+ eassert (!overflow);
+ }
+ size_t count;
+ {
+ bool overflow = INT_SUBTRACT_WRAPV (end, point, &count);
+ eassert (!overflow);
+ }
+ if (buflen < count)
+ count = buflen;
+ memcpy (buffer, BYTE_POS_ADDR (point), count);
+ {
+ bool overflow = INT_ADD_WRAPV (d->point, count, &d->point);
+ eassert (!overflow);
+ }
+ return count;
+}
+
+DEFUN ("json-parse-buffer", Fjson_parse_buffer, Sjson_parse_buffer,
+ 0, 0, NULL,
+ doc: /* Read JSON object from current buffer starting at point.
+This is similar to `json-parse-string', which see. Move point after
+the end of the object if parsing was successful. On error, point is
+not moved. */)
+ (void)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+
+ ptrdiff_t point = PT_BYTE;
+ struct json_read_buffer_data data = {.point = point};
+ json_error_t error;
+ json_t *object = json_load_callback (json_read_buffer_callback, &data,
+ JSON_DISABLE_EOF_CHECK, &error);
+
+ if (object == NULL)
+ json_parse_error (&error);
+
+ /* Avoid leaking the object in case of further errors. */
+ record_unwind_protect_ptr (json_release_object, object);
+
+ /* Convert and then move point only if everything succeeded. */
+ Lisp_Object lisp = json_to_lisp (object);
+
+ {
+ /* Adjust point by how much we just read. Do this here because
+ tokener->char_offset becomes incorrect below. */
+ bool overflow = INT_ADD_WRAPV (point, error.position, &point);
+ eassert (!overflow);
+ eassert (point <= ZV_BYTE);
+ SET_PT_BOTH (BYTE_TO_CHAR (point), point);
+ }
+
+ return unbind_to (count, lisp);
+}
+
+/* Simplified version of ‘define-error’ that works with pure
+ objects. */
+
+static void
+define_error (Lisp_Object name, const char *message, Lisp_Object parent)
+{
+ eassert (SYMBOLP (name));
+ eassert (SYMBOLP (parent));
+ Lisp_Object parent_conditions = Fget (parent, Qerror_conditions);
+ eassert (CONSP (parent_conditions));
+ eassert (!NILP (Fmemq (parent, parent_conditions)));
+ eassert (NILP (Fmemq (name, parent_conditions)));
+ Fput (name, Qerror_conditions, pure_cons (name, parent_conditions));
+ Fput (name, Qerror_message, build_pure_c_string (message));
+}
+
+void
+syms_of_json (void)
+{
+ DEFSYM (QCnull, ":null");
+ DEFSYM (QCfalse, ":false");
+
+ DEFSYM (Qstring_without_embedded_nulls_p, "string-without-embedded-nulls-p");
+ DEFSYM (Qjson_value_p, "json-value-p");
+
+ DEFSYM (Qjson_error, "json-error");
+ DEFSYM (Qjson_out_of_memory, "json-out-of-memory");
+ DEFSYM (Qjson_parse_error, "json-parse-error");
+ DEFSYM (Qjson_object_too_deep, "json-object-too-deep");
+ define_error (Qjson_error, "generic JSON error", Qerror);
+ define_error (Qjson_out_of_memory, "no free memory for creating JSON object",
+ Qjson_error);
+ define_error (Qjson_parse_error, "could not parse JSON stream",
+ Qjson_error);
+ define_error (Qjson_object_too_deep, "object cyclic or too deep",
+ Qjson_error);
+
+ DEFSYM (Qpure, "pure");
+ DEFSYM (Qside_effect_free, "side-effect-free");
+
+ DEFSYM (Qjson_serialize, "json-serialize");
+ DEFSYM (Qjson_parse_string, "json-parse-string");
+ Fput (Qjson_serialize, Qpure, Qt);
+ Fput (Qjson_serialize, Qside_effect_free, Qt);
+ Fput (Qjson_parse_string, Qpure, Qt);
+ Fput (Qjson_parse_string, Qside_effect_free, Qt);
+
+ defsubr (&Sjson_serialize);
+ defsubr (&Sjson_insert);
+ defsubr (&Sjson_parse_string);
+ defsubr (&Sjson_parse_buffer);
+}
diff --git a/src/lisp.h b/src/lisp.h
index c503082442..8d485098ac 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -3440,6 +3440,11 @@ extern int x_bitmap_mask (struct frame *, ptrdiff_t);
extern void reset_image_types (void);
extern void syms_of_image (void);
+#ifdef HAVE_JSON
+/* Defined in json.c. */
+extern void syms_of_json (void);
+#endif
+
/* Defined in insdel.c. */
extern void move_gap_both (ptrdiff_t, ptrdiff_t);
extern _Noreturn void buffer_overflow (void);
diff --git a/test/src/json-tests.el b/test/src/json-tests.el
new file mode 100644
index 0000000000..42c9f0e9e1
--- /dev/null
+++ b/test/src/json-tests.el
@@ -0,0 +1,61 @@
+;;; json-tests.el --- unit tests for json.c -*- lexical-binding: t; -*-
+
+;; Copyright (C) 2017 Free Software Foundation, Inc.
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
+
+;;; Commentary:
+
+;; Unit tests for src/json.c.
+
+;;; Code:
+
+(require 'cl-lib)
+(require 'map)
+
+(ert-deftest json-serialize/roundtrip ()
+ (let ((lisp [:null :false t 0 123 -456 3.75 "foo"])
+ (json "[null,false,true,0,123,-456,3.75,\"foo\"]"))
+ (should (equal (json-serialize lisp) json))
+ (with-temp-buffer
+ (json-insert lisp)
+ (should (equal (buffer-string) json))
+ (should (eobp)))
+ (should (equal (json-parse-string json) lisp))
+ (with-temp-buffer
+ (insert json)
+ (goto-char 1)
+ (should (equal (json-parse-buffer) lisp))
+ (should (eobp)))))
+
+(ert-deftest json-serialize/object ()
+ (let ((table (make-hash-table :test #'equal)))
+ (puthash "abc" [1 2 t] table)
+ (puthash "def" :null table)
+ (should (equal (json-serialize table)
+ "{\"abc\":[1,2,true],\"def\":null}"))))
+
+(ert-deftest json-parse-string/object ()
+ (let ((actual
+ (json-parse-string
+ "{ \"abc\" : [1, 2, true], \"def\" : null, \"abc\" : [9, false] }\n")))
+ (should (hash-table-p actual))
+ (should (equal (hash-table-count actual) 2))
+ (should (equal (cl-sort (map-pairs actual) #'string< :key #'car)
+ '(("abc" . [9 :false]) ("def" . :null))))))
+
+(provide 'json-tests)
+;;; json-tests.el ends here
--
2.14.1
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-19 8:18 ` Philipp Stephani
@ 2017-09-19 19:09 ` Eli Zaretskii
2017-09-28 21:19 ` Philipp Stephani
0 siblings, 1 reply; 81+ messages in thread
From: Eli Zaretskii @ 2017-09-19 19:09 UTC (permalink / raw)
To: Philipp Stephani; +Cc: emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Tue, 19 Sep 2017 08:18:14 +0000
>
> Here's a newer version of the patch. The only significant difference is that now the Lisp values for JSON null
> and false are :null and :false, respectively. Using a dedicated symbol for :null reduces the mental overhead of
> the triple meaning of nil (null, false, empty list), and is more future-proof, should we ever want to support lists.
Thanks, a few comments below.
> +static _Noreturn void
> +json_parse_error (const json_error_t *error)
> +{
> + xsignal (Qjson_parse_error,
> + list5 (build_string (error->text), build_string (error->source),
> + make_natnum (error->line), make_natnum (error->column),
> + make_natnum (error->position)));
> +}
I think error->source could include non-ASCII characters, in which
case you need to use make_specified_string with its last argument
non-zero, not build_string, which has its own ideas about when to
produce a multibyte string.
> +static _GL_ARG_NONNULL ((2)) Lisp_Object
> +lisp_to_json_1 (Lisp_Object lisp, json_t **json)
> +{
> + if (VECTORP (lisp))
> + {
> + ptrdiff_t size = ASIZE (lisp);
> + eassert (size >= 0);
> + if (size > SIZE_MAX)
> + xsignal1 (Qoverflow_error, build_pure_c_string ("vector is too long"));
I don't think you can allocate pure storage at run time, only at dump
time. (There are more of this elsewhere in the patch.)
> + /* LISP now must be a vector or hashtable. */
> + if (++lisp_eval_depth > max_lisp_eval_depth)
> + xsignal0 (Qjson_object_too_deep);
This error could mislead: the problem could be in the nesting of
surrounding Lisp being too deep, and the JSON part could be just fine.
> + Lisp_Object string
> + = make_string (buffer_and_size->buffer, buffer_and_size->size);
This is arbitrary text, so I'm not sure make_string is appropriate.
Could the text be a byte stream, i.e. not human-readable text? If so,
do we want to create a unibyte string or a multibyte string here?
> + insert_from_string (string, 0, 0, SCHARS (string), SBYTES (string), false);
Hmmm... if you want to insert the text into the buffer, you need to
make sure it has the right representation. What kind of text is this?
It probably should be decoded.
In any case, going through a string sounds gross. You should insert
the text directly into the gap, like we do in a couple of places
already. See insert_from_gap and its users, and maybe also
decode_coding_gap.
> +DEFUN ("json-parse-string", Fjson_parse_string, Sjson_parse_string, 1, 1, NULL,
> + doc: /* Parse the JSON STRING into a Lisp object.
> +This is essentially the reverse operation of `json-serialize', which
> +see. The returned object will be a vector or hashtable. Its elements
> +will be `:null', `:false', t, numbers, strings, or further vectors and
> +hashtables. If there are duplicate keys in an object, all but the
> +last one are ignored. If STRING doesn't contain a valid JSON object,
> +an error of type `json-parse-error' is signaled. */)
> + (Lisp_Object string)
> +{
> + ptrdiff_t count = SPECPDL_INDEX ();
> + check_string_without_embedded_nulls (string);
> +
> + json_error_t error;
> + json_t *object = json_loads (SSDATA (string), 0, &error);
Doesn't json_loads require the string to be encoded in some particular
encoding? If so, passing it our internal representation might not be
TRT.
> + /* First, parse from point to the gap or the end of the accessible
> + portion, whatever is closer. */
> + ptrdiff_t point = d->point;
> + ptrdiff_t end;
> + {
> + bool overflow = INT_ADD_WRAPV (BUFFER_CEILING_OF (point), 1, &end);
> + eassert (!overflow);
> + }
> + size_t count;
> + {
> + bool overflow = INT_SUBTRACT_WRAPV (end, point, &count);
> + eassert (!overflow);
> + }
Why did you need these blocks in braces?
> +(provide 'json-tests)
> +;;; json-tests.el ends here
IMO, it would be good to test also non-ASCII text in JSON objects.
Finally, this needs documentation: NEWS and the ELisp manual.
Thanks again for working on this.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-18 18:08 ` Eli Zaretskii
@ 2017-09-19 19:32 ` Richard Stallman
0 siblings, 0 replies; 81+ messages in thread
From: Richard Stallman @ 2017-09-19 19:32 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: mvoteiza, p.stephani2, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> I thought Richard just said we didn't need an assignment in such
> cases
That's right.
This is explained in the section External Libraries
in the GNU maintainer's manual.
Whether linking is static or dynamic has no effect on
this issue.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-19 19:09 ` Eli Zaretskii
@ 2017-09-28 21:19 ` Philipp Stephani
2017-09-28 21:27 ` Stefan Monnier
2017-09-29 19:55 ` Eli Zaretskii
0 siblings, 2 replies; 81+ messages in thread
From: Philipp Stephani @ 2017-09-28 21:19 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 5350 bytes --]
Eli Zaretskii <eliz@gnu.org> schrieb am Di., 19. Sep. 2017 um 21:10 Uhr:
> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Tue, 19 Sep 2017 08:18:14 +0000
> >
> > Here's a newer version of the patch. The only significant difference is
> that now the Lisp values for JSON null
> > and false are :null and :false, respectively. Using a dedicated symbol
> for :null reduces the mental overhead of
> > the triple meaning of nil (null, false, empty list), and is more
> future-proof, should we ever want to support lists.
>
> Thanks, a few comments below.
>
Thanks for the review. Most of the comments are about converting between C
and Lisp strings, so let me summarize my questions here.
IIUC Jansson only accepts UTF-8 strings (i.e. it will generate an error
some input is not an UTF-8 string), and will only return UTF-8 strings as
well. Therefore I think that direct conversion between Lisp strings and C
strings (using SDATA etc.) is always correct because the internal Emacs
encoding is a superset of UTF-8. Also build_string should always be correct
because it will generate a correct multibyte string for an UTF-8 string
with non-ASCII characters, and a correct unibyte string for an ASCII
string, right?
>
> > +static _Noreturn void
> > +json_parse_error (const json_error_t *error)
> > +{
> > + xsignal (Qjson_parse_error,
> > + list5 (build_string (error->text), build_string
> (error->source),
> > + make_natnum (error->line), make_natnum
> (error->column),
> > + make_natnum (error->position)));
> > +}
>
> I think error->source could include non-ASCII characters, in which
> case you need to use make_specified_string with its last argument
> non-zero, not build_string, which has its own ideas about when to
> produce a multibyte string.
>
> > +static _GL_ARG_NONNULL ((2)) Lisp_Object
> > +lisp_to_json_1 (Lisp_Object lisp, json_t **json)
> > +{
> > + if (VECTORP (lisp))
> > + {
> > + ptrdiff_t size = ASIZE (lisp);
> > + eassert (size >= 0);
> > + if (size > SIZE_MAX)
> > + xsignal1 (Qoverflow_error, build_pure_c_string ("vector is too
> long"));
>
> I don't think you can allocate pure storage at run time, only at dump
> time. (There are more of this elsewhere in the patch.)
>
OK, will be fixed in the next version.
>
> > + /* LISP now must be a vector or hashtable. */
> > + if (++lisp_eval_depth > max_lisp_eval_depth)
> > + xsignal0 (Qjson_object_too_deep);
>
> This error could mislead: the problem could be in the nesting of
> surrounding Lisp being too deep, and the JSON part could be just fine.
>
Agreed, but I think it's better to use lisp_eval_depth here because it's
the total nesting depth that could cause stack overflows.
>
> > + Lisp_Object string
> > + = make_string (buffer_and_size->buffer, buffer_and_size->size);
>
> This is arbitrary text, so I'm not sure make_string is appropriate.
> Could the text be a byte stream, i.e. not human-readable text? If so,
> do we want to create a unibyte string or a multibyte string here?
>
It should always be UTF-8.
>
> > + insert_from_string (string, 0, 0, SCHARS (string), SBYTES (string),
> false);
>
> Hmmm... if you want to insert the text into the buffer, you need to
> make sure it has the right representation. What kind of text is this?
> It probably should be decoded.
>
> In any case, going through a string sounds gross. You should insert
> the text directly into the gap, like we do in a couple of places
> already. See insert_from_gap and its users, and maybe also
> decode_coding_gap.
>
OK, I'll have to check that, but it sounds doable.
>
> > +DEFUN ("json-parse-string", Fjson_parse_string, Sjson_parse_string, 1,
> 1, NULL,
> > + doc: /* Parse the JSON STRING into a Lisp object.
> > +This is essentially the reverse operation of `json-serialize', which
> > +see. The returned object will be a vector or hashtable. Its elements
> > +will be `:null', `:false', t, numbers, strings, or further vectors and
> > +hashtables. If there are duplicate keys in an object, all but the
> > +last one are ignored. If STRING doesn't contain a valid JSON object,
> > +an error of type `json-parse-error' is signaled. */)
> > + (Lisp_Object string)
> > +{
> > + ptrdiff_t count = SPECPDL_INDEX ();
> > + check_string_without_embedded_nulls (string);
> > +
> > + json_error_t error;
> > + json_t *object = json_loads (SSDATA (string), 0, &error);
>
> Doesn't json_loads require the string to be encoded in some particular
> encoding? If so, passing it our internal representation might not be
> TRT.
>
> > + /* First, parse from point to the gap or the end of the accessible
> > + portion, whatever is closer. */
> > + ptrdiff_t point = d->point;
> > + ptrdiff_t end;
> > + {
> > + bool overflow = INT_ADD_WRAPV (BUFFER_CEILING_OF (point), 1, &end);
> > + eassert (!overflow);
> > + }
> > + size_t count;
> > + {
> > + bool overflow = INT_SUBTRACT_WRAPV (end, point, &count);
> > + eassert (!overflow);
> > + }
>
> Why did you need these blocks in braces?
>
To be able to reuse the "overflow" name/
>
> > +(provide 'json-tests)
> > +;;; json-tests.el ends here
>
> IMO, it would be good to test also non-ASCII text in JSON objects.
>
>
Yes, once the patch is in acceptable shape, I plan to add many more tests.
[-- Attachment #2: Type: text/html, Size: 7284 bytes --]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-28 21:19 ` Philipp Stephani
@ 2017-09-28 21:27 ` Stefan Monnier
2017-09-29 19:55 ` Eli Zaretskii
1 sibling, 0 replies; 81+ messages in thread
From: Stefan Monnier @ 2017-09-28 21:27 UTC (permalink / raw)
To: emacs-devel
> encoding is a superset of UTF-8. Also build_string should always be correct
> because it will generate a correct multibyte string for an UTF-8 string
> with non-ASCII characters, and a correct unibyte string for an ASCII
> string, right?
FWIW, I think that returning a unibyte string just because it doesn't
contain non-ASCII chars would be wrong. We should only build unibyte
strings if they represent a sequence of *bytes*, not when they represent
a sequence of *chars* which happens to only be ASCII.
Stefan
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-28 21:19 ` Philipp Stephani
2017-09-28 21:27 ` Stefan Monnier
@ 2017-09-29 19:55 ` Eli Zaretskii
2017-09-30 22:02 ` Philipp Stephani
1 sibling, 1 reply; 81+ messages in thread
From: Eli Zaretskii @ 2017-09-29 19:55 UTC (permalink / raw)
To: Philipp Stephani; +Cc: emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Thu, 28 Sep 2017 21:19:00 +0000
> Cc: emacs-devel@gnu.org
>
> IIUC Jansson only accepts UTF-8 strings (i.e. it will generate an error some input is not an UTF-8 string), and
> will only return UTF-8 strings as well. Therefore I think that direct conversion between Lisp strings and C
> strings (using SDATA etc.) is always correct because the internal Emacs encoding is a superset of UTF-8.
> Also build_string should always be correct because it will generate a correct multibyte string for an UTF-8
> string with non-ASCII characters, and a correct unibyte string for an ASCII string, right?
I don't think it's a good idea to write code which has such
assumptions embedded in it. We don't do that in other cases, although
UTF-8 based systems are widespread nowadays. Instead, we make sure
that encoding and decoding UTF-8 byte stream is implemented
efficiently, and when possible simply reuses the same string data.
Besides, these assumptions are not always true, for example:
. The Emacs internal representation could include raw bytes, whose
representations (both of them) is not valid UTF-8;
. Strings we receive from the library could be invalid UTF-8, in
which case putting them into a buffer or string without decoding
will mean trouble for programs that will try to process them;
So I think decoding and encoding any string passed to/from Jansson is
better for stability and future maintenance. If you worry about
performance, you shouldn't: we convert UTF-8 into our internal
representation as efficiently as possible.
> > + /* LISP now must be a vector or hashtable. */
> > + if (++lisp_eval_depth > max_lisp_eval_depth)
> > + xsignal0 (Qjson_object_too_deep);
>
> This error could mislead: the problem could be in the nesting of
> surrounding Lisp being too deep, and the JSON part could be just fine.
>
> Agreed, but I think it's better to use lisp_eval_depth here because it's the total nesting depth that could cause
> stack overflows.
Well, at least the error message should not point exclusively to a
JSON problem, it should mention the possibility of a Lisp eval depth
overflow as well.
> > + Lisp_Object string
> > + = make_string (buffer_and_size->buffer, buffer_and_size->size);
>
> This is arbitrary text, so I'm not sure make_string is appropriate.
> Could the text be a byte stream, i.e. not human-readable text? If so,
> do we want to create a unibyte string or a multibyte string here?
>
> It should always be UTF-8.
How does JSON express byte streams, then? Doesn't it support data (as
opposed to text)?
> > + {
> > + bool overflow = INT_ADD_WRAPV (BUFFER_CEILING_OF (point), 1, &end);
> > + eassert (!overflow);
> > + }
> > + size_t count;
> > + {
> > + bool overflow = INT_SUBTRACT_WRAPV (end, point, &count);
> > + eassert (!overflow);
> > + }
>
> Why did you need these blocks in braces?
>
> To be able to reuse the "overflow" name/
Why can't you reuse it without the braces?
Thanks.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-29 19:55 ` Eli Zaretskii
@ 2017-09-30 22:02 ` Philipp Stephani
2017-10-01 18:06 ` Eli Zaretskii
2017-10-01 18:38 ` Eli Zaretskii
0 siblings, 2 replies; 81+ messages in thread
From: Philipp Stephani @ 2017-09-30 22:02 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
[-- Attachment #1.1: Type: text/plain, Size: 3411 bytes --]
Eli Zaretskii <eliz@gnu.org> schrieb am Fr., 29. Sep. 2017 um 21:56 Uhr:
> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Thu, 28 Sep 2017 21:19:00 +0000
> > Cc: emacs-devel@gnu.org
> >
> > IIUC Jansson only accepts UTF-8 strings (i.e. it will generate an error
> some input is not an UTF-8 string), and
> > will only return UTF-8 strings as well. Therefore I think that direct
> conversion between Lisp strings and C
> > strings (using SDATA etc.) is always correct because the internal Emacs
> encoding is a superset of UTF-8.
> > Also build_string should always be correct because it will generate a
> correct multibyte string for an UTF-8
> > string with non-ASCII characters, and a correct unibyte string for an
> ASCII string, right?
>
> I don't think it's a good idea to write code which has such
> assumptions embedded in it. We don't do that in other cases, although
> UTF-8 based systems are widespread nowadays. Instead, we make sure
> that encoding and decoding UTF-8 byte stream is implemented
> efficiently, and when possible simply reuses the same string data.
>
> Besides, these assumptions are not always true, for example:
>
> . The Emacs internal representation could include raw bytes, whose
> representations (both of them) is not valid UTF-8;
> . Strings we receive from the library could be invalid UTF-8, in
> which case putting them into a buffer or string without decoding
> will mean trouble for programs that will try to process them;
>
> So I think decoding and encoding any string passed to/from Jansson is
> better for stability and future maintenance. If you worry about
> performance, you shouldn't: we convert UTF-8 into our internal
> representation as efficiently as possible.
>
> > > + /* LISP now must be a vector or hashtable. */
> > > + if (++lisp_eval_depth > max_lisp_eval_depth)
> > > + xsignal0 (Qjson_object_too_deep);
> >
> > This error could mislead: the problem could be in the nesting of
> > surrounding Lisp being too deep, and the JSON part could be just fine.
> >
> > Agreed, but I think it's better to use lisp_eval_depth here because it's
> the total nesting depth that could cause
> > stack overflows.
>
> Well, at least the error message should not point exclusively to a
> JSON problem, it should mention the possibility of a Lisp eval depth
> overflow as well.
>
OK, I've attached a new patch that incorporates most of these changes.
>
> > > + Lisp_Object string
> > > + = make_string (buffer_and_size->buffer, buffer_and_size->size);
> >
> > This is arbitrary text, so I'm not sure make_string is appropriate.
> > Could the text be a byte stream, i.e. not human-readable text? If so,
> > do we want to create a unibyte string or a multibyte string here?
> >
> > It should always be UTF-8.
>
> How does JSON express byte streams, then? Doesn't it support data (as
> opposed to text)?
>
Usually using base64.
>
> > > + {
> > > + bool overflow = INT_ADD_WRAPV (BUFFER_CEILING_OF (point), 1, &end);
> > > + eassert (!overflow);
> > > + }
> > > + size_t count;
> > > + {
> > > + bool overflow = INT_SUBTRACT_WRAPV (end, point, &count);
> > > + eassert (!overflow);
> > > + }
> >
> > Why did you need these blocks in braces?
> >
> > To be able to reuse the "overflow" name/
>
> Why can't you reuse it without the braces?
>
>
Then I'd need to reuse the variable. Not a big deal, just personal style.
[-- Attachment #1.2: Type: text/html, Size: 4567 bytes --]
[-- Attachment #2: 0001-Implement-native-JSON-support-using-Jansson.txt --]
[-- Type: text/plain, Size: 31173 bytes --]
From 4f7c7453b63fa00c3dc8e0eff19295668d9ca459 Mon Sep 17 00:00:00 2001
From: Philipp Stephani <phst@google.com>
Date: Mon, 18 Sep 2017 10:51:39 +0200
Subject: [PATCH] Implement native JSON support using Jansson
* configure.ac: New option --with-json.
* src/json.c (Fjson_serialize, Fjson_insert, Fjson_parse_string)
(Fjson_parse_buffer): New defuns.
(json_has_prefix, json_has_suffix, json_make_string)
(json_build_string, json_encode, json_out_of_memory, json_parse_error)
(json_release_object, check_string_without_embedded_nulls, json_check)
(lisp_to_json, lisp_to_json_toplevel, lisp_to_json_toplevel_1)
(json_insert, json_insert_callback, json_to_lisp)
(json_read_buffer_callback, Fjson_parse_buffer, define_error): New
helper functions.
(syms_of_json): New file.
* src/lisp.h: Declaration for syms_of_json.
* src/conf_post.h (ATTRIBUTE_WARN_UNUSED_RESULT): New attribute macro.
* src/emacs.c (main): Enable JSON functions.
* src/eval.c (internal_catch_all, internal_catch_all_1): New helper
functions to catch all signals.
(syms_of_eval): Add uninterned symbol to signify out of memory.
* src/Makefile.in (JSON_LIBS, JSON_CFLAGS, JSON_OBJ, EMACS_CFLAGS)
(base_obj, LIBES): Compile json.c if --with-json is enabled.
* test/src/json-tests.el (json-serialize/roundtrip)
(json-serialize/object, json-parse-string/object)
(json-parse-string/string, json-serialize/string)
(json-parse-string/incomplete, json-parse-string/trailing)
(json-parse-buffer/incomplete, json-parse-buffer/trailing): New unit
tests.
---
configure.ac | 20 +-
etc/NEWS | 7 +
src/Makefile.in | 11 +-
src/conf_post.h | 6 +
src/emacs.c | 4 +
src/eval.c | 54 ++++++
src/json.c | 517 +++++++++++++++++++++++++++++++++++++++++++++++++
src/lisp.h | 6 +
test/src/json-tests.el | 97 ++++++++++
9 files changed, 718 insertions(+), 4 deletions(-)
create mode 100644 src/json.c
create mode 100644 test/src/json-tests.el
diff --git a/configure.ac b/configure.ac
index 75bda7b164..4297e4d67c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -348,6 +348,7 @@ AC_DEFUN
OPTION_DEFAULT_OFF([cairo],[compile with Cairo drawing (experimental)])
OPTION_DEFAULT_ON([xml2],[don't compile with XML parsing support])
OPTION_DEFAULT_ON([imagemagick],[don't compile with ImageMagick image support])
+OPTION_DEFAULT_ON([json], [don't compile with native JSON support])
OPTION_DEFAULT_ON([xft],[don't use XFT for anti aliased fonts])
OPTION_DEFAULT_ON([libotf],[don't use libotf for OpenType font support])
@@ -2856,6 +2857,22 @@ AC_DEFUN
AC_SUBST(LIBSYSTEMD_LIBS)
AC_SUBST(LIBSYSTEMD_CFLAGS)
+HAVE_JSON=no
+JSON_OBJ=
+
+if test "${with_json}" = yes; then
+ EMACS_CHECK_MODULES([JSON], [jansson >= 2.5],
+ [HAVE_JSON=yes], [HAVE_JSON=no])
+ if test "${HAVE_JSON}" = yes; then
+ AC_DEFINE(HAVE_JSON, 1, [Define if using Jansson.])
+ JSON_OBJ=json.o
+ fi
+fi
+
+AC_SUBST(JSON_LIBS)
+AC_SUBST(JSON_CFLAGS)
+AC_SUBST(JSON_OBJ)
+
NOTIFY_OBJ=
NOTIFY_SUMMARY=no
@@ -5368,7 +5385,7 @@ AC_DEFUN
for opt in XAW3D XPM JPEG TIFF GIF PNG RSVG CAIRO IMAGEMAGICK SOUND GPM DBUS \
GCONF GSETTINGS NOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT \
LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS X_TOOLKIT OLDXMENU X11 NS MODULES \
- XWIDGETS LIBSYSTEMD CANNOT_DUMP LCMS2; do
+ XWIDGETS LIBSYSTEMD JSON CANNOT_DUMP LCMS2; do
case $opt in
CANNOT_DUMP) eval val=\${$opt} ;;
@@ -5418,6 +5435,7 @@ AC_DEFUN
Does Emacs use -lotf? ${HAVE_LIBOTF}
Does Emacs use -lxft? ${HAVE_XFT}
Does Emacs use -lsystemd? ${HAVE_LIBSYSTEMD}
+ Does Emacs use -ljansson? ${HAVE_JSON}
Does Emacs directly use zlib? ${HAVE_ZLIB}
Does Emacs have dynamic modules support? ${HAVE_MODULES}
Does Emacs use toolkit scroll bars? ${USE_TOOLKIT_SCROLL_BARS}
diff --git a/etc/NEWS b/etc/NEWS
index 8fbc354fc0..5a0a164937 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -25,6 +25,13 @@ When you add a new item, use the appropriate mark if you are sure it applies,
\f
* Installation Changes in Emacs 27.1
+** The new configure option '--with-json' adds support for JSON using
+the Jansson library. It is on by default; use 'configure
+--with-json=no' to build without Jansson support. The new JSON
+functions 'json-serialize', 'json-insert', 'json-parse-string', and
+'json-parse-buffer' are typically much faster than their Lisp
+counterparts from json.el.
+
\f
* Startup Changes in Emacs 27.1
diff --git a/src/Makefile.in b/src/Makefile.in
index 9a8c9c85f0..b395627893 100644
--- a/src/Makefile.in
+++ b/src/Makefile.in
@@ -312,6 +312,10 @@ LIBGNUTLS_CFLAGS =
LIBSYSTEMD_LIBS = @LIBSYSTEMD_LIBS@
LIBSYSTEMD_CFLAGS = @LIBSYSTEMD_CFLAGS@
+JSON_LIBS = @JSON_LIBS@
+JSON_CFLAGS = @JSON_CFLAGS@
+JSON_OBJ = @JSON_OBJ@
+
INTERVALS_H = dispextern.h intervals.h composite.h
GETLOADAVG_LIBS = @GETLOADAVG_LIBS@
@@ -363,7 +367,7 @@ EMACS_CFLAGS=
$(WEBKIT_CFLAGS) \
$(SETTINGS_CFLAGS) $(FREETYPE_CFLAGS) $(FONTCONFIG_CFLAGS) \
$(LIBOTF_CFLAGS) $(M17N_FLT_CFLAGS) $(DEPFLAGS) \
- $(LIBSYSTEMD_CFLAGS) \
+ $(LIBSYSTEMD_CFLAGS) $(JSON_CFLAGS) \
$(LIBGNUTLS_CFLAGS) $(NOTIFY_CFLAGS) $(CAIRO_CFLAGS) \
$(WERROR_CFLAGS)
ALL_CFLAGS = $(EMACS_CFLAGS) $(WARN_CFLAGS) $(CFLAGS)
@@ -397,7 +401,7 @@ base_obj =
thread.o systhread.o \
$(if $(HYBRID_MALLOC),sheap.o) \
$(MSDOS_OBJ) $(MSDOS_X_OBJ) $(NS_OBJ) $(CYGWIN_OBJ) $(FONT_OBJ) \
- $(W32_OBJ) $(WINDOW_SYSTEM_OBJ) $(XGSELOBJ)
+ $(W32_OBJ) $(WINDOW_SYSTEM_OBJ) $(XGSELOBJ) $(JSON_OBJ)
obj = $(base_obj) $(NS_OBJC_OBJ)
## Object files used on some machine or other.
@@ -493,7 +497,8 @@ LIBES =
$(LIBS_TERMCAP) $(GETLOADAVG_LIBS) $(SETTINGS_LIBS) $(LIBSELINUX_LIBS) \
$(FREETYPE_LIBS) $(FONTCONFIG_LIBS) $(LIBOTF_LIBS) $(M17N_FLT_LIBS) \
$(LIBGNUTLS_LIBS) $(LIB_PTHREAD) $(GETADDRINFO_A_LIBS) $(LIBLCMS2) \
- $(NOTIFY_LIBS) $(LIB_MATH) $(LIBZ) $(LIBMODULES) $(LIBSYSTEMD_LIBS)
+ $(NOTIFY_LIBS) $(LIB_MATH) $(LIBZ) $(LIBMODULES) $(LIBSYSTEMD_LIBS) \
+ $(JSON_LIBS)
## FORCE it so that admin/unidata can decide whether these files
## are up-to-date. Although since charprop depends on bootstrap-emacs,
diff --git a/src/conf_post.h b/src/conf_post.h
index febdb8b8bf..1a7f51fa51 100644
--- a/src/conf_post.h
+++ b/src/conf_post.h
@@ -338,6 +338,12 @@ extern int emacs_setenv_TZ (char const *);
# define ATTRIBUTE_NO_SANITIZE_ADDRESS
#endif
+#if __has_attribute (warn_unused_result)
+# define ATTRIBUTE_WARN_UNUSED_RESULT __attribute__ ((__warn_unused_result__))
+#else
+# define ATTRIBUTE_WARN_UNUSED_RESULT
+#endif
+
/* gcc -fsanitize=address does not work with vfork in Fedora 25 x86-64.
For now, assume that this problem occurs on all platforms. */
#if ADDRESS_SANITIZER && !defined vfork
diff --git a/src/emacs.c b/src/emacs.c
index 0fe7d9113b..39761016ef 100644
--- a/src/emacs.c
+++ b/src/emacs.c
@@ -1610,6 +1610,10 @@ Using an Emacs configured with --with-x-toolkit=lucid does not have this problem
syms_of_threads ();
syms_of_profiler ();
+#ifdef HAVE_JSON
+ syms_of_json ();
+#endif
+
keys_of_casefiddle ();
keys_of_cmds ();
keys_of_buffer ();
diff --git a/src/eval.c b/src/eval.c
index 39d78364d5..e60154845c 100644
--- a/src/eval.c
+++ b/src/eval.c
@@ -1416,6 +1416,57 @@ internal_condition_case_n (Lisp_Object (*bfun) (ptrdiff_t, Lisp_Object *),
}
}
+static Lisp_Object
+internal_catch_all_1 (Lisp_Object (*function) (void *), void *argument)
+{
+ struct handler *c = push_handler_nosignal (Qt, CATCHER_ALL);
+ if (c == NULL)
+ return Qcatch_all_memory_full;
+
+ if (sys_setjmp (c->jmp) == 0)
+ {
+ Lisp_Object val = function (argument);
+ eassert (handlerlist == c);
+ handlerlist = c->next;
+ return val;
+ }
+ else
+ {
+ eassert (handlerlist == c);
+ Lisp_Object val = c->val;
+ handlerlist = c->next;
+ Fsignal (Qno_catch, val);
+ }
+}
+
+/* Like a combination of internal_condition_case_1 and internal_catch.
+ Catches all signals and throws. Never exits nonlocally; returns
+ Qcatch_all_memory_full if no handler could be allocated. */
+
+Lisp_Object
+internal_catch_all (Lisp_Object (*function) (void *), void *argument,
+ Lisp_Object (*handler) (Lisp_Object))
+{
+ struct handler *c = push_handler_nosignal (Qt, CONDITION_CASE);
+ if (c == NULL)
+ return Qcatch_all_memory_full;
+
+ if (sys_setjmp (c->jmp) == 0)
+ {
+ Lisp_Object val = internal_catch_all_1 (function, argument);
+ eassert (handlerlist == c);
+ handlerlist = c->next;
+ return val;
+ }
+ else
+ {
+ eassert (handlerlist == c);
+ Lisp_Object val = c->val;
+ handlerlist = c->next;
+ return handler (val);
+ }
+}
+
struct handler *
push_handler (Lisp_Object tag_ch_val, enum handlertype handlertype)
{
@@ -4065,6 +4116,9 @@ alist of active lexical bindings. */);
inhibit_lisp_code = Qnil;
+ DEFSYM (Qcatch_all_memory_full, "catch-all-memory-full");
+ Funintern (Qcatch_all_memory_full, Qnil);
+
defsubr (&Sor);
defsubr (&Sand);
defsubr (&Sif);
diff --git a/src/json.c b/src/json.c
new file mode 100644
index 0000000000..79be55bd54
--- /dev/null
+++ b/src/json.c
@@ -0,0 +1,517 @@
+/* JSON parsing and serialization.
+
+Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GNU Emacs.
+
+GNU Emacs is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 3 of the License, or (at
+your option) any later version.
+
+GNU Emacs is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. */
+
+#include <config.h>
+
+#include <stddef.h>
+#include <stdint.h>
+
+#include <jansson.h>
+
+#include "lisp.h"
+#include "buffer.h"
+#include "coding.h"
+
+static bool
+json_has_prefix (const char *string, const char *prefix)
+{
+ size_t string_len = strlen (string);
+ size_t prefix_len = strlen (prefix);
+ return string_len >= prefix_len && memcmp (string, prefix, prefix_len) == 0;
+}
+
+static bool
+json_has_suffix (const char *string, const char *suffix)
+{
+ size_t string_len = strlen (string);
+ size_t suffix_len = strlen (suffix);
+ return string_len >= suffix_len
+ && memcmp (string + string_len - suffix_len, suffix, suffix_len) == 0;
+}
+
+static Lisp_Object
+json_make_string (const char *data, ptrdiff_t size)
+{
+ return make_specified_string (data, -1, size, true);
+}
+
+static Lisp_Object
+json_build_string (const char *data)
+{
+ size_t size = strlen (data);
+ eassert (size < PTRDIFF_MAX);
+ return json_make_string (data, size);
+}
+
+static Lisp_Object
+json_encode (Lisp_Object string)
+{
+ return code_convert_string (string, Qutf_8_unix, Qt, true, true, true);
+}
+
+static _Noreturn void
+json_out_of_memory (void)
+{
+ xsignal0 (Qjson_out_of_memory);
+}
+
+static _Noreturn void
+json_parse_error (const json_error_t *error)
+{
+ Lisp_Object symbol;
+ if (json_has_suffix (error->text, "expected near end of file"))
+ symbol = Qjson_end_of_file;
+ else if (json_has_prefix (error->text, "end of file expected"))
+ symbol = Qjson_trailing_content;
+ else
+ symbol = Qjson_parse_error;
+ xsignal (symbol,
+ list5 (json_build_string (error->text),
+ json_build_string (error->source), make_natnum (error->line),
+ make_natnum (error->column), make_natnum (error->position)));
+}
+
+static void
+json_release_object (void *object)
+{
+ json_decref (object);
+}
+
+static void
+check_string_without_embedded_nulls (Lisp_Object object)
+{
+ CHECK_STRING (object);
+ CHECK_TYPE (memchr (SDATA (object), '\0', SBYTES (object)) == NULL,
+ Qstring_without_embedded_nulls_p, object);
+}
+
+static ATTRIBUTE_WARN_UNUSED_RESULT json_t *
+json_check (json_t *object)
+{
+ if (object == NULL)
+ json_out_of_memory ();
+ return object;
+}
+
+static json_t *lisp_to_json (Lisp_Object) ATTRIBUTE_WARN_UNUSED_RESULT;
+
+/* This returns Lisp_Object so we can use unbind_to. The return value
+ is always nil. */
+
+static _GL_ARG_NONNULL ((2)) Lisp_Object
+lisp_to_json_toplevel_1 (Lisp_Object lisp, json_t **json)
+{
+ if (VECTORP (lisp))
+ {
+ ptrdiff_t size = ASIZE (lisp);
+ eassert (size >= 0);
+ if (size > SIZE_MAX)
+ xsignal1 (Qoverflow_error, build_string ("vector is too long"));
+ *json = json_check (json_array ());
+ ptrdiff_t count = SPECPDL_INDEX ();
+ record_unwind_protect_ptr (json_release_object, json);
+ for (ptrdiff_t i = 0; i < size; ++i)
+ {
+ int status
+ = json_array_append_new (*json, lisp_to_json (AREF (lisp, i)));
+ if (status == -1)
+ json_out_of_memory ();
+ eassert (status == 0);
+ }
+ eassert (json_array_size (*json) == size);
+ clear_unwind_protect (count);
+ return unbind_to (count, Qnil);
+ }
+ else if (HASH_TABLE_P (lisp))
+ {
+ struct Lisp_Hash_Table *h = XHASH_TABLE (lisp);
+ *json = json_check (json_object ());
+ ptrdiff_t count = SPECPDL_INDEX ();
+ record_unwind_protect_ptr (json_release_object, *json);
+ for (ptrdiff_t i = 0; i < HASH_TABLE_SIZE (h); ++i)
+ if (!NILP (HASH_HASH (h, i)))
+ {
+ Lisp_Object key = json_encode (HASH_KEY (h, i));
+ /* We can’t specify the length, so the string must be
+ null-terminated. */
+ check_string_without_embedded_nulls (key);
+ int status = json_object_set_new (*json, SSDATA (key),
+ lisp_to_json (HASH_VALUE (h, i)));
+ if (status == -1)
+ json_out_of_memory ();
+ eassert (status == 0);
+ }
+ clear_unwind_protect (count);
+ return unbind_to (count, Qnil);
+ }
+ wrong_type_argument (Qjson_value_p, lisp);
+}
+
+static ATTRIBUTE_WARN_UNUSED_RESULT json_t *
+lisp_to_json_toplevel (Lisp_Object lisp)
+{
+ if (++lisp_eval_depth > max_lisp_eval_depth)
+ xsignal0 (Qjson_object_too_deep);
+ json_t *json;
+ lisp_to_json_toplevel_1 (lisp, &json);
+ --lisp_eval_depth;
+ return json;
+}
+
+static ATTRIBUTE_WARN_UNUSED_RESULT json_t *
+lisp_to_json (Lisp_Object lisp)
+{
+ if (EQ (lisp, QCnull))
+ return json_check (json_null ());
+ else if (EQ (lisp, QCfalse))
+ return json_check (json_false ());
+ else if (EQ (lisp, Qt))
+ return json_check (json_true ());
+ else if (INTEGERP (lisp))
+ {
+ CHECK_TYPE_RANGED_INTEGER (json_int_t, lisp);
+ return json_check (json_integer (XINT (lisp)));
+ }
+ else if (FLOATP (lisp))
+ return json_check (json_real (XFLOAT_DATA (lisp)));
+ else if (STRINGP (lisp))
+ {
+ Lisp_Object encoded = json_encode (lisp);
+ ptrdiff_t size = SBYTES (encoded);
+ eassert (size >= 0);
+ if (size > SIZE_MAX)
+ xsignal1 (Qoverflow_error, build_string ("string is too long"));
+ return json_check (json_stringn (SSDATA (encoded), size));
+ }
+
+ /* LISP now must be a vector or hashtable. */
+ return lisp_to_json_toplevel (lisp);
+}
+
+DEFUN ("json-serialize", Fjson_serialize, Sjson_serialize, 1, 1, NULL,
+ doc: /* Return the JSON representation of OBJECT as a string.
+OBJECT must be a vector or hashtable, and its elements can recursively
+contain `:null', `:false', t, numbers, strings, or other vectors and
+hashtables. `:null', `:false', and t will be converted to JSON null,
+false, and true values, respectively. Vectors will be converted to
+JSON arrays, and hashtables to JSON objects. Hashtable keys must be
+strings without embedded null characters and must be unique within
+each object. */)
+ (Lisp_Object object)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+
+ json_t *json = lisp_to_json_toplevel (object);
+ record_unwind_protect_ptr (json_release_object, json);
+
+ char *string = json_dumps (json, JSON_COMPACT);
+ if (string == NULL)
+ json_out_of_memory ();
+ record_unwind_protect_ptr (free, string);
+
+ return unbind_to (count, json_build_string (string));
+}
+
+struct json_buffer_and_size
+{
+ const char *buffer;
+ size_t size;
+};
+
+static Lisp_Object
+json_insert (void *data)
+{
+ const struct json_buffer_and_size *buffer_and_size = data;
+ if (buffer_and_size->size > PTRDIFF_MAX)
+ xsignal1 (Qoverflow_error, build_string ("buffer too large"));
+ insert (buffer_and_size->buffer, buffer_and_size->size);
+ return Qnil;
+}
+
+struct json_insert_data
+{
+ /* nil if json_insert succeeded, otherwise the symbol
+ Qcatch_all_memory_full or a cons (ERROR-SYMBOL . ERROR-DATA). */
+ Lisp_Object error;
+};
+
+static int
+json_insert_callback (const char *buffer, size_t size, void *data)
+{
+ /* This function may not exit nonlocally. */
+ struct json_insert_data *d = data;
+ struct json_buffer_and_size buffer_and_size
+ = {.buffer = buffer, .size = size};
+ d->error = internal_catch_all (json_insert, &buffer_and_size, Fidentity);
+ return NILP (d->error) ? 0 : -1;
+}
+
+DEFUN ("json-insert", Fjson_insert, Sjson_insert, 1, 1, NULL,
+ doc: /* Insert the JSON representation of OBJECT before point.
+This is the same as (insert (json-serialize OBJECT)), but potentially
+faster. See the function `json-serialize' for allowed values of
+OBJECT. */)
+ (Lisp_Object object)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+
+ json_t *json = lisp_to_json (object);
+ record_unwind_protect_ptr (json_release_object, json);
+
+ struct json_insert_data data;
+ int status
+ = json_dump_callback (json, json_insert_callback, &data, JSON_COMPACT);
+ if (status == -1)
+ {
+ if (CONSP (data.error))
+ xsignal (XCAR (data.error), XCDR (data.error));
+ else
+ json_out_of_memory ();
+ }
+ eassert (status == 0);
+ eassert (NILP (data.error));
+ return unbind_to (count, Qnil);
+}
+
+static _GL_ARG_NONNULL ((1)) Lisp_Object
+json_to_lisp (json_t *json)
+{
+ switch (json_typeof (json))
+ {
+ case JSON_NULL:
+ return QCnull;
+ case JSON_FALSE:
+ return QCfalse;
+ case JSON_TRUE:
+ return Qt;
+ case JSON_INTEGER:
+ {
+ json_int_t value = json_integer_value (json);
+ if (FIXNUM_OVERFLOW_P (value))
+ xsignal1 (Qoverflow_error,
+ build_string ("JSON integer is too large"));
+ return make_number (value);
+ }
+ case JSON_REAL:
+ return make_float (json_real_value (json));
+ case JSON_STRING:
+ {
+ size_t size = json_string_length (json);
+ if (FIXNUM_OVERFLOW_P (size))
+ xsignal1 (Qoverflow_error, build_string ("JSON string is too long"));
+ return json_make_string (json_string_value (json), size);
+ }
+ case JSON_ARRAY:
+ {
+ if (++lisp_eval_depth > max_lisp_eval_depth)
+ xsignal0 (Qjson_object_too_deep);
+ size_t size = json_array_size (json);
+ if (FIXNUM_OVERFLOW_P (size))
+ xsignal1 (Qoverflow_error, build_string ("JSON array is too long"));
+ Lisp_Object result = Fmake_vector (make_natnum (size), Qunbound);
+ for (ptrdiff_t i = 0; i < size; ++i)
+ ASET (result, i,
+ json_to_lisp (json_array_get (json, i)));
+ --lisp_eval_depth;
+ return result;
+ }
+ case JSON_OBJECT:
+ {
+ if (++lisp_eval_depth > max_lisp_eval_depth)
+ xsignal0 (Qjson_object_too_deep);
+ size_t size = json_object_size (json);
+ if (FIXNUM_OVERFLOW_P (size))
+ xsignal1 (Qoverflow_error,
+ build_string ("JSON object has too many elements"));
+ Lisp_Object result = CALLN (Fmake_hash_table, QCtest, Qequal,
+ QCsize, make_natnum (size));
+ struct Lisp_Hash_Table *h = XHASH_TABLE (result);
+ const char *key_str;
+ json_t *value;
+ json_object_foreach (json, key_str, value)
+ {
+ Lisp_Object key = json_build_string (key_str);
+ EMACS_UINT hash;
+ ptrdiff_t i = hash_lookup (h, key, &hash);
+ eassert (i < 0);
+ hash_put (h, key, json_to_lisp (value), hash);
+ }
+ --lisp_eval_depth;
+ return result;
+ }
+ }
+ /* Can’t get here. */
+ emacs_abort ();
+}
+
+DEFUN ("json-parse-string", Fjson_parse_string, Sjson_parse_string, 1, 1, NULL,
+ doc: /* Parse the JSON STRING into a Lisp object.
+This is essentially the reverse operation of `json-serialize', which
+see. The returned object will be a vector or hashtable. Its elements
+will be `:null', `:false', t, numbers, strings, or further vectors and
+hashtables. If there are duplicate keys in an object, all but the
+last one are ignored. If STRING doesn't contain a valid JSON object,
+an error of type `json-parse-error' is signaled. */)
+ (Lisp_Object string)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+ Lisp_Object encoded = json_encode (string);
+ check_string_without_embedded_nulls (encoded);
+
+ json_error_t error;
+ json_t *object = json_loads (SSDATA (encoded), 0, &error);
+ if (object == NULL)
+ json_parse_error (&error);
+
+ /* Avoid leaking the object in case of further errors. */
+ if (object != NULL)
+ record_unwind_protect_ptr (json_release_object, object);
+
+ return unbind_to (count, json_to_lisp (object));
+}
+
+struct json_read_buffer_data
+{
+ ptrdiff_t point;
+};
+
+static size_t
+json_read_buffer_callback (void *buffer, size_t buflen, void *data)
+{
+ struct json_read_buffer_data *d = data;
+
+ /* First, parse from point to the gap or the end of the accessible
+ portion, whatever is closer. */
+ ptrdiff_t point = d->point;
+ ptrdiff_t end;
+ {
+ bool overflow = INT_ADD_WRAPV (BUFFER_CEILING_OF (point), 1, &end);
+ eassert (!overflow);
+ }
+ size_t count;
+ {
+ bool overflow = INT_SUBTRACT_WRAPV (end, point, &count);
+ eassert (!overflow);
+ }
+ if (buflen < count)
+ count = buflen;
+ memcpy (buffer, BYTE_POS_ADDR (point), count);
+ {
+ bool overflow = INT_ADD_WRAPV (d->point, count, &d->point);
+ eassert (!overflow);
+ }
+ return count;
+}
+
+DEFUN ("json-parse-buffer", Fjson_parse_buffer, Sjson_parse_buffer,
+ 0, 0, NULL,
+ doc: /* Read JSON object from current buffer starting at point.
+This is similar to `json-parse-string', which see. Move point after
+the end of the object if parsing was successful. On error, point is
+not moved. */)
+ (void)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+
+ ptrdiff_t point = PT_BYTE;
+ struct json_read_buffer_data data = {.point = point};
+ json_error_t error;
+ json_t *object = json_load_callback (json_read_buffer_callback, &data,
+ JSON_DISABLE_EOF_CHECK, &error);
+
+ if (object == NULL)
+ json_parse_error (&error);
+
+ /* Avoid leaking the object in case of further errors. */
+ record_unwind_protect_ptr (json_release_object, object);
+
+ /* Convert and then move point only if everything succeeded. */
+ Lisp_Object lisp = json_to_lisp (object);
+
+ {
+ /* Adjust point by how much we just read. Do this here because
+ tokener->char_offset becomes incorrect below. */
+ bool overflow = INT_ADD_WRAPV (point, error.position, &point);
+ eassert (!overflow);
+ eassert (point <= ZV_BYTE);
+ SET_PT_BOTH (BYTE_TO_CHAR (point), point);
+ }
+
+ return unbind_to (count, lisp);
+}
+
+/* Simplified version of ‘define-error’ that works with pure
+ objects. */
+
+static void
+define_error (Lisp_Object name, const char *message, Lisp_Object parent)
+{
+ eassert (SYMBOLP (name));
+ eassert (SYMBOLP (parent));
+ Lisp_Object parent_conditions = Fget (parent, Qerror_conditions);
+ eassert (CONSP (parent_conditions));
+ eassert (!NILP (Fmemq (parent, parent_conditions)));
+ eassert (NILP (Fmemq (name, parent_conditions)));
+ Fput (name, Qerror_conditions, pure_cons (name, parent_conditions));
+ Fput (name, Qerror_message, build_pure_c_string (message));
+}
+
+void
+syms_of_json (void)
+{
+ DEFSYM (QCnull, ":null");
+ DEFSYM (QCfalse, ":false");
+
+ DEFSYM (Qstring_without_embedded_nulls_p, "string-without-embedded-nulls-p");
+ DEFSYM (Qjson_value_p, "json-value-p");
+
+ DEFSYM (Qutf_8_unix, "utf-8-unix");
+
+ DEFSYM (Qjson_error, "json-error");
+ DEFSYM (Qjson_out_of_memory, "json-out-of-memory");
+ DEFSYM (Qjson_parse_error, "json-parse-error");
+ DEFSYM (Qjson_end_of_file, "json-end-of-file");
+ DEFSYM (Qjson_trailing_content, "json-trailing-content");
+ DEFSYM (Qjson_object_too_deep, "json-object-too-deep");
+ define_error (Qjson_error, "generic JSON error", Qerror);
+ define_error (Qjson_out_of_memory, "no free memory for creating JSON object",
+ Qjson_error);
+ define_error (Qjson_parse_error, "could not parse JSON stream",
+ Qjson_error);
+ define_error (Qjson_end_of_file, "end of JSON stream", Qjson_parse_error);
+ define_error (Qjson_trailing_content, "trailing content after JSON stream",
+ Qjson_parse_error);
+ define_error (Qjson_object_too_deep,
+ "object cyclic or Lisp evaluation too deep", Qjson_error);
+
+ DEFSYM (Qpure, "pure");
+ DEFSYM (Qside_effect_free, "side-effect-free");
+
+ DEFSYM (Qjson_serialize, "json-serialize");
+ DEFSYM (Qjson_parse_string, "json-parse-string");
+ Fput (Qjson_serialize, Qpure, Qt);
+ Fput (Qjson_serialize, Qside_effect_free, Qt);
+ Fput (Qjson_parse_string, Qpure, Qt);
+ Fput (Qjson_parse_string, Qside_effect_free, Qt);
+
+ defsubr (&Sjson_serialize);
+ defsubr (&Sjson_insert);
+ defsubr (&Sjson_parse_string);
+ defsubr (&Sjson_parse_buffer);
+}
diff --git a/src/lisp.h b/src/lisp.h
index 0c3ca3ae06..7ecad40f30 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -3440,6 +3440,11 @@ extern int x_bitmap_mask (struct frame *, ptrdiff_t);
extern void reset_image_types (void);
extern void syms_of_image (void);
+#ifdef HAVE_JSON
+/* Defined in json.c. */
+extern void syms_of_json (void);
+#endif
+
/* Defined in insdel.c. */
extern void move_gap_both (ptrdiff_t, ptrdiff_t);
extern _Noreturn void buffer_overflow (void);
@@ -3863,6 +3868,7 @@ extern Lisp_Object internal_condition_case_2 (Lisp_Object (*) (Lisp_Object, Lisp
extern Lisp_Object internal_condition_case_n
(Lisp_Object (*) (ptrdiff_t, Lisp_Object *), ptrdiff_t, Lisp_Object *,
Lisp_Object, Lisp_Object (*) (Lisp_Object, ptrdiff_t, Lisp_Object *));
+extern Lisp_Object internal_catch_all (Lisp_Object (*) (void *), void *, Lisp_Object (*) (Lisp_Object));
extern struct handler *push_handler (Lisp_Object, enum handlertype);
extern struct handler *push_handler_nosignal (Lisp_Object, enum handlertype);
extern void specbind (Lisp_Object, Lisp_Object);
diff --git a/test/src/json-tests.el b/test/src/json-tests.el
new file mode 100644
index 0000000000..8820c682ba
--- /dev/null
+++ b/test/src/json-tests.el
@@ -0,0 +1,97 @@
+;;; json-tests.el --- unit tests for json.c -*- lexical-binding: t; -*-
+
+;; Copyright (C) 2017 Free Software Foundation, Inc.
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
+
+;;; Commentary:
+
+;; Unit tests for src/json.c.
+
+;;; Code:
+
+(require 'cl-lib)
+(require 'map)
+
+(ert-deftest json-serialize/roundtrip ()
+ (let ((lisp [:null :false t 0 123 -456 3.75 "foo"])
+ (json "[null,false,true,0,123,-456,3.75,\"foo\"]"))
+ (should (equal (json-serialize lisp) json))
+ (with-temp-buffer
+ (json-insert lisp)
+ (should (equal (buffer-string) json))
+ (should (eobp)))
+ (should (equal (json-parse-string json) lisp))
+ (with-temp-buffer
+ (insert json)
+ (goto-char 1)
+ (should (equal (json-parse-buffer) lisp))
+ (should (eobp)))))
+
+(ert-deftest json-serialize/object ()
+ (let ((table (make-hash-table :test #'equal)))
+ (puthash "abc" [1 2 t] table)
+ (puthash "def" :null table)
+ (should (equal (json-serialize table)
+ "{\"abc\":[1,2,true],\"def\":null}"))))
+
+(ert-deftest json-parse-string/object ()
+ (let ((actual
+ (json-parse-string
+ "{ \"abc\" : [1, 2, true], \"def\" : null, \"abc\" : [9, false] }\n")))
+ (should (hash-table-p actual))
+ (should (equal (hash-table-count actual) 2))
+ (should (equal (cl-sort (map-pairs actual) #'string< :key #'car)
+ '(("abc" . [9 :false]) ("def" . :null))))))
+
+(ert-deftest json-parse-string/string ()
+ (should-error (json-parse-string "[\"formfeed\f\"]") :type 'json-parse-error)
+ (should (equal (json-parse-string "[\"foo \\\"bar\\\"\"]") ["foo \"bar\""]))
+ (should (equal (json-parse-string "[\"abcαβγ\"]") ["abcαβγ"]))
+ (should (equal (json-parse-string "[\"\\nasd\\u0444\\u044b\\u0432fgh\\t\"]")
+ ["\nasdфывfgh\t"]))
+ (should (equal (json-parse-string "[\"\\uD834\\uDD1E\"]") ["\U0001D11E"]))
+ (should-error (json-parse-string "foo") :type 'json-parse-error))
+
+(ert-deftest json-serialize/string ()
+ (should (equal (json-serialize ["foo"]) "[\"foo\"]"))
+ (should (equal (json-serialize ["a\n\fb"]) "[\"a\\n\\fb\"]"))
+ (should (equal (json-serialize ["\nasdфыв\u001f\u007ffgh\t"])
+ "[\"\\nasdфыв\\u001F\u007ffgh\\t\"]")))
+
+(ert-deftest json-parse-string/incomplete ()
+ (should-error (json-parse-string "[123") :type 'json-end-of-file))
+
+(ert-deftest json-parse-string/trailing ()
+ (should-error (json-parse-string "[123] [456]") :type 'json-trailing-content))
+
+(ert-deftest json-parse-buffer/incomplete ()
+ (with-temp-buffer
+ (insert "[123")
+ (goto-char 1)
+ (should-error (json-parse-buffer) :type 'json-end-of-file)
+ (should (bobp))))
+
+(ert-deftest json-parse-buffer/trailing ()
+ (with-temp-buffer
+ (insert "[123] [456]")
+ (goto-char 1)
+ (should (equal (json-parse-buffer) [123]))
+ (should-not (bobp))
+ (should (looking-at-p (rx " [456]" eos)))))
+
+(provide 'json-tests)
+;;; json-tests.el ends here
--
2.14.1
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-30 22:02 ` Philipp Stephani
@ 2017-10-01 18:06 ` Eli Zaretskii
2017-10-03 12:26 ` Philipp Stephani
2017-10-01 18:38 ` Eli Zaretskii
1 sibling, 1 reply; 81+ messages in thread
From: Eli Zaretskii @ 2017-10-01 18:06 UTC (permalink / raw)
To: Philipp Stephani; +Cc: emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sat, 30 Sep 2017 22:02:55 +0000
> Cc: emacs-devel@gnu.org
>
> Subject: [PATCH] Implement native JSON support using Jansson
Thanks, a few more comments/questions.
> +#if __has_attribute (warn_unused_result)
> +# define ATTRIBUTE_WARN_UNUSED_RESULT __attribute__ ((__warn_unused_result__))
> +#else
> +# define ATTRIBUTE_WARN_UNUSED_RESULT
> +#endif
Hmm... why do we need this attribute? You use it with 2 static
functions, so this sounds like a left-over from the development stage?
> +static Lisp_Object
> +internal_catch_all_1 (Lisp_Object (*function) (void *), void *argument)
Can you tell why you needed this (and the similar internal_catch_all)?
Is that only because the callbacks could signal an error, or is there
another reason? If the former, I'd prefer to simplify the code and
its maintenance by treating the error condition in a less drastic
manner, and avoiding the call to xsignal.
> +static _GL_ARG_NONNULL ((2)) Lisp_Object
> +lisp_to_json_toplevel_1 (Lisp_Object lisp, json_t **json)
> +{
> + if (VECTORP (lisp))
> + {
> + ptrdiff_t size = ASIZE (lisp);
> + eassert (size >= 0);
> + if (size > SIZE_MAX)
> + xsignal1 (Qoverflow_error, build_string ("vector is too long"));
I think this error text is too vague. Can we come up with something
that describes the problem more accurately?
And btw, how can size be greater than SIZE_MAX in this case? This is
a valid Lisp object, isn't it? (There are more such tests in the
patch, e.g. in lisp_to_json, and I think they, too, are redundant.)
> + *json = json_check (json_array ());
> + ptrdiff_t count = SPECPDL_INDEX ();
> + record_unwind_protect_ptr (json_release_object, json);
> + for (ptrdiff_t i = 0; i < size; ++i)
> + {
> + int status
> + = json_array_append_new (*json, lisp_to_json (AREF (lisp, i)));
> + if (status == -1)
> + json_out_of_memory ();
> + eassert (status == 0);
> + }
> + eassert (json_array_size (*json) == size);
> + clear_unwind_protect (count);
> + return unbind_to (count, Qnil);
This, too, sounds more complex than it should: you record
unwind-protect just so lisp_to_json's subroutines could signal an
error due to insufficient memory, right? Why can't we have the
out-of-memory check only inside this loop, which you already do, and
avoid the checks on lower levels (which undoubtedly cost us extra
cycles)? What do those extra checks in json_check buy us? the errors
they signal are no more informative than the one in the loop, AFAICT.
> +static Lisp_Object
> +json_insert (void *data)
> +{
> + const struct json_buffer_and_size *buffer_and_size = data;
> + if (buffer_and_size->size > PTRDIFF_MAX)
> + xsignal1 (Qoverflow_error, build_string ("buffer too large"));
> + insert (buffer_and_size->buffer, buffer_and_size->size);
I don't think we need this test here, as 'insert' already has the
equivalent test in one of its subroutines.
> + case JSON_INTEGER:
> + {
> + json_int_t value = json_integer_value (json);
> + if (FIXNUM_OVERFLOW_P (value))
> + xsignal1 (Qoverflow_error,
> + build_string ("JSON integer is too large"));
> + return make_number (value);
This overflow test is also redundant, as make_number already does it.
> + case JSON_STRING:
> + {
> + size_t size = json_string_length (json);
> + if (FIXNUM_OVERFLOW_P (size))
> + xsignal1 (Qoverflow_error, build_string ("JSON string is too long"));
> + return json_make_string (json_string_value (json), size);
Once again, the overflow test is redundant, as make_specified_string
(called by json_make_string) already includes an equivalent test.
> + case JSON_ARRAY:
> + {
> + if (++lisp_eval_depth > max_lisp_eval_depth)
> + xsignal0 (Qjson_object_too_deep);
> + size_t size = json_array_size (json);
> + if (FIXNUM_OVERFLOW_P (size))
> + xsignal1 (Qoverflow_error, build_string ("JSON array is too long"));
> + Lisp_Object result = Fmake_vector (make_natnum (size), Qunbound);
Likewise here: Fmake_vector makes sure the size is not larger than
allowed.
> + case JSON_OBJECT:
> + {
> + if (++lisp_eval_depth > max_lisp_eval_depth)
> + xsignal0 (Qjson_object_too_deep);
> + size_t size = json_object_size (json);
> + if (FIXNUM_OVERFLOW_P (size))
> + xsignal1 (Qoverflow_error,
> + build_string ("JSON object has too many elements"));
> + Lisp_Object result = CALLN (Fmake_hash_table, QCtest, Qequal,
> + QCsize, make_natnum (size));
Likewise here: make_natnum does the equivalent test.
> + /* Adjust point by how much we just read. Do this here because
> + tokener->char_offset becomes incorrect below. */
> + bool overflow = INT_ADD_WRAPV (point, error.position, &point);
> + eassert (!overflow);
> + eassert (point <= ZV_BYTE);
> + SET_PT_BOTH (BYTE_TO_CHAR (point), point);
It's better to use SET_PT here, I think.
> + define_error (Qjson_out_of_memory, "no free memory for creating JSON object",
I'd prefer "not enough memory for creating JSON object".
Thanks again for working on this.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-09-30 22:02 ` Philipp Stephani
2017-10-01 18:06 ` Eli Zaretskii
@ 2017-10-01 18:38 ` Eli Zaretskii
2017-10-03 12:12 ` Philipp Stephani
1 sibling, 1 reply; 81+ messages in thread
From: Eli Zaretskii @ 2017-10-01 18:38 UTC (permalink / raw)
To: Philipp Stephani; +Cc: emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sat, 30 Sep 2017 22:02:55 +0000
> Cc: emacs-devel@gnu.org
One more comment:
> +static Lisp_Object
> +json_insert (void *data)
> +{
> + const struct json_buffer_and_size *buffer_and_size = data;
> + if (buffer_and_size->size > PTRDIFF_MAX)
> + xsignal1 (Qoverflow_error, build_string ("buffer too large"));
> + insert (buffer_and_size->buffer, buffer_and_size->size);
> + return Qnil;
Shouldn't we follow the call to 'insert' with decode_from_gap, or do
it before json_insert_callback returns? Or do we trust Jansson to
produce 100% valid UTF-8?
Thanks.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-01 18:38 ` Eli Zaretskii
@ 2017-10-03 12:12 ` Philipp Stephani
2017-10-03 14:54 ` Eli Zaretskii
0 siblings, 1 reply; 81+ messages in thread
From: Philipp Stephani @ 2017-10-03 12:12 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 801 bytes --]
Eli Zaretskii <eliz@gnu.org> schrieb am So., 1. Okt. 2017 um 20:38 Uhr:
> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Sat, 30 Sep 2017 22:02:55 +0000
> > Cc: emacs-devel@gnu.org
>
> One more comment:
>
> > +static Lisp_Object
> > +json_insert (void *data)
> > +{
> > + const struct json_buffer_and_size *buffer_and_size = data;
> > + if (buffer_and_size->size > PTRDIFF_MAX)
> > + xsignal1 (Qoverflow_error, build_string ("buffer too large"));
> > + insert (buffer_and_size->buffer, buffer_and_size->size);
> > + return Qnil;
>
> Shouldn't we follow the call to 'insert' with decode_from_gap, or do
> it before json_insert_callback returns? Or do we trust Jansson to
> produce 100% valid UTF-8?
>
>
What's decode_from_gap? It doesn't seem to be defined in the Emacs code
base.
[-- Attachment #2: Type: text/html, Size: 1330 bytes --]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-01 18:06 ` Eli Zaretskii
@ 2017-10-03 12:26 ` Philipp Stephani
2017-10-03 15:31 ` Eli Zaretskii
2017-10-03 20:52 ` Paul Eggert
0 siblings, 2 replies; 81+ messages in thread
From: Philipp Stephani @ 2017-10-03 12:26 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 6984 bytes --]
Eli Zaretskii <eliz@gnu.org> schrieb am So., 1. Okt. 2017 um 20:06 Uhr:
> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Sat, 30 Sep 2017 22:02:55 +0000
> > Cc: emacs-devel@gnu.org
> >
> > Subject: [PATCH] Implement native JSON support using Jansson
>
> Thanks, a few more comments/questions.
>
> > +#if __has_attribute (warn_unused_result)
> > +# define ATTRIBUTE_WARN_UNUSED_RESULT __attribute__
> ((__warn_unused_result__))
> > +#else
> > +# define ATTRIBUTE_WARN_UNUSED_RESULT
> > +#endif
>
> Hmm... why do we need this attribute? You use it with 2 static
> functions, so this sounds like a left-over from the development stage?
>
It's not strictly needed (and if you don't like it, I can remove it), but
it helps catching memory leaks.
>
> > +static Lisp_Object
> > +internal_catch_all_1 (Lisp_Object (*function) (void *), void *argument)
>
> Can you tell why you needed this (and the similar internal_catch_all)?
> Is that only because the callbacks could signal an error, or is there
> another reason? If the former, I'd prefer to simplify the code and
> its maintenance by treating the error condition in a less drastic
> manner, and avoiding the call to xsignal.
>
The callbacks (especially insert and before-/after-change-hook) can exit
nonlocally, but these nonlocal exits may not escape the Jansson callback.
Therefore all nonlocal exits must be caught here.
>
> > +static _GL_ARG_NONNULL ((2)) Lisp_Object
> > +lisp_to_json_toplevel_1 (Lisp_Object lisp, json_t **json)
> > +{
> > + if (VECTORP (lisp))
> > + {
> > + ptrdiff_t size = ASIZE (lisp);
> > + eassert (size >= 0);
> > + if (size > SIZE_MAX)
> > + xsignal1 (Qoverflow_error, build_string ("vector is too long"));
>
> I think this error text is too vague. Can we come up with something
> that describes the problem more accurately?
Maybe, but it's probably not worth it because I don't think we have many
architectures where PTRDIFF_MAX > SIZE_MAX.
>
> And btw, how can size be greater than SIZE_MAX in this case? This is
> a valid Lisp object, isn't it? (There are more such tests in the
> patch, e.g. in lisp_to_json, and I think they, too, are redundant.)
>
Depends on the range of ptrdiff_t and size_t. IIUC nothing in the C
standard guarantees PTRDIFF_MAX <= SIZE_MAX. If we want to guarantee that,
we should probably add at least a static assertion.
>
> > + *json = json_check (json_array ());
> > + ptrdiff_t count = SPECPDL_INDEX ();
> > + record_unwind_protect_ptr (json_release_object, json);
> > + for (ptrdiff_t i = 0; i < size; ++i)
> > + {
> > + int status
> > + = json_array_append_new (*json, lisp_to_json (AREF (lisp,
> i)));
> > + if (status == -1)
> > + json_out_of_memory ();
> > + eassert (status == 0);
> > + }
> > + eassert (json_array_size (*json) == size);
> > + clear_unwind_protect (count);
> > + return unbind_to (count, Qnil);
>
> This, too, sounds more complex than it should: you record
> unwind-protect just so lisp_to_json's subroutines could signal an
> error due to insufficient memory, right? Why can't we have the
> out-of-memory check only inside this loop, which you already do, and
> avoid the checks on lower levels (which undoubtedly cost us extra
> cycles)? What do those extra checks in json_check buy us? the errors
> they signal are no more informative than the one in the loop, AFAICT.
>
I don't understand what you mean. We need to check the return values of all
functions if we want to to use them later.
>
> > +static Lisp_Object
> > +json_insert (void *data)
> > +{
> > + const struct json_buffer_and_size *buffer_and_size = data;
> > + if (buffer_and_size->size > PTRDIFF_MAX)
> > + xsignal1 (Qoverflow_error, build_string ("buffer too large"));
> > + insert (buffer_and_size->buffer, buffer_and_size->size);
>
> I don't think we need this test here, as 'insert' already has the
> equivalent test in one of its subroutines.
>
It can't, because it takes the byte length as ptrdiff_t. We need to check
before whether the size is actually in the valid range of ptrdiff_t.
>
> > + case JSON_INTEGER:
> > + {
> > + json_int_t value = json_integer_value (json);
> > + if (FIXNUM_OVERFLOW_P (value))
> > + xsignal1 (Qoverflow_error,
> > + build_string ("JSON integer is too large"));
> > + return make_number (value);
>
> This overflow test is also redundant, as make_number already does it.
>
It can't, because json_int_t can be larger than EMACS_INT. Also,
make_number doesn't contain any checks.
>
> > + case JSON_STRING:
> > + {
> > + size_t size = json_string_length (json);
> > + if (FIXNUM_OVERFLOW_P (size))
> > + xsignal1 (Qoverflow_error, build_string ("JSON string is too
> long"));
> > + return json_make_string (json_string_value (json), size);
>
> Once again, the overflow test is redundant, as make_specified_string
> (called by json_make_string) already includes an equivalent test.
>
And once again, we need to check at least whether the size fits into
ptrdiff_t.
>
> > + case JSON_ARRAY:
> > + {
> > + if (++lisp_eval_depth > max_lisp_eval_depth)
> > + xsignal0 (Qjson_object_too_deep);
> > + size_t size = json_array_size (json);
> > + if (FIXNUM_OVERFLOW_P (size))
> > + xsignal1 (Qoverflow_error, build_string ("JSON array is too
> long"));
> > + Lisp_Object result = Fmake_vector (make_natnum (size),
> Qunbound);
>
> Likewise here: Fmake_vector makes sure the size is not larger than
> allowed.
>
Same as above: It can't.
>
> > + case JSON_OBJECT:
> > + {
> > + if (++lisp_eval_depth > max_lisp_eval_depth)
> > + xsignal0 (Qjson_object_too_deep);
> > + size_t size = json_object_size (json);
> > + if (FIXNUM_OVERFLOW_P (size))
> > + xsignal1 (Qoverflow_error,
> > + build_string ("JSON object has too many elements"));
> > + Lisp_Object result = CALLN (Fmake_hash_table, QCtest, Qequal,
> > + QCsize, make_natnum (size));
>
> Likewise here: make_natnum does the equivalent test.
>
It doesn't and can't.
>
> > + /* Adjust point by how much we just read. Do this here because
> > + tokener->char_offset becomes incorrect below. */
> > + bool overflow = INT_ADD_WRAPV (point, error.position, &point);
> > + eassert (!overflow);
> > + eassert (point <= ZV_BYTE);
> > + SET_PT_BOTH (BYTE_TO_CHAR (point), point);
>
> It's better to use SET_PT here, I think.
>
That's not possible because we don't have the character offset. (And I
think using SET_PT (BYTE_TO_CHAR (point)) would just require needlessly
recalculating point.)
>
> > + define_error (Qjson_out_of_memory, "no free memory for creating JSON
> object",
>
> I'd prefer "not enough memory for creating JSON object".
>
>
Done.
[-- Attachment #2: Type: text/html, Size: 10111 bytes --]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-03 12:12 ` Philipp Stephani
@ 2017-10-03 14:54 ` Eli Zaretskii
0 siblings, 0 replies; 81+ messages in thread
From: Eli Zaretskii @ 2017-10-03 14:54 UTC (permalink / raw)
To: Philipp Stephani; +Cc: emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Tue, 03 Oct 2017 12:12:05 +0000
> Cc: emacs-devel@gnu.org
>
> Shouldn't we follow the call to 'insert' with decode_from_gap, or do
> it before json_insert_callback returns? Or do we trust Jansson to
> produce 100% valid UTF-8?
>
> What's decode_from_gap? It doesn't seem to be defined in the Emacs code base.
Sorry, faulty memory: I meant decode_coding_gap.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-03 12:26 ` Philipp Stephani
@ 2017-10-03 15:31 ` Eli Zaretskii
2017-10-03 15:52 ` Philipp Stephani
2017-10-03 20:52 ` Paul Eggert
1 sibling, 1 reply; 81+ messages in thread
From: Eli Zaretskii @ 2017-10-03 15:31 UTC (permalink / raw)
To: Philipp Stephani; +Cc: emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Tue, 03 Oct 2017 12:26:32 +0000
> Cc: emacs-devel@gnu.org
>
> > > +#if __has_attribute (warn_unused_result)
> > > +# define ATTRIBUTE_WARN_UNUSED_RESULT __attribute__
> > ((__warn_unused_result__))
> > > +#else
> > > +# define ATTRIBUTE_WARN_UNUSED_RESULT
> > > +#endif
> >
> > Hmm... why do we need this attribute? You use it with 2 static
> > functions, so this sounds like a left-over from the development stage?
> >
>
> It's not strictly needed (and if you don't like it, I can remove it), but
> it helps catching memory leaks.
No strong feeling here, but I'd be interested in hearing Paul's
opinion on this.
> > > +static Lisp_Object
> > > +internal_catch_all_1 (Lisp_Object (*function) (void *), void *argument)
> >
> > Can you tell why you needed this (and the similar internal_catch_all)?
> > Is that only because the callbacks could signal an error, or is there
> > another reason? If the former, I'd prefer to simplify the code and
> > its maintenance by treating the error condition in a less drastic
> > manner, and avoiding the call to xsignal.
>
> The callbacks (especially insert and before-/after-change-hook) can exit
> nonlocally, but these nonlocal exits may not escape the Jansson callback.
> Therefore all nonlocal exits must be caught here.
Why can't you use record_unwind_protect, as we normally do in these
situations?
> > > +static _GL_ARG_NONNULL ((2)) Lisp_Object
> > > +lisp_to_json_toplevel_1 (Lisp_Object lisp, json_t **json)
> > > +{
> > > + if (VECTORP (lisp))
> > > + {
> > > + ptrdiff_t size = ASIZE (lisp);
> > > + eassert (size >= 0);
> > > + if (size > SIZE_MAX)
> > > + xsignal1 (Qoverflow_error, build_string ("vector is too long"));
> >
> > I think this error text is too vague. Can we come up with something
> > that describes the problem more accurately?
>
> Maybe, but it's probably not worth it because I don't think we have many
> architectures where PTRDIFF_MAX > SIZE_MAX.
Then why do we punish all the platforms with this runtime check?
> > And btw, how can size be greater than SIZE_MAX in this case? This is
> > a valid Lisp object, isn't it? (There are more such tests in the
> > patch, e.g. in lisp_to_json, and I think they, too, are redundant.)
>
> Depends on the range of ptrdiff_t and size_t. IIUC nothing in the C
> standard guarantees PTRDIFF_MAX <= SIZE_MAX.
I wasn't talking about PTRDIFF_MAX, I was talking about 'size', which
is the number of bytes in a Lisp string. Since that Lisp string is a
valid Lisp object, how can its size be greater than SIZE_MAX? I don't
think there's a way of creating such a Lisp string in Emacs, because
functions that allocate memory for strings will prevent that.
> > > + *json = json_check (json_array ());
> > > + ptrdiff_t count = SPECPDL_INDEX ();
> > > + record_unwind_protect_ptr (json_release_object, json);
> > > + for (ptrdiff_t i = 0; i < size; ++i)
> > > + {
> > > + int status
> > > + = json_array_append_new (*json, lisp_to_json (AREF (lisp,
> > i)));
> > > + if (status == -1)
> > > + json_out_of_memory ();
> > > + eassert (status == 0);
> > > + }
> > > + eassert (json_array_size (*json) == size);
> > > + clear_unwind_protect (count);
> > > + return unbind_to (count, Qnil);
> >
> > This, too, sounds more complex than it should: you record
> > unwind-protect just so lisp_to_json's subroutines could signal an
> > error due to insufficient memory, right? Why can't we have the
> > out-of-memory check only inside this loop, which you already do, and
> > avoid the checks on lower levels (which undoubtedly cost us extra
> > cycles)? What do those extra checks in json_check buy us? the errors
> > they signal are no more informative than the one in the loop, AFAICT.
>
> I don't understand what you mean. We need to check the return values of all
> functions if we want to to use them later.
Yes, but what problems can cause these return value to be invalid?
AFAICT, only out-of-memory conditions, and that can be checked only
once, there's no need to check every single allocation, because once
an allocation fails, all the rest will too.
> > > +static Lisp_Object
> > > +json_insert (void *data)
> > > +{
> > > + const struct json_buffer_and_size *buffer_and_size = data;
> > > + if (buffer_and_size->size > PTRDIFF_MAX)
> > > + xsignal1 (Qoverflow_error, build_string ("buffer too large"));
> > > + insert (buffer_and_size->buffer, buffer_and_size->size);
> >
> > I don't think we need this test here, as 'insert' already has the
> > equivalent test in one of its subroutines.
>
> It can't, because it takes the byte length as ptrdiff_t. We need to check
> before whether the size is actually in the valid range of ptrdiff_t.
I'm sorry, but I don't see why we should support such exotic
situations only for this one feature. In all other cases we use
either ptrdiff_t type or EMACS_INT type, and these issues disappear
then. Trying to support the SIZE_MAX > PTRDIFF_MAX situation causes
the code to be much more complicated, harder to maintain, and more
expensive at run time than it should be. I'm not even sure there are
such platforms out there that Emacs supports, but if there are, we
already have a gazillion problems like that all over our code. I
object to having such code just for this reason, sorry.
> > > + case JSON_INTEGER:
> > > + {
> > > + json_int_t value = json_integer_value (json);
> > > + if (FIXNUM_OVERFLOW_P (value))
> > > + xsignal1 (Qoverflow_error,
> > > + build_string ("JSON integer is too large"));
> > > + return make_number (value);
> >
> > This overflow test is also redundant, as make_number already does it.
>
> It can't, because json_int_t can be larger than EMACS_INT.
OK, but then I think we should consider returning a float value, or a
cons of 2 integers. If these situations are frequent enough, users
will thank us, and if they are very infrequent, they will never see
such values, and we gain code simplicity and less non-local exits.
> > > + case JSON_STRING:
> > > + {
> > > + size_t size = json_string_length (json);
> > > + if (FIXNUM_OVERFLOW_P (size))
> > > + xsignal1 (Qoverflow_error, build_string ("JSON string is too
> > long"));
> > > + return json_make_string (json_string_value (json), size);
> >
> > Once again, the overflow test is redundant, as make_specified_string
> > (called by json_make_string) already includes an equivalent test.
>
> And once again, we need to check at least whether the size fits into
> ptrdiff_t.
No, we don't, as we don't in other similar cases.
> > > + case JSON_ARRAY:
> > > + {
> > > + if (++lisp_eval_depth > max_lisp_eval_depth)
> > > + xsignal0 (Qjson_object_too_deep);
> > > + size_t size = json_array_size (json);
> > > + if (FIXNUM_OVERFLOW_P (size))
> > > + xsignal1 (Qoverflow_error, build_string ("JSON array is too
> > long"));
> > > + Lisp_Object result = Fmake_vector (make_natnum (size),
> > Qunbound);
> >
> > Likewise here: Fmake_vector makes sure the size is not larger than
> > allowed.
>
> Same as above: It can't.
It can and it does.
> > > + case JSON_OBJECT:
> > > + {
> > > + if (++lisp_eval_depth > max_lisp_eval_depth)
> > > + xsignal0 (Qjson_object_too_deep);
> > > + size_t size = json_object_size (json);
> > > + if (FIXNUM_OVERFLOW_P (size))
> > > + xsignal1 (Qoverflow_error,
> > > + build_string ("JSON object has too many elements"));
> > > + Lisp_Object result = CALLN (Fmake_hash_table, QCtest, Qequal,
> > > + QCsize, make_natnum (size));
> >
> > Likewise here: make_natnum does the equivalent test.
>
> It doesn't and can't.
Yes, it does:
INLINE Lisp_Object
make_natnum (EMACS_INT n)
{
eassert (0 <= n && n <= MOST_POSITIVE_FIXNUM); <<<<<<<<<<<<<<<
EMACS_INT int0 = Lisp_Int0;
> > > + /* Adjust point by how much we just read. Do this here because
> > > + tokener->char_offset becomes incorrect below. */
> > > + bool overflow = INT_ADD_WRAPV (point, error.position, &point);
> > > + eassert (!overflow);
> > > + eassert (point <= ZV_BYTE);
> > > + SET_PT_BOTH (BYTE_TO_CHAR (point), point);
> >
> > It's better to use SET_PT here, I think.
>
> That's not possible because we don't have the character offset.
Right, sorry, I was confused.
Thanks.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-03 15:31 ` Eli Zaretskii
@ 2017-10-03 15:52 ` Philipp Stephani
2017-10-03 16:26 ` Eli Zaretskii
0 siblings, 1 reply; 81+ messages in thread
From: Philipp Stephani @ 2017-10-03 15:52 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 9519 bytes --]
Eli Zaretskii <eliz@gnu.org> schrieb am Di., 3. Okt. 2017 um 17:32 Uhr:
> >
> > > > +static Lisp_Object
> > > > +internal_catch_all_1 (Lisp_Object (*function) (void *), void
> *argument)
> > >
> > > Can you tell why you needed this (and the similar internal_catch_all)?
> > > Is that only because the callbacks could signal an error, or is there
> > > another reason? If the former, I'd prefer to simplify the code and
> > > its maintenance by treating the error condition in a less drastic
> > > manner, and avoiding the call to xsignal.
> >
> > The callbacks (especially insert and before-/after-change-hook) can exit
> > nonlocally, but these nonlocal exits may not escape the Jansson callback.
> > Therefore all nonlocal exits must be caught here.
>
> Why can't you use record_unwind_protect, as we normally do in these
> situations?
>
How would that help? record_unwind_protect can't stop nonlocal exits.
>
> > > > +static _GL_ARG_NONNULL ((2)) Lisp_Object
> > > > +lisp_to_json_toplevel_1 (Lisp_Object lisp, json_t **json)
> > > > +{
> > > > + if (VECTORP (lisp))
> > > > + {
> > > > + ptrdiff_t size = ASIZE (lisp);
> > > > + eassert (size >= 0);
> > > > + if (size > SIZE_MAX)
> > > > + xsignal1 (Qoverflow_error, build_string ("vector is too
> long"));
> > >
> > > I think this error text is too vague. Can we come up with something
> > > that describes the problem more accurately?
> >
> > Maybe, but it's probably not worth it because I don't think we have many
> > architectures where PTRDIFF_MAX > SIZE_MAX.
>
> Then why do we punish all the platforms with this runtime check?
>
If you think this cannot happen we can turn it into a runtime or
compile-time assertion.
>
> > > And btw, how can size be greater than SIZE_MAX in this case? This is
> > > a valid Lisp object, isn't it? (There are more such tests in the
> > > patch, e.g. in lisp_to_json, and I think they, too, are redundant.)
> >
> > Depends on the range of ptrdiff_t and size_t. IIUC nothing in the C
> > standard guarantees PTRDIFF_MAX <= SIZE_MAX.
>
> I wasn't talking about PTRDIFF_MAX, I was talking about 'size', which
> is the number of bytes in a Lisp string. Since that Lisp string is a
> valid Lisp object, how can its size be greater than SIZE_MAX? I don't
> think there's a way of creating such a Lisp string in Emacs, because
> functions that allocate memory for strings will prevent that.
>
Then I think we should at least add an assertion to document this.
>
> > > > + *json = json_check (json_array ());
> > > > + ptrdiff_t count = SPECPDL_INDEX ();
> > > > + record_unwind_protect_ptr (json_release_object, json);
> > > > + for (ptrdiff_t i = 0; i < size; ++i)
> > > > + {
> > > > + int status
> > > > + = json_array_append_new (*json, lisp_to_json (AREF
> (lisp,
> > > i)));
> > > > + if (status == -1)
> > > > + json_out_of_memory ();
> > > > + eassert (status == 0);
> > > > + }
> > > > + eassert (json_array_size (*json) == size);
> > > > + clear_unwind_protect (count);
> > > > + return unbind_to (count, Qnil);
> > >
> > > This, too, sounds more complex than it should: you record
> > > unwind-protect just so lisp_to_json's subroutines could signal an
> > > error due to insufficient memory, right? Why can't we have the
> > > out-of-memory check only inside this loop, which you already do, and
> > > avoid the checks on lower levels (which undoubtedly cost us extra
> > > cycles)? What do those extra checks in json_check buy us? the errors
> > > they signal are no more informative than the one in the loop, AFAICT.
> >
> > I don't understand what you mean. We need to check the return values of
> all
> > functions if we want to to use them later.
>
> Yes, but what problems can cause these return value to be invalid?
> AFAICT, only out-of-memory conditions, and that can be checked only
> once, there's no need to check every single allocation, because once
> an allocation fails, all the rest will too.
>
But if the first succeeds, the second can still fail, so we do need to
check all of them.
>
> > > > +static Lisp_Object
> > > > +json_insert (void *data)
> > > > +{
> > > > + const struct json_buffer_and_size *buffer_and_size = data;
> > > > + if (buffer_and_size->size > PTRDIFF_MAX)
> > > > + xsignal1 (Qoverflow_error, build_string ("buffer too large"));
> > > > + insert (buffer_and_size->buffer, buffer_and_size->size);
> > >
> > > I don't think we need this test here, as 'insert' already has the
> > > equivalent test in one of its subroutines.
> >
> > It can't, because it takes the byte length as ptrdiff_t. We need to check
> > before whether the size is actually in the valid range of ptrdiff_t.
>
> I'm sorry, but I don't see why we should support such exotic
> situations only for this one feature. In all other cases we use
> either ptrdiff_t type or EMACS_INT type, and these issues disappear
> then. Trying to support the SIZE_MAX > PTRDIFF_MAX situation causes
> the code to be much more complicated, harder to maintain, and more
> expensive at run time than it should be.
We can't avoid these checks. The API returns size_t, so we can only assume
that the numbers are in the valid range of size_t, which is larger than the
ones for positive ptrdiff_t's. There's no way around this.
> I'm not even sure there are
> such platforms out there that Emacs supports,
All platforms that I know of have SIZE_MAX > PTRDIFF_MAX.
> but if there are, we
> already have a gazillion problems like that all over our code.
Just because other parts of the codebase are buggy doesn't mean we need to
introduce more bugs in new code.
> I
> object to having such code just for this reason, sorry.
>
We can't avoid it.
>
> > > > + case JSON_INTEGER:
> > > > + {
> > > > + json_int_t value = json_integer_value (json);
> > > > + if (FIXNUM_OVERFLOW_P (value))
> > > > + xsignal1 (Qoverflow_error,
> > > > + build_string ("JSON integer is too large"));
> > > > + return make_number (value);
> > >
> > > This overflow test is also redundant, as make_number already does it.
> >
> > It can't, because json_int_t can be larger than EMACS_INT.
>
> OK, but then I think we should consider returning a float value, or a
> cons of 2 integers. If these situations are frequent enough, users
> will thank us, and if they are very infrequent, they will never see
> such values, and we gain code simplicity and less non-local exits.
>
Returning a float (using make_natnum_or_float) might work, but in the end
I've decided against it because it could silently drop precision. I think
that's worse than signaling an error.
>
> > > > + case JSON_STRING:
> > > > + {
> > > > + size_t size = json_string_length (json);
> > > > + if (FIXNUM_OVERFLOW_P (size))
> > > > + xsignal1 (Qoverflow_error, build_string ("JSON string is
> too
> > > long"));
> > > > + return json_make_string (json_string_value (json), size);
> > >
> > > Once again, the overflow test is redundant, as make_specified_string
> > > (called by json_make_string) already includes an equivalent test.
> >
> > And once again, we need to check at least whether the size fits into
> > ptrdiff_t.
>
> No, we don't, as we don't in other similar cases.
>
I don't understand why you think these checks aren't necessary. Converting
between integral types when the number is out of range for the destination
type results in an implementation-defined result, i.e. it's unportable.
Even assuming the GCC convention, performing such conversions results in
dangerously incorrect values.
>
> > > > + case JSON_ARRAY:
> > > > + {
> > > > + if (++lisp_eval_depth > max_lisp_eval_depth)
> > > > + xsignal0 (Qjson_object_too_deep);
> > > > + size_t size = json_array_size (json);
> > > > + if (FIXNUM_OVERFLOW_P (size))
> > > > + xsignal1 (Qoverflow_error, build_string ("JSON array is
> too
> > > long"));
> > > > + Lisp_Object result = Fmake_vector (make_natnum (size),
> > > Qunbound);
> > >
> > > Likewise here: Fmake_vector makes sure the size is not larger than
> > > allowed.
> >
> > Same as above: It can't.
>
> It can and it does.
>
No, it can't. make_natnum takes a ptrdiff_t argument, and when passing a
value that's out of range for ptrdiff_t, it will receive an incorrect,
implementation-defined value.
>
> > > > + case JSON_OBJECT:
> > > > + {
> > > > + if (++lisp_eval_depth > max_lisp_eval_depth)
> > > > + xsignal0 (Qjson_object_too_deep);
> > > > + size_t size = json_object_size (json);
> > > > + if (FIXNUM_OVERFLOW_P (size))
> > > > + xsignal1 (Qoverflow_error,
> > > > + build_string ("JSON object has too many
> elements"));
> > > > + Lisp_Object result = CALLN (Fmake_hash_table, QCtest,
> Qequal,
> > > > + QCsize, make_natnum (size));
> > >
> > > Likewise here: make_natnum does the equivalent test.
> >
> > It doesn't and can't.
>
> Yes, it does:
>
> INLINE Lisp_Object
> make_natnum (EMACS_INT n)
> {
> eassert (0 <= n && n <= MOST_POSITIVE_FIXNUM); <<<<<<<<<<<<<<<
> EMACS_INT int0 = Lisp_Int0;
>
We're not talking about the same thing. What if make_natnum is called with
a value that doesn't fit in EMACS_INT?
Also an assertion is incorrect here because the overflowing value comes
from user data.
[-- Attachment #2: Type: text/html, Size: 13387 bytes --]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-03 15:52 ` Philipp Stephani
@ 2017-10-03 16:26 ` Eli Zaretskii
2017-10-03 17:10 ` Eli Zaretskii
0 siblings, 1 reply; 81+ messages in thread
From: Eli Zaretskii @ 2017-10-03 16:26 UTC (permalink / raw)
To: Philipp Stephani; +Cc: emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Tue, 03 Oct 2017 15:52:03 +0000
> Cc: emacs-devel@gnu.org
>
> > > The callbacks (especially insert and before-/after-change-hook) can exit
> > > nonlocally, but these nonlocal exits may not escape the Jansson callback.
> > > Therefore all nonlocal exits must be caught here.
> >
> > Why can't you use record_unwind_protect, as we normally do in these
> > situations?
> >
>
> How would that help? record_unwind_protect can't stop nonlocal exits.
Why do you need to stop them? Maybe that's the part I didn't
understand: I thought you needed that to avoid leaking resources.
This is what record_unwind_protect is for.
> > > Maybe, but it's probably not worth it because I don't think we have many
> > > architectures where PTRDIFF_MAX > SIZE_MAX.
> >
> > Then why do we punish all the platforms with this runtime check?
> >
>
> If you think this cannot happen we can turn it into a runtime or
> compile-time assertion.
That's fine with me. (You meant eassert, right? Because it is not a
compile-time assertion.)
> > > Depends on the range of ptrdiff_t and size_t. IIUC nothing in the C
> > > standard guarantees PTRDIFF_MAX <= SIZE_MAX.
> >
> > I wasn't talking about PTRDIFF_MAX, I was talking about 'size', which
> > is the number of bytes in a Lisp string. Since that Lisp string is a
> > valid Lisp object, how can its size be greater than SIZE_MAX? I don't
> > think there's a way of creating such a Lisp string in Emacs, because
> > functions that allocate memory for strings will prevent that.
>
> Then I think we should at least add an assertion to document this.
OK.
> > Yes, but what problems can cause these return value to be invalid?
> > AFAICT, only out-of-memory conditions, and that can be checked only
> > once, there's no need to check every single allocation, because once
> > an allocation fails, all the rest will too.
>
> But if the first succeeds, the second can still fail, so we do need to
> check all of them.
Only the last of them.
> > > > I don't think we need this test here, as 'insert' already has the
> > > > equivalent test in one of its subroutines.
> > >
> > > It can't, because it takes the byte length as ptrdiff_t. We need to check
> > > before whether the size is actually in the valid range of ptrdiff_t.
> >
> > I'm sorry, but I don't see why we should support such exotic
> > situations only for this one feature. In all other cases we use
> > either ptrdiff_t type or EMACS_INT type, and these issues disappear
> > then. Trying to support the SIZE_MAX > PTRDIFF_MAX situation causes
> > the code to be much more complicated, harder to maintain, and more
> > expensive at run time than it should be.
>
> We can't avoid these checks. The API returns size_t, so we can only assume
> that the numbers are in the valid range of size_t, which is larger than the
> ones for positive ptrdiff_t's. There's no way around this.
We do this everywhere else.
We can discuss whether the low-level subroutines which allocate memory
for these objects should be modified to accept a size_t arguments
instead of ptrdiff_t or EMACS_INT, and then make these tests in those
subroutines. But doing these tests on the level of your code, and
only in those few functions, makes absolutely no sense to me. Either
these problems are important for Emacs's stability, or they aren't.
Considering them important only in a small number of places means we
are confused about our own code.
> > I'm not even sure there are
> > such platforms out there that Emacs supports,
>
> All platforms that I know of have SIZE_MAX > PTRDIFF_MAX.
Ha-ha, very funny. I meant size_t is wider than ptrdiff_t.
> > but if there are, we
> > already have a gazillion problems like that all over our code.
>
> Just because other parts of the codebase are buggy doesn't mean we need to
> introduce more bugs in new code.
If we consider them bugs, we should fix them, instead of adding
wrappers around the most basic functions in our arsenal.
> > I object to having such code just for this reason, sorry.
>
> We can't avoid it.
On the level of this functionality, we certainly can and should.
> > > It can't, because json_int_t can be larger than EMACS_INT.
> >
> > OK, but then I think we should consider returning a float value, or a
> > cons of 2 integers. If these situations are frequent enough, users
> > will thank us, and if they are very infrequent, they will never see
> > such values, and we gain code simplicity and less non-local exits.
>
> Returning a float (using make_natnum_or_float) might work, but in the end
> I've decided against it because it could silently drop precision. I think
> that's worse than signaling an error.
It cannot be worse, because we already do this elsewhere. And this
situation is infrequent enough to be unimportant. Signaling an error,
OTOH, disrupts what could be a valid program, so I cannot agree that
it's better. The corresponding Lisp implementation certainly doesn't
signal an error in this case, right?
> > > And once again, we need to check at least whether the size fits into
> > > ptrdiff_t.
> >
> > No, we don't, as we don't in other similar cases.
>
> I don't understand why you think these checks aren't necessary. Converting
> between integral types when the number is out of range for the destination
> type results in an implementation-defined result, i.e. it's unportable.
I'm saying that this code is the wrong place for doing these checks.
We can discuss whether these checks are needed in general, and if we
agree they are, we should change all the related allocation
subroutines to do that there.
> > > > Likewise here: Fmake_vector makes sure the size is not larger than
> > > > allowed.
> > >
> > > Same as above: It can't.
> >
> > It can and it does.
>
> No, it can't. make_natnum takes a ptrdiff_t argument, and when passing a
> value that's out of range for ptrdiff_t, it will receive an incorrect,
> implementation-defined value.
Even if that can happen (and I'm not sure it can) that is already
happening in all the other cases where we call make_natnum and
Fmake_vector.
> > > > Likewise here: make_natnum does the equivalent test.
> > >
> > > It doesn't and can't.
> >
> > Yes, it does:
> >
> > INLINE Lisp_Object
> > make_natnum (EMACS_INT n)
> > {
> > eassert (0 <= n && n <= MOST_POSITIVE_FIXNUM); <<<<<<<<<<<<<<<
> > EMACS_INT int0 = Lisp_Int0;
> >
>
> We're not talking about the same thing. What if make_natnum is called with
> a value that doesn't fit in EMACS_INT?
How can that happen? If that's because you pass it a wrongly typed
value, we should fix that type.
> Also an assertion is incorrect here because the overflowing value comes
> from user data.
Once again, we do that all the time, and this code is not the right
place for making such checks, IMO.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-03 16:26 ` Eli Zaretskii
@ 2017-10-03 17:10 ` Eli Zaretskii
2017-10-03 18:37 ` Philipp Stephani
0 siblings, 1 reply; 81+ messages in thread
From: Eli Zaretskii @ 2017-10-03 17:10 UTC (permalink / raw)
To: p.stephani2, Paul Eggert; +Cc: emacs-devel
> Date: Tue, 03 Oct 2017 19:26:53 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: emacs-devel@gnu.org
>
> > I don't understand why you think these checks aren't necessary. Converting
> > between integral types when the number is out of range for the destination
> > type results in an implementation-defined result, i.e. it's unportable.
>
> I'm saying that this code is the wrong place for doing these checks.
> We can discuss whether these checks are needed in general, and if we
> agree they are, we should change all the related allocation
> subroutines to do that there.
Let me say this another way: Paul Eggert and others have spent the
last several years hardening Emacs primitives for all kinds of
infrequent situations where we could have undefined behavior. We now
have in many places dozens of tests and tricky macros we never had
before with checks and defenses against such calamities. If, after
all that, we still need application-level C code to do its own checks
for such situations, then I don't understand what we were doing all
these years, and why all the safety nets we added are not good enough
for taking care of this code as well.
Paul, can you please comment on this?
Thanks.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-03 17:10 ` Eli Zaretskii
@ 2017-10-03 18:37 ` Philipp Stephani
0 siblings, 0 replies; 81+ messages in thread
From: Philipp Stephani @ 2017-10-03 18:37 UTC (permalink / raw)
To: Eli Zaretskii, Paul Eggert; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1466 bytes --]
Eli Zaretskii <eliz@gnu.org> schrieb am Di., 3. Okt. 2017 um 19:10 Uhr:
> > Date: Tue, 03 Oct 2017 19:26:53 +0300
> > From: Eli Zaretskii <eliz@gnu.org>
> > Cc: emacs-devel@gnu.org
> >
> > > I don't understand why you think these checks aren't necessary.
> Converting
> > > between integral types when the number is out of range for the
> destination
> > > type results in an implementation-defined result, i.e. it's unportable.
> >
> > I'm saying that this code is the wrong place for doing these checks.
> > We can discuss whether these checks are needed in general, and if we
> > agree they are, we should change all the related allocation
> > subroutines to do that there.
>
> Let me say this another way: Paul Eggert and others have spent the
> last several years hardening Emacs primitives for all kinds of
> infrequent situations where we could have undefined behavior. We now
> have in many places dozens of tests and tricky macros we never had
> before with checks and defenses against such calamities. If, after
> all that, we still need application-level C code to do its own checks
> for such situations, then I don't understand what we were doing all
> these years, and why all the safety nets we added are not good enough
> for taking care of this code as well.
>
>
I'm certain those changes are useful! But they can't protect from data loss
due to lossy type conversions; that is just a general behavior of C, and
all C code needs to deal with it.
[-- Attachment #2: Type: text/html, Size: 1985 bytes --]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-03 12:26 ` Philipp Stephani
2017-10-03 15:31 ` Eli Zaretskii
@ 2017-10-03 20:52 ` Paul Eggert
2017-10-04 5:33 ` Eli Zaretskii
2017-10-08 17:58 ` Philipp Stephani
1 sibling, 2 replies; 81+ messages in thread
From: Paul Eggert @ 2017-10-03 20:52 UTC (permalink / raw)
To: Philipp Stephani, Eli Zaretskii; +Cc: emacs-devel
On 10/03/2017 05:26 AM, Philipp Stephani wrote:
> > +#if __has_attribute (warn_unused_result)
> > +# define ATTRIBUTE_WARN_UNUSED_RESULT __attribute__
> ((__warn_unused_result__))
> > +#else
> > +# define ATTRIBUTE_WARN_UNUSED_RESULT
> > +#endif
>
> Hmm... why do we need this attribute? You use it with 2 static
> functions, so this sounds like a left-over from the development stage?
>
> It's not strictly needed (and if you don't like it, I can remove it),
> but it helps catching memory leaks.
I've found __warn_unused_result__ to be more trouble than it's worth in
library functions. Emacs has lib/ignore-value.h in order to work around
__warn_unused_result__ brain damage, for example. For static functions
the problem is less, but still, I mildly favor leaving it out.
>
> And btw, how can size be greater than SIZE_MAX in this case? This is
> a valid Lisp object, isn't it? (There are more such tests in the
> patch, e.g. in lisp_to_json, and I think they, too, are redundant.)
>
>
> Depends on the range of ptrdiff_t and size_t. IIUC nothing in the C
> standard guarantees PTRDIFF_MAX <= SIZE_MAX. If we want to guarantee
> that, we should probably add at least a static assertion.
There should be no need for that. No Lisp object can exceed min
(PTRDIFF_MAX, SIZE_MAX) bytes; alloc.c guarantees this, so that Emacs
should work even on oddball platforms where SIZE_MAX < PTRDIFF_MAX, and
there is no need for a runtime check here.
The main practical problem here, by the way, is objects whose sizes
exceed PTRDIFF_MAX on mainstream 32-bit platforms. This Does Not Work
because you cannot subtract pointers within such objects reliably,
sometimes even when the true difference is representable as ptrdiff_t!
This is the main reason alloc.c prohibits creating such large objects,
and that Emacs should reject any attempt to support such objects (even
non-Lisp objects).
> > + if (buffer_and_size->size > PTRDIFF_MAX)
> > + xsignal1 (Qoverflow_error, build_string ("buffer too large"));
> > + insert (buffer_and_size->buffer, buffer_and_size->size);
>
> I don't think we need this test here, as 'insert' already has the
> equivalent test in one of its subroutines.
>
> It can't, because it takes the byte length as ptrdiff_t. We need to
> check before whether the size is actually in the valid range of ptrdiff_t.
A PTRDIFF_MAX test is needed if the JSON library can return strings
longer than PTRDIFF_MAX. Please just use buffer_overflow () to signal
the error, though.
> > + case JSON_INTEGER:
> > + {
> > + json_int_t value = json_integer_value (json);
> > + if (FIXNUM_OVERFLOW_P (value))
> > + xsignal1 (Qoverflow_error,
> > + build_string ("JSON integer is too large"));
> > + return make_number (value);
>
> This overflow test is also redundant, as make_number already does it.
>
> It can't, because json_int_t can be larger than EMACS_INT. Also,
> make_number doesn't contain any checks.
You're right that a test is needed. However, elsewhere we just use
xsignal0 (Qoverflow_error) for this sort of thing, and I suggest being
consistent and doing that here as well. Similarly for other calls to
xsignal1 with Qoverflow_error.
> > + case JSON_STRING:
> > + {
> > + size_t size = json_string_length (json);
> > + if (FIXNUM_OVERFLOW_P (size))
> > + xsignal1 (Qoverflow_error, build_string ("JSON string is
> too long"));
> > + return json_make_string (json_string_value (json), size);
>
> Once again, the overflow test is redundant, as make_specified_string
> (called by json_make_string) already includes an equivalent test.
>
> And once again, we need to check at least whether the size fits into
> ptrdiff_t.
You're right, a test is needed. However, I suggest using string_overflow
() to signal string overflows.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-03 20:52 ` Paul Eggert
@ 2017-10-04 5:33 ` Eli Zaretskii
2017-10-04 6:41 ` Paul Eggert
2017-10-08 17:58 ` Philipp Stephani
1 sibling, 1 reply; 81+ messages in thread
From: Eli Zaretskii @ 2017-10-04 5:33 UTC (permalink / raw)
To: Paul Eggert; +Cc: p.stephani2, emacs-devel
> Cc: emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Tue, 3 Oct 2017 13:52:54 -0700
>
> > > + if (buffer_and_size->size > PTRDIFF_MAX)
> > > + xsignal1 (Qoverflow_error, build_string ("buffer too large"));
> > > + insert (buffer_and_size->buffer, buffer_and_size->size);
> >
> > I don't think we need this test here, as 'insert' already has the
> > equivalent test in one of its subroutines.
> >
> > It can't, because it takes the byte length as ptrdiff_t. We need to
> > check before whether the size is actually in the valid range of ptrdiff_t.
>
> A PTRDIFF_MAX test is needed if the JSON library can return strings
> longer than PTRDIFF_MAX. Please just use buffer_overflow () to signal
> the error, though.
I'd prefer to have such tests inside 'insert' and its subroutines, or
to tweak the argument passed to 'insert' to have the existing checks
catch the problem. Can we do that, please? It doesn't feel right to
me to have this kind of checks in application C code.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-04 5:33 ` Eli Zaretskii
@ 2017-10-04 6:41 ` Paul Eggert
2017-10-04 8:03 ` Eli Zaretskii
0 siblings, 1 reply; 81+ messages in thread
From: Paul Eggert @ 2017-10-04 6:41 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: p.stephani2, emacs-devel
Eli Zaretskii wrote:
> I'd prefer to have such tests inside 'insert' and its subroutines, or
> to tweak the argument passed to 'insert' to have the existing checks
> catch the problem. Can we do that, please? It doesn't feel right to
> me to have this kind of checks in application C code.
There is no universal C integer type, so this doesn't sound doable in general.
Even intmax_t is not wide enough, as it doesn't subsume uintmax_t. And I
wouldn't want APIs to switch to intmax_t or uintmax_t merely because some
library's types don't match ours; the responsibility for passing arguments
correctly and for checking for any out-of-range values belongs to the glue code,
not to Emacs internals.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-04 6:41 ` Paul Eggert
@ 2017-10-04 8:03 ` Eli Zaretskii
2017-10-04 17:51 ` Paul Eggert
0 siblings, 1 reply; 81+ messages in thread
From: Eli Zaretskii @ 2017-10-04 8:03 UTC (permalink / raw)
To: Paul Eggert; +Cc: p.stephani2, emacs-devel
> Cc: p.stephani2@gmail.com, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Tue, 3 Oct 2017 23:41:41 -0700
>
> Eli Zaretskii wrote:
> > I'd prefer to have such tests inside 'insert' and its subroutines, or
> > to tweak the argument passed to 'insert' to have the existing checks
> > catch the problem. Can we do that, please? It doesn't feel right to
> > me to have this kind of checks in application C code.
>
> There is no universal C integer type, so this doesn't sound doable in general.
> Even intmax_t is not wide enough, as it doesn't subsume uintmax_t. And I
> wouldn't want APIs to switch to intmax_t or uintmax_t merely because some
> library's types don't match ours; the responsibility for passing arguments
> correctly and for checking for any out-of-range values belongs to the glue code,
> not to Emacs internals.
That sounds very uncool, since many libraries use size_t, which means
we will have to spread these checks for extremely rare cases all over
our code, thus making it harder to read and maintain, and making it
slower for no good reason.
E.g., I see several uses of size_t in gnutls.c and in image.c, and at
least some of them seem to be of the same nature as what was discussed
here. Why didn't we add similar checks there?
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-04 8:03 ` Eli Zaretskii
@ 2017-10-04 17:51 ` Paul Eggert
2017-10-04 19:38 ` Eli Zaretskii
0 siblings, 1 reply; 81+ messages in thread
From: Paul Eggert @ 2017-10-04 17:51 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: p.stephani2, emacs-devel
On 10/04/2017 01:03 AM, Eli Zaretskii wrote:
> Why didn't we add similar checks there?
If Emacs allocates the objects there is no problem, since it never
allocates objects larger than PTRDIFF_MAX. Generally this is what
gnutls.c and image.c do. There are places where they do need overflow
checks (notably, image size calculation); in some cases the checks are
there, in some they're old code that nobody has ever cleaned up and
which undoubtedly has problems, and in either case the PTRDIFF_MAX
versus SIZE_MAX issue is a bit of a sideshow as each image library has
its own idea about what integer type to use.
We should not encourage more use of size_t within Emacs, as it's an
unsigned type and unsigned types have the problems I mentioned.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-04 17:51 ` Paul Eggert
@ 2017-10-04 19:38 ` Eli Zaretskii
2017-10-04 21:24 ` Paul Eggert
0 siblings, 1 reply; 81+ messages in thread
From: Eli Zaretskii @ 2017-10-04 19:38 UTC (permalink / raw)
To: Paul Eggert; +Cc: p.stephani2, emacs-devel
> Cc: p.stephani2@gmail.com, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Wed, 4 Oct 2017 10:51:33 -0700
>
> We should not encourage more use of size_t within Emacs, as it's an
> unsigned type and unsigned types have the problems I mentioned.
But if we did use size_t for the arguments which can clearly only be
non-negative, the problems which we are discussing would not have
happened, and we could simply rely on our primitives to do those
checks. So it sounds like by avoiding use of size_t in those cases,
we are shooting ourselves in the foot, by making our code more
complex, less readable, and less efficient. IOW, a net loss. Look
how much more complex is the proposed code in json.c than it could
have been: we could remove all the redundant checks and quite some
code that supports them.
So I'm questioning the total avoidance of size_t in our low-level
code. Why not allow it in a few places where negative values are
clearly mistakes/bugs, and so if they appear as large positive values,
and are rejected as such, that's OK?
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-04 19:38 ` Eli Zaretskii
@ 2017-10-04 21:24 ` Paul Eggert
2017-10-05 1:48 ` Paul Eggert
2017-10-05 7:12 ` Eli Zaretskii
0 siblings, 2 replies; 81+ messages in thread
From: Paul Eggert @ 2017-10-04 21:24 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: p.stephani2, emacs-devel
On 10/04/2017 12:38 PM, Eli Zaretskii wrote:
> if we did use size_t for the arguments which can clearly only be
> non-negative, the problems which we are discussing would not have
> happened
Sure, but we would also have worse problems, as size_t is inherently
more error-prone. ptrdiff_t overflows are reliably diagnosed when Emacs
is compiled with suitable GCC compiler options. size_t overflows cannot
be diagnosed, are all too common, and can cause serious trouble.
The Emacs internals occasionally use size_t because underlying
primitives like 'malloc' do, so we do make some exceptions. Perhaps
there should be an exception here, for convenience with the JSON
library. The code snippets I've seen so far in this thread are not
enough context to judge whether an exception would be helpful in this
case. Generally speaking, though, unsigned types should be avoided
because they are more error-prone. This has long been the style in Emacs
internals, and it's served us well.
(Ironically, just last week I was telling beginning students to beware
unsigned types, with (0u < -1) as an example....)
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-04 21:24 ` Paul Eggert
@ 2017-10-05 1:48 ` Paul Eggert
2017-10-05 7:14 ` Eli Zaretskii
2017-10-08 22:52 ` Philipp Stephani
2017-10-05 7:12 ` Eli Zaretskii
1 sibling, 2 replies; 81+ messages in thread
From: Paul Eggert @ 2017-10-05 1:48 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: p.stephani2, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 765 bytes --]
On 10/04/2017 02:24 PM, Paul Eggert wrote:
> The code snippets I've seen so far in this thread are not enough
> context to judge whether an exception would be helpful in this case.
Sorry, I looked only at this month's part of the thead. When I went back
to last month's I found this:
http://lists.gnu.org/archive/html/emacs-devel/2017-09/msg01080.html
for which I suggest the attached as a simplification. One idea here is
that there is no need for eassert (E) unless there's a genuine doubt
that E will be true (in some cases the removed eassert (E) calls were
ineffective anyway, due to preceding eassume (E) calls). The patch cuts
down the number of integer overflow checks to six in json.c, if I'm
counting correctly, and that should be good enough.
[-- Attachment #2: 0001-Minor-JSON-cleanups-mostly-for-overflow.patch --]
[-- Type: text/x-patch, Size: 8932 bytes --]
From 9a5c6a9eab6cb5ddbbb1f1c0940a561f6895b478 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Wed, 4 Oct 2017 18:38:07 -0700
Subject: [PATCH] Minor JSON cleanups, mostly for overflow
* src/json.c (json_has_prefix): Simplify via strncmp.
(json_has_suffix): Indent a la GNU.
(json_make_string): Take size_t, not ptrdiff_t, and check its
range. Simplify callers accordingly.
(json_out_of_memory, json_to_lisp): Just call memory_full.
(check_string_without_embedded_nulls): Use strlen, not memchr;
it is typically faster.
(lisp_to_json_toplevel_1, json_to_lisp): Do not bother with
_GL_ARG_NONNULL on static functions; it is not worth the trouble.
(lisp_to_json_toplevel_1, lisp_to_json, json_read_buffer_callback)
(define_error):
Remove useless checks.
(json_insert): Just call buffer_overflow.
(json_to_lisp): Just signal overflow error, to be consistent with
other signalers. Use allocate_vector instead of Fmake_vector,
to avoid need for initializing vector twice. Use make_hash_table
instead of Fmake_hash_table, as it is a bit simpler.
---
src/json.c | 93 +++++++++++++++++++++-----------------------------------------
1 file changed, 32 insertions(+), 61 deletions(-)
diff --git a/src/json.c b/src/json.c
index 79be55bd54..5138315da1 100644
--- a/src/json.c
+++ b/src/json.c
@@ -31,9 +31,7 @@ along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. */
static bool
json_has_prefix (const char *string, const char *prefix)
{
- size_t string_len = strlen (string);
- size_t prefix_len = strlen (prefix);
- return string_len >= prefix_len && memcmp (string, prefix, prefix_len) == 0;
+ return strncmp (string, prefix, strlen (prefix)) == 0;
}
static bool
@@ -41,22 +39,23 @@ json_has_suffix (const char *string, const char *suffix)
{
size_t string_len = strlen (string);
size_t suffix_len = strlen (suffix);
- return string_len >= suffix_len
- && memcmp (string + string_len - suffix_len, suffix, suffix_len) == 0;
+ return (string_len >= suffix_len
+ && (memcmp (string + string_len - suffix_len, suffix, suffix_len)
+ == 0));
}
static Lisp_Object
-json_make_string (const char *data, ptrdiff_t size)
+json_make_string (const char *data, size_t size)
{
+ if (PTRDIFF_MAX < size)
+ string_overflow ();
return make_specified_string (data, -1, size, true);
}
static Lisp_Object
json_build_string (const char *data)
{
- size_t size = strlen (data);
- eassert (size < PTRDIFF_MAX);
- return json_make_string (data, size);
+ return json_make_string (data, strlen (data));
}
static Lisp_Object
@@ -68,7 +67,7 @@ json_encode (Lisp_Object string)
static _Noreturn void
json_out_of_memory (void)
{
- xsignal0 (Qjson_out_of_memory);
+ memory_full (SIZE_MAX);
}
static _Noreturn void
@@ -97,7 +96,7 @@ static void
check_string_without_embedded_nulls (Lisp_Object object)
{
CHECK_STRING (object);
- CHECK_TYPE (memchr (SDATA (object), '\0', SBYTES (object)) == NULL,
+ CHECK_TYPE (strlen (SSDATA (object)) == SBYTES (object),
Qstring_without_embedded_nulls_p, object);
}
@@ -114,15 +113,12 @@ static json_t *lisp_to_json (Lisp_Object) ATTRIBUTE_WARN_UNUSED_RESULT;
/* This returns Lisp_Object so we can use unbind_to. The return value
is always nil. */
-static _GL_ARG_NONNULL ((2)) Lisp_Object
+static Lisp_Object
lisp_to_json_toplevel_1 (Lisp_Object lisp, json_t **json)
{
if (VECTORP (lisp))
{
ptrdiff_t size = ASIZE (lisp);
- eassert (size >= 0);
- if (size > SIZE_MAX)
- xsignal1 (Qoverflow_error, build_string ("vector is too long"));
*json = json_check (json_array ());
ptrdiff_t count = SPECPDL_INDEX ();
record_unwind_protect_ptr (json_release_object, json);
@@ -194,9 +190,6 @@ lisp_to_json (Lisp_Object lisp)
{
Lisp_Object encoded = json_encode (lisp);
ptrdiff_t size = SBYTES (encoded);
- eassert (size >= 0);
- if (size > SIZE_MAX)
- xsignal1 (Qoverflow_error, build_string ("string is too long"));
return json_check (json_stringn (SSDATA (encoded), size));
}
@@ -239,7 +232,7 @@ json_insert (void *data)
{
const struct json_buffer_and_size *buffer_and_size = data;
if (buffer_and_size->size > PTRDIFF_MAX)
- xsignal1 (Qoverflow_error, build_string ("buffer too large"));
+ buffer_overflow ();
insert (buffer_and_size->buffer, buffer_and_size->size);
return Qnil;
}
@@ -289,7 +282,7 @@ OBJECT. */)
return unbind_to (count, Qnil);
}
-static _GL_ARG_NONNULL ((1)) Lisp_Object
+static Lisp_Object
json_to_lisp (json_t *json)
{
switch (json_typeof (json))
@@ -304,32 +297,26 @@ json_to_lisp (json_t *json)
{
json_int_t value = json_integer_value (json);
if (FIXNUM_OVERFLOW_P (value))
- xsignal1 (Qoverflow_error,
- build_string ("JSON integer is too large"));
+ xsignal0 (Qoverflow_error);
return make_number (value);
}
case JSON_REAL:
return make_float (json_real_value (json));
case JSON_STRING:
- {
- size_t size = json_string_length (json);
- if (FIXNUM_OVERFLOW_P (size))
- xsignal1 (Qoverflow_error, build_string ("JSON string is too long"));
- return json_make_string (json_string_value (json), size);
- }
+ return json_make_string (json_string_value (json),
+ json_string_length (json));
case JSON_ARRAY:
{
if (++lisp_eval_depth > max_lisp_eval_depth)
xsignal0 (Qjson_object_too_deep);
size_t size = json_array_size (json);
if (FIXNUM_OVERFLOW_P (size))
- xsignal1 (Qoverflow_error, build_string ("JSON array is too long"));
- Lisp_Object result = Fmake_vector (make_natnum (size), Qunbound);
+ memory_full (size);
+ struct Lisp_Vector *v = allocate_vector (size);
for (ptrdiff_t i = 0; i < size; ++i)
- ASET (result, i,
- json_to_lisp (json_array_get (json, i)));
+ v->contents[i] = json_to_lisp (json_array_get (json, i));
--lisp_eval_depth;
- return result;
+ return make_lisp_ptr (v, Lisp_Vectorlike);
}
case JSON_OBJECT:
{
@@ -337,10 +324,10 @@ json_to_lisp (json_t *json)
xsignal0 (Qjson_object_too_deep);
size_t size = json_object_size (json);
if (FIXNUM_OVERFLOW_P (size))
- xsignal1 (Qoverflow_error,
- build_string ("JSON object has too many elements"));
- Lisp_Object result = CALLN (Fmake_hash_table, QCtest, Qequal,
- QCsize, make_natnum (size));
+ memory_full (size);
+ Lisp_Object result
+ = make_hash_table (hashtest_equal, size, DEFAULT_REHASH_SIZE,
+ DEFAULT_REHASH_THRESHOLD, Qnil, false);
struct Lisp_Hash_Table *h = XHASH_TABLE (result);
const char *key_str;
json_t *value;
@@ -399,23 +386,12 @@ json_read_buffer_callback (void *buffer, size_t buflen, void *data)
/* First, parse from point to the gap or the end of the accessible
portion, whatever is closer. */
ptrdiff_t point = d->point;
- ptrdiff_t end;
- {
- bool overflow = INT_ADD_WRAPV (BUFFER_CEILING_OF (point), 1, &end);
- eassert (!overflow);
- }
- size_t count;
- {
- bool overflow = INT_SUBTRACT_WRAPV (end, point, &count);
- eassert (!overflow);
- }
+ ptrdiff_t end = BUFFER_CEILING_OF (point) + 1;
+ ptrdiff_t count = end - point;
if (buflen < count)
count = buflen;
memcpy (buffer, BYTE_POS_ADDR (point), count);
- {
- bool overflow = INT_ADD_WRAPV (d->point, count, &d->point);
- eassert (!overflow);
- }
+ d->point += count;
return count;
}
@@ -444,14 +420,11 @@ not moved. */)
/* Convert and then move point only if everything succeeded. */
Lisp_Object lisp = json_to_lisp (object);
- {
- /* Adjust point by how much we just read. Do this here because
- tokener->char_offset becomes incorrect below. */
- bool overflow = INT_ADD_WRAPV (point, error.position, &point);
- eassert (!overflow);
- eassert (point <= ZV_BYTE);
- SET_PT_BOTH (BYTE_TO_CHAR (point), point);
- }
+ /* Adjust point by how much we just read. Do this here because
+ tokener->char_offset becomes incorrect below. */
+ eassert (0 <= error.position && error.position <= ZV_BYTE - point);
+ point += error.position;
+ SET_PT_BOTH (BYTE_TO_CHAR (point), point);
return unbind_to (count, lisp);
}
@@ -462,8 +435,6 @@ not moved. */)
static void
define_error (Lisp_Object name, const char *message, Lisp_Object parent)
{
- eassert (SYMBOLP (name));
- eassert (SYMBOLP (parent));
Lisp_Object parent_conditions = Fget (parent, Qerror_conditions);
eassert (CONSP (parent_conditions));
eassert (!NILP (Fmemq (parent, parent_conditions)));
--
2.13.6
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-04 21:24 ` Paul Eggert
2017-10-05 1:48 ` Paul Eggert
@ 2017-10-05 7:12 ` Eli Zaretskii
2017-10-06 1:58 ` Paul Eggert
2017-10-08 23:04 ` Philipp Stephani
1 sibling, 2 replies; 81+ messages in thread
From: Eli Zaretskii @ 2017-10-05 7:12 UTC (permalink / raw)
To: Paul Eggert; +Cc: p.stephani2, emacs-devel
> Cc: p.stephani2@gmail.com, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Wed, 4 Oct 2017 14:24:59 -0700
>
> On 10/04/2017 12:38 PM, Eli Zaretskii wrote:
> > if we did use size_t for the arguments which can clearly only be
> > non-negative, the problems which we are discussing would not have
> > happened
> Sure, but we would also have worse problems, as size_t is inherently
> more error-prone. ptrdiff_t overflows are reliably diagnosed when Emacs
> is compiled with suitable GCC compiler options. size_t overflows cannot
> be diagnosed, are all too common, and can cause serious trouble.
If ptrdiff_t overflows are reliably diagnosed, then why do we have to
test for them explicitly in our code, as in the proposed json.c?
AFAIU, ptrdiff_t overflows are the _only_ reason for json.c checks
whether a size_t value is too large, because similar checks for
ptrdiff_t values are already in the low-level subroutines involved in
creating Lisp objects. So why couldn't those checks be avoided by
simply assigning to a ptrdiff_t variables?
> The Emacs internals occasionally use size_t because underlying
> primitives like 'malloc' do, so we do make some exceptions. Perhaps
> there should be an exception here, for convenience with the JSON
> library. The code snippets I've seen so far in this thread are not
> enough context to judge whether an exception would be helpful in this
> case. Generally speaking, though, unsigned types should be avoided
> because they are more error-prone. This has long been the style in Emacs
> internals, and it's served us well.
I'm not arguing for general replacement of ptrdiff_t with size_t, only
for doing that in those primitives where negative values are a clear
mistake/bug.
For example, let's take this case from your proposed changes:
static Lisp_Object
-json_make_string (const char *data, ptrdiff_t size)
+json_make_string (const char *data, size_t size)
{
+ if (PTRDIFF_MAX < size)
+ string_overflow ();
return make_specified_string (data, -1, size, true);
}
If we were to change make_specified_string (and its subroutines, like
make_uninit_multibyte_string etc.) to accept a size_t value in its 3rd
argument, the need for the above check against PTRDIFF_MAX would
disappear.
Another such case is 'insert', which is also used in json.c, and
requires a similar check:
void
insert (const char *string, ptrdiff_t nbytes)
{
if (nbytes > 0)
{
ptrdiff_t len = chars_in_text ((unsigned char *) string, nbytes), opoint;
insert_1_both (string, len, nbytes, 0, 1, 0);
opoint = PT - len;
signal_after_change (opoint, 0, len);
update_compositions (opoint, PT, CHECK_BORDER);
}
}
It clearly ignores negative values of nbytes, as expected. So why not
make nbytes a size_t argument? (We will probably need some low-level
changes inside the subroutines of insert_1_both, like move_gap, to
reject too large size_t values before we convert them to signed
values, but that's hardly rocket science.)
I envision that all the Fmake_SOMETHING primitives could use similar
changes to have the size specified as size_t, because it can never be
negative. E.g., Fmake_vector is used by json.c and currently requires
a similar check because its size argument is a signed type.
IOW, I'm saying that using size_t judiciously, in a small number of
places, would make a lot of sense and allow us to simplify
higher-level code, and make it faster by avoiding duplicate checks of
the same values. It would also make the higher-level code more
reliable, because application-level programmers will not need to
understand all the non-trivial intricacies of this stuff. As Emacs
starts using more and more external libraries, whether built-in or via
modules, the issue of size_t vs ptrdiff_t will become more and more
important, and a source for more and more error-prone code. Why not
fix that in advance in our primitives?
> (Ironically, just last week I was telling beginning students to beware
> unsigned types, with (0u < -1) as an example....)
Well, "kids, don't do that at home -- we are trained professionals"
seems to apply here ;-)
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-05 1:48 ` Paul Eggert
@ 2017-10-05 7:14 ` Eli Zaretskii
2017-10-08 22:52 ` Philipp Stephani
1 sibling, 0 replies; 81+ messages in thread
From: Eli Zaretskii @ 2017-10-05 7:14 UTC (permalink / raw)
To: Paul Eggert; +Cc: p.stephani2, emacs-devel
> From: Paul Eggert <eggert@cs.ucla.edu>
> Cc: p.stephani2@gmail.com, emacs-devel@gnu.org
> Date: Wed, 4 Oct 2017 18:48:04 -0700
>
> Sorry, I looked only at this month's part of the thead. When I went back
> to last month's I found this:
>
> http://lists.gnu.org/archive/html/emacs-devel/2017-09/msg01080.html
>
> for which I suggest the attached as a simplification. One idea here is
> that there is no need for eassert (E) unless there's a genuine doubt
> that E will be true (in some cases the removed eassert (E) calls were
> ineffective anyway, due to preceding eassume (E) calls). The patch cuts
> down the number of integer overflow checks to six in json.c, if I'm
> counting correctly, and that should be good enough.
Thanks. This is indeed good progress. However, I think we could make
even more progress by changing some of our primitives to accept size_t
arguments. See my other message for the details.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-05 7:12 ` Eli Zaretskii
@ 2017-10-06 1:58 ` Paul Eggert
2017-10-06 7:40 ` Eli Zaretskii
2017-10-08 23:09 ` Philipp Stephani
2017-10-08 23:04 ` Philipp Stephani
1 sibling, 2 replies; 81+ messages in thread
From: Paul Eggert @ 2017-10-06 1:58 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: p.stephani2, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2798 bytes --]
On 10/05/2017 12:12 AM, Eli Zaretskii wrote:
> If ptrdiff_t overflows are reliably diagnosed, then why do we have to
> test for them explicitly in our code, as in the proposed json.c?
They're diagnosed only if one compiles with debugging flags like
-fsanitize=undefined. And even then the checks are "reliable" only in
some sense: some overflows at the source level are not caught at the
machine level even if the code is executed, because the overflows are
optimized away. So testing a program with -fsanizitize=undefined does
not guarantee that the same test will avoid ptrdiff_t overflow on some
other platform.
> AFAIU, ptrdiff_t overflows are the _only_ reason for json.c checks
> whether a size_t value is too large
In the most recent patch I proposed, there were only two such checks;
there were two other checks for too-large size_t that were fixnum
checks, not ptrdiff_t checks.
However, I can see that you really don't like those checks. So I tweaked
the proposed patch to remove them all from json.c. Please see the
attached 3 patches (the first is just Philipp's patch rebased to
master). The basic idea here is that json.c should be using xmalloc for
allocation anyway, for reasons other than size overflow checking. And
once it uses the Emacs malloc we can make sure that it never allocates
objects that are too large for ptrdiff_t.
> I'm not arguing for general replacement of ptrdiff_t with size_t, only
> for doing that in those primitives where negative values are a clear
> mistake/bug.
This is exactly where we should be cautious about using unsigned types.
The longstanding Emacs style is to prefer signed integers, to avoid the
common typos that occur with unsigned. If we start changing the style,
these primitives (or people debugging these primitives) often won't be
able to distinguish buggy from valid-but-enormous cases. And when we
compile such primitives (or their callers) with -fsanitize=undefined, we
won't be able to catch integer-overflow bugs automatically at runtime,
since unsigned integer arithmetic silently wraps around even when
-fsanitize=undefined is used.
> "kids, don't do that at home -- we are trained professionals"
I help maintain several GNU programs that use unsigned types for sizes,
and find that style to be trickier than the style Emacs uses, with
respect to integer-overflow bugs. I've been gradually changing some of
the non-Emacs GNU code to use signed types, and the results have been
encouraging: the code is more readable and more obviously correct. I
would rather not go back to the unsigned style: although you're right
that it can be done, it is too error-prone and the errors can lead to
serious bugs. Even for trained professionals, this particular set of
acrobatics is best done with a net.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Implement-native-JSON-support-using-Jansson.patch --]
[-- Type: text/x-patch; name="0001-Implement-native-JSON-support-using-Jansson.patch", Size: 32098 bytes --]
From 8a1689de7c9391a84f26cba97ce2b43d2eec893b Mon Sep 17 00:00:00 2001
From: Philipp Stephani <phst@google.com>
Date: Mon, 18 Sep 2017 10:51:39 +0200
Subject: [PATCH 1/3] Implement native JSON support using Jansson
* configure.ac: New option --with-json.
* src/json.c (Fjson_serialize, Fjson_insert, Fjson_parse_string)
(Fjson_parse_buffer): New defuns.
(json_has_prefix, json_has_suffix, json_make_string)
(json_build_string, json_encode, json_out_of_memory, json_parse_error)
(json_release_object, check_string_without_embedded_nulls, json_check)
(lisp_to_json, lisp_to_json_toplevel, lisp_to_json_toplevel_1)
(json_insert, json_insert_callback, json_to_lisp)
(json_read_buffer_callback, Fjson_parse_buffer, define_error): New
helper functions.
(syms_of_json): New file.
* src/lisp.h: Declaration for syms_of_json.
* src/conf_post.h (ATTRIBUTE_WARN_UNUSED_RESULT): New attribute macro.
* src/emacs.c (main): Enable JSON functions.
* src/eval.c (internal_catch_all, internal_catch_all_1): New helper
functions to catch all signals.
(syms_of_eval): Add uninterned symbol to signify out of memory.
* src/Makefile.in (JSON_LIBS, JSON_CFLAGS, JSON_OBJ, EMACS_CFLAGS)
(base_obj, LIBES): Compile json.c if --with-json is enabled.
* test/src/json-tests.el (json-serialize/roundtrip)
(json-serialize/object, json-parse-string/object)
(json-parse-string/string, json-serialize/string)
(json-parse-string/incomplete, json-parse-string/trailing)
(json-parse-buffer/incomplete, json-parse-buffer/trailing): New unit
tests.
---
configure.ac | 20 +-
etc/NEWS | 7 +
src/Makefile.in | 11 +-
src/conf_post.h | 6 +
src/emacs.c | 4 +
src/eval.c | 54 ++++++
src/json.c | 517 +++++++++++++++++++++++++++++++++++++++++++++++++
src/lisp.h | 6 +
test/src/json-tests.el | 97 ++++++++++
9 files changed, 718 insertions(+), 4 deletions(-)
create mode 100644 src/json.c
create mode 100644 test/src/json-tests.el
diff --git a/configure.ac b/configure.ac
index eba95e2fb8..a0e25cf631 100644
--- a/configure.ac
+++ b/configure.ac
@@ -348,6 +348,7 @@ AC_DEFUN
OPTION_DEFAULT_OFF([cairo],[compile with Cairo drawing (experimental)])
OPTION_DEFAULT_ON([xml2],[don't compile with XML parsing support])
OPTION_DEFAULT_ON([imagemagick],[don't compile with ImageMagick image support])
+OPTION_DEFAULT_ON([json], [don't compile with native JSON support])
OPTION_DEFAULT_ON([xft],[don't use XFT for anti aliased fonts])
OPTION_DEFAULT_ON([libotf],[don't use libotf for OpenType font support])
@@ -2856,6 +2857,22 @@ AC_DEFUN
AC_SUBST(LIBSYSTEMD_LIBS)
AC_SUBST(LIBSYSTEMD_CFLAGS)
+HAVE_JSON=no
+JSON_OBJ=
+
+if test "${with_json}" = yes; then
+ EMACS_CHECK_MODULES([JSON], [jansson >= 2.5],
+ [HAVE_JSON=yes], [HAVE_JSON=no])
+ if test "${HAVE_JSON}" = yes; then
+ AC_DEFINE(HAVE_JSON, 1, [Define if using Jansson.])
+ JSON_OBJ=json.o
+ fi
+fi
+
+AC_SUBST(JSON_LIBS)
+AC_SUBST(JSON_CFLAGS)
+AC_SUBST(JSON_OBJ)
+
NOTIFY_OBJ=
NOTIFY_SUMMARY=no
@@ -5368,7 +5385,7 @@ AC_DEFUN
for opt in XAW3D XPM JPEG TIFF GIF PNG RSVG CAIRO IMAGEMAGICK SOUND GPM DBUS \
GCONF GSETTINGS NOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT \
LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS X_TOOLKIT OLDXMENU X11 NS MODULES \
- XWIDGETS LIBSYSTEMD CANNOT_DUMP LCMS2; do
+ XWIDGETS LIBSYSTEMD JSON CANNOT_DUMP LCMS2; do
case $opt in
CANNOT_DUMP) eval val=\${$opt} ;;
@@ -5418,6 +5435,7 @@ AC_DEFUN
Does Emacs use -lotf? ${HAVE_LIBOTF}
Does Emacs use -lxft? ${HAVE_XFT}
Does Emacs use -lsystemd? ${HAVE_LIBSYSTEMD}
+ Does Emacs use -ljansson? ${HAVE_JSON}
Does Emacs directly use zlib? ${HAVE_ZLIB}
Does Emacs have dynamic modules support? ${HAVE_MODULES}
Does Emacs use toolkit scroll bars? ${USE_TOOLKIT_SCROLL_BARS}
diff --git a/etc/NEWS b/etc/NEWS
index 15661808c7..81586ceb97 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -25,6 +25,13 @@ When you add a new item, use the appropriate mark if you are sure it applies,
\f
* Installation Changes in Emacs 27.1
+** The new configure option '--with-json' adds support for JSON using
+the Jansson library. It is on by default; use 'configure
+--with-json=no' to build without Jansson support. The new JSON
+functions 'json-serialize', 'json-insert', 'json-parse-string', and
+'json-parse-buffer' are typically much faster than their Lisp
+counterparts from json.el.
+
\f
* Startup Changes in Emacs 27.1
diff --git a/src/Makefile.in b/src/Makefile.in
index 9a8c9c85f0..b395627893 100644
--- a/src/Makefile.in
+++ b/src/Makefile.in
@@ -312,6 +312,10 @@ LIBGNUTLS_CFLAGS =
LIBSYSTEMD_LIBS = @LIBSYSTEMD_LIBS@
LIBSYSTEMD_CFLAGS = @LIBSYSTEMD_CFLAGS@
+JSON_LIBS = @JSON_LIBS@
+JSON_CFLAGS = @JSON_CFLAGS@
+JSON_OBJ = @JSON_OBJ@
+
INTERVALS_H = dispextern.h intervals.h composite.h
GETLOADAVG_LIBS = @GETLOADAVG_LIBS@
@@ -363,7 +367,7 @@ EMACS_CFLAGS=
$(WEBKIT_CFLAGS) \
$(SETTINGS_CFLAGS) $(FREETYPE_CFLAGS) $(FONTCONFIG_CFLAGS) \
$(LIBOTF_CFLAGS) $(M17N_FLT_CFLAGS) $(DEPFLAGS) \
- $(LIBSYSTEMD_CFLAGS) \
+ $(LIBSYSTEMD_CFLAGS) $(JSON_CFLAGS) \
$(LIBGNUTLS_CFLAGS) $(NOTIFY_CFLAGS) $(CAIRO_CFLAGS) \
$(WERROR_CFLAGS)
ALL_CFLAGS = $(EMACS_CFLAGS) $(WARN_CFLAGS) $(CFLAGS)
@@ -397,7 +401,7 @@ base_obj =
thread.o systhread.o \
$(if $(HYBRID_MALLOC),sheap.o) \
$(MSDOS_OBJ) $(MSDOS_X_OBJ) $(NS_OBJ) $(CYGWIN_OBJ) $(FONT_OBJ) \
- $(W32_OBJ) $(WINDOW_SYSTEM_OBJ) $(XGSELOBJ)
+ $(W32_OBJ) $(WINDOW_SYSTEM_OBJ) $(XGSELOBJ) $(JSON_OBJ)
obj = $(base_obj) $(NS_OBJC_OBJ)
## Object files used on some machine or other.
@@ -493,7 +497,8 @@ LIBES =
$(LIBS_TERMCAP) $(GETLOADAVG_LIBS) $(SETTINGS_LIBS) $(LIBSELINUX_LIBS) \
$(FREETYPE_LIBS) $(FONTCONFIG_LIBS) $(LIBOTF_LIBS) $(M17N_FLT_LIBS) \
$(LIBGNUTLS_LIBS) $(LIB_PTHREAD) $(GETADDRINFO_A_LIBS) $(LIBLCMS2) \
- $(NOTIFY_LIBS) $(LIB_MATH) $(LIBZ) $(LIBMODULES) $(LIBSYSTEMD_LIBS)
+ $(NOTIFY_LIBS) $(LIB_MATH) $(LIBZ) $(LIBMODULES) $(LIBSYSTEMD_LIBS) \
+ $(JSON_LIBS)
## FORCE it so that admin/unidata can decide whether these files
## are up-to-date. Although since charprop depends on bootstrap-emacs,
diff --git a/src/conf_post.h b/src/conf_post.h
index febdb8b8bf..1a7f51fa51 100644
--- a/src/conf_post.h
+++ b/src/conf_post.h
@@ -338,6 +338,12 @@ extern int emacs_setenv_TZ (char const *);
# define ATTRIBUTE_NO_SANITIZE_ADDRESS
#endif
+#if __has_attribute (warn_unused_result)
+# define ATTRIBUTE_WARN_UNUSED_RESULT __attribute__ ((__warn_unused_result__))
+#else
+# define ATTRIBUTE_WARN_UNUSED_RESULT
+#endif
+
/* gcc -fsanitize=address does not work with vfork in Fedora 25 x86-64.
For now, assume that this problem occurs on all platforms. */
#if ADDRESS_SANITIZER && !defined vfork
diff --git a/src/emacs.c b/src/emacs.c
index 0fe7d9113b..39761016ef 100644
--- a/src/emacs.c
+++ b/src/emacs.c
@@ -1610,6 +1610,10 @@ Using an Emacs configured with --with-x-toolkit=lucid does not have this problem
syms_of_threads ();
syms_of_profiler ();
+#ifdef HAVE_JSON
+ syms_of_json ();
+#endif
+
keys_of_casefiddle ();
keys_of_cmds ();
keys_of_buffer ();
diff --git a/src/eval.c b/src/eval.c
index acda64e7f0..11804d1819 100644
--- a/src/eval.c
+++ b/src/eval.c
@@ -1416,6 +1416,57 @@ internal_condition_case_n (Lisp_Object (*bfun) (ptrdiff_t, Lisp_Object *),
}
}
+static Lisp_Object
+internal_catch_all_1 (Lisp_Object (*function) (void *), void *argument)
+{
+ struct handler *c = push_handler_nosignal (Qt, CATCHER_ALL);
+ if (c == NULL)
+ return Qcatch_all_memory_full;
+
+ if (sys_setjmp (c->jmp) == 0)
+ {
+ Lisp_Object val = function (argument);
+ eassert (handlerlist == c);
+ handlerlist = c->next;
+ return val;
+ }
+ else
+ {
+ eassert (handlerlist == c);
+ Lisp_Object val = c->val;
+ handlerlist = c->next;
+ Fsignal (Qno_catch, val);
+ }
+}
+
+/* Like a combination of internal_condition_case_1 and internal_catch.
+ Catches all signals and throws. Never exits nonlocally; returns
+ Qcatch_all_memory_full if no handler could be allocated. */
+
+Lisp_Object
+internal_catch_all (Lisp_Object (*function) (void *), void *argument,
+ Lisp_Object (*handler) (Lisp_Object))
+{
+ struct handler *c = push_handler_nosignal (Qt, CONDITION_CASE);
+ if (c == NULL)
+ return Qcatch_all_memory_full;
+
+ if (sys_setjmp (c->jmp) == 0)
+ {
+ Lisp_Object val = internal_catch_all_1 (function, argument);
+ eassert (handlerlist == c);
+ handlerlist = c->next;
+ return val;
+ }
+ else
+ {
+ eassert (handlerlist == c);
+ Lisp_Object val = c->val;
+ handlerlist = c->next;
+ return handler (val);
+ }
+}
+
struct handler *
push_handler (Lisp_Object tag_ch_val, enum handlertype handlertype)
{
@@ -4065,6 +4116,9 @@ alist of active lexical bindings. */);
inhibit_lisp_code = Qnil;
+ DEFSYM (Qcatch_all_memory_full, "catch-all-memory-full");
+ Funintern (Qcatch_all_memory_full, Qnil);
+
defsubr (&Sor);
defsubr (&Sand);
defsubr (&Sif);
diff --git a/src/json.c b/src/json.c
new file mode 100644
index 0000000000..79be55bd54
--- /dev/null
+++ b/src/json.c
@@ -0,0 +1,517 @@
+/* JSON parsing and serialization.
+
+Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GNU Emacs.
+
+GNU Emacs is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 3 of the License, or (at
+your option) any later version.
+
+GNU Emacs is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. */
+
+#include <config.h>
+
+#include <stddef.h>
+#include <stdint.h>
+
+#include <jansson.h>
+
+#include "lisp.h"
+#include "buffer.h"
+#include "coding.h"
+
+static bool
+json_has_prefix (const char *string, const char *prefix)
+{
+ size_t string_len = strlen (string);
+ size_t prefix_len = strlen (prefix);
+ return string_len >= prefix_len && memcmp (string, prefix, prefix_len) == 0;
+}
+
+static bool
+json_has_suffix (const char *string, const char *suffix)
+{
+ size_t string_len = strlen (string);
+ size_t suffix_len = strlen (suffix);
+ return string_len >= suffix_len
+ && memcmp (string + string_len - suffix_len, suffix, suffix_len) == 0;
+}
+
+static Lisp_Object
+json_make_string (const char *data, ptrdiff_t size)
+{
+ return make_specified_string (data, -1, size, true);
+}
+
+static Lisp_Object
+json_build_string (const char *data)
+{
+ size_t size = strlen (data);
+ eassert (size < PTRDIFF_MAX);
+ return json_make_string (data, size);
+}
+
+static Lisp_Object
+json_encode (Lisp_Object string)
+{
+ return code_convert_string (string, Qutf_8_unix, Qt, true, true, true);
+}
+
+static _Noreturn void
+json_out_of_memory (void)
+{
+ xsignal0 (Qjson_out_of_memory);
+}
+
+static _Noreturn void
+json_parse_error (const json_error_t *error)
+{
+ Lisp_Object symbol;
+ if (json_has_suffix (error->text, "expected near end of file"))
+ symbol = Qjson_end_of_file;
+ else if (json_has_prefix (error->text, "end of file expected"))
+ symbol = Qjson_trailing_content;
+ else
+ symbol = Qjson_parse_error;
+ xsignal (symbol,
+ list5 (json_build_string (error->text),
+ json_build_string (error->source), make_natnum (error->line),
+ make_natnum (error->column), make_natnum (error->position)));
+}
+
+static void
+json_release_object (void *object)
+{
+ json_decref (object);
+}
+
+static void
+check_string_without_embedded_nulls (Lisp_Object object)
+{
+ CHECK_STRING (object);
+ CHECK_TYPE (memchr (SDATA (object), '\0', SBYTES (object)) == NULL,
+ Qstring_without_embedded_nulls_p, object);
+}
+
+static ATTRIBUTE_WARN_UNUSED_RESULT json_t *
+json_check (json_t *object)
+{
+ if (object == NULL)
+ json_out_of_memory ();
+ return object;
+}
+
+static json_t *lisp_to_json (Lisp_Object) ATTRIBUTE_WARN_UNUSED_RESULT;
+
+/* This returns Lisp_Object so we can use unbind_to. The return value
+ is always nil. */
+
+static _GL_ARG_NONNULL ((2)) Lisp_Object
+lisp_to_json_toplevel_1 (Lisp_Object lisp, json_t **json)
+{
+ if (VECTORP (lisp))
+ {
+ ptrdiff_t size = ASIZE (lisp);
+ eassert (size >= 0);
+ if (size > SIZE_MAX)
+ xsignal1 (Qoverflow_error, build_string ("vector is too long"));
+ *json = json_check (json_array ());
+ ptrdiff_t count = SPECPDL_INDEX ();
+ record_unwind_protect_ptr (json_release_object, json);
+ for (ptrdiff_t i = 0; i < size; ++i)
+ {
+ int status
+ = json_array_append_new (*json, lisp_to_json (AREF (lisp, i)));
+ if (status == -1)
+ json_out_of_memory ();
+ eassert (status == 0);
+ }
+ eassert (json_array_size (*json) == size);
+ clear_unwind_protect (count);
+ return unbind_to (count, Qnil);
+ }
+ else if (HASH_TABLE_P (lisp))
+ {
+ struct Lisp_Hash_Table *h = XHASH_TABLE (lisp);
+ *json = json_check (json_object ());
+ ptrdiff_t count = SPECPDL_INDEX ();
+ record_unwind_protect_ptr (json_release_object, *json);
+ for (ptrdiff_t i = 0; i < HASH_TABLE_SIZE (h); ++i)
+ if (!NILP (HASH_HASH (h, i)))
+ {
+ Lisp_Object key = json_encode (HASH_KEY (h, i));
+ /* We can’t specify the length, so the string must be
+ null-terminated. */
+ check_string_without_embedded_nulls (key);
+ int status = json_object_set_new (*json, SSDATA (key),
+ lisp_to_json (HASH_VALUE (h, i)));
+ if (status == -1)
+ json_out_of_memory ();
+ eassert (status == 0);
+ }
+ clear_unwind_protect (count);
+ return unbind_to (count, Qnil);
+ }
+ wrong_type_argument (Qjson_value_p, lisp);
+}
+
+static ATTRIBUTE_WARN_UNUSED_RESULT json_t *
+lisp_to_json_toplevel (Lisp_Object lisp)
+{
+ if (++lisp_eval_depth > max_lisp_eval_depth)
+ xsignal0 (Qjson_object_too_deep);
+ json_t *json;
+ lisp_to_json_toplevel_1 (lisp, &json);
+ --lisp_eval_depth;
+ return json;
+}
+
+static ATTRIBUTE_WARN_UNUSED_RESULT json_t *
+lisp_to_json (Lisp_Object lisp)
+{
+ if (EQ (lisp, QCnull))
+ return json_check (json_null ());
+ else if (EQ (lisp, QCfalse))
+ return json_check (json_false ());
+ else if (EQ (lisp, Qt))
+ return json_check (json_true ());
+ else if (INTEGERP (lisp))
+ {
+ CHECK_TYPE_RANGED_INTEGER (json_int_t, lisp);
+ return json_check (json_integer (XINT (lisp)));
+ }
+ else if (FLOATP (lisp))
+ return json_check (json_real (XFLOAT_DATA (lisp)));
+ else if (STRINGP (lisp))
+ {
+ Lisp_Object encoded = json_encode (lisp);
+ ptrdiff_t size = SBYTES (encoded);
+ eassert (size >= 0);
+ if (size > SIZE_MAX)
+ xsignal1 (Qoverflow_error, build_string ("string is too long"));
+ return json_check (json_stringn (SSDATA (encoded), size));
+ }
+
+ /* LISP now must be a vector or hashtable. */
+ return lisp_to_json_toplevel (lisp);
+}
+
+DEFUN ("json-serialize", Fjson_serialize, Sjson_serialize, 1, 1, NULL,
+ doc: /* Return the JSON representation of OBJECT as a string.
+OBJECT must be a vector or hashtable, and its elements can recursively
+contain `:null', `:false', t, numbers, strings, or other vectors and
+hashtables. `:null', `:false', and t will be converted to JSON null,
+false, and true values, respectively. Vectors will be converted to
+JSON arrays, and hashtables to JSON objects. Hashtable keys must be
+strings without embedded null characters and must be unique within
+each object. */)
+ (Lisp_Object object)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+
+ json_t *json = lisp_to_json_toplevel (object);
+ record_unwind_protect_ptr (json_release_object, json);
+
+ char *string = json_dumps (json, JSON_COMPACT);
+ if (string == NULL)
+ json_out_of_memory ();
+ record_unwind_protect_ptr (free, string);
+
+ return unbind_to (count, json_build_string (string));
+}
+
+struct json_buffer_and_size
+{
+ const char *buffer;
+ size_t size;
+};
+
+static Lisp_Object
+json_insert (void *data)
+{
+ const struct json_buffer_and_size *buffer_and_size = data;
+ if (buffer_and_size->size > PTRDIFF_MAX)
+ xsignal1 (Qoverflow_error, build_string ("buffer too large"));
+ insert (buffer_and_size->buffer, buffer_and_size->size);
+ return Qnil;
+}
+
+struct json_insert_data
+{
+ /* nil if json_insert succeeded, otherwise the symbol
+ Qcatch_all_memory_full or a cons (ERROR-SYMBOL . ERROR-DATA). */
+ Lisp_Object error;
+};
+
+static int
+json_insert_callback (const char *buffer, size_t size, void *data)
+{
+ /* This function may not exit nonlocally. */
+ struct json_insert_data *d = data;
+ struct json_buffer_and_size buffer_and_size
+ = {.buffer = buffer, .size = size};
+ d->error = internal_catch_all (json_insert, &buffer_and_size, Fidentity);
+ return NILP (d->error) ? 0 : -1;
+}
+
+DEFUN ("json-insert", Fjson_insert, Sjson_insert, 1, 1, NULL,
+ doc: /* Insert the JSON representation of OBJECT before point.
+This is the same as (insert (json-serialize OBJECT)), but potentially
+faster. See the function `json-serialize' for allowed values of
+OBJECT. */)
+ (Lisp_Object object)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+
+ json_t *json = lisp_to_json (object);
+ record_unwind_protect_ptr (json_release_object, json);
+
+ struct json_insert_data data;
+ int status
+ = json_dump_callback (json, json_insert_callback, &data, JSON_COMPACT);
+ if (status == -1)
+ {
+ if (CONSP (data.error))
+ xsignal (XCAR (data.error), XCDR (data.error));
+ else
+ json_out_of_memory ();
+ }
+ eassert (status == 0);
+ eassert (NILP (data.error));
+ return unbind_to (count, Qnil);
+}
+
+static _GL_ARG_NONNULL ((1)) Lisp_Object
+json_to_lisp (json_t *json)
+{
+ switch (json_typeof (json))
+ {
+ case JSON_NULL:
+ return QCnull;
+ case JSON_FALSE:
+ return QCfalse;
+ case JSON_TRUE:
+ return Qt;
+ case JSON_INTEGER:
+ {
+ json_int_t value = json_integer_value (json);
+ if (FIXNUM_OVERFLOW_P (value))
+ xsignal1 (Qoverflow_error,
+ build_string ("JSON integer is too large"));
+ return make_number (value);
+ }
+ case JSON_REAL:
+ return make_float (json_real_value (json));
+ case JSON_STRING:
+ {
+ size_t size = json_string_length (json);
+ if (FIXNUM_OVERFLOW_P (size))
+ xsignal1 (Qoverflow_error, build_string ("JSON string is too long"));
+ return json_make_string (json_string_value (json), size);
+ }
+ case JSON_ARRAY:
+ {
+ if (++lisp_eval_depth > max_lisp_eval_depth)
+ xsignal0 (Qjson_object_too_deep);
+ size_t size = json_array_size (json);
+ if (FIXNUM_OVERFLOW_P (size))
+ xsignal1 (Qoverflow_error, build_string ("JSON array is too long"));
+ Lisp_Object result = Fmake_vector (make_natnum (size), Qunbound);
+ for (ptrdiff_t i = 0; i < size; ++i)
+ ASET (result, i,
+ json_to_lisp (json_array_get (json, i)));
+ --lisp_eval_depth;
+ return result;
+ }
+ case JSON_OBJECT:
+ {
+ if (++lisp_eval_depth > max_lisp_eval_depth)
+ xsignal0 (Qjson_object_too_deep);
+ size_t size = json_object_size (json);
+ if (FIXNUM_OVERFLOW_P (size))
+ xsignal1 (Qoverflow_error,
+ build_string ("JSON object has too many elements"));
+ Lisp_Object result = CALLN (Fmake_hash_table, QCtest, Qequal,
+ QCsize, make_natnum (size));
+ struct Lisp_Hash_Table *h = XHASH_TABLE (result);
+ const char *key_str;
+ json_t *value;
+ json_object_foreach (json, key_str, value)
+ {
+ Lisp_Object key = json_build_string (key_str);
+ EMACS_UINT hash;
+ ptrdiff_t i = hash_lookup (h, key, &hash);
+ eassert (i < 0);
+ hash_put (h, key, json_to_lisp (value), hash);
+ }
+ --lisp_eval_depth;
+ return result;
+ }
+ }
+ /* Can’t get here. */
+ emacs_abort ();
+}
+
+DEFUN ("json-parse-string", Fjson_parse_string, Sjson_parse_string, 1, 1, NULL,
+ doc: /* Parse the JSON STRING into a Lisp object.
+This is essentially the reverse operation of `json-serialize', which
+see. The returned object will be a vector or hashtable. Its elements
+will be `:null', `:false', t, numbers, strings, or further vectors and
+hashtables. If there are duplicate keys in an object, all but the
+last one are ignored. If STRING doesn't contain a valid JSON object,
+an error of type `json-parse-error' is signaled. */)
+ (Lisp_Object string)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+ Lisp_Object encoded = json_encode (string);
+ check_string_without_embedded_nulls (encoded);
+
+ json_error_t error;
+ json_t *object = json_loads (SSDATA (encoded), 0, &error);
+ if (object == NULL)
+ json_parse_error (&error);
+
+ /* Avoid leaking the object in case of further errors. */
+ if (object != NULL)
+ record_unwind_protect_ptr (json_release_object, object);
+
+ return unbind_to (count, json_to_lisp (object));
+}
+
+struct json_read_buffer_data
+{
+ ptrdiff_t point;
+};
+
+static size_t
+json_read_buffer_callback (void *buffer, size_t buflen, void *data)
+{
+ struct json_read_buffer_data *d = data;
+
+ /* First, parse from point to the gap or the end of the accessible
+ portion, whatever is closer. */
+ ptrdiff_t point = d->point;
+ ptrdiff_t end;
+ {
+ bool overflow = INT_ADD_WRAPV (BUFFER_CEILING_OF (point), 1, &end);
+ eassert (!overflow);
+ }
+ size_t count;
+ {
+ bool overflow = INT_SUBTRACT_WRAPV (end, point, &count);
+ eassert (!overflow);
+ }
+ if (buflen < count)
+ count = buflen;
+ memcpy (buffer, BYTE_POS_ADDR (point), count);
+ {
+ bool overflow = INT_ADD_WRAPV (d->point, count, &d->point);
+ eassert (!overflow);
+ }
+ return count;
+}
+
+DEFUN ("json-parse-buffer", Fjson_parse_buffer, Sjson_parse_buffer,
+ 0, 0, NULL,
+ doc: /* Read JSON object from current buffer starting at point.
+This is similar to `json-parse-string', which see. Move point after
+the end of the object if parsing was successful. On error, point is
+not moved. */)
+ (void)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+
+ ptrdiff_t point = PT_BYTE;
+ struct json_read_buffer_data data = {.point = point};
+ json_error_t error;
+ json_t *object = json_load_callback (json_read_buffer_callback, &data,
+ JSON_DISABLE_EOF_CHECK, &error);
+
+ if (object == NULL)
+ json_parse_error (&error);
+
+ /* Avoid leaking the object in case of further errors. */
+ record_unwind_protect_ptr (json_release_object, object);
+
+ /* Convert and then move point only if everything succeeded. */
+ Lisp_Object lisp = json_to_lisp (object);
+
+ {
+ /* Adjust point by how much we just read. Do this here because
+ tokener->char_offset becomes incorrect below. */
+ bool overflow = INT_ADD_WRAPV (point, error.position, &point);
+ eassert (!overflow);
+ eassert (point <= ZV_BYTE);
+ SET_PT_BOTH (BYTE_TO_CHAR (point), point);
+ }
+
+ return unbind_to (count, lisp);
+}
+
+/* Simplified version of ‘define-error’ that works with pure
+ objects. */
+
+static void
+define_error (Lisp_Object name, const char *message, Lisp_Object parent)
+{
+ eassert (SYMBOLP (name));
+ eassert (SYMBOLP (parent));
+ Lisp_Object parent_conditions = Fget (parent, Qerror_conditions);
+ eassert (CONSP (parent_conditions));
+ eassert (!NILP (Fmemq (parent, parent_conditions)));
+ eassert (NILP (Fmemq (name, parent_conditions)));
+ Fput (name, Qerror_conditions, pure_cons (name, parent_conditions));
+ Fput (name, Qerror_message, build_pure_c_string (message));
+}
+
+void
+syms_of_json (void)
+{
+ DEFSYM (QCnull, ":null");
+ DEFSYM (QCfalse, ":false");
+
+ DEFSYM (Qstring_without_embedded_nulls_p, "string-without-embedded-nulls-p");
+ DEFSYM (Qjson_value_p, "json-value-p");
+
+ DEFSYM (Qutf_8_unix, "utf-8-unix");
+
+ DEFSYM (Qjson_error, "json-error");
+ DEFSYM (Qjson_out_of_memory, "json-out-of-memory");
+ DEFSYM (Qjson_parse_error, "json-parse-error");
+ DEFSYM (Qjson_end_of_file, "json-end-of-file");
+ DEFSYM (Qjson_trailing_content, "json-trailing-content");
+ DEFSYM (Qjson_object_too_deep, "json-object-too-deep");
+ define_error (Qjson_error, "generic JSON error", Qerror);
+ define_error (Qjson_out_of_memory, "no free memory for creating JSON object",
+ Qjson_error);
+ define_error (Qjson_parse_error, "could not parse JSON stream",
+ Qjson_error);
+ define_error (Qjson_end_of_file, "end of JSON stream", Qjson_parse_error);
+ define_error (Qjson_trailing_content, "trailing content after JSON stream",
+ Qjson_parse_error);
+ define_error (Qjson_object_too_deep,
+ "object cyclic or Lisp evaluation too deep", Qjson_error);
+
+ DEFSYM (Qpure, "pure");
+ DEFSYM (Qside_effect_free, "side-effect-free");
+
+ DEFSYM (Qjson_serialize, "json-serialize");
+ DEFSYM (Qjson_parse_string, "json-parse-string");
+ Fput (Qjson_serialize, Qpure, Qt);
+ Fput (Qjson_serialize, Qside_effect_free, Qt);
+ Fput (Qjson_parse_string, Qpure, Qt);
+ Fput (Qjson_parse_string, Qside_effect_free, Qt);
+
+ defsubr (&Sjson_serialize);
+ defsubr (&Sjson_insert);
+ defsubr (&Sjson_parse_string);
+ defsubr (&Sjson_parse_buffer);
+}
diff --git a/src/lisp.h b/src/lisp.h
index 680c25d4c4..43d8846619 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -3440,6 +3440,11 @@ extern int x_bitmap_mask (struct frame *, ptrdiff_t);
extern void reset_image_types (void);
extern void syms_of_image (void);
+#ifdef HAVE_JSON
+/* Defined in json.c. */
+extern void syms_of_json (void);
+#endif
+
/* Defined in insdel.c. */
extern void move_gap_both (ptrdiff_t, ptrdiff_t);
extern _Noreturn void buffer_overflow (void);
@@ -3863,6 +3868,7 @@ extern Lisp_Object internal_condition_case_2 (Lisp_Object (*) (Lisp_Object, Lisp
extern Lisp_Object internal_condition_case_n
(Lisp_Object (*) (ptrdiff_t, Lisp_Object *), ptrdiff_t, Lisp_Object *,
Lisp_Object, Lisp_Object (*) (Lisp_Object, ptrdiff_t, Lisp_Object *));
+extern Lisp_Object internal_catch_all (Lisp_Object (*) (void *), void *, Lisp_Object (*) (Lisp_Object));
extern struct handler *push_handler (Lisp_Object, enum handlertype);
extern struct handler *push_handler_nosignal (Lisp_Object, enum handlertype);
extern void specbind (Lisp_Object, Lisp_Object);
diff --git a/test/src/json-tests.el b/test/src/json-tests.el
new file mode 100644
index 0000000000..8820c682ba
--- /dev/null
+++ b/test/src/json-tests.el
@@ -0,0 +1,97 @@
+;;; json-tests.el --- unit tests for json.c -*- lexical-binding: t; -*-
+
+;; Copyright (C) 2017 Free Software Foundation, Inc.
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
+
+;;; Commentary:
+
+;; Unit tests for src/json.c.
+
+;;; Code:
+
+(require 'cl-lib)
+(require 'map)
+
+(ert-deftest json-serialize/roundtrip ()
+ (let ((lisp [:null :false t 0 123 -456 3.75 "foo"])
+ (json "[null,false,true,0,123,-456,3.75,\"foo\"]"))
+ (should (equal (json-serialize lisp) json))
+ (with-temp-buffer
+ (json-insert lisp)
+ (should (equal (buffer-string) json))
+ (should (eobp)))
+ (should (equal (json-parse-string json) lisp))
+ (with-temp-buffer
+ (insert json)
+ (goto-char 1)
+ (should (equal (json-parse-buffer) lisp))
+ (should (eobp)))))
+
+(ert-deftest json-serialize/object ()
+ (let ((table (make-hash-table :test #'equal)))
+ (puthash "abc" [1 2 t] table)
+ (puthash "def" :null table)
+ (should (equal (json-serialize table)
+ "{\"abc\":[1,2,true],\"def\":null}"))))
+
+(ert-deftest json-parse-string/object ()
+ (let ((actual
+ (json-parse-string
+ "{ \"abc\" : [1, 2, true], \"def\" : null, \"abc\" : [9, false] }\n")))
+ (should (hash-table-p actual))
+ (should (equal (hash-table-count actual) 2))
+ (should (equal (cl-sort (map-pairs actual) #'string< :key #'car)
+ '(("abc" . [9 :false]) ("def" . :null))))))
+
+(ert-deftest json-parse-string/string ()
+ (should-error (json-parse-string "[\"formfeed\f\"]") :type 'json-parse-error)
+ (should (equal (json-parse-string "[\"foo \\\"bar\\\"\"]") ["foo \"bar\""]))
+ (should (equal (json-parse-string "[\"abcαβγ\"]") ["abcαβγ"]))
+ (should (equal (json-parse-string "[\"\\nasd\\u0444\\u044b\\u0432fgh\\t\"]")
+ ["\nasdфывfgh\t"]))
+ (should (equal (json-parse-string "[\"\\uD834\\uDD1E\"]") ["\U0001D11E"]))
+ (should-error (json-parse-string "foo") :type 'json-parse-error))
+
+(ert-deftest json-serialize/string ()
+ (should (equal (json-serialize ["foo"]) "[\"foo\"]"))
+ (should (equal (json-serialize ["a\n\fb"]) "[\"a\\n\\fb\"]"))
+ (should (equal (json-serialize ["\nasdфыв\u001f\u007ffgh\t"])
+ "[\"\\nasdфыв\\u001F\u007ffgh\\t\"]")))
+
+(ert-deftest json-parse-string/incomplete ()
+ (should-error (json-parse-string "[123") :type 'json-end-of-file))
+
+(ert-deftest json-parse-string/trailing ()
+ (should-error (json-parse-string "[123] [456]") :type 'json-trailing-content))
+
+(ert-deftest json-parse-buffer/incomplete ()
+ (with-temp-buffer
+ (insert "[123")
+ (goto-char 1)
+ (should-error (json-parse-buffer) :type 'json-end-of-file)
+ (should (bobp))))
+
+(ert-deftest json-parse-buffer/trailing ()
+ (with-temp-buffer
+ (insert "[123] [456]")
+ (goto-char 1)
+ (should (equal (json-parse-buffer) [123]))
+ (should-not (bobp))
+ (should (looking-at-p (rx " [456]" eos)))))
+
+(provide 'json-tests)
+;;; json-tests.el ends here
--
2.13.6
[-- Attachment #3: 0002-Do-not-malloc-more-than-PTRDIFF_MAX.patch --]
[-- Type: text/x-patch, Size: 1181 bytes --]
From c27ea3054d37a38c64f339e30044b1f463b9affe Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Thu, 5 Oct 2017 17:59:07 -0700
Subject: [PATCH 2/3] Do not malloc more than PTRDIFF_MAX
* src/alloc.c (lmalloc, lrealloc): Do not allocate objects
containing more than PTRDIFF_MAX bytes, as they would
cause pointer subtraction to stop working. Much of
Emacs already checks for this; this change closes a
couple of loopholes.
---
src/alloc.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/src/alloc.c b/src/alloc.c
index 2e6399e7f8..ed482c88f2 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -1440,6 +1440,12 @@ laligned (void *p, size_t size)
static void *
lmalloc (size_t size)
{
+ if (PTRDIFF_MAX < size)
+ {
+ errno = ENOMEM;
+ return NULL;
+ }
+
#if USE_ALIGNED_ALLOC
if (! MALLOC_IS_GC_ALIGNED && size % GCALIGNMENT == 0)
return aligned_alloc (GCALIGNMENT, size);
@@ -1460,6 +1466,12 @@ lmalloc (size_t size)
static void *
lrealloc (void *p, size_t size)
{
+ if (PTRDIFF_MAX < size)
+ {
+ errno = ENOMEM;
+ return NULL;
+ }
+
while (true)
{
p = realloc (p, size);
--
2.13.6
[-- Attachment #4: 0003-Minor-JSON-cleanups-mostly-for-overflow.patch --]
[-- Type: text/x-patch, Size: 10377 bytes --]
From 991b22ef18152fb0f2238f25998bb141951ab2cd Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Wed, 4 Oct 2017 18:38:07 -0700
Subject: [PATCH 3/3] Minor JSON cleanups, mostly for overflow
Tell Jansson to use the Emacs allocators, to avoid problems
with objects containing more than PTRDIFF_MAX bytes,
along with other problems. Also, fix some other
minor integer-overflow problems.
* src/emacs.c (main) [HAVE_JSON]: Call init_json.
* src/json.c (json_has_prefix): Simplify via strncmp.
(json_has_suffix): Indent a la GNU.
(json_has_suffix, struct json_buffer_and_size):
Use ptrdiff_t instead of size_t where either will do.
(json_build_string, lisp_to_json_toplevel_1, lisp_to_json)
(json_insert, json_to_lisp, json_read_buffer_callback)
(define_error): Remove useless or no-longer-needed checks.
(json_out_of_memory, json_to_lisp): Just call memory_full.
(check_string_without_embedded_nulls): Use strlen, not memchr;
it is typically faster.
(lisp_to_json_toplevel_1, json_to_lisp): Do not bother with
_GL_ARG_NONNULL on static functions; it is not worth the trouble.
(json_to_lisp): Just signal overflow error, to be consistent with
other signalers. Use allocate_vector instead of Fmake_vector,
to avoid need for initializing vector twice. Use make_hash_table
instead of Fmake_hash_table, as it is a bit simpler.
(init_json): New function.
---
src/emacs.c | 4 +++
src/json.c | 105 +++++++++++++++++++++---------------------------------------
src/lisp.h | 1 +
3 files changed, 42 insertions(+), 68 deletions(-)
diff --git a/src/emacs.c b/src/emacs.c
index 39761016ef..9f635acdbc 100644
--- a/src/emacs.c
+++ b/src/emacs.c
@@ -1271,6 +1271,10 @@ Using an Emacs configured with --with-x-toolkit=lucid does not have this problem
build_details = ! argmatch (argv, argc, "-no-build-details",
"--no-build-details", 7, NULL, &skip_args);
+#ifdef HAVE_JSON
+ init_json ();
+#endif
+
#ifdef HAVE_MODULES
bool module_assertions
= argmatch (argv, argc, "-module-assertions", "--module-assertions", 15,
diff --git a/src/json.c b/src/json.c
index 79be55bd54..0f3b227a78 100644
--- a/src/json.c
+++ b/src/json.c
@@ -31,18 +31,17 @@ along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. */
static bool
json_has_prefix (const char *string, const char *prefix)
{
- size_t string_len = strlen (string);
- size_t prefix_len = strlen (prefix);
- return string_len >= prefix_len && memcmp (string, prefix, prefix_len) == 0;
+ return strncmp (string, prefix, strlen (prefix)) == 0;
}
static bool
json_has_suffix (const char *string, const char *suffix)
{
- size_t string_len = strlen (string);
- size_t suffix_len = strlen (suffix);
- return string_len >= suffix_len
- && memcmp (string + string_len - suffix_len, suffix, suffix_len) == 0;
+ ptrdiff_t string_len = strlen (string);
+ ptrdiff_t suffix_len = strlen (suffix);
+ return (string_len >= suffix_len
+ && (memcmp (string + string_len - suffix_len, suffix, suffix_len)
+ == 0));
}
static Lisp_Object
@@ -54,9 +53,7 @@ json_make_string (const char *data, ptrdiff_t size)
static Lisp_Object
json_build_string (const char *data)
{
- size_t size = strlen (data);
- eassert (size < PTRDIFF_MAX);
- return json_make_string (data, size);
+ return json_make_string (data, strlen (data));
}
static Lisp_Object
@@ -68,7 +65,7 @@ json_encode (Lisp_Object string)
static _Noreturn void
json_out_of_memory (void)
{
- xsignal0 (Qjson_out_of_memory);
+ memory_full (SIZE_MAX);
}
static _Noreturn void
@@ -97,7 +94,7 @@ static void
check_string_without_embedded_nulls (Lisp_Object object)
{
CHECK_STRING (object);
- CHECK_TYPE (memchr (SDATA (object), '\0', SBYTES (object)) == NULL,
+ CHECK_TYPE (strlen (SSDATA (object)) == SBYTES (object),
Qstring_without_embedded_nulls_p, object);
}
@@ -114,15 +111,12 @@ static json_t *lisp_to_json (Lisp_Object) ATTRIBUTE_WARN_UNUSED_RESULT;
/* This returns Lisp_Object so we can use unbind_to. The return value
is always nil. */
-static _GL_ARG_NONNULL ((2)) Lisp_Object
+static Lisp_Object
lisp_to_json_toplevel_1 (Lisp_Object lisp, json_t **json)
{
if (VECTORP (lisp))
{
ptrdiff_t size = ASIZE (lisp);
- eassert (size >= 0);
- if (size > SIZE_MAX)
- xsignal1 (Qoverflow_error, build_string ("vector is too long"));
*json = json_check (json_array ());
ptrdiff_t count = SPECPDL_INDEX ();
record_unwind_protect_ptr (json_release_object, json);
@@ -194,9 +188,6 @@ lisp_to_json (Lisp_Object lisp)
{
Lisp_Object encoded = json_encode (lisp);
ptrdiff_t size = SBYTES (encoded);
- eassert (size >= 0);
- if (size > SIZE_MAX)
- xsignal1 (Qoverflow_error, build_string ("string is too long"));
return json_check (json_stringn (SSDATA (encoded), size));
}
@@ -231,15 +222,13 @@ each object. */)
struct json_buffer_and_size
{
const char *buffer;
- size_t size;
+ ptrdiff_t size;
};
static Lisp_Object
json_insert (void *data)
{
const struct json_buffer_and_size *buffer_and_size = data;
- if (buffer_and_size->size > PTRDIFF_MAX)
- xsignal1 (Qoverflow_error, build_string ("buffer too large"));
insert (buffer_and_size->buffer, buffer_and_size->size);
return Qnil;
}
@@ -289,7 +278,7 @@ OBJECT. */)
return unbind_to (count, Qnil);
}
-static _GL_ARG_NONNULL ((1)) Lisp_Object
+static Lisp_Object
json_to_lisp (json_t *json)
{
switch (json_typeof (json))
@@ -304,43 +293,33 @@ json_to_lisp (json_t *json)
{
json_int_t value = json_integer_value (json);
if (FIXNUM_OVERFLOW_P (value))
- xsignal1 (Qoverflow_error,
- build_string ("JSON integer is too large"));
+ xsignal0 (Qoverflow_error);
return make_number (value);
}
case JSON_REAL:
return make_float (json_real_value (json));
case JSON_STRING:
- {
- size_t size = json_string_length (json);
- if (FIXNUM_OVERFLOW_P (size))
- xsignal1 (Qoverflow_error, build_string ("JSON string is too long"));
- return json_make_string (json_string_value (json), size);
- }
+ return json_make_string (json_string_value (json),
+ json_string_length (json));
case JSON_ARRAY:
{
if (++lisp_eval_depth > max_lisp_eval_depth)
xsignal0 (Qjson_object_too_deep);
- size_t size = json_array_size (json);
- if (FIXNUM_OVERFLOW_P (size))
- xsignal1 (Qoverflow_error, build_string ("JSON array is too long"));
- Lisp_Object result = Fmake_vector (make_natnum (size), Qunbound);
+ ptrdiff_t size = json_array_size (json);
+ struct Lisp_Vector *v = allocate_vector (size);
for (ptrdiff_t i = 0; i < size; ++i)
- ASET (result, i,
- json_to_lisp (json_array_get (json, i)));
+ v->contents[i] = json_to_lisp (json_array_get (json, i));
--lisp_eval_depth;
- return result;
+ return make_lisp_ptr (v, Lisp_Vectorlike);
}
case JSON_OBJECT:
{
if (++lisp_eval_depth > max_lisp_eval_depth)
xsignal0 (Qjson_object_too_deep);
- size_t size = json_object_size (json);
- if (FIXNUM_OVERFLOW_P (size))
- xsignal1 (Qoverflow_error,
- build_string ("JSON object has too many elements"));
- Lisp_Object result = CALLN (Fmake_hash_table, QCtest, Qequal,
- QCsize, make_natnum (size));
+ ptrdiff_t size = json_object_size (json);
+ Lisp_Object result
+ = make_hash_table (hashtest_equal, size, DEFAULT_REHASH_SIZE,
+ DEFAULT_REHASH_THRESHOLD, Qnil, false);
struct Lisp_Hash_Table *h = XHASH_TABLE (result);
const char *key_str;
json_t *value;
@@ -399,23 +378,12 @@ json_read_buffer_callback (void *buffer, size_t buflen, void *data)
/* First, parse from point to the gap or the end of the accessible
portion, whatever is closer. */
ptrdiff_t point = d->point;
- ptrdiff_t end;
- {
- bool overflow = INT_ADD_WRAPV (BUFFER_CEILING_OF (point), 1, &end);
- eassert (!overflow);
- }
- size_t count;
- {
- bool overflow = INT_SUBTRACT_WRAPV (end, point, &count);
- eassert (!overflow);
- }
+ ptrdiff_t end = BUFFER_CEILING_OF (point) + 1;
+ ptrdiff_t count = end - point;
if (buflen < count)
count = buflen;
memcpy (buffer, BYTE_POS_ADDR (point), count);
- {
- bool overflow = INT_ADD_WRAPV (d->point, count, &d->point);
- eassert (!overflow);
- }
+ d->point += count;
return count;
}
@@ -444,14 +412,11 @@ not moved. */)
/* Convert and then move point only if everything succeeded. */
Lisp_Object lisp = json_to_lisp (object);
- {
- /* Adjust point by how much we just read. Do this here because
- tokener->char_offset becomes incorrect below. */
- bool overflow = INT_ADD_WRAPV (point, error.position, &point);
- eassert (!overflow);
- eassert (point <= ZV_BYTE);
- SET_PT_BOTH (BYTE_TO_CHAR (point), point);
- }
+ /* Adjust point by how much we just read. Do this here because
+ tokener->char_offset becomes incorrect below. */
+ eassert (0 <= error.position && error.position <= ZV_BYTE - point);
+ point += error.position;
+ SET_PT_BOTH (BYTE_TO_CHAR (point), point);
return unbind_to (count, lisp);
}
@@ -462,8 +427,6 @@ not moved. */)
static void
define_error (Lisp_Object name, const char *message, Lisp_Object parent)
{
- eassert (SYMBOLP (name));
- eassert (SYMBOLP (parent));
Lisp_Object parent_conditions = Fget (parent, Qerror_conditions);
eassert (CONSP (parent_conditions));
eassert (!NILP (Fmemq (parent, parent_conditions)));
@@ -473,6 +436,12 @@ define_error (Lisp_Object name, const char *message, Lisp_Object parent)
}
void
+init_json (void)
+{
+ json_set_alloc_funcs (xmalloc, xfree);
+}
+
+void
syms_of_json (void)
{
DEFSYM (QCnull, ":null");
diff --git a/src/lisp.h b/src/lisp.h
index 43d8846619..8e530619b8 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -3442,6 +3442,7 @@ extern void syms_of_image (void);
#ifdef HAVE_JSON
/* Defined in json.c. */
+extern void init_json (void);
extern void syms_of_json (void);
#endif
--
2.13.6
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-06 1:58 ` Paul Eggert
@ 2017-10-06 7:40 ` Eli Zaretskii
2017-10-06 19:36 ` Paul Eggert
2017-10-08 23:09 ` Philipp Stephani
1 sibling, 1 reply; 81+ messages in thread
From: Eli Zaretskii @ 2017-10-06 7:40 UTC (permalink / raw)
To: Paul Eggert; +Cc: p.stephani2, emacs-devel
> Cc: p.stephani2@gmail.com, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Thu, 5 Oct 2017 18:58:34 -0700
>
> > AFAIU, ptrdiff_t overflows are the _only_ reason for json.c checks
> > whether a size_t value is too large
>
> In the most recent patch I proposed, there were only two such checks;
> there were two other checks for too-large size_t that were fixnum
> checks, not ptrdiff_t checks.
>
> However, I can see that you really don't like those checks. So I tweaked
> the proposed patch to remove them all from json.c. Please see the
> attached 3 patches (the first is just Philipp's patch rebased to
> master). The basic idea here is that json.c should be using xmalloc for
> allocation anyway, for reasons other than size overflow checking. And
> once it uses the Emacs malloc we can make sure that it never allocates
> objects that are too large for ptrdiff_t.
Thanks, that's better, although it just pushes the extra checks down
to our memory allocation routines. It is better because now these
checks are applied to all the places where Lisp objects are created,
instead of requiring higher-level code to include these checks.
> > I'm not arguing for general replacement of ptrdiff_t with size_t, only
> > for doing that in those primitives where negative values are a clear
> > mistake/bug.
>
> This is exactly where we should be cautious about using unsigned types.
> The longstanding Emacs style is to prefer signed integers, to avoid the
> common typos that occur with unsigned. If we start changing the style,
> these primitives (or people debugging these primitives) often won't be
> able to distinguish buggy from valid-but-enormous cases.
Those valid-but-enormous values are either invalid (if they are larger
than PTRDIFF_MAX), or easily uncovered by looking at higher call-stack
frames in the debugger, where the actual size of the object being
created is visible. Yes, it requires some diligence, but then
debugging Emacs is already more complicated than that of C programs
which use only C objects, so I don't see that as a significant
disadvantage.
> And when we compile such primitives (or their callers) with
> -fsanitize=undefined, we won't be able to catch integer-overflow
> bugs automatically at runtime, since unsigned integer arithmetic
> silently wraps around even when -fsanitize=undefined is used.
I don't envision many primitives to need this kind of change, so
again, the disadvantage doesn't sound too significant to me. The
advantage is IMO significant, as doing so will remove the need for
checks that a size_t value doesn't overflow a ptrdiff_t value, so we
will have an overall speedup. Emacs is accused of being slow, so I
think we should avoid overhead that is only needed in a tiny fraction
of valid use cases.
> I help maintain several GNU programs that use unsigned types for sizes,
> and find that style to be trickier than the style Emacs uses, with
> respect to integer-overflow bugs. I've been gradually changing some of
> the non-Emacs GNU code to use signed types, and the results have been
> encouraging: the code is more readable and more obviously correct.
I'm not sure that experience is 100% applicable to Emacs, because
Emacs has special needs due to the fact that our integers are narrower
than the corresponding C integral types.
And once again, I'm not arguing for a wholesale switch to size_t, only
for its judicious use in a few primitives that create Lisp objects.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-06 7:40 ` Eli Zaretskii
@ 2017-10-06 19:36 ` Paul Eggert
2017-10-06 21:03 ` Eli Zaretskii
0 siblings, 1 reply; 81+ messages in thread
From: Paul Eggert @ 2017-10-06 19:36 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: p.stephani2, emacs-devel
On 10/06/2017 12:40 AM, Eli Zaretskii wrote:
> Those valid-but-enormous values are either invalid (if they are larger
> than PTRDIFF_MAX), or easily uncovered by looking at higher call-stack
> frames in the debugger
I'm not quite following, since "valid-but-enormous values" cannot be
invalid. However, I don't normally use a debugger to find or debug
integer-overflow problems. I more often use static analysis, for which
signed integers work better since static analysis can more-easily detect
signed operations that look dubious. When I do run-time checking, I
normally use -fsanitize=undefined or something like that, instead of
GDB; and here again, signed integers work better than unsigned integers do.
> I don't envision many primitives to need this kind of change
At present zero primitives need this kind of change for JSON, since the
JSON code doesn't need to do any overflow checking for sizes under the
currently-proposed patches. If we run across the problem in the future
for other libraries, we can revisit the issue then.
> I'm not sure that experience is 100% applicable to Emacs, because
> Emacs has special needs due to the fact that our integers are narrower
> than the corresponding C integral types.
That problem is separate from the ptrdiff_t vs size_t problem, which is
the issue at hand here, and which corresponds directly to the experience
I've had with with ptrdiff_t and size_t in other GNU programs.
Preferring ptrdiff_t to size_t (or vice versa) does not affect whether
code needs to check for fixnum overflow.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-06 19:36 ` Paul Eggert
@ 2017-10-06 21:03 ` Eli Zaretskii
0 siblings, 0 replies; 81+ messages in thread
From: Eli Zaretskii @ 2017-10-06 21:03 UTC (permalink / raw)
To: Paul Eggert; +Cc: p.stephani2, emacs-devel
I can never convince you in anything, it seems. It's... frustrating.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-03 20:52 ` Paul Eggert
2017-10-04 5:33 ` Eli Zaretskii
@ 2017-10-08 17:58 ` Philipp Stephani
2017-10-08 18:42 ` Eli Zaretskii
2017-10-09 6:22 ` Paul Eggert
1 sibling, 2 replies; 81+ messages in thread
From: Philipp Stephani @ 2017-10-08 17:58 UTC (permalink / raw)
To: Paul Eggert, Eli Zaretskii; +Cc: emacs-devel
[-- Attachment #1.1: Type: text/plain, Size: 3903 bytes --]
Paul Eggert <eggert@cs.ucla.edu> schrieb am Di., 3. Okt. 2017 um 22:52 Uhr:
> On 10/03/2017 05:26 AM, Philipp Stephani wrote:
>
> > > +#if __has_attribute (warn_unused_result)
> > > +# define ATTRIBUTE_WARN_UNUSED_RESULT __attribute__
> > ((__warn_unused_result__))
> > > +#else
> > > +# define ATTRIBUTE_WARN_UNUSED_RESULT
> > > +#endif
> >
> > Hmm... why do we need this attribute? You use it with 2 static
> > functions, so this sounds like a left-over from the development stage?
> >
> > It's not strictly needed (and if you don't like it, I can remove it),
> > but it helps catching memory leaks.
>
> I've found __warn_unused_result__ to be more trouble than it's worth in
> library functions. Emacs has lib/ignore-value.h in order to work around
> __warn_unused_result__ brain damage, for example. For static functions
> the problem is less, but still, I mildly favor leaving it out.
>
OK, I'll remove it.
>
> >
> > And btw, how can size be greater than SIZE_MAX in this case? This is
> > a valid Lisp object, isn't it? (There are more such tests in the
> > patch, e.g. in lisp_to_json, and I think they, too, are redundant.)
> >
> >
> > Depends on the range of ptrdiff_t and size_t. IIUC nothing in the C
> > standard guarantees PTRDIFF_MAX <= SIZE_MAX. If we want to guarantee
> > that, we should probably add at least a static assertion.
>
> There should be no need for that. No Lisp object can exceed min
> (PTRDIFF_MAX, SIZE_MAX) bytes; alloc.c guarantees this, so that Emacs
> should work even on oddball platforms where SIZE_MAX < PTRDIFF_MAX, and
> there is no need for a runtime check here.
>
Should I at least add an eassert to document this?
>
> > > + if (buffer_and_size->size > PTRDIFF_MAX)
> > > + xsignal1 (Qoverflow_error, build_string ("buffer too large"));
> > > + insert (buffer_and_size->buffer, buffer_and_size->size);
> >
> > I don't think we need this test here, as 'insert' already has the
> > equivalent test in one of its subroutines.
> >
> > It can't, because it takes the byte length as ptrdiff_t. We need to
> > check before whether the size is actually in the valid range of
> ptrdiff_t.
>
> A PTRDIFF_MAX test is needed if the JSON library can return strings
> longer than PTRDIFF_MAX. Please just use buffer_overflow () to signal
> the error, though.
>
Done.
>
> > > + case JSON_INTEGER:
> > > + {
> > > + json_int_t value = json_integer_value (json);
> > > + if (FIXNUM_OVERFLOW_P (value))
> > > + xsignal1 (Qoverflow_error,
> > > + build_string ("JSON integer is too large"));
> > > + return make_number (value);
> >
> > This overflow test is also redundant, as make_number already does it.
> >
> > It can't, because json_int_t can be larger than EMACS_INT. Also,
> > make_number doesn't contain any checks.
>
> You're right that a test is needed. However, elsewhere we just use
> xsignal0 (Qoverflow_error) for this sort of thing, and I suggest being
> consistent and doing that here as well. Similarly for other calls to
> xsignal1 with Qoverflow_error.
>
Done.
>
> > > + case JSON_STRING:
> > > + {
> > > + size_t size = json_string_length (json);
> > > + if (FIXNUM_OVERFLOW_P (size))
> > > + xsignal1 (Qoverflow_error, build_string ("JSON string is
> > too long"));
> > > + return json_make_string (json_string_value (json), size);
> >
> > Once again, the overflow test is redundant, as make_specified_string
> > (called by json_make_string) already includes an equivalent test.
> >
> > And once again, we need to check at least whether the size fits into
> > ptrdiff_t.
> You're right, a test is needed. However, I suggest using string_overflow
> () to signal string overflows.
>
>
Done. I've attached a new patch (which currently segfaults on
decode_coding_gap, but the call to that function doesn't seem to be
required anyway).
[-- Attachment #1.2: Type: text/html, Size: 5439 bytes --]
[-- Attachment #2: 0001-Implement-native-JSON-support-using-Jansson.txt --]
[-- Type: text/plain, Size: 31728 bytes --]
From 16919f7ffda7fb057d4c4486f92396dab636e5b8 Mon Sep 17 00:00:00 2001
From: Philipp Stephani <phst@google.com>
Date: Mon, 18 Sep 2017 10:51:39 +0200
Subject: [PATCH] Implement native JSON support using Jansson
* configure.ac: New option --with-json.
* src/json.c (Fjson_serialize, Fjson_insert, Fjson_parse_string)
(Fjson_parse_buffer): New defuns.
(json_has_prefix, json_has_suffix, json_make_string)
(json_build_string, json_encode, json_out_of_memory, json_parse_error)
(json_release_object, check_string_without_embedded_nulls, json_check)
(lisp_to_json, lisp_to_json_toplevel, lisp_to_json_toplevel_1)
(json_insert, json_insert_callback, json_to_lisp)
(json_read_buffer_callback, Fjson_parse_buffer, define_error): New
helper functions.
(syms_of_json): New file.
* src/lisp.h: Declaration for syms_of_json.
* src/conf_post.h (ATTRIBUTE_WARN_UNUSED_RESULT): New attribute macro.
* src/emacs.c (main): Enable JSON functions.
* src/eval.c (internal_catch_all, internal_catch_all_1): New helper
functions to catch all signals.
(syms_of_eval): Add uninterned symbol to signify out of memory.
* src/Makefile.in (JSON_LIBS, JSON_CFLAGS, JSON_OBJ, EMACS_CFLAGS)
(base_obj, LIBES): Compile json.c if --with-json is enabled.
* test/src/json-tests.el (json-serialize/roundtrip)
(json-serialize/object, json-parse-string/object)
(json-parse-string/string, json-serialize/string)
(json-parse-string/incomplete, json-parse-string/trailing)
(json-parse-buffer/incomplete, json-parse-buffer/trailing): New unit
tests.
---
configure.ac | 20 +-
etc/NEWS | 7 +
src/Makefile.in | 11 +-
src/conf_post.h | 6 +
src/emacs.c | 4 +
src/eval.c | 54 +++++
src/json.c | 542 +++++++++++++++++++++++++++++++++++++++++++++++++
src/lisp.h | 6 +
test/src/json-tests.el | 97 +++++++++
9 files changed, 743 insertions(+), 4 deletions(-)
create mode 100644 src/json.c
create mode 100644 test/src/json-tests.el
diff --git a/configure.ac b/configure.ac
index 627a392a5b..7d191a32cd 100644
--- a/configure.ac
+++ b/configure.ac
@@ -355,6 +355,7 @@ AC_DEFUN
OPTION_DEFAULT_OFF([cairo],[compile with Cairo drawing (experimental)])
OPTION_DEFAULT_ON([xml2],[don't compile with XML parsing support])
OPTION_DEFAULT_ON([imagemagick],[don't compile with ImageMagick image support])
+OPTION_DEFAULT_ON([json], [don't compile with native JSON support])
OPTION_DEFAULT_ON([xft],[don't use XFT for anti aliased fonts])
OPTION_DEFAULT_ON([libotf],[don't use libotf for OpenType font support])
@@ -2859,6 +2860,22 @@ AC_DEFUN
AC_SUBST(LIBSYSTEMD_LIBS)
AC_SUBST(LIBSYSTEMD_CFLAGS)
+HAVE_JSON=no
+JSON_OBJ=
+
+if test "${with_json}" = yes; then
+ EMACS_CHECK_MODULES([JSON], [jansson >= 2.5],
+ [HAVE_JSON=yes], [HAVE_JSON=no])
+ if test "${HAVE_JSON}" = yes; then
+ AC_DEFINE(HAVE_JSON, 1, [Define if using Jansson.])
+ JSON_OBJ=json.o
+ fi
+fi
+
+AC_SUBST(JSON_LIBS)
+AC_SUBST(JSON_CFLAGS)
+AC_SUBST(JSON_OBJ)
+
NOTIFY_OBJ=
NOTIFY_SUMMARY=no
@@ -5370,7 +5387,7 @@ AC_DEFUN
for opt in XAW3D XPM JPEG TIFF GIF PNG RSVG CAIRO IMAGEMAGICK SOUND GPM DBUS \
GCONF GSETTINGS NOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT \
LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS X_TOOLKIT OLDXMENU X11 NS MODULES \
- XWIDGETS LIBSYSTEMD CANNOT_DUMP LCMS2; do
+ XWIDGETS LIBSYSTEMD JSON CANNOT_DUMP LCMS2; do
case $opt in
CANNOT_DUMP) eval val=\${$opt} ;;
@@ -5420,6 +5437,7 @@ AC_DEFUN
Does Emacs use -lotf? ${HAVE_LIBOTF}
Does Emacs use -lxft? ${HAVE_XFT}
Does Emacs use -lsystemd? ${HAVE_LIBSYSTEMD}
+ Does Emacs use -ljansson? ${HAVE_JSON}
Does Emacs directly use zlib? ${HAVE_ZLIB}
Does Emacs have dynamic modules support? ${HAVE_MODULES}
Does Emacs use toolkit scroll bars? ${USE_TOOLKIT_SCROLL_BARS}
diff --git a/etc/NEWS b/etc/NEWS
index 75a98d1500..4a9976828b 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -25,6 +25,13 @@ When you add a new item, use the appropriate mark if you are sure it applies,
\f
* Installation Changes in Emacs 27.1
+** The new configure option '--with-json' adds support for JSON using
+the Jansson library. It is on by default; use 'configure
+--with-json=no' to build without Jansson support. The new JSON
+functions 'json-serialize', 'json-insert', 'json-parse-string', and
+'json-parse-buffer' are typically much faster than their Lisp
+counterparts from json.el.
+
\f
* Startup Changes in Emacs 27.1
diff --git a/src/Makefile.in b/src/Makefile.in
index 9a8c9c85f0..b395627893 100644
--- a/src/Makefile.in
+++ b/src/Makefile.in
@@ -312,6 +312,10 @@ LIBGNUTLS_CFLAGS =
LIBSYSTEMD_LIBS = @LIBSYSTEMD_LIBS@
LIBSYSTEMD_CFLAGS = @LIBSYSTEMD_CFLAGS@
+JSON_LIBS = @JSON_LIBS@
+JSON_CFLAGS = @JSON_CFLAGS@
+JSON_OBJ = @JSON_OBJ@
+
INTERVALS_H = dispextern.h intervals.h composite.h
GETLOADAVG_LIBS = @GETLOADAVG_LIBS@
@@ -363,7 +367,7 @@ EMACS_CFLAGS=
$(WEBKIT_CFLAGS) \
$(SETTINGS_CFLAGS) $(FREETYPE_CFLAGS) $(FONTCONFIG_CFLAGS) \
$(LIBOTF_CFLAGS) $(M17N_FLT_CFLAGS) $(DEPFLAGS) \
- $(LIBSYSTEMD_CFLAGS) \
+ $(LIBSYSTEMD_CFLAGS) $(JSON_CFLAGS) \
$(LIBGNUTLS_CFLAGS) $(NOTIFY_CFLAGS) $(CAIRO_CFLAGS) \
$(WERROR_CFLAGS)
ALL_CFLAGS = $(EMACS_CFLAGS) $(WARN_CFLAGS) $(CFLAGS)
@@ -397,7 +401,7 @@ base_obj =
thread.o systhread.o \
$(if $(HYBRID_MALLOC),sheap.o) \
$(MSDOS_OBJ) $(MSDOS_X_OBJ) $(NS_OBJ) $(CYGWIN_OBJ) $(FONT_OBJ) \
- $(W32_OBJ) $(WINDOW_SYSTEM_OBJ) $(XGSELOBJ)
+ $(W32_OBJ) $(WINDOW_SYSTEM_OBJ) $(XGSELOBJ) $(JSON_OBJ)
obj = $(base_obj) $(NS_OBJC_OBJ)
## Object files used on some machine or other.
@@ -493,7 +497,8 @@ LIBES =
$(LIBS_TERMCAP) $(GETLOADAVG_LIBS) $(SETTINGS_LIBS) $(LIBSELINUX_LIBS) \
$(FREETYPE_LIBS) $(FONTCONFIG_LIBS) $(LIBOTF_LIBS) $(M17N_FLT_LIBS) \
$(LIBGNUTLS_LIBS) $(LIB_PTHREAD) $(GETADDRINFO_A_LIBS) $(LIBLCMS2) \
- $(NOTIFY_LIBS) $(LIB_MATH) $(LIBZ) $(LIBMODULES) $(LIBSYSTEMD_LIBS)
+ $(NOTIFY_LIBS) $(LIB_MATH) $(LIBZ) $(LIBMODULES) $(LIBSYSTEMD_LIBS) \
+ $(JSON_LIBS)
## FORCE it so that admin/unidata can decide whether these files
## are up-to-date. Although since charprop depends on bootstrap-emacs,
diff --git a/src/conf_post.h b/src/conf_post.h
index febdb8b8bf..1a7f51fa51 100644
--- a/src/conf_post.h
+++ b/src/conf_post.h
@@ -338,6 +338,12 @@ extern int emacs_setenv_TZ (char const *);
# define ATTRIBUTE_NO_SANITIZE_ADDRESS
#endif
+#if __has_attribute (warn_unused_result)
+# define ATTRIBUTE_WARN_UNUSED_RESULT __attribute__ ((__warn_unused_result__))
+#else
+# define ATTRIBUTE_WARN_UNUSED_RESULT
+#endif
+
/* gcc -fsanitize=address does not work with vfork in Fedora 25 x86-64.
For now, assume that this problem occurs on all platforms. */
#if ADDRESS_SANITIZER && !defined vfork
diff --git a/src/emacs.c b/src/emacs.c
index 0fe7d9113b..39761016ef 100644
--- a/src/emacs.c
+++ b/src/emacs.c
@@ -1610,6 +1610,10 @@ Using an Emacs configured with --with-x-toolkit=lucid does not have this problem
syms_of_threads ();
syms_of_profiler ();
+#ifdef HAVE_JSON
+ syms_of_json ();
+#endif
+
keys_of_casefiddle ();
keys_of_cmds ();
keys_of_buffer ();
diff --git a/src/eval.c b/src/eval.c
index acda64e7f0..11804d1819 100644
--- a/src/eval.c
+++ b/src/eval.c
@@ -1416,6 +1416,57 @@ internal_condition_case_n (Lisp_Object (*bfun) (ptrdiff_t, Lisp_Object *),
}
}
+static Lisp_Object
+internal_catch_all_1 (Lisp_Object (*function) (void *), void *argument)
+{
+ struct handler *c = push_handler_nosignal (Qt, CATCHER_ALL);
+ if (c == NULL)
+ return Qcatch_all_memory_full;
+
+ if (sys_setjmp (c->jmp) == 0)
+ {
+ Lisp_Object val = function (argument);
+ eassert (handlerlist == c);
+ handlerlist = c->next;
+ return val;
+ }
+ else
+ {
+ eassert (handlerlist == c);
+ Lisp_Object val = c->val;
+ handlerlist = c->next;
+ Fsignal (Qno_catch, val);
+ }
+}
+
+/* Like a combination of internal_condition_case_1 and internal_catch.
+ Catches all signals and throws. Never exits nonlocally; returns
+ Qcatch_all_memory_full if no handler could be allocated. */
+
+Lisp_Object
+internal_catch_all (Lisp_Object (*function) (void *), void *argument,
+ Lisp_Object (*handler) (Lisp_Object))
+{
+ struct handler *c = push_handler_nosignal (Qt, CONDITION_CASE);
+ if (c == NULL)
+ return Qcatch_all_memory_full;
+
+ if (sys_setjmp (c->jmp) == 0)
+ {
+ Lisp_Object val = internal_catch_all_1 (function, argument);
+ eassert (handlerlist == c);
+ handlerlist = c->next;
+ return val;
+ }
+ else
+ {
+ eassert (handlerlist == c);
+ Lisp_Object val = c->val;
+ handlerlist = c->next;
+ return handler (val);
+ }
+}
+
struct handler *
push_handler (Lisp_Object tag_ch_val, enum handlertype handlertype)
{
@@ -4065,6 +4116,9 @@ alist of active lexical bindings. */);
inhibit_lisp_code = Qnil;
+ DEFSYM (Qcatch_all_memory_full, "catch-all-memory-full");
+ Funintern (Qcatch_all_memory_full, Qnil);
+
defsubr (&Sor);
defsubr (&Sand);
defsubr (&Sif);
diff --git a/src/json.c b/src/json.c
new file mode 100644
index 0000000000..f704479dd7
--- /dev/null
+++ b/src/json.c
@@ -0,0 +1,542 @@
+/* JSON parsing and serialization.
+
+Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GNU Emacs.
+
+GNU Emacs is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 3 of the License, or (at
+your option) any later version.
+
+GNU Emacs is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. */
+
+#include <config.h>
+
+#include <stddef.h>
+#include <stdint.h>
+
+#include <jansson.h>
+
+#include "lisp.h"
+#include "buffer.h"
+#include "coding.h"
+
+static bool
+json_has_prefix (const char *string, const char *prefix)
+{
+ size_t string_len = strlen (string);
+ size_t prefix_len = strlen (prefix);
+ return string_len >= prefix_len && memcmp (string, prefix, prefix_len) == 0;
+}
+
+static bool
+json_has_suffix (const char *string, const char *suffix)
+{
+ size_t string_len = strlen (string);
+ size_t suffix_len = strlen (suffix);
+ return string_len >= suffix_len
+ && memcmp (string + string_len - suffix_len, suffix, suffix_len) == 0;
+}
+
+static Lisp_Object
+json_make_string (const char *data, ptrdiff_t size)
+{
+ return make_specified_string (data, -1, size, true);
+}
+
+static Lisp_Object
+json_build_string (const char *data)
+{
+ size_t size = strlen (data);
+ if (FIXNUM_OVERFLOW_P (size))
+ string_overflow ();
+ return json_make_string (data, size);
+}
+
+static Lisp_Object
+json_encode (Lisp_Object string)
+{
+ return code_convert_string (string, Qutf_8_unix, Qt, true, true, true);
+}
+
+static _Noreturn void
+json_out_of_memory (void)
+{
+ xsignal0 (Qjson_out_of_memory);
+}
+
+static _Noreturn void
+json_parse_error (const json_error_t *error)
+{
+ Lisp_Object symbol;
+ /* FIXME: Upstream Jansson should have a way to return error codes
+ without parsing the error messages. See
+ https://github.com/akheron/jansson/issues/352. */
+ if (json_has_suffix (error->text, "expected near end of file"))
+ symbol = Qjson_end_of_file;
+ else if (json_has_prefix (error->text, "end of file expected"))
+ symbol = Qjson_trailing_content;
+ else
+ symbol = Qjson_parse_error;
+ xsignal (symbol,
+ list5 (json_build_string (error->text),
+ json_build_string (error->source), make_natnum (error->line),
+ make_natnum (error->column), make_natnum (error->position)));
+}
+
+static void
+json_release_object (void *object)
+{
+ json_decref (object);
+}
+
+static void
+check_string_without_embedded_nulls (Lisp_Object object)
+{
+ CHECK_STRING (object);
+ CHECK_TYPE (memchr (SDATA (object), '\0', SBYTES (object)) == NULL,
+ Qstring_without_embedded_nulls_p, object);
+}
+
+static json_t *
+json_check (json_t *object)
+{
+ if (object == NULL)
+ json_out_of_memory ();
+ return object;
+}
+
+static json_t *lisp_to_json (Lisp_Object);
+
+/* This returns Lisp_Object so we can use unbind_to. The return value
+ is always nil. */
+
+static _GL_ARG_NONNULL ((2)) Lisp_Object
+lisp_to_json_toplevel_1 (Lisp_Object lisp, json_t **json)
+{
+ if (VECTORP (lisp))
+ {
+ ptrdiff_t size = ASIZE (lisp);
+ eassert (size >= 0);
+ eassert (size <= SIZE_MAX);
+ *json = json_check (json_array ());
+ ptrdiff_t count = SPECPDL_INDEX ();
+ record_unwind_protect_ptr (json_release_object, json);
+ for (ptrdiff_t i = 0; i < size; ++i)
+ {
+ int status
+ = json_array_append_new (*json, lisp_to_json (AREF (lisp, i)));
+ if (status == -1)
+ json_out_of_memory ();
+ eassert (status == 0);
+ }
+ eassert (json_array_size (*json) == size);
+ clear_unwind_protect (count);
+ return unbind_to (count, Qnil);
+ }
+ else if (HASH_TABLE_P (lisp))
+ {
+ struct Lisp_Hash_Table *h = XHASH_TABLE (lisp);
+ *json = json_check (json_object ());
+ ptrdiff_t count = SPECPDL_INDEX ();
+ record_unwind_protect_ptr (json_release_object, *json);
+ for (ptrdiff_t i = 0; i < HASH_TABLE_SIZE (h); ++i)
+ if (!NILP (HASH_HASH (h, i)))
+ {
+ Lisp_Object key = json_encode (HASH_KEY (h, i));
+ /* We can’t specify the length, so the string must be
+ null-terminated. */
+ check_string_without_embedded_nulls (key);
+ int status = json_object_set_new (*json, SSDATA (key),
+ lisp_to_json (HASH_VALUE (h, i)));
+ if (status == -1)
+ json_out_of_memory ();
+ eassert (status == 0);
+ }
+ clear_unwind_protect (count);
+ return unbind_to (count, Qnil);
+ }
+ wrong_type_argument (Qjson_value_p, lisp);
+}
+
+static json_t *
+lisp_to_json_toplevel (Lisp_Object lisp)
+{
+ if (++lisp_eval_depth > max_lisp_eval_depth)
+ xsignal0 (Qjson_object_too_deep);
+ json_t *json;
+ lisp_to_json_toplevel_1 (lisp, &json);
+ --lisp_eval_depth;
+ return json;
+}
+
+static json_t *
+lisp_to_json (Lisp_Object lisp)
+{
+ if (EQ (lisp, QCnull))
+ return json_check (json_null ());
+ else if (EQ (lisp, QCfalse))
+ return json_check (json_false ());
+ else if (EQ (lisp, Qt))
+ return json_check (json_true ());
+ else if (INTEGERP (lisp))
+ {
+ CHECK_TYPE_RANGED_INTEGER (json_int_t, lisp);
+ return json_check (json_integer (XINT (lisp)));
+ }
+ else if (FLOATP (lisp))
+ return json_check (json_real (XFLOAT_DATA (lisp)));
+ else if (STRINGP (lisp))
+ {
+ Lisp_Object encoded = json_encode (lisp);
+ ptrdiff_t size = SBYTES (encoded);
+ eassert (size >= 0);
+ eassert (size <= SIZE_MAX);
+ return json_check (json_stringn (SSDATA (encoded), size));
+ }
+
+ /* LISP now must be a vector or hashtable. */
+ return lisp_to_json_toplevel (lisp);
+}
+
+DEFUN ("json-serialize", Fjson_serialize, Sjson_serialize, 1, 1, NULL,
+ doc: /* Return the JSON representation of OBJECT as a string.
+OBJECT must be a vector or hashtable, and its elements can recursively
+contain `:null', `:false', t, numbers, strings, or other vectors and
+hashtables. `:null', `:false', and t will be converted to JSON null,
+false, and true values, respectively. Vectors will be converted to
+JSON arrays, and hashtables to JSON objects. Hashtable keys must be
+strings without embedded null characters and must be unique within
+each object. */)
+ (Lisp_Object object)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+
+ json_t *json = lisp_to_json_toplevel (object);
+ record_unwind_protect_ptr (json_release_object, json);
+
+ char *string = json_dumps (json, JSON_COMPACT);
+ if (string == NULL)
+ json_out_of_memory ();
+ record_unwind_protect_ptr (free, string);
+
+ return unbind_to (count, json_build_string (string));
+}
+
+struct json_buffer_and_size
+{
+ const char *buffer;
+ size_t nbytes;
+ ptrdiff_t nchars;
+};
+
+static Lisp_Object
+json_insert (void *data)
+{
+ struct json_buffer_and_size *buffer_and_size = data;
+ size_t nbytes = buffer_and_size->nbytes;
+ if (nbytes == 0)
+ return Qnil;
+ if (nbytes > PTRDIFF_MAX)
+ buffer_overflow ();
+ const char *buffer = buffer_and_size->buffer;
+ ptrdiff_t nchars
+ = chars_in_text ((const unsigned char *) buffer_and_size->buffer, nbytes);
+ insert_1_both (buffer, nchars, nbytes, false, false, false);
+ buffer_and_size->nchars = nchars;
+ return Qnil;
+}
+
+struct json_insert_data
+{
+ /* nil if json_insert succeeded, otherwise the symbol
+ Qcatch_all_memory_full or a cons (ERROR-SYMBOL . ERROR-DATA). */
+ Lisp_Object error;
+ ptrdiff_t nchars;
+ ptrdiff_t nbytes;
+};
+
+static int
+json_insert_callback (const char *buffer, size_t size, void *data)
+{
+ /* This function may not exit nonlocally. */
+ struct json_insert_data *d = data;
+ struct json_buffer_and_size buffer_and_size
+ = {.buffer = buffer, .nbytes = size, .nchars = 0};
+ d->error = internal_catch_all (json_insert, &buffer_and_size, Fidentity);
+ bool overflow = INT_ADD_WRAPV (d->nchars, buffer_and_size.nchars, &d->nchars);
+ eassert (!overflow);
+ overflow = INT_ADD_WRAPV (d->nbytes, size, &d->nbytes);
+ eassert (!overflow);
+ return NILP (d->error) ? 0 : -1;
+}
+
+DEFUN ("json-insert", Fjson_insert, Sjson_insert, 1, 1, NULL,
+ doc: /* Insert the JSON representation of OBJECT before point.
+This is the same as (insert (json-serialize OBJECT)), but potentially
+faster. See the function `json-serialize' for allowed values of
+OBJECT. */)
+ (Lisp_Object object)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+
+ json_t *json = lisp_to_json (object);
+ record_unwind_protect_ptr (json_release_object, json);
+
+ ptrdiff_t opoint = PT;
+ prepare_to_modify_buffer (opoint, opoint, NULL);
+
+ struct json_insert_data data;
+ int status
+ = json_dump_callback (json, json_insert_callback, &data, JSON_COMPACT);
+ if (status == -1)
+ {
+ if (CONSP (data.error))
+ xsignal (XCAR (data.error), XCDR (data.error));
+ else
+ json_out_of_memory ();
+ }
+ eassert (status == 0);
+ eassert (NILP (data.error));
+
+ struct coding_system coding;
+ setup_coding_system (Qutf_8_unix, &coding);
+ eassert (PT == GPT);
+ decode_coding_gap (&coding, data.nchars, data.nbytes);
+
+ signal_after_change (opoint, 0, data.nchars);
+ update_compositions (opoint, PT, CHECK_BORDER);
+ return unbind_to (count, Qnil);
+}
+
+static _GL_ARG_NONNULL ((1)) Lisp_Object
+json_to_lisp (json_t *json)
+{
+ switch (json_typeof (json))
+ {
+ case JSON_NULL:
+ return QCnull;
+ case JSON_FALSE:
+ return QCfalse;
+ case JSON_TRUE:
+ return Qt;
+ case JSON_INTEGER:
+ {
+ json_int_t value = json_integer_value (json);
+ if (FIXNUM_OVERFLOW_P (value))
+ xsignal0 (Qoverflow_error);
+ return make_number (value);
+ }
+ case JSON_REAL:
+ return make_float (json_real_value (json));
+ case JSON_STRING:
+ {
+ size_t size = json_string_length (json);
+ if (FIXNUM_OVERFLOW_P (size))
+ string_overflow ();
+ return json_make_string (json_string_value (json), size);
+ }
+ case JSON_ARRAY:
+ {
+ if (++lisp_eval_depth > max_lisp_eval_depth)
+ xsignal0 (Qjson_object_too_deep);
+ size_t size = json_array_size (json);
+ if (FIXNUM_OVERFLOW_P (size))
+ xsignal0 (Qoverflow_error);
+ Lisp_Object result = Fmake_vector (make_natnum (size), Qunbound);
+ for (ptrdiff_t i = 0; i < size; ++i)
+ ASET (result, i,
+ json_to_lisp (json_array_get (json, i)));
+ --lisp_eval_depth;
+ return result;
+ }
+ case JSON_OBJECT:
+ {
+ if (++lisp_eval_depth > max_lisp_eval_depth)
+ xsignal0 (Qjson_object_too_deep);
+ size_t size = json_object_size (json);
+ if (FIXNUM_OVERFLOW_P (size))
+ xsignal0 (Qoverflow_error);
+ Lisp_Object result = CALLN (Fmake_hash_table, QCtest, Qequal,
+ QCsize, make_natnum (size));
+ struct Lisp_Hash_Table *h = XHASH_TABLE (result);
+ const char *key_str;
+ json_t *value;
+ json_object_foreach (json, key_str, value)
+ {
+ Lisp_Object key = json_build_string (key_str);
+ EMACS_UINT hash;
+ ptrdiff_t i = hash_lookup (h, key, &hash);
+ eassert (i < 0);
+ hash_put (h, key, json_to_lisp (value), hash);
+ }
+ --lisp_eval_depth;
+ return result;
+ }
+ }
+ /* Can’t get here. */
+ emacs_abort ();
+}
+
+DEFUN ("json-parse-string", Fjson_parse_string, Sjson_parse_string, 1, 1, NULL,
+ doc: /* Parse the JSON STRING into a Lisp object.
+This is essentially the reverse operation of `json-serialize', which
+see. The returned object will be a vector or hashtable. Its elements
+will be `:null', `:false', t, numbers, strings, or further vectors and
+hashtables. If there are duplicate keys in an object, all but the
+last one are ignored. If STRING doesn't contain a valid JSON object,
+an error of type `json-parse-error' is signaled. */)
+ (Lisp_Object string)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+ Lisp_Object encoded = json_encode (string);
+ check_string_without_embedded_nulls (encoded);
+
+ json_error_t error;
+ json_t *object = json_loads (SSDATA (encoded), 0, &error);
+ if (object == NULL)
+ json_parse_error (&error);
+
+ /* Avoid leaking the object in case of further errors. */
+ if (object != NULL)
+ record_unwind_protect_ptr (json_release_object, object);
+
+ return unbind_to (count, json_to_lisp (object));
+}
+
+struct json_read_buffer_data
+{
+ ptrdiff_t point;
+};
+
+static size_t
+json_read_buffer_callback (void *buffer, size_t buflen, void *data)
+{
+ struct json_read_buffer_data *d = data;
+
+ /* First, parse from point to the gap or the end of the accessible
+ portion, whatever is closer. */
+ ptrdiff_t point = d->point;
+ ptrdiff_t end;
+ {
+ bool overflow = INT_ADD_WRAPV (BUFFER_CEILING_OF (point), 1, &end);
+ eassert (!overflow);
+ }
+ size_t count;
+ {
+ bool overflow = INT_SUBTRACT_WRAPV (end, point, &count);
+ eassert (!overflow);
+ }
+ if (buflen < count)
+ count = buflen;
+ memcpy (buffer, BYTE_POS_ADDR (point), count);
+ {
+ bool overflow = INT_ADD_WRAPV (d->point, count, &d->point);
+ eassert (!overflow);
+ }
+ return count;
+}
+
+DEFUN ("json-parse-buffer", Fjson_parse_buffer, Sjson_parse_buffer,
+ 0, 0, NULL,
+ doc: /* Read JSON object from current buffer starting at point.
+This is similar to `json-parse-string', which see. Move point after
+the end of the object if parsing was successful. On error, point is
+not moved. */)
+ (void)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+
+ ptrdiff_t point = PT_BYTE;
+ struct json_read_buffer_data data = {.point = point};
+ json_error_t error;
+ json_t *object = json_load_callback (json_read_buffer_callback, &data,
+ JSON_DISABLE_EOF_CHECK, &error);
+
+ if (object == NULL)
+ json_parse_error (&error);
+
+ /* Avoid leaking the object in case of further errors. */
+ record_unwind_protect_ptr (json_release_object, object);
+
+ /* Convert and then move point only if everything succeeded. */
+ Lisp_Object lisp = json_to_lisp (object);
+
+ {
+ /* Adjust point by how much we just read. Do this here because
+ tokener->char_offset becomes incorrect below. */
+ bool overflow = INT_ADD_WRAPV (point, error.position, &point);
+ eassert (!overflow);
+ eassert (point <= ZV_BYTE);
+ SET_PT_BOTH (BYTE_TO_CHAR (point), point);
+ }
+
+ return unbind_to (count, lisp);
+}
+
+/* Simplified version of ‘define-error’ that works with pure
+ objects. */
+
+static void
+define_error (Lisp_Object name, const char *message, Lisp_Object parent)
+{
+ eassert (SYMBOLP (name));
+ eassert (SYMBOLP (parent));
+ Lisp_Object parent_conditions = Fget (parent, Qerror_conditions);
+ eassert (CONSP (parent_conditions));
+ eassert (!NILP (Fmemq (parent, parent_conditions)));
+ eassert (NILP (Fmemq (name, parent_conditions)));
+ Fput (name, Qerror_conditions, pure_cons (name, parent_conditions));
+ Fput (name, Qerror_message, build_pure_c_string (message));
+}
+
+void
+syms_of_json (void)
+{
+ DEFSYM (QCnull, ":null");
+ DEFSYM (QCfalse, ":false");
+
+ DEFSYM (Qstring_without_embedded_nulls_p, "string-without-embedded-nulls-p");
+ DEFSYM (Qjson_value_p, "json-value-p");
+
+ DEFSYM (Qutf_8_unix, "utf-8-unix");
+
+ DEFSYM (Qjson_error, "json-error");
+ DEFSYM (Qjson_out_of_memory, "json-out-of-memory");
+ DEFSYM (Qjson_parse_error, "json-parse-error");
+ DEFSYM (Qjson_end_of_file, "json-end-of-file");
+ DEFSYM (Qjson_trailing_content, "json-trailing-content");
+ DEFSYM (Qjson_object_too_deep, "json-object-too-deep");
+ define_error (Qjson_error, "generic JSON error", Qerror);
+ define_error (Qjson_out_of_memory,
+ "not enough memory for creating JSON object", Qjson_error);
+ define_error (Qjson_parse_error, "could not parse JSON stream",
+ Qjson_error);
+ define_error (Qjson_end_of_file, "end of JSON stream", Qjson_parse_error);
+ define_error (Qjson_trailing_content, "trailing content after JSON stream",
+ Qjson_parse_error);
+ define_error (Qjson_object_too_deep,
+ "object cyclic or Lisp evaluation too deep", Qjson_error);
+
+ DEFSYM (Qpure, "pure");
+ DEFSYM (Qside_effect_free, "side-effect-free");
+
+ DEFSYM (Qjson_serialize, "json-serialize");
+ DEFSYM (Qjson_parse_string, "json-parse-string");
+ Fput (Qjson_serialize, Qpure, Qt);
+ Fput (Qjson_serialize, Qside_effect_free, Qt);
+ Fput (Qjson_parse_string, Qpure, Qt);
+ Fput (Qjson_parse_string, Qside_effect_free, Qt);
+
+ defsubr (&Sjson_serialize);
+ defsubr (&Sjson_insert);
+ defsubr (&Sjson_parse_string);
+ defsubr (&Sjson_parse_buffer);
+}
diff --git a/src/lisp.h b/src/lisp.h
index 266370333f..3ab8fdc4f3 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -3441,6 +3441,11 @@ extern int x_bitmap_mask (struct frame *, ptrdiff_t);
extern void reset_image_types (void);
extern void syms_of_image (void);
+#ifdef HAVE_JSON
+/* Defined in json.c. */
+extern void syms_of_json (void);
+#endif
+
/* Defined in insdel.c. */
extern void move_gap_both (ptrdiff_t, ptrdiff_t);
extern _Noreturn void buffer_overflow (void);
@@ -3864,6 +3869,7 @@ extern Lisp_Object internal_condition_case_2 (Lisp_Object (*) (Lisp_Object, Lisp
extern Lisp_Object internal_condition_case_n
(Lisp_Object (*) (ptrdiff_t, Lisp_Object *), ptrdiff_t, Lisp_Object *,
Lisp_Object, Lisp_Object (*) (Lisp_Object, ptrdiff_t, Lisp_Object *));
+extern Lisp_Object internal_catch_all (Lisp_Object (*) (void *), void *, Lisp_Object (*) (Lisp_Object));
extern struct handler *push_handler (Lisp_Object, enum handlertype);
extern struct handler *push_handler_nosignal (Lisp_Object, enum handlertype);
extern void specbind (Lisp_Object, Lisp_Object);
diff --git a/test/src/json-tests.el b/test/src/json-tests.el
new file mode 100644
index 0000000000..5d3c84a136
--- /dev/null
+++ b/test/src/json-tests.el
@@ -0,0 +1,97 @@
+;;; json-tests.el --- unit tests for json.c -*- lexical-binding: t; -*-
+
+;; Copyright (C) 2017 Free Software Foundation, Inc.
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
+
+;;; Commentary:
+
+;; Unit tests for src/json.c.
+
+;;; Code:
+
+(require 'cl-lib)
+(require 'map)
+
+(ert-deftest json-serialize/roundtrip ()
+ (let ((lisp [:null :false t 0 123 -456 3.75 "abcαβγ"])
+ (json "[null,false,true,0,123,-456,3.75,\"abcαβγ\"]"))
+ (should (equal (json-serialize lisp) json))
+ (with-temp-buffer
+ (json-insert lisp)
+ (should (equal (buffer-string) json))
+ (should (eobp)))
+ (should (equal (json-parse-string json) lisp))
+ (with-temp-buffer
+ (insert json)
+ (goto-char 1)
+ (should (equal (json-parse-buffer) lisp))
+ (should (eobp)))))
+
+(ert-deftest json-serialize/object ()
+ (let ((table (make-hash-table :test #'equal)))
+ (puthash "abc" [1 2 t] table)
+ (puthash "def" :null table)
+ (should (equal (json-serialize table)
+ "{\"abc\":[1,2,true],\"def\":null}"))))
+
+(ert-deftest json-parse-string/object ()
+ (let ((actual
+ (json-parse-string
+ "{ \"abc\" : [1, 2, true], \"def\" : null, \"abc\" : [9, false] }\n")))
+ (should (hash-table-p actual))
+ (should (equal (hash-table-count actual) 2))
+ (should (equal (cl-sort (map-pairs actual) #'string< :key #'car)
+ '(("abc" . [9 :false]) ("def" . :null))))))
+
+(ert-deftest json-parse-string/string ()
+ (should-error (json-parse-string "[\"formfeed\f\"]") :type 'json-parse-error)
+ (should (equal (json-parse-string "[\"foo \\\"bar\\\"\"]") ["foo \"bar\""]))
+ (should (equal (json-parse-string "[\"abcαβγ\"]") ["abcαβγ"]))
+ (should (equal (json-parse-string "[\"\\nasd\\u0444\\u044b\\u0432fgh\\t\"]")
+ ["\nasdфывfgh\t"]))
+ (should (equal (json-parse-string "[\"\\uD834\\uDD1E\"]") ["\U0001D11E"]))
+ (should-error (json-parse-string "foo") :type 'json-parse-error))
+
+(ert-deftest json-serialize/string ()
+ (should (equal (json-serialize ["foo"]) "[\"foo\"]"))
+ (should (equal (json-serialize ["a\n\fb"]) "[\"a\\n\\fb\"]"))
+ (should (equal (json-serialize ["\nasdфыв\u001f\u007ffgh\t"])
+ "[\"\\nasdфыв\\u001F\u007ffgh\\t\"]")))
+
+(ert-deftest json-parse-string/incomplete ()
+ (should-error (json-parse-string "[123") :type 'json-end-of-file))
+
+(ert-deftest json-parse-string/trailing ()
+ (should-error (json-parse-string "[123] [456]") :type 'json-trailing-content))
+
+(ert-deftest json-parse-buffer/incomplete ()
+ (with-temp-buffer
+ (insert "[123")
+ (goto-char 1)
+ (should-error (json-parse-buffer) :type 'json-end-of-file)
+ (should (bobp))))
+
+(ert-deftest json-parse-buffer/trailing ()
+ (with-temp-buffer
+ (insert "[123] [456]")
+ (goto-char 1)
+ (should (equal (json-parse-buffer) [123]))
+ (should-not (bobp))
+ (should (looking-at-p (rx " [456]" eos)))))
+
+(provide 'json-tests)
+;;; json-tests.el ends here
--
2.14.2
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-08 17:58 ` Philipp Stephani
@ 2017-10-08 18:42 ` Eli Zaretskii
2017-10-08 23:14 ` Philipp Stephani
2017-10-09 6:22 ` Paul Eggert
1 sibling, 1 reply; 81+ messages in thread
From: Eli Zaretskii @ 2017-10-08 18:42 UTC (permalink / raw)
To: Philipp Stephani; +Cc: eggert, emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sun, 08 Oct 2017 17:58:14 +0000
> Cc: emacs-devel@gnu.org
>
> I've attached a new patch (which currently segfaults on decode_coding_gap, but the call to that function
> doesn't seem to be required anyway).
I think decode_coding_gap might segfault because you use insert_1_both
to put the text into the buffer. By contrast, decode_coding_gap
assumes you've copied the bytes into the gap, like we do in
Finsert_file_contents.
I don't think we can bypass decoding in this case; the reason why you
think it isn't needed is because you only tried that with valid UTF-8
encoded text. We never rely on that elsewhere, AFAIK.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-05 1:48 ` Paul Eggert
2017-10-05 7:14 ` Eli Zaretskii
@ 2017-10-08 22:52 ` Philipp Stephani
2017-10-09 5:54 ` Paul Eggert
2017-10-09 6:38 ` Eli Zaretskii
1 sibling, 2 replies; 81+ messages in thread
From: Philipp Stephani @ 2017-10-08 22:52 UTC (permalink / raw)
To: Paul Eggert, Eli Zaretskii; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 690 bytes --]
Paul Eggert <eggert@cs.ucla.edu> schrieb am Do., 5. Okt. 2017 um 03:48 Uhr:
>
> One idea here is
> that there is no need for eassert (E) unless there's a genuine doubt
> that E will be true (in some cases the removed eassert (E) calls were
> ineffective anyway, due to preceding eassume (E) calls). The patch cuts
> down the number of integer overflow checks to six in json.c, if I'm
> counting correctly, and that should be good enough.
>
>
I don't understand why minimizing the number of checks and assertions
should be a worthwhile goal. At the very least, the assertions document the
assumptions that we make about the values, and as such they are valuable
even if they never trigger.
[-- Attachment #2: Type: text/html, Size: 996 bytes --]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-05 7:12 ` Eli Zaretskii
2017-10-06 1:58 ` Paul Eggert
@ 2017-10-08 23:04 ` Philipp Stephani
2017-10-09 6:47 ` Eli Zaretskii
1 sibling, 1 reply; 81+ messages in thread
From: Philipp Stephani @ 2017-10-08 23:04 UTC (permalink / raw)
To: Eli Zaretskii, Paul Eggert; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 4019 bytes --]
Eli Zaretskii <eliz@gnu.org> schrieb am Do., 5. Okt. 2017 um 09:12 Uhr:
> > Cc: p.stephani2@gmail.com, emacs-devel@gnu.org
> > From: Paul Eggert <eggert@cs.ucla.edu>
> > Date: Wed, 4 Oct 2017 14:24:59 -0700
> >
> > On 10/04/2017 12:38 PM, Eli Zaretskii wrote:
> > > if we did use size_t for the arguments which can clearly only be
> > > non-negative, the problems which we are discussing would not have
> > > happened
> > Sure, but we would also have worse problems, as size_t is inherently
> > more error-prone. ptrdiff_t overflows are reliably diagnosed when Emacs
> > is compiled with suitable GCC compiler options. size_t overflows cannot
> > be diagnosed, are all too common, and can cause serious trouble.
>
> If ptrdiff_t overflows are reliably diagnosed, then why do we have to
> test for them explicitly in our code, as in the proposed json.c?
> AFAIU, ptrdiff_t overflows are the _only_ reason for json.c checks
> whether a size_t value is too large, because similar checks for
> ptrdiff_t values are already in the low-level subroutines involved in
> creating Lisp objects. So why couldn't those checks be avoided by
> simply assigning to a ptrdiff_t variables?
Signed integer overflow is only diagnosed when compiling with -ftrapv or
-fsanitize=undefined, which normally doesn't happen in production builds.
IIUC lossy numeric conversions are never diagnosed because they are not
undefined behavior, so we always need to check for them explicitly.
We also need to strictly distinguish between the case where an overflow is
a bug (incorrect assumptions in the Emacs source code itself) and the case
where it is a legitimate outcome due to user input (in this case, return
values from Jansson that overflow ptrdiff_t). The two cases need to be
treated differently: In the first case we can say it's UB and rely on
(hopefully regular and comprehensive) UBSan runs, in the second case we
need explicit checks and normal signalling as opposed to assertions.
>
> > The Emacs internals occasionally use size_t because underlying
> > primitives like 'malloc' do, so we do make some exceptions. Perhaps
> > there should be an exception here, for convenience with the JSON
> > library. The code snippets I've seen so far in this thread are not
> > enough context to judge whether an exception would be helpful in this
> > case. Generally speaking, though, unsigned types should be avoided
> > because they are more error-prone. This has long been the style in Emacs
> > internals, and it's served us well.
>
> I'm not arguing for general replacement of ptrdiff_t with size_t, only
> for doing that in those primitives where negative values are a clear
> mistake/bug.
>
> For example, let's take this case from your proposed changes:
>
> static Lisp_Object
> -json_make_string (const char *data, ptrdiff_t size)
> +json_make_string (const char *data, size_t size)
> {
> + if (PTRDIFF_MAX < size)
> + string_overflow ();
> return make_specified_string (data, -1, size, true);
> }
>
> If we were to change make_specified_string (and its subroutines, like
> make_uninit_multibyte_string etc.) to accept a size_t value in its 3rd
> argument, the need for the above check against PTRDIFF_MAX would
> disappear.
>
It wouldn't disappear, it would merely be shifted around. Arguments could
be made for either choice, but somewhere the check needs to happen.
It would also make the higher-level code more
> reliable, because application-level programmers will not need to
> understand all the non-trivial intricacies of this stuff.
Every C programmer needs to understand these issues, no matter what
codebase they are working with. Lossy integral conversions are fundamental
design choices of the C language that can't be avoided.
C is a nasty language full of traps. You can try to hide some of the traps,
but you can't remove them.
(Arguably I've yet to come across a language that doesn't have nasty
integer types. Python 3, with integers auto-upgrading to bignums, might be
closest.)
[-- Attachment #2: Type: text/html, Size: 5117 bytes --]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-06 1:58 ` Paul Eggert
2017-10-06 7:40 ` Eli Zaretskii
@ 2017-10-08 23:09 ` Philipp Stephani
2017-10-09 6:19 ` Paul Eggert
1 sibling, 1 reply; 81+ messages in thread
From: Philipp Stephani @ 2017-10-08 23:09 UTC (permalink / raw)
To: Paul Eggert, Eli Zaretskii; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 405 bytes --]
Paul Eggert <eggert@cs.ucla.edu> schrieb am Fr., 6. Okt. 2017 um 03:58 Uhr:
> The basic idea here is that json.c should be using xmalloc for
> allocation anyway, for reasons other than size overflow checking.
>
>
I don't think Jansson can use xmalloc because xmalloc can exit nonlocally,
which is not expected by a third-party library such as Jansson. It could
use a suitable wrapper of lmalloc, though.
[-- Attachment #2: Type: text/html, Size: 689 bytes --]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-08 18:42 ` Eli Zaretskii
@ 2017-10-08 23:14 ` Philipp Stephani
2017-10-09 6:53 ` Eli Zaretskii
0 siblings, 1 reply; 81+ messages in thread
From: Philipp Stephani @ 2017-10-08 23:14 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: eggert, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 816 bytes --]
Eli Zaretskii <eliz@gnu.org> schrieb am So., 8. Okt. 2017 um 20:42 Uhr:
>
> I don't think we can bypass decoding in this case; the reason why you
> think it isn't needed is because you only tried that with valid UTF-8
> encoded text. We never rely on that elsewhere, AFAIK.
>
Jansson only accepts UTF-8 strings, and at least in our usage will also
only hand out UTF-8 strings.
I'd rather not include coding steps: given their complexity, they are very
slow and make the code much more complex than necessary. Since we only deal
with UTF-8 strings, and we can assume that Emacs strings are a superset of
UTF-8 strings, we can completely avoid any coding operations.
It's totally OK to rely on this assumption since all code that's involved
here is part of the Emacs core, so it can rely on implementation details.
[-- Attachment #2: Type: text/html, Size: 1131 bytes --]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-08 22:52 ` Philipp Stephani
@ 2017-10-09 5:54 ` Paul Eggert
2017-10-29 20:48 ` Philipp Stephani
2017-10-09 6:38 ` Eli Zaretskii
1 sibling, 1 reply; 81+ messages in thread
From: Paul Eggert @ 2017-10-09 5:54 UTC (permalink / raw)
To: Philipp Stephani, Eli Zaretskii; +Cc: emacs-devel
Philipp Stephani wrote:
> I don't understand why minimizing the number of checks and assertions
> should be a worthwhile goal. At the very least, the assertions document the
> assumptions that we make about the values, and as such they are valuable
> even if they never trigger.
One can take the process too far. To take a deliberately extreme example,
'eassert (INT_MIN < 0)' would clutter the code unnecessarily, and would be
discarded by the compiler anyway. Although none of the assertions in question
were *that* obvious, some did have that flavor (and indeed, were optimized away
by GCC). The patch that I proposed eliminated those, while retaining the ones
that conveyed useful and nonobvious information. Admittedly some of the removals
were judgment calls; however, the point remains that easserts should not waste
the reader's time unnecessarily.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-08 23:09 ` Philipp Stephani
@ 2017-10-09 6:19 ` Paul Eggert
2017-10-29 20:48 ` Philipp Stephani
0 siblings, 1 reply; 81+ messages in thread
From: Paul Eggert @ 2017-10-09 6:19 UTC (permalink / raw)
To: Philipp Stephani, Eli Zaretskii; +Cc: emacs-devel
Philipp Stephani wrote:
> I don't think Jansson can use xmalloc because xmalloc can exit nonlocally,
> which is not expected by a third-party library such as Jansson. It could
> use a suitable wrapper of lmalloc, though.
That would be overkill, as lmalloc arranges for Lisp alignment, which Jansson
does not need. We could define new functions (smalloc and srealloc, say), that
act like malloc and realloc except they return NULL for requests larger than
PTRDIFF_MAX. Right now, I expect only the JSON code needs this sort of thing so
we could put the new functions in json.c. If other code needs it later we could
move these new functions to alloc.c.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-08 17:58 ` Philipp Stephani
2017-10-08 18:42 ` Eli Zaretskii
@ 2017-10-09 6:22 ` Paul Eggert
1 sibling, 0 replies; 81+ messages in thread
From: Paul Eggert @ 2017-10-09 6:22 UTC (permalink / raw)
To: Philipp Stephani, Eli Zaretskii; +Cc: emacs-devel
Philipp Stephani wrote:
> Should I at least add an eassert to document this?
I wouldn't. Many calls to memory allocators would have problems if they request
more than PTRDIFF_MAX bytes, given the problems that C programs have when doing
pointer arithmetic on large objects. It would be a waste of time to document
this in every call by doing an eassert. Simply calling a memory allocator that
is documented to not return such objects, should make it clear to the reader
what is going on.
> I've attached a new patch (which currently segfaults on
> decode_coding_gap, but the call to that function doesn't seem to be
> required anyway).
Thanks, I plan to take a look at it after the decode_coding_gap issue is addressed.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-08 22:52 ` Philipp Stephani
2017-10-09 5:54 ` Paul Eggert
@ 2017-10-09 6:38 ` Eli Zaretskii
1 sibling, 0 replies; 81+ messages in thread
From: Eli Zaretskii @ 2017-10-09 6:38 UTC (permalink / raw)
To: Philipp Stephani; +Cc: eggert, emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sun, 08 Oct 2017 22:52:36 +0000
> Cc: emacs-devel@gnu.org
>
> Paul Eggert <eggert@cs.ucla.edu> schrieb am Do., 5. Okt. 2017 um 03:48 Uhr:
>
> One idea here is
> that there is no need for eassert (E) unless there's a genuine doubt
> that E will be true (in some cases the removed eassert (E) calls were
> ineffective anyway, due to preceding eassume (E) calls). The patch cuts
> down the number of integer overflow checks to six in json.c, if I'm
> counting correctly, and that should be good enough.
>
> I don't understand why minimizing the number of checks and assertions should be a worthwhile goal. At the
> very least, the assertions document the assumptions that we make about the values, and as such they are
> valuable even if they never trigger.
My criterion for adding eassert(E) is that the following conditions
all hold:
. the code below it assumes E is true
. the code below it will not work correctly if E is false, but will
not immediately abort or segfault (which also means the code has
no defenses against E being false)
. E being false "should never happen", i.e. it's deemed impossible
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-08 23:04 ` Philipp Stephani
@ 2017-10-09 6:47 ` Eli Zaretskii
0 siblings, 0 replies; 81+ messages in thread
From: Eli Zaretskii @ 2017-10-09 6:47 UTC (permalink / raw)
To: Philipp Stephani; +Cc: eggert, emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sun, 08 Oct 2017 23:04:10 +0000
> Cc: emacs-devel@gnu.org
>
> If we were to change make_specified_string (and its subroutines, like
> make_uninit_multibyte_string etc.) to accept a size_t value in its 3rd
> argument, the need for the above check against PTRDIFF_MAX would
> disappear.
>
> It wouldn't disappear, it would merely be shifted around. Arguments could be made for either choice, but
> somewhere the check needs to happen.
No, the need will entirely disappear, because
make_uninit_multibyte_string already has a similar check, it just uses
signed types for it currently.
> It would also make the higher-level code more
> reliable, because application-level programmers will not need to
> understand all the non-trivial intricacies of this stuff.
>
> Every C programmer needs to understand these issues, no matter what codebase they are working with.
I agree that it's very useful to understand these issues, but by
requiring "every C programmer" to understand them, you are raising the
bar for contributing to Emacs's C code considerably and IMO without
any good reason. A C programmer for Emacs who needs to use primitives
that create Lisp objects should expect those primitives to be safe,
just like you expect libc function to be safe, once they pass the
compilation with -Wall.
> Lossy integral conversions are fundamental design choices of the C language that can't be avoided.
They can and should be avoided on the level of jason.c, as long as the
code itself doesn't do anything silly, which will then be flagged by
the compiler.
> C is a nasty language full of traps.
I certainly disagree with that, having programmed in C for 35 years.
> You can try to hide some of the traps, but you can't remove them.
I want to hide them, as much as possible, from application-level code.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-08 23:14 ` Philipp Stephani
@ 2017-10-09 6:53 ` Eli Zaretskii
2017-10-29 20:41 ` Philipp Stephani
0 siblings, 1 reply; 81+ messages in thread
From: Eli Zaretskii @ 2017-10-09 6:53 UTC (permalink / raw)
To: Philipp Stephani; +Cc: eggert, emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sun, 08 Oct 2017 23:14:18 +0000
> Cc: eggert@cs.ucla.edu, emacs-devel@gnu.org
>
> Jansson only accepts UTF-8 strings, and at least in our usage will also only hand out UTF-8 strings.
How can we be 100% sure of that? We don't trust any other libraries
with such high fidelity, we always decode any external data.
> It's totally OK to rely on this assumption since all code that's involved here is part of the Emacs core, so it can
> rely on implementation details.
That is in stark contrast with your usual coding style, which tends to
place checks and assertions where they are not always needed. Could
it be that you underestimate the damage that broken non-ASCII byte
stream can cause Emacs if inserted directly into a buffer or a string?
Doing so will usually cause Emacs die a horrible death quite soon,
because code that processes buffer or string text has no defenses
against such calamities.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-09 6:53 ` Eli Zaretskii
@ 2017-10-29 20:41 ` Philipp Stephani
0 siblings, 0 replies; 81+ messages in thread
From: Philipp Stephani @ 2017-10-29 20:41 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: eggert, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1661 bytes --]
Eli Zaretskii <eliz@gnu.org> schrieb am Mo., 9. Okt. 2017 um 08:54 Uhr:
> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Sun, 08 Oct 2017 23:14:18 +0000
> > Cc: eggert@cs.ucla.edu, emacs-devel@gnu.org
> >
> > Jansson only accepts UTF-8 strings, and at least in our usage will also
> only hand out UTF-8 strings.
>
> How can we be 100% sure of that? We don't trust any other libraries
> with such high fidelity, we always decode any external data.
>
We also trust glibc's malloc to never return overlapping non-freed blocks,
right? This "trust" isn't different. Of course we can assume that libraries
behave according to their specification.
>
> > It's totally OK to rely on this assumption since all code that's
> involved here is part of the Emacs core, so it can
> > rely on implementation details.
>
> That is in stark contrast with your usual coding style, which tends to
> place checks and assertions where they are not always needed.
I wouldn't mind placing an assertion here as well. An assertion primarily
documents the assumptions made in the code and as a side effect is also
tested in debug builds. It's generally a good idea to add such
documentation.
> Could
> it be that you underestimate the damage that broken non-ASCII byte
> stream can cause Emacs if inserted directly into a buffer or a string?
> Doing so will usually cause Emacs die a horrible death quite soon,
> because code that processes buffer or string text has no defenses
> against such calamities.
>
If and when such a bug happens, we can work around it (after filing a bug
against Jansson). But we can't work around potential bugs in libraries, see
above.
[-- Attachment #2: Type: text/html, Size: 2515 bytes --]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-09 5:54 ` Paul Eggert
@ 2017-10-29 20:48 ` Philipp Stephani
0 siblings, 0 replies; 81+ messages in thread
From: Philipp Stephani @ 2017-10-29 20:48 UTC (permalink / raw)
To: Paul Eggert; +Cc: Eli Zaretskii, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1434 bytes --]
Paul Eggert <eggert@cs.ucla.edu> schrieb am Mo., 9. Okt. 2017 um 07:54 Uhr:
> Philipp Stephani wrote:
> > I don't understand why minimizing the number of checks and assertions
> > should be a worthwhile goal. At the very least, the assertions document
> the
> > assumptions that we make about the values, and as such they are valuable
> > even if they never trigger.
>
> One can take the process too far. To take a deliberately extreme example,
> 'eassert (INT_MIN < 0)' would clutter the code unnecessarily, and would be
> discarded by the compiler anyway. Although none of the assertions in
> question
> were *that* obvious, some did have that flavor (and indeed, were optimized
> away
> by GCC). The patch that I proposed eliminated those, while retaining the
> ones
> that conveyed useful and nonobvious information. Admittedly some of the
> removals
> were judgment calls; however, the point remains that easserts should not
> waste
> the reader's time unnecessarily.
>
I agree, but I think anything that's not obvious deserves to be documented
(preferably in the form of an assertion). Such documentation avoids the
need to search for other places in the codebase and figuring out which
invariants are guaranteed from the implementation. INT_MIN < 0 is already
guaranteed by the C standard (although even that is somewhat subtle, given
the possibility of different widths and representations of integers,
padding bits, etc.),
[-- Attachment #2: Type: text/html, Size: 1770 bytes --]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-09 6:19 ` Paul Eggert
@ 2017-10-29 20:48 ` Philipp Stephani
2017-10-29 22:49 ` Philipp Stephani
0 siblings, 1 reply; 81+ messages in thread
From: Philipp Stephani @ 2017-10-29 20:48 UTC (permalink / raw)
To: Paul Eggert; +Cc: Eli Zaretskii, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 795 bytes --]
Paul Eggert <eggert@cs.ucla.edu> schrieb am Mo., 9. Okt. 2017 um 08:19 Uhr:
> Philipp Stephani wrote:
> > I don't think Jansson can use xmalloc because xmalloc can exit
> nonlocally,
> > which is not expected by a third-party library such as Jansson. It could
> > use a suitable wrapper of lmalloc, though.
>
> That would be overkill, as lmalloc arranges for Lisp alignment, which
> Jansson
> does not need. We could define new functions (smalloc and srealloc, say),
> that
> act like malloc and realloc except they return NULL for requests larger
> than
> PTRDIFF_MAX. Right now, I expect only the JSON code needs this sort of
> thing so
> we could put the new functions in json.c. If other code needs it later we
> could
> move these new functions to alloc.c.
>
Yes, that sounds reasonable.
[-- Attachment #2: Type: text/html, Size: 1099 bytes --]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-29 20:48 ` Philipp Stephani
@ 2017-10-29 22:49 ` Philipp Stephani
2017-12-09 23:05 ` Philipp Stephani
0 siblings, 1 reply; 81+ messages in thread
From: Philipp Stephani @ 2017-10-29 22:49 UTC (permalink / raw)
To: Paul Eggert; +Cc: Eli Zaretskii, emacs-devel
[-- Attachment #1.1: Type: text/plain, Size: 1403 bytes --]
Philipp Stephani <p.stephani2@gmail.com> schrieb am So., 29. Okt. 2017 um
21:48 Uhr:
> Paul Eggert <eggert@cs.ucla.edu> schrieb am Mo., 9. Okt. 2017 um
> 08:19 Uhr:
>
>> Philipp Stephani wrote:
>> > I don't think Jansson can use xmalloc because xmalloc can exit
>> nonlocally,
>> > which is not expected by a third-party library such as Jansson. It could
>> > use a suitable wrapper of lmalloc, though.
>>
>> That would be overkill, as lmalloc arranges for Lisp alignment, which
>> Jansson
>> does not need. We could define new functions (smalloc and srealloc, say),
>> that
>> act like malloc and realloc except they return NULL for requests larger
>> than
>> PTRDIFF_MAX. Right now, I expect only the JSON code needs this sort of
>> thing so
>> we could put the new functions in json.c. If other code needs it later we
>> could
>> move these new functions to alloc.c.
>>
>
> Yes, that sounds reasonable.
>
Here's a new patch that incorporates some of these changes. Specifically:
- I've removed some of the assertions
- I've installed a custom allocator, as you suggested
- Reverted back to creating a temporary string and inserting that into the
buffer. Anything else just doesn't seem to work or seems way too complex.
- Introduced explicit encoding and decoding. I suspect that will lead to a
massive performance hit, but I haven't done any benchmarks yet.
- Added manual section and NEWS entry
[-- Attachment #1.2: Type: text/html, Size: 2097 bytes --]
[-- Attachment #2: 0001-Implement-native-JSON-support-using-Jansson.txt --]
[-- Type: text/plain, Size: 36968 bytes --]
From a7a74cfc40c4cacab047f5da239c7b7150bb9573 Mon Sep 17 00:00:00 2001
From: Philipp Stephani <phst@google.com>
Date: Mon, 18 Sep 2017 10:51:39 +0200
Subject: [PATCH] Implement native JSON support using Jansson
* configure.ac: New option --with-json.
* src/json.c (Fjson_serialize, Fjson_insert, Fjson_parse_string)
(Fjson_parse_buffer): New defuns.
(json_malloc, json_free, json_has_prefix, json_has_suffix)
(json_make_string, json_build_string, json_encode)
(json_out_of_memory, json_parse_error)
(json_release_object, check_string_without_embedded_nulls, json_check)
(lisp_to_json, lisp_to_json_toplevel, lisp_to_json_toplevel_1)
(json_insert, json_insert_callback, json_to_lisp)
(json_read_buffer_callback, Fjson_parse_buffer, define_error): New
helper functions.
(init_json, syms_of_json): New file.
* src/lisp.h: Declaration for init_json and syms_of_json.
* src/emacs.c (main): Enable JSON functions.
* src/eval.c (internal_catch_all, internal_catch_all_1): New helper
functions to catch all signals.
(syms_of_eval): Add uninterned symbol to signify out of memory.
* src/Makefile.in (JSON_LIBS, JSON_CFLAGS, JSON_OBJ, EMACS_CFLAGS)
(base_obj, LIBES): Compile json.c if --with-json is enabled.
* test/src/json-tests.el (json-serialize/roundtrip)
(json-serialize/object, json-parse-string/object)
(json-parse-string/string, json-serialize/string)
(json-parse-string/incomplete, json-parse-string/trailing)
(json-parse-buffer/incomplete, json-parse-buffer/trailing): New unit
tests.
* doc/lispref/text.texi (Parsing JSON): New manual section.
---
configure.ac | 20 +-
doc/lispref/text.texi | 87 ++++++++
etc/NEWS | 11 +
src/Makefile.in | 11 +-
src/emacs.c | 8 +
src/eval.c | 54 +++++
src/json.c | 576 +++++++++++++++++++++++++++++++++++++++++++++++++
src/lisp.h | 7 +
test/src/json-tests.el | 97 +++++++++
9 files changed, 867 insertions(+), 4 deletions(-)
create mode 100644 src/json.c
create mode 100644 test/src/json-tests.el
diff --git a/configure.ac b/configure.ac
index 7437eb90d2..ba8600c61a 100644
--- a/configure.ac
+++ b/configure.ac
@@ -355,6 +355,7 @@ AC_DEFUN
OPTION_DEFAULT_OFF([cairo],[compile with Cairo drawing (experimental)])
OPTION_DEFAULT_ON([xml2],[don't compile with XML parsing support])
OPTION_DEFAULT_ON([imagemagick],[don't compile with ImageMagick image support])
+OPTION_DEFAULT_ON([json], [don't compile with native JSON support])
OPTION_DEFAULT_ON([xft],[don't use XFT for anti aliased fonts])
OPTION_DEFAULT_ON([libotf],[don't use libotf for OpenType font support])
@@ -2859,6 +2860,22 @@ AC_DEFUN
AC_SUBST(LIBSYSTEMD_LIBS)
AC_SUBST(LIBSYSTEMD_CFLAGS)
+HAVE_JSON=no
+JSON_OBJ=
+
+if test "${with_json}" = yes; then
+ EMACS_CHECK_MODULES([JSON], [jansson >= 2.5],
+ [HAVE_JSON=yes], [HAVE_JSON=no])
+ if test "${HAVE_JSON}" = yes; then
+ AC_DEFINE(HAVE_JSON, 1, [Define if using Jansson.])
+ JSON_OBJ=json.o
+ fi
+fi
+
+AC_SUBST(JSON_LIBS)
+AC_SUBST(JSON_CFLAGS)
+AC_SUBST(JSON_OBJ)
+
NOTIFY_OBJ=
NOTIFY_SUMMARY=no
@@ -5370,7 +5387,7 @@ AC_DEFUN
for opt in XAW3D XPM JPEG TIFF GIF PNG RSVG CAIRO IMAGEMAGICK SOUND GPM DBUS \
GCONF GSETTINGS NOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT \
LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS X_TOOLKIT OLDXMENU X11 NS MODULES \
- XWIDGETS LIBSYSTEMD CANNOT_DUMP LCMS2; do
+ XWIDGETS LIBSYSTEMD JSON CANNOT_DUMP LCMS2; do
case $opt in
CANNOT_DUMP) eval val=\${$opt} ;;
@@ -5420,6 +5437,7 @@ AC_DEFUN
Does Emacs use -lotf? ${HAVE_LIBOTF}
Does Emacs use -lxft? ${HAVE_XFT}
Does Emacs use -lsystemd? ${HAVE_LIBSYSTEMD}
+ Does Emacs use -ljansson? ${HAVE_JSON}
Does Emacs directly use zlib? ${HAVE_ZLIB}
Does Emacs have dynamic modules support? ${HAVE_MODULES}
Does Emacs use toolkit scroll bars? ${USE_TOOLKIT_SCROLL_BARS}
diff --git a/doc/lispref/text.texi b/doc/lispref/text.texi
index baa3c708e9..217a9ba88b 100644
--- a/doc/lispref/text.texi
+++ b/doc/lispref/text.texi
@@ -61,6 +61,7 @@ Text
* Checksum/Hash:: Computing cryptographic hashes.
* GnuTLS Cryptography:: Cryptographic algorithms imported from GnuTLS.
* Parsing HTML/XML:: Parsing HTML and XML.
+* Parsing JSON:: Parsing and generating JSON values.
* Atomic Changes:: Installing several buffer changes atomically.
* Change Hooks:: Supplying functions to be run when text is changed.
@end menu
@@ -4911,6 +4912,92 @@ Document Object Model
@end table
+@node Parsing JSON
+@section Parsing and generating JSON values
+@cindex JSON
+
+ When Emacs is compiled with JSON support, it provides a couple of
+functions to convert between Lisp objects and JSON values. Any JSON
+value can be converted to a Lisp object, but not vice versa.
+Specifically:
+
+@itemize
+
+@item
+JSON has a couple of keywords: @code{null}, @code{false}, and
+@code{true}. These are represented in Lisp using the keywords
+@code{:null}, @code{:false}, and @code{t}, respectively.
+
+@item
+JSON only has floating-point numbers. They can represent both Lisp
+integers and Lisp floating-point numbers.
+
+@item
+JSON strings are always Unicode strings. Lisp strings can contain
+non-Unicode characters.
+
+@item
+JSON has only one sequence type, the array. JSON arrays are
+represented using Lisp vectors.
+
+@item
+JSON has only one map type, the object. JSON objects are represented
+using Lisp hashtables.
+
+@end itemize
+
+@noindent
+Note that @code{nil} doesn't represent any JSON values: this is to
+avoid confusion, because @code{nil} could either represent
+@code{null}, @code{false}, or an empty array, all of which are
+different JSON values.
+
+ If some Lisp object can't be represented in JSON, the serialization
+functions will signal an error of type @code{wrong-type-argument}.
+The parsing functions will signal the following errors:
+
+@table @code
+
+@item json-end-of-file
+ Signaled when encountering a premature end of the input text.
+
+@item json-trailing-content
+ Signaled when encountering unexpected input after the first JSON
+ object parsed.
+
+@item json-parse-error
+ Signaled when encountering invalid JSON syntax.
+
+@end table
+
+ Only top-level values (arrays and objects) can be serialized to
+JSON. The subobjects within these top-level values can be of any
+type. Likewise, the parsing functions will only return vectors and
+hashtables.
+
+@defun json-serialize object
+This function returns a new Lisp string which contains the JSON
+representation of @var{object}.
+@end defun
+
+@defun json-insert object
+This function inserts the JSON representation of @var{object} into the
+current buffer before point.
+@end defun
+
+@defun json-parse-string string
+This function parses the JSON value in @var{string}, which must be a
+Lisp string.
+@end defun
+
+@defun json-parse-buffer
+This function reads the next JSON value from the current buffer,
+starting at point. It moves point to the position immediately after
+the value if a value could be read and converted to Lisp; otherwise it
+doesn't move point.
+@end defun
+
+
@node Atomic Changes
@section Atomic Change Groups
@cindex atomic changes
diff --git a/etc/NEWS b/etc/NEWS
index 9ae36bdb03..cf84d80395 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -24,6 +24,13 @@ When you add a new item, use the appropriate mark if you are sure it applies,
\f
* Installation Changes in Emacs 27.1
+** The new configure option '--with-json' adds support for JSON using
+the Jansson library. It is on by default; use 'configure
+--with-json=no' to build without Jansson support. The new JSON
+functions 'json-serialize', 'json-insert', 'json-parse-string', and
+'json-parse-buffer' are typically much faster than their Lisp
+counterparts from json.el.
+
\f
* Startup Changes in Emacs 27.1
@@ -123,6 +130,10 @@ bug on OS X 10.8 and later (Bug#28639).
** The function 'get-free-disk-space' returns now a non-nil value for
remote systems, which support this check.
+** New JSON parsing and serialization functions 'json-serialize',
+'json-insert', 'json-parse-string', and 'json-parse-buffer'. These
+are implemented in C using the Jansson library.
+
\f
* Changes in Emacs 27.1 on Non-Free Operating Systems
diff --git a/src/Makefile.in b/src/Makefile.in
index 9a8c9c85f0..b395627893 100644
--- a/src/Makefile.in
+++ b/src/Makefile.in
@@ -312,6 +312,10 @@ LIBGNUTLS_CFLAGS =
LIBSYSTEMD_LIBS = @LIBSYSTEMD_LIBS@
LIBSYSTEMD_CFLAGS = @LIBSYSTEMD_CFLAGS@
+JSON_LIBS = @JSON_LIBS@
+JSON_CFLAGS = @JSON_CFLAGS@
+JSON_OBJ = @JSON_OBJ@
+
INTERVALS_H = dispextern.h intervals.h composite.h
GETLOADAVG_LIBS = @GETLOADAVG_LIBS@
@@ -363,7 +367,7 @@ EMACS_CFLAGS=
$(WEBKIT_CFLAGS) \
$(SETTINGS_CFLAGS) $(FREETYPE_CFLAGS) $(FONTCONFIG_CFLAGS) \
$(LIBOTF_CFLAGS) $(M17N_FLT_CFLAGS) $(DEPFLAGS) \
- $(LIBSYSTEMD_CFLAGS) \
+ $(LIBSYSTEMD_CFLAGS) $(JSON_CFLAGS) \
$(LIBGNUTLS_CFLAGS) $(NOTIFY_CFLAGS) $(CAIRO_CFLAGS) \
$(WERROR_CFLAGS)
ALL_CFLAGS = $(EMACS_CFLAGS) $(WARN_CFLAGS) $(CFLAGS)
@@ -397,7 +401,7 @@ base_obj =
thread.o systhread.o \
$(if $(HYBRID_MALLOC),sheap.o) \
$(MSDOS_OBJ) $(MSDOS_X_OBJ) $(NS_OBJ) $(CYGWIN_OBJ) $(FONT_OBJ) \
- $(W32_OBJ) $(WINDOW_SYSTEM_OBJ) $(XGSELOBJ)
+ $(W32_OBJ) $(WINDOW_SYSTEM_OBJ) $(XGSELOBJ) $(JSON_OBJ)
obj = $(base_obj) $(NS_OBJC_OBJ)
## Object files used on some machine or other.
@@ -493,7 +497,8 @@ LIBES =
$(LIBS_TERMCAP) $(GETLOADAVG_LIBS) $(SETTINGS_LIBS) $(LIBSELINUX_LIBS) \
$(FREETYPE_LIBS) $(FONTCONFIG_LIBS) $(LIBOTF_LIBS) $(M17N_FLT_LIBS) \
$(LIBGNUTLS_LIBS) $(LIB_PTHREAD) $(GETADDRINFO_A_LIBS) $(LIBLCMS2) \
- $(NOTIFY_LIBS) $(LIB_MATH) $(LIBZ) $(LIBMODULES) $(LIBSYSTEMD_LIBS)
+ $(NOTIFY_LIBS) $(LIB_MATH) $(LIBZ) $(LIBMODULES) $(LIBSYSTEMD_LIBS) \
+ $(JSON_LIBS)
## FORCE it so that admin/unidata can decide whether these files
## are up-to-date. Although since charprop depends on bootstrap-emacs,
diff --git a/src/emacs.c b/src/emacs.c
index 0fe7d9113b..202ecb0ebb 100644
--- a/src/emacs.c
+++ b/src/emacs.c
@@ -1262,6 +1262,10 @@ Using an Emacs configured with --with-x-toolkit=lucid does not have this problem
running_asynch_code = 0;
init_random ();
+#ifdef HAVE_JSON
+ init_json ();
+#endif
+
no_loadup
= argmatch (argv, argc, "-nl", "--no-loadup", 6, NULL, &skip_args);
@@ -1610,6 +1614,10 @@ Using an Emacs configured with --with-x-toolkit=lucid does not have this problem
syms_of_threads ();
syms_of_profiler ();
+#ifdef HAVE_JSON
+ syms_of_json ();
+#endif
+
keys_of_casefiddle ();
keys_of_cmds ();
keys_of_buffer ();
diff --git a/src/eval.c b/src/eval.c
index 063deb4ba0..5cc521b9de 100644
--- a/src/eval.c
+++ b/src/eval.c
@@ -1416,6 +1416,57 @@ internal_condition_case_n (Lisp_Object (*bfun) (ptrdiff_t, Lisp_Object *),
}
}
+static Lisp_Object
+internal_catch_all_1 (Lisp_Object (*function) (void *), void *argument)
+{
+ struct handler *c = push_handler_nosignal (Qt, CATCHER_ALL);
+ if (c == NULL)
+ return Qcatch_all_memory_full;
+
+ if (sys_setjmp (c->jmp) == 0)
+ {
+ Lisp_Object val = function (argument);
+ eassert (handlerlist == c);
+ handlerlist = c->next;
+ return val;
+ }
+ else
+ {
+ eassert (handlerlist == c);
+ Lisp_Object val = c->val;
+ handlerlist = c->next;
+ Fsignal (Qno_catch, val);
+ }
+}
+
+/* Like a combination of internal_condition_case_1 and internal_catch.
+ Catches all signals and throws. Never exits nonlocally; returns
+ Qcatch_all_memory_full if no handler could be allocated. */
+
+Lisp_Object
+internal_catch_all (Lisp_Object (*function) (void *), void *argument,
+ Lisp_Object (*handler) (Lisp_Object))
+{
+ struct handler *c = push_handler_nosignal (Qt, CONDITION_CASE);
+ if (c == NULL)
+ return Qcatch_all_memory_full;
+
+ if (sys_setjmp (c->jmp) == 0)
+ {
+ Lisp_Object val = internal_catch_all_1 (function, argument);
+ eassert (handlerlist == c);
+ handlerlist = c->next;
+ return val;
+ }
+ else
+ {
+ eassert (handlerlist == c);
+ Lisp_Object val = c->val;
+ handlerlist = c->next;
+ return handler (val);
+ }
+}
+
struct handler *
push_handler (Lisp_Object tag_ch_val, enum handlertype handlertype)
{
@@ -4066,6 +4117,9 @@ alist of active lexical bindings. */);
inhibit_lisp_code = Qnil;
+ DEFSYM (Qcatch_all_memory_full, "catch-all-memory-full");
+ Funintern (Qcatch_all_memory_full, Qnil);
+
defsubr (&Sor);
defsubr (&Sand);
defsubr (&Sif);
diff --git a/src/json.c b/src/json.c
new file mode 100644
index 0000000000..dc449e43e1
--- /dev/null
+++ b/src/json.c
@@ -0,0 +1,576 @@
+/* JSON parsing and serialization.
+
+Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GNU Emacs.
+
+GNU Emacs is free software: you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation, either version 3 of the License, or (at
+your option) any later version.
+
+GNU Emacs is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. */
+
+#include <config.h>
+
+#include <errno.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdlib.h>
+
+#include <jansson.h>
+
+#include "lisp.h"
+#include "buffer.h"
+#include "coding.h"
+
+/* We install a custom allocator so that we can avoid objects larger
+ than PTRDIFF_MAX. Such objects wouldn’t play well with the rest of
+ Emacs’s codebase, which generally uses ptrdiff_t for sizes and
+ indices. The other functions in this file also generally assume
+ that size_t values never exceed PTRDIFF_MAX. */
+
+static void *
+json_malloc (size_t size)
+{
+ if (size > PTRDIFF_MAX)
+ {
+ errno = ENOMEM;
+ return NULL;
+ }
+ return malloc (size);
+}
+
+static void
+json_free (void *ptr)
+{
+ free (ptr);
+}
+
+void
+init_json (void)
+{
+ json_set_alloc_funcs (json_malloc, json_free);
+}
+
+/* Return whether STRING starts with PREFIX. */
+
+static bool
+json_has_prefix (const char *string, const char *prefix)
+{
+ size_t string_len = strlen (string);
+ size_t prefix_len = strlen (prefix);
+ return string_len >= prefix_len && memcmp (string, prefix, prefix_len) == 0;
+}
+
+/* Return whether STRING ends with SUFFIX. */
+
+static bool
+json_has_suffix (const char *string, const char *suffix)
+{
+ size_t string_len = strlen (string);
+ size_t suffix_len = strlen (suffix);
+ return string_len >= suffix_len
+ && memcmp (string + string_len - suffix_len, suffix, suffix_len) == 0;
+}
+
+/* Create a multibyte Lisp string from the UTF-8 string in
+ [DATA, DATA + SIZE). If the range [DATA, DATA + SIZE) does not
+ contain a valid UTF-8 string, an unspecified string is
+ returned. */
+
+static Lisp_Object
+json_make_string (const char *data, ptrdiff_t size)
+{
+ return code_convert_string (make_specified_string (data, -1, size, false),
+ Qutf_8_unix, Qt, false, true, true);
+}
+
+/* Create a multibyte Lisp string from the null-terminated UTF-8
+ string beginning at DATA. If the string is not a valid UTF-8
+ string, an unspecified string is returned. */
+
+static Lisp_Object
+json_build_string (const char *data)
+{
+ return json_make_string (data, strlen (data));
+}
+
+/* Return a unibyte string containing the sequence of UTF-8 encoding
+ units of the UTF-8 representation of STRING. If STRING does not
+ represent a sequence of Unicode scalar values, return a string with
+ unspecified contents. */
+
+static Lisp_Object
+json_encode (Lisp_Object string)
+{
+ return code_convert_string (string, Qutf_8_unix, Qt, true, true, true);
+}
+
+static _Noreturn void
+json_out_of_memory (void)
+{
+ xsignal0 (Qjson_out_of_memory);
+}
+
+/* Signal a Lisp error corresponding to the JSON ERROR. */
+
+static _Noreturn void
+json_parse_error (const json_error_t *error)
+{
+ Lisp_Object symbol;
+ /* FIXME: Upstream Jansson should have a way to return error codes
+ without parsing the error messages. See
+ https://github.com/akheron/jansson/issues/352. */
+ if (json_has_suffix (error->text, "expected near end of file"))
+ symbol = Qjson_end_of_file;
+ else if (json_has_prefix (error->text, "end of file expected"))
+ symbol = Qjson_trailing_content;
+ else
+ symbol = Qjson_parse_error;
+ xsignal (symbol,
+ list5 (json_build_string (error->text),
+ json_build_string (error->source), make_natnum (error->line),
+ make_natnum (error->column), make_natnum (error->position)));
+}
+
+static void
+json_release_object (void *object)
+{
+ json_decref (object);
+}
+
+/* Signal an error if OBJECT is not a string, or if OBJECT contains
+ embedded null characters. */
+
+static void
+check_string_without_embedded_nulls (Lisp_Object object)
+{
+ CHECK_STRING (object);
+ CHECK_TYPE (memchr (SDATA (object), '\0', SBYTES (object)) == NULL,
+ Qstring_without_embedded_nulls_p, object);
+}
+
+/* Signal an error of type `json-out-of-memory' if OBJECT is
+ NULL. */
+
+static json_t *
+json_check (json_t *object)
+{
+ if (object == NULL)
+ json_out_of_memory ();
+ return object;
+}
+
+static json_t *lisp_to_json (Lisp_Object);
+
+/* Convert a Lisp object to a toplevel JSON object (array or object).
+ This returns Lisp_Object so we can use unbind_to. The return value
+ is always nil. */
+
+static _GL_ARG_NONNULL ((2)) Lisp_Object
+lisp_to_json_toplevel_1 (Lisp_Object lisp, json_t **json)
+{
+ if (VECTORP (lisp))
+ {
+ ptrdiff_t size = ASIZE (lisp);
+ *json = json_check (json_array ());
+ ptrdiff_t count = SPECPDL_INDEX ();
+ record_unwind_protect_ptr (json_release_object, json);
+ for (ptrdiff_t i = 0; i < size; ++i)
+ {
+ int status
+ = json_array_append_new (*json, lisp_to_json (AREF (lisp, i)));
+ if (status == -1)
+ json_out_of_memory ();
+ }
+ eassert (json_array_size (*json) == size);
+ clear_unwind_protect (count);
+ return unbind_to (count, Qnil);
+ }
+ else if (HASH_TABLE_P (lisp))
+ {
+ struct Lisp_Hash_Table *h = XHASH_TABLE (lisp);
+ *json = json_check (json_object ());
+ ptrdiff_t count = SPECPDL_INDEX ();
+ record_unwind_protect_ptr (json_release_object, *json);
+ for (ptrdiff_t i = 0; i < HASH_TABLE_SIZE (h); ++i)
+ if (!NILP (HASH_HASH (h, i)))
+ {
+ Lisp_Object key = json_encode (HASH_KEY (h, i));
+ /* We can’t specify the length, so the string must be
+ null-terminated. */
+ check_string_without_embedded_nulls (key);
+ int status = json_object_set_new (*json, SSDATA (key),
+ lisp_to_json (HASH_VALUE (h, i)));
+ if (status == -1)
+ json_out_of_memory ();
+ }
+ clear_unwind_protect (count);
+ return unbind_to (count, Qnil);
+ }
+ wrong_type_argument (Qjson_value_p, lisp);
+}
+
+/* Convert LISP to a toplevel JSON object (array or object). Signal
+ an error of type `wrong-type-argument' if LISP is not a vector or
+ hashtable. */
+
+static json_t *
+lisp_to_json_toplevel (Lisp_Object lisp)
+{
+ if (++lisp_eval_depth > max_lisp_eval_depth)
+ xsignal0 (Qjson_object_too_deep);
+ json_t *json;
+ lisp_to_json_toplevel_1 (lisp, &json);
+ --lisp_eval_depth;
+ return json;
+}
+
+/* Convert LISP to any JSON object. Signal an error of type
+ `wrong-type-argument' if the type of LISP can't be converted to a
+ JSON object. */
+
+static json_t *
+lisp_to_json (Lisp_Object lisp)
+{
+ if (EQ (lisp, QCnull))
+ return json_check (json_null ());
+ else if (EQ (lisp, QCfalse))
+ return json_check (json_false ());
+ else if (EQ (lisp, Qt))
+ return json_check (json_true ());
+ else if (INTEGERP (lisp))
+ {
+ CHECK_TYPE_RANGED_INTEGER (json_int_t, lisp);
+ return json_check (json_integer (XINT (lisp)));
+ }
+ else if (FLOATP (lisp))
+ return json_check (json_real (XFLOAT_DATA (lisp)));
+ else if (STRINGP (lisp))
+ {
+ Lisp_Object encoded = json_encode (lisp);
+ ptrdiff_t size = SBYTES (encoded);
+ return json_check (json_stringn (SSDATA (encoded), size));
+ }
+
+ /* LISP now must be a vector or hashtable. */
+ return lisp_to_json_toplevel (lisp);
+}
+
+DEFUN ("json-serialize", Fjson_serialize, Sjson_serialize, 1, 1, NULL,
+ doc: /* Return the JSON representation of OBJECT as a string.
+OBJECT must be a vector or hashtable, and its elements can recursively
+contain `:null', `:false', t, numbers, strings, or other vectors and
+hashtables. `:null', `:false', and t will be converted to JSON null,
+false, and true values, respectively. Vectors will be converted to
+JSON arrays, and hashtables to JSON objects. Hashtable keys must be
+strings without embedded null characters and must be unique within
+each object. */)
+ (Lisp_Object object)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+
+ json_t *json = lisp_to_json_toplevel (object);
+ record_unwind_protect_ptr (json_release_object, json);
+
+ char *string = json_dumps (json, JSON_COMPACT);
+ if (string == NULL)
+ json_out_of_memory ();
+ record_unwind_protect_ptr (free, string);
+
+ return unbind_to (count, json_build_string (string));
+}
+
+struct json_buffer_and_size
+{
+ const char *buffer;
+ ptrdiff_t size;
+};
+
+static Lisp_Object
+json_insert (void *data)
+{
+ struct json_buffer_and_size *buffer_and_size = data;
+ /* FIXME: This should be possible without creating an intermediate
+ string object. */
+ Lisp_Object string
+ = json_make_string (buffer_and_size->buffer, buffer_and_size->size);
+ insert1 (string);
+ return Qnil;
+}
+
+struct json_insert_data
+{
+ /* nil if json_insert succeeded, otherwise the symbol
+ Qcatch_all_memory_full or a cons (ERROR-SYMBOL . ERROR-DATA). */
+ Lisp_Object error;
+};
+
+/* Callback for json_dump_callback that inserts the UTF-8 string in
+ [BUFFER, BUFFER + SIZE) into the current buffer.
+ If [BUFFER, BUFFER + SIZE) does not contain a valid UTF-8 string,
+ an unspecified string is inserted into the buffer. DATA must point
+ to a structure of type json_insert_data. This function may not
+ exit nonlocally. It catches all nonlocal exits and stores them in
+ data->error for reraising. */
+
+static int
+json_insert_callback (const char *buffer, size_t size, void *data)
+{
+ struct json_insert_data *d = data;
+ struct json_buffer_and_size buffer_and_size
+ = {.buffer = buffer, .size = size};
+ d->error = internal_catch_all (json_insert, &buffer_and_size, Fidentity);
+ return NILP (d->error) ? 0 : -1;
+}
+
+DEFUN ("json-insert", Fjson_insert, Sjson_insert, 1, 1, NULL,
+ doc: /* Insert the JSON representation of OBJECT before point.
+This is the same as (insert (json-serialize OBJECT)), but potentially
+faster. See the function `json-serialize' for allowed values of
+OBJECT. */)
+ (Lisp_Object object)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+
+ json_t *json = lisp_to_json (object);
+ record_unwind_protect_ptr (json_release_object, json);
+
+ struct json_insert_data data;
+ int status
+ = json_dump_callback (json, json_insert_callback, &data, JSON_COMPACT);
+ if (status == -1)
+ {
+ if (CONSP (data.error))
+ xsignal (XCAR (data.error), XCDR (data.error));
+ else
+ json_out_of_memory ();
+ }
+
+ return unbind_to (count, Qnil);
+}
+
+/* Convert a JSON object to a Lisp object. */
+
+static _GL_ARG_NONNULL ((1)) Lisp_Object
+json_to_lisp (json_t *json)
+{
+ switch (json_typeof (json))
+ {
+ case JSON_NULL:
+ return QCnull;
+ case JSON_FALSE:
+ return QCfalse;
+ case JSON_TRUE:
+ return Qt;
+ case JSON_INTEGER:
+ /* Return an integer if possible, a floating-point number
+ otherwise. This loses precision for integers with large
+ magnitude; however, such integers tend to be nonportable
+ anyway because many JSON implementations use only 64-bit
+ floating-point numbers with 53 mantissa bits. See
+ https://tools.ietf.org/html/rfc7159#section-6 for some
+ discussion. */
+ return make_fixnum_or_float (json_integer_value (json));
+ case JSON_REAL:
+ return make_float (json_real_value (json));
+ case JSON_STRING:
+ return json_make_string (json_string_value (json),
+ json_string_length (json));
+ case JSON_ARRAY:
+ {
+ if (++lisp_eval_depth > max_lisp_eval_depth)
+ xsignal0 (Qjson_object_too_deep);
+ size_t size = json_array_size (json);
+ if (FIXNUM_OVERFLOW_P (size))
+ xsignal0 (Qoverflow_error);
+ Lisp_Object result = Fmake_vector (make_natnum (size), Qunbound);
+ for (ptrdiff_t i = 0; i < size; ++i)
+ ASET (result, i,
+ json_to_lisp (json_array_get (json, i)));
+ --lisp_eval_depth;
+ return result;
+ }
+ case JSON_OBJECT:
+ {
+ if (++lisp_eval_depth > max_lisp_eval_depth)
+ xsignal0 (Qjson_object_too_deep);
+ size_t size = json_object_size (json);
+ if (FIXNUM_OVERFLOW_P (size))
+ xsignal0 (Qoverflow_error);
+ Lisp_Object result = CALLN (Fmake_hash_table, QCtest, Qequal,
+ QCsize, make_natnum (size));
+ struct Lisp_Hash_Table *h = XHASH_TABLE (result);
+ const char *key_str;
+ json_t *value;
+ json_object_foreach (json, key_str, value)
+ {
+ Lisp_Object key = json_build_string (key_str);
+ EMACS_UINT hash;
+ ptrdiff_t i = hash_lookup (h, key, &hash);
+ /* Keys in JSON objects are unique, so the key can’t be
+ present yet. */
+ eassert (i < 0);
+ hash_put (h, key, json_to_lisp (value), hash);
+ }
+ --lisp_eval_depth;
+ return result;
+ }
+ }
+ /* Can’t get here. */
+ emacs_abort ();
+}
+
+DEFUN ("json-parse-string", Fjson_parse_string, Sjson_parse_string, 1, 1, NULL,
+ doc: /* Parse the JSON STRING into a Lisp object.
+This is essentially the reverse operation of `json-serialize', which
+see. The returned object will be a vector or hashtable. Its elements
+will be `:null', `:false', t, numbers, strings, or further vectors and
+hashtables. If there are duplicate keys in an object, all but the
+last one are ignored. If STRING doesn't contain a valid JSON object,
+an error of type `json-parse-error' is signaled. */)
+ (Lisp_Object string)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+ Lisp_Object encoded = json_encode (string);
+ check_string_without_embedded_nulls (encoded);
+
+ json_error_t error;
+ json_t *object = json_loads (SSDATA (encoded), 0, &error);
+ if (object == NULL)
+ json_parse_error (&error);
+
+ /* Avoid leaking the object in case of further errors. */
+ if (object != NULL)
+ record_unwind_protect_ptr (json_release_object, object);
+
+ return unbind_to (count, json_to_lisp (object));
+}
+
+struct json_read_buffer_data
+{
+ /* Byte position of position to read the next chunk from. */
+ ptrdiff_t point;
+};
+
+/* Callback for json_load_callback that reads from the current buffer.
+ DATA must point to a structure of type json_read_buffer_data.
+ data->point must point to the byte position to read from; after
+ reading, data->point is advanced accordingly. The buffer point
+ itself is ignored. This function may not exit nonlocally. */
+
+static size_t
+json_read_buffer_callback (void *buffer, size_t buflen, void *data)
+{
+ struct json_read_buffer_data *d = data;
+
+ /* First, parse from point to the gap or the end of the accessible
+ portion, whatever is closer. */
+ ptrdiff_t point = d->point;
+ ptrdiff_t end = BUFFER_CEILING_OF (point) + 1;
+ ptrdiff_t count = end - point;
+ if (buflen < count)
+ count = buflen;
+ memcpy (buffer, BYTE_POS_ADDR (point), count);
+ d->point += count;
+ return count;
+}
+
+DEFUN ("json-parse-buffer", Fjson_parse_buffer, Sjson_parse_buffer,
+ 0, 0, NULL,
+ doc: /* Read JSON object from current buffer starting at point.
+This is similar to `json-parse-string', which see. Move point after
+the end of the object if parsing was successful. On error, point is
+not moved. */)
+ (void)
+{
+ ptrdiff_t count = SPECPDL_INDEX ();
+
+ ptrdiff_t point = PT_BYTE;
+ struct json_read_buffer_data data = {.point = point};
+ json_error_t error;
+ json_t *object = json_load_callback (json_read_buffer_callback, &data,
+ JSON_DISABLE_EOF_CHECK, &error);
+
+ if (object == NULL)
+ json_parse_error (&error);
+
+ /* Avoid leaking the object in case of further errors. */
+ record_unwind_protect_ptr (json_release_object, object);
+
+ /* Convert and then move point only if everything succeeded. */
+ Lisp_Object lisp = json_to_lisp (object);
+
+ /* Adjust point by how much we just read. */
+ point += error.position;
+ SET_PT_BOTH (BYTE_TO_CHAR (point), point);
+
+ return unbind_to (count, lisp);
+}
+
+/* Simplified version of ‘define-error’ that works with pure
+ objects. */
+
+static void
+define_error (Lisp_Object name, const char *message, Lisp_Object parent)
+{
+ eassert (SYMBOLP (name));
+ eassert (SYMBOLP (parent));
+ Lisp_Object parent_conditions = Fget (parent, Qerror_conditions);
+ eassert (CONSP (parent_conditions));
+ eassert (!NILP (Fmemq (parent, parent_conditions)));
+ eassert (NILP (Fmemq (name, parent_conditions)));
+ Fput (name, Qerror_conditions, pure_cons (name, parent_conditions));
+ Fput (name, Qerror_message, build_pure_c_string (message));
+}
+
+void
+syms_of_json (void)
+{
+ DEFSYM (QCnull, ":null");
+ DEFSYM (QCfalse, ":false");
+
+ DEFSYM (Qstring_without_embedded_nulls_p, "string-without-embedded-nulls-p");
+ DEFSYM (Qjson_value_p, "json-value-p");
+
+ DEFSYM (Qutf_8_unix, "utf-8-unix");
+
+ DEFSYM (Qjson_error, "json-error");
+ DEFSYM (Qjson_out_of_memory, "json-out-of-memory");
+ DEFSYM (Qjson_parse_error, "json-parse-error");
+ DEFSYM (Qjson_end_of_file, "json-end-of-file");
+ DEFSYM (Qjson_trailing_content, "json-trailing-content");
+ DEFSYM (Qjson_object_too_deep, "json-object-too-deep");
+ define_error (Qjson_error, "generic JSON error", Qerror);
+ define_error (Qjson_out_of_memory,
+ "not enough memory for creating JSON object", Qjson_error);
+ define_error (Qjson_parse_error, "could not parse JSON stream",
+ Qjson_error);
+ define_error (Qjson_end_of_file, "end of JSON stream", Qjson_parse_error);
+ define_error (Qjson_trailing_content, "trailing content after JSON stream",
+ Qjson_parse_error);
+ define_error (Qjson_object_too_deep,
+ "object cyclic or Lisp evaluation too deep", Qjson_error);
+
+ DEFSYM (Qpure, "pure");
+ DEFSYM (Qside_effect_free, "side-effect-free");
+
+ DEFSYM (Qjson_serialize, "json-serialize");
+ DEFSYM (Qjson_parse_string, "json-parse-string");
+ Fput (Qjson_serialize, Qpure, Qt);
+ Fput (Qjson_serialize, Qside_effect_free, Qt);
+ Fput (Qjson_parse_string, Qpure, Qt);
+ Fput (Qjson_parse_string, Qside_effect_free, Qt);
+
+ defsubr (&Sjson_serialize);
+ defsubr (&Sjson_insert);
+ defsubr (&Sjson_parse_string);
+ defsubr (&Sjson_parse_buffer);
+}
diff --git a/src/lisp.h b/src/lisp.h
index 266370333f..8ccad2c224 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -3441,6 +3441,12 @@ extern int x_bitmap_mask (struct frame *, ptrdiff_t);
extern void reset_image_types (void);
extern void syms_of_image (void);
+#ifdef HAVE_JSON
+/* Defined in json.c. */
+extern void init_json (void);
+extern void syms_of_json (void);
+#endif
+
/* Defined in insdel.c. */
extern void move_gap_both (ptrdiff_t, ptrdiff_t);
extern _Noreturn void buffer_overflow (void);
@@ -3864,6 +3870,7 @@ extern Lisp_Object internal_condition_case_2 (Lisp_Object (*) (Lisp_Object, Lisp
extern Lisp_Object internal_condition_case_n
(Lisp_Object (*) (ptrdiff_t, Lisp_Object *), ptrdiff_t, Lisp_Object *,
Lisp_Object, Lisp_Object (*) (Lisp_Object, ptrdiff_t, Lisp_Object *));
+extern Lisp_Object internal_catch_all (Lisp_Object (*) (void *), void *, Lisp_Object (*) (Lisp_Object));
extern struct handler *push_handler (Lisp_Object, enum handlertype);
extern struct handler *push_handler_nosignal (Lisp_Object, enum handlertype);
extern void specbind (Lisp_Object, Lisp_Object);
diff --git a/test/src/json-tests.el b/test/src/json-tests.el
new file mode 100644
index 0000000000..5d3c84a136
--- /dev/null
+++ b/test/src/json-tests.el
@@ -0,0 +1,97 @@
+;;; json-tests.el --- unit tests for json.c -*- lexical-binding: t; -*-
+
+;; Copyright (C) 2017 Free Software Foundation, Inc.
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
+
+;;; Commentary:
+
+;; Unit tests for src/json.c.
+
+;;; Code:
+
+(require 'cl-lib)
+(require 'map)
+
+(ert-deftest json-serialize/roundtrip ()
+ (let ((lisp [:null :false t 0 123 -456 3.75 "abcαβγ"])
+ (json "[null,false,true,0,123,-456,3.75,\"abcαβγ\"]"))
+ (should (equal (json-serialize lisp) json))
+ (with-temp-buffer
+ (json-insert lisp)
+ (should (equal (buffer-string) json))
+ (should (eobp)))
+ (should (equal (json-parse-string json) lisp))
+ (with-temp-buffer
+ (insert json)
+ (goto-char 1)
+ (should (equal (json-parse-buffer) lisp))
+ (should (eobp)))))
+
+(ert-deftest json-serialize/object ()
+ (let ((table (make-hash-table :test #'equal)))
+ (puthash "abc" [1 2 t] table)
+ (puthash "def" :null table)
+ (should (equal (json-serialize table)
+ "{\"abc\":[1,2,true],\"def\":null}"))))
+
+(ert-deftest json-parse-string/object ()
+ (let ((actual
+ (json-parse-string
+ "{ \"abc\" : [1, 2, true], \"def\" : null, \"abc\" : [9, false] }\n")))
+ (should (hash-table-p actual))
+ (should (equal (hash-table-count actual) 2))
+ (should (equal (cl-sort (map-pairs actual) #'string< :key #'car)
+ '(("abc" . [9 :false]) ("def" . :null))))))
+
+(ert-deftest json-parse-string/string ()
+ (should-error (json-parse-string "[\"formfeed\f\"]") :type 'json-parse-error)
+ (should (equal (json-parse-string "[\"foo \\\"bar\\\"\"]") ["foo \"bar\""]))
+ (should (equal (json-parse-string "[\"abcαβγ\"]") ["abcαβγ"]))
+ (should (equal (json-parse-string "[\"\\nasd\\u0444\\u044b\\u0432fgh\\t\"]")
+ ["\nasdфывfgh\t"]))
+ (should (equal (json-parse-string "[\"\\uD834\\uDD1E\"]") ["\U0001D11E"]))
+ (should-error (json-parse-string "foo") :type 'json-parse-error))
+
+(ert-deftest json-serialize/string ()
+ (should (equal (json-serialize ["foo"]) "[\"foo\"]"))
+ (should (equal (json-serialize ["a\n\fb"]) "[\"a\\n\\fb\"]"))
+ (should (equal (json-serialize ["\nasdфыв\u001f\u007ffgh\t"])
+ "[\"\\nasdфыв\\u001F\u007ffgh\\t\"]")))
+
+(ert-deftest json-parse-string/incomplete ()
+ (should-error (json-parse-string "[123") :type 'json-end-of-file))
+
+(ert-deftest json-parse-string/trailing ()
+ (should-error (json-parse-string "[123] [456]") :type 'json-trailing-content))
+
+(ert-deftest json-parse-buffer/incomplete ()
+ (with-temp-buffer
+ (insert "[123")
+ (goto-char 1)
+ (should-error (json-parse-buffer) :type 'json-end-of-file)
+ (should (bobp))))
+
+(ert-deftest json-parse-buffer/trailing ()
+ (with-temp-buffer
+ (insert "[123] [456]")
+ (goto-char 1)
+ (should (equal (json-parse-buffer) [123]))
+ (should-not (bobp))
+ (should (looking-at-p (rx " [456]" eos)))))
+
+(provide 'json-tests)
+;;; json-tests.el ends here
--
2.14.2
^ permalink raw reply related [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-10-29 22:49 ` Philipp Stephani
@ 2017-12-09 23:05 ` Philipp Stephani
2017-12-10 7:08 ` Eli Zaretskii
0 siblings, 1 reply; 81+ messages in thread
From: Philipp Stephani @ 2017-12-09 23:05 UTC (permalink / raw)
To: Paul Eggert; +Cc: Eli Zaretskii, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1572 bytes --]
Philipp Stephani <p.stephani2@gmail.com> schrieb am So., 29. Okt. 2017 um
23:49 Uhr:
> Philipp Stephani <p.stephani2@gmail.com> schrieb am So., 29. Okt. 2017 um
> 21:48 Uhr:
>
>> Paul Eggert <eggert@cs.ucla.edu> schrieb am Mo., 9. Okt. 2017 um
>> 08:19 Uhr:
>>
>>> Philipp Stephani wrote:
>>> > I don't think Jansson can use xmalloc because xmalloc can exit
>>> nonlocally,
>>> > which is not expected by a third-party library such as Jansson. It
>>> could
>>> > use a suitable wrapper of lmalloc, though.
>>>
>>> That would be overkill, as lmalloc arranges for Lisp alignment, which
>>> Jansson
>>> does not need. We could define new functions (smalloc and srealloc,
>>> say), that
>>> act like malloc and realloc except they return NULL for requests larger
>>> than
>>> PTRDIFF_MAX. Right now, I expect only the JSON code needs this sort of
>>> thing so
>>> we could put the new functions in json.c. If other code needs it later
>>> we could
>>> move these new functions to alloc.c.
>>>
>>
>> Yes, that sounds reasonable.
>>
>
> Here's a new patch that incorporates some of these changes. Specifically:
>
> - I've removed some of the assertions
> - I've installed a custom allocator, as you suggested
> - Reverted back to creating a temporary string and inserting that into the
> buffer. Anything else just doesn't seem to work or seems way too complex.
> - Introduced explicit encoding and decoding. I suspect that will lead to a
> massive performance hit, but I haven't done any benchmarks yet.
> - Added manual section and NEWS entry
>
Is that patch OK for master?
[-- Attachment #2: Type: text/html, Size: 2547 bytes --]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-12-09 23:05 ` Philipp Stephani
@ 2017-12-10 7:08 ` Eli Zaretskii
2017-12-10 13:26 ` Philipp Stephani
0 siblings, 1 reply; 81+ messages in thread
From: Eli Zaretskii @ 2017-12-10 7:08 UTC (permalink / raw)
To: Philipp Stephani; +Cc: eggert, emacs-devel
> From: Philipp Stephani <p.stephani2@gmail.com>
> Date: Sat, 09 Dec 2017 23:05:35 +0000
> Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org
>
> Is that patch OK for master?
Yes, thanks.
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-12-10 7:08 ` Eli Zaretskii
@ 2017-12-10 13:26 ` Philipp Stephani
2017-12-10 13:32 ` Ted Zlatanov
0 siblings, 1 reply; 81+ messages in thread
From: Philipp Stephani @ 2017-12-10 13:26 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: eggert, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 312 bytes --]
Eli Zaretskii <eliz@gnu.org> schrieb am So., 10. Dez. 2017 um 08:08 Uhr:
> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Sat, 09 Dec 2017 23:05:35 +0000
> > Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org
> >
> > Is that patch OK for master?
>
> Yes, thanks.
>
Thanks, pushed as ab203e36d5.
[-- Attachment #2: Type: text/html, Size: 802 bytes --]
^ permalink raw reply [flat|nested] 81+ messages in thread
* Re: JSON/YAML/TOML/etc. parsing performance
2017-12-10 13:26 ` Philipp Stephani
@ 2017-12-10 13:32 ` Ted Zlatanov
0 siblings, 0 replies; 81+ messages in thread
From: Ted Zlatanov @ 2017-12-10 13:32 UTC (permalink / raw)
To: Philipp Stephani; +Cc: Eli Zaretskii, eggert, emacs-devel
On Sun, 10 Dec 2017 13:26:03 +0000 Philipp Stephani <p.stephani2@gmail.com> wrote:
PS> Eli Zaretskii <eliz@gnu.org> schrieb am So., 10. Dez. 2017 um 08:08 Uhr:
>> > From: Philipp Stephani <p.stephani2@gmail.com>
>> > Date: Sat, 09 Dec 2017 23:05:35 +0000
>> > Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org
>> >
>> > Is that patch OK for master?
>>
>> Yes, thanks.
>>
PS> Thanks, pushed as ab203e36d5.
Thank you for working on this and providing tests and documentation.
Ted
^ permalink raw reply [flat|nested] 81+ messages in thread
end of thread, other threads:[~2017-12-10 13:32 UTC | newest]
Thread overview: 81+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-16 15:54 JSON/YAML/TOML/etc. parsing performance Ted Zlatanov
2017-09-16 16:02 ` Mark Oteiza
2017-09-17 0:02 ` Richard Stallman
2017-09-17 3:13 ` Mark Oteiza
2017-09-18 0:00 ` Richard Stallman
2017-09-17 0:02 ` Richard Stallman
2017-09-18 13:46 ` Ted Zlatanov
2017-09-17 18:46 ` Philipp Stephani
2017-09-17 19:05 ` Eli Zaretskii
2017-09-17 20:27 ` Philipp Stephani
2017-09-17 22:41 ` Mark Oteiza
2017-09-18 13:53 ` Ted Zlatanov
2017-09-17 21:17 ` Speed of Elisp (was: JSON/YAML/TOML/etc. parsing performance) Stefan Monnier
2017-09-18 13:26 ` JSON/YAML/TOML/etc. parsing performance Philipp Stephani
2017-09-18 13:58 ` Mark Oteiza
2017-09-18 14:14 ` Philipp Stephani
2017-09-18 14:28 ` Mark Oteiza
2017-09-18 14:36 ` Philipp Stephani
2017-09-18 15:02 ` Eli Zaretskii
2017-09-18 16:14 ` Philipp Stephani
2017-09-18 17:33 ` Eli Zaretskii
2017-09-18 19:57 ` Thien-Thi Nguyen
2017-09-18 14:57 ` Eli Zaretskii
2017-09-18 15:07 ` Mark Oteiza
2017-09-18 15:51 ` Eli Zaretskii
2017-09-18 16:22 ` Philipp Stephani
2017-09-18 18:08 ` Eli Zaretskii
2017-09-19 19:32 ` Richard Stallman
2017-09-18 17:26 ` Glenn Morris
2017-09-18 18:16 ` Eli Zaretskii
2017-09-18 16:08 ` Philipp Stephani
2017-09-19 8:18 ` Philipp Stephani
2017-09-19 19:09 ` Eli Zaretskii
2017-09-28 21:19 ` Philipp Stephani
2017-09-28 21:27 ` Stefan Monnier
2017-09-29 19:55 ` Eli Zaretskii
2017-09-30 22:02 ` Philipp Stephani
2017-10-01 18:06 ` Eli Zaretskii
2017-10-03 12:26 ` Philipp Stephani
2017-10-03 15:31 ` Eli Zaretskii
2017-10-03 15:52 ` Philipp Stephani
2017-10-03 16:26 ` Eli Zaretskii
2017-10-03 17:10 ` Eli Zaretskii
2017-10-03 18:37 ` Philipp Stephani
2017-10-03 20:52 ` Paul Eggert
2017-10-04 5:33 ` Eli Zaretskii
2017-10-04 6:41 ` Paul Eggert
2017-10-04 8:03 ` Eli Zaretskii
2017-10-04 17:51 ` Paul Eggert
2017-10-04 19:38 ` Eli Zaretskii
2017-10-04 21:24 ` Paul Eggert
2017-10-05 1:48 ` Paul Eggert
2017-10-05 7:14 ` Eli Zaretskii
2017-10-08 22:52 ` Philipp Stephani
2017-10-09 5:54 ` Paul Eggert
2017-10-29 20:48 ` Philipp Stephani
2017-10-09 6:38 ` Eli Zaretskii
2017-10-05 7:12 ` Eli Zaretskii
2017-10-06 1:58 ` Paul Eggert
2017-10-06 7:40 ` Eli Zaretskii
2017-10-06 19:36 ` Paul Eggert
2017-10-06 21:03 ` Eli Zaretskii
2017-10-08 23:09 ` Philipp Stephani
2017-10-09 6:19 ` Paul Eggert
2017-10-29 20:48 ` Philipp Stephani
2017-10-29 22:49 ` Philipp Stephani
2017-12-09 23:05 ` Philipp Stephani
2017-12-10 7:08 ` Eli Zaretskii
2017-12-10 13:26 ` Philipp Stephani
2017-12-10 13:32 ` Ted Zlatanov
2017-10-08 23:04 ` Philipp Stephani
2017-10-09 6:47 ` Eli Zaretskii
2017-10-08 17:58 ` Philipp Stephani
2017-10-08 18:42 ` Eli Zaretskii
2017-10-08 23:14 ` Philipp Stephani
2017-10-09 6:53 ` Eli Zaretskii
2017-10-29 20:41 ` Philipp Stephani
2017-10-09 6:22 ` Paul Eggert
2017-10-01 18:38 ` Eli Zaretskii
2017-10-03 12:12 ` Philipp Stephani
2017-10-03 14:54 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).