* two json.el bugs
@ 2017-05-05 17:51 Theresa O'Connor
2017-05-05 18:44 ` Yuri Khan
2017-05-27 13:48 ` Philipp Stephani
0 siblings, 2 replies; 13+ messages in thread
From: Theresa O'Connor @ 2017-05-05 17:51 UTC (permalink / raw)
To: emacs-devel
hi all,
i wanted to write up two bugs in json.el that i haven't had a chance
to fix (in years, ooof). one is super minor and the other might
require a bit of thinking to come up with an elegant fix.
1. `json-skip-whitespace' uses a hardcoded list of whitespace
characters, which means it fails to skip over other WSpace=Y
characters. It should probably use `search-whitespace-regexp' from
isearch.el (or an equivalent value from elsewhere) instead.
2. Order of key-value pairs is not explicitly preserved in either the
reader or the serializer, and if there are duplicate keys, either the
first or the last wins depending on a number of factors. While json
objects are technically defined to be unordered (and therefore
json.el's current behavior is conforming), the standard JS
implementation preserves order and a convention has developed whereby
duplicate keys are used to provide "comments", e.g.
{
"foo": "the foo property is used for blah blah blah.",
"foo": 4
}
Last key should always win when reading, and order needs to be
preserved when serializing so that these "comments" can be generated.
This is a fairly serious interoperability issue.
thanks,
tess
p.s. for extra credit, if someone could s/Edward O'Connor/Theresa
O'Connor/g everywhere i would really appreciate it.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: two json.el bugs
2017-05-05 17:51 two json.el bugs Theresa O'Connor
@ 2017-05-05 18:44 ` Yuri Khan
2017-05-05 19:25 ` Drew Adams
` (2 more replies)
2017-05-27 13:48 ` Philipp Stephani
1 sibling, 3 replies; 13+ messages in thread
From: Yuri Khan @ 2017-05-05 18:44 UTC (permalink / raw)
To: Theresa O'Connor; +Cc: Emacs developers
On Sat, May 6, 2017 at 12:51 AM, Theresa O'Connor <hober0@gmail.com> wrote:
> 1. `json-skip-whitespace' uses a hardcoded list of whitespace
> characters, which means it fails to skip over other WSpace=Y
> characters. It should probably use `search-whitespace-regexp' from
> isearch.el (or an equivalent value from elsewhere) instead.
JSON is defined by the standard [ECMA-404][1].
[1]: http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf
Section 4, paragraph the last:
| Insignificant whitespace is allowed before or after any token.
| The whitespace characters are: character tabulation (U+0009),
| line feed (U+000A), carriage return (U+000D), and space (U+0020).
In my copy of Emacs 25.1, json-skip-whitespace looks like this:
(defun json-skip-whitespace ()
"Skip past the whitespace at point."
(skip-chars-forward "\t\r\n\f\b "))
that is, it skips the whitespace characters defined by the spec, and
in addition to that, also form feed (U+000C) and backspace (U+0007).
If anything, \f\b should be *removed* from characters to be skipped.
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: two json.el bugs
2017-05-05 18:44 ` Yuri Khan
@ 2017-05-05 19:25 ` Drew Adams
2017-05-20 15:51 ` [PATCH] Fix definition of whitespace in JSON Philipp Stephani
2017-05-27 13:32 ` two json.el bugs Philipp Stephani
2 siblings, 0 replies; 13+ messages in thread
From: Drew Adams @ 2017-05-05 19:25 UTC (permalink / raw)
To: Yuri Khan, Theresa O'Connor; +Cc: Emacs developers
> > 1. `json-skip-whitespace' uses a hardcoded list of whitespace
> > characters, which means it fails to skip over other WSpace=Y
> > characters. It should probably use `search-whitespace-regexp' from
> > isearch.el (or an equivalent value from elsewhere) instead.
>
> JSON is defined by the standard [ECMA-404][1]...
> Section 4, paragraph the last:
>
> | Insignificant whitespace is allowed before or after any token.
> | The whitespace characters are: character tabulation (U+0009),
> | line feed (U+000A), carriage return (U+000D), and space (U+0020).
>
> In my copy of Emacs 25.1, json-skip-whitespace looks like this:
>
> (defun json-skip-whitespace ()
> "Skip past the whitespace at point."
> (skip-chars-forward "\t\r\n\f\b "))
>
> that is, it skips the whitespace characters defined by the spec, and
> in addition to that, also form feed (U+000C) and backspace (U+0007).
>
> If anything, \f\b should be *removed* from characters to be skipped.
+1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] Fix definition of whitespace in JSON
2017-05-05 18:44 ` Yuri Khan
2017-05-05 19:25 ` Drew Adams
@ 2017-05-20 15:51 ` Philipp Stephani
2017-05-20 17:28 ` Dmitry Gutov
2017-05-27 13:32 ` two json.el bugs Philipp Stephani
2 siblings, 1 reply; 13+ messages in thread
From: Philipp Stephani @ 2017-05-20 15:51 UTC (permalink / raw)
To: emacs-devel; +Cc: Philipp Stephani
See
https://lists.gnu.org/archive/html/emacs-devel/2017-05/msg00115.html.
* lisp/json.el (json-skip-whitespace): Fix definition.
* test/lisp/json-tests.el (test-json-skip-whitespace): Adapt unit
test.
---
lisp/json.el | 6 +++++-
test/lisp/json-tests.el | 5 ++++-
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/lisp/json.el b/lisp/json.el
index 5f403a411b..3def94ce04 100644
--- a/lisp/json.el
+++ b/lisp/json.el
@@ -206,7 +206,11 @@ json-pop
(defun json-skip-whitespace ()
"Skip past the whitespace at point."
- (skip-chars-forward "\t\r\n\f\b "))
+ ;; See
+ ;; https://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf
+ ;; or https://tools.ietf.org/html/rfc7159#section-2 for the
+ ;; definition of whitespace in JSON.
+ (skip-chars-forward "\t\r\n "))
\f
diff --git a/test/lisp/json-tests.el b/test/lisp/json-tests.el
index 38672de066..c6bd295d66 100644
--- a/test/lisp/json-tests.el
+++ b/test/lisp/json-tests.el
@@ -89,7 +89,10 @@ json-tests--with-temp-buffer
(ert-deftest test-json-skip-whitespace ()
(json-tests--with-temp-buffer "\t\r\n\f\b { \"a\": 1 }"
(json-skip-whitespace)
- (should (equal (char-after (point)) ?{))))
+ (should (equal (char-after) ?\f)))
+ (json-tests--with-temp-buffer "\t\r\n\t { \"a\": 1 }"
+ (json-skip-whitespace)
+ (should (equal (char-after) ?{))))
;;; Paths
--
2.13.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] Fix definition of whitespace in JSON
2017-05-20 15:51 ` [PATCH] Fix definition of whitespace in JSON Philipp Stephani
@ 2017-05-20 17:28 ` Dmitry Gutov
2017-05-21 21:03 ` Philipp Stephani
0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Gutov @ 2017-05-20 17:28 UTC (permalink / raw)
To: Philipp Stephani, emacs-devel; +Cc: Philipp Stephani
On 20.05.2017 18:51, Philipp Stephani wrote:
> See
> https://lists.gnu.org/archive/html/emacs-devel/2017-05/msg00115.html.
>
> * lisp/json.el (json-skip-whitespace): Fix definition.
> * test/lisp/json-tests.el (test-json-skip-whitespace): Adapt unit
> test.
LGTM, please install.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] Fix definition of whitespace in JSON
2017-05-20 17:28 ` Dmitry Gutov
@ 2017-05-21 21:03 ` Philipp Stephani
0 siblings, 0 replies; 13+ messages in thread
From: Philipp Stephani @ 2017-05-21 21:03 UTC (permalink / raw)
To: Dmitry Gutov, emacs-devel; +Cc: Philipp Stephani
[-- Attachment #1: Type: text/plain, Size: 399 bytes --]
Dmitry Gutov <dgutov@yandex.ru> schrieb am Sa., 20. Mai 2017 um 19:28 Uhr:
> On 20.05.2017 18:51, Philipp Stephani wrote:
> > See
> > https://lists.gnu.org/archive/html/emacs-devel/2017-05/msg00115.html.
> >
> > * lisp/json.el (json-skip-whitespace): Fix definition.
> > * test/lisp/json-tests.el (test-json-skip-whitespace): Adapt unit
> > test.
>
> LGTM, please install.
>
Pushed as 32f80eb678.
[-- Attachment #2: Type: text/html, Size: 840 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: two json.el bugs
2017-05-05 18:44 ` Yuri Khan
2017-05-05 19:25 ` Drew Adams
2017-05-20 15:51 ` [PATCH] Fix definition of whitespace in JSON Philipp Stephani
@ 2017-05-27 13:32 ` Philipp Stephani
2 siblings, 0 replies; 13+ messages in thread
From: Philipp Stephani @ 2017-05-27 13:32 UTC (permalink / raw)
To: Yuri Khan, Theresa O'Connor; +Cc: Emacs developers
[-- Attachment #1: Type: text/plain, Size: 1273 bytes --]
Yuri Khan <yuri.v.khan@gmail.com> schrieb am Fr., 5. Mai 2017 um 20:45 Uhr:
> On Sat, May 6, 2017 at 12:51 AM, Theresa O'Connor <hober0@gmail.com>
> wrote:
>
> > 1. `json-skip-whitespace' uses a hardcoded list of whitespace
> > characters, which means it fails to skip over other WSpace=Y
> > characters. It should probably use `search-whitespace-regexp' from
> > isearch.el (or an equivalent value from elsewhere) instead.
>
> JSON is defined by the standard [ECMA-404][1].
>
> [1]:
> http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf
>
> Section 4, paragraph the last:
>
> | Insignificant whitespace is allowed before or after any token.
> | The whitespace characters are: character tabulation (U+0009),
> | line feed (U+000A), carriage return (U+000D), and space (U+0020).
>
> In my copy of Emacs 25.1, json-skip-whitespace looks like this:
>
> (defun json-skip-whitespace ()
> "Skip past the whitespace at point."
> (skip-chars-forward "\t\r\n\f\b "))
>
> that is, it skips the whitespace characters defined by the spec, and
> in addition to that, also form feed (U+000C) and backspace (U+0007).
>
> If anything, \f\b should be *removed* from characters to be skipped.
>
>
I've done that in commit 32f80eb678c4dc6335063cc39975bbce2766829a.
[-- Attachment #2: Type: text/html, Size: 1879 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: two json.el bugs
2017-05-05 17:51 two json.el bugs Theresa O'Connor
2017-05-05 18:44 ` Yuri Khan
@ 2017-05-27 13:48 ` Philipp Stephani
2017-05-28 1:30 ` Drew Adams
1 sibling, 1 reply; 13+ messages in thread
From: Philipp Stephani @ 2017-05-27 13:48 UTC (permalink / raw)
To: Theresa O'Connor, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1441 bytes --]
Theresa O'Connor <hober0@gmail.com> schrieb am Fr., 5. Mai 2017 um
19:52 Uhr:
>
> 2. Order of key-value pairs is not explicitly preserved in either the
> reader or the serializer, and if there are duplicate keys, either the
> first or the last wins depending on a number of factors. While json
> objects are technically defined to be unordered (and therefore
> json.el's current behavior is conforming), the standard JS
> implementation preserves order and a convention has developed whereby
> duplicate keys are used to provide "comments", e.g.
>
> {
> "foo": "the foo property is used for blah blah blah.",
> "foo": 4
> }
>
> Last key should always win when reading, and order needs to be
> preserved when serializing so that these "comments" can be generated.
> This is a fairly serious interoperability issue.
>
>
This one needs to be designed carefully. We can't simply
serialize/desterialize alists in key order because the order is reversed in
Emacs. (In Emacs alists, if there are duplicates, the first one wins, in
JSON the last one wins.)
So maybe we should just reverse the order on serializing? Unfortunately
this means that serializing and then deserializing is no longer a
round-trip. Should we then also reverse the order on deserializing? If
there are duplicates it would still not be a round-trip, but at least would
allow the "comment" syntax.
Alternatively, we might just throw an error when encountering duplicates.
[-- Attachment #2: Type: text/html, Size: 1862 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: two json.el bugs
2017-05-27 13:48 ` Philipp Stephani
@ 2017-05-28 1:30 ` Drew Adams
2017-05-28 18:47 ` Philipp Stephani
0 siblings, 1 reply; 13+ messages in thread
From: Drew Adams @ 2017-05-28 1:30 UTC (permalink / raw)
To: Philipp Stephani, Theresa O'Connor, emacs-devel
> We can't simply serialize/desterialize alists in key order
> because the order is reversed in Emacs. (In Emacs alists,
> if there are duplicates, the first one wins,
> in JSON the last one wins.)
^^^^^^^^^^^^^^^^^^^^^^^^^
If you're talking about JSON objects then no, no particular
meaning or behavior is defined for duplicate fields (keys)
in a JSON object.
A given application that handles JSON object is free to
act as you say. And it is free to act otherwise.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: two json.el bugs
2017-05-28 1:30 ` Drew Adams
@ 2017-05-28 18:47 ` Philipp Stephani
2017-05-28 22:12 ` Drew Adams
0 siblings, 1 reply; 13+ messages in thread
From: Philipp Stephani @ 2017-05-28 18:47 UTC (permalink / raw)
To: Drew Adams, Theresa O'Connor, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 627 bytes --]
Drew Adams <drew.adams@oracle.com> schrieb am So., 28. Mai 2017 um
03:30 Uhr:
> > We can't simply serialize/desterialize alists in key order
> > because the order is reversed in Emacs. (In Emacs alists,
> > if there are duplicates, the first one wins,
> > in JSON the last one wins.)
> ^^^^^^^^^^^^^^^^^^^^^^^^^
>
> If you're talking about JSON objects then no, no particular
> meaning or behavior is defined for duplicate fields (keys)
> in a JSON object.
>
> A given application that handles JSON object is free to
> act as you say. And it is free to act otherwise.
>
Please see Theresa's initial post why this matters.
[-- Attachment #2: Type: text/html, Size: 963 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: two json.el bugs
2017-05-28 18:47 ` Philipp Stephani
@ 2017-05-28 22:12 ` Drew Adams
2017-05-28 22:44 ` Bruce V Chiarelli
0 siblings, 1 reply; 13+ messages in thread
From: Drew Adams @ 2017-05-28 22:12 UTC (permalink / raw)
To: Philipp Stephani, Theresa O'Connor, emacs-devel
> > > We can't simply serialize/desterialize alists in key order
> > > because the order is reversed in Emacs. (In Emacs alists,
> > > if there are duplicates, the first one wins,
> > > in JSON the last one wins.)
> > ^^^^^^^^^^^^^^^^^^^^^^^^^
> >
> > If you're talking about JSON objects then no, no particular
> > meaning or behavior is defined for duplicate fields (keys)
> > in a JSON object.
> >
> > A given application that handles JSON object is free to
> > act as you say. And it is free to act otherwise.
>
> Please see Theresa's initial post why this matters.
It is generally good to preserve the order. With
that general intention I agree.
What's not true is that "in JSON the last one wins"
when there are duplicates. JSON itself imposes no
such semantics.
And I question mention of "the standard JS
implementation". No such thing, AFAIK.
An application can do anything it wants with JSON
data, of course. In particular, it can if it wants
(but typically would not) depend on the order of
duplicate fields.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: two json.el bugs
2017-05-28 22:12 ` Drew Adams
@ 2017-05-28 22:44 ` Bruce V Chiarelli
2017-05-29 0:42 ` Drew Adams
0 siblings, 1 reply; 13+ messages in thread
From: Bruce V Chiarelli @ 2017-05-28 22:44 UTC (permalink / raw)
To: Drew Adams; +Cc: Philipp Stephani, Theresa O'Connor, emacs-devel
Drew Adams <drew.adams@oracle.com> writes:
>> > > We can't simply serialize/desterialize alists in key order
>> > > because the order is reversed in Emacs. (In Emacs alists,
>> > > if there are duplicates, the first one wins,
>> > > in JSON the last one wins.)
>> > ^^^^^^^^^^^^^^^^^^^^^^^^^
>> >
>> > If you're talking about JSON objects then no, no particular
>> > meaning or behavior is defined for duplicate fields (keys)
>> > in a JSON object.
>> >
>> > A given application that handles JSON object is free to
>> > act as you say. And it is free to act otherwise.
>>
>> Please see Theresa's initial post why this matters.
>
> It is generally good to preserve the order. With
> that general intention I agree.
>
> What's not true is that "in JSON the last one wins"
> when there are duplicates. JSON itself imposes no
> such semantics.
>
> And I question mention of "the standard JS
> implementation". No such thing, AFAIK.
ECMA 404 doesn't specify anything about duplicate keys, but the JSON
parser behavior required by Section 24.3 of ECMA 262 does indeed. It's
not an implementation of course, but it does dictate how compliant JS
implementations should handle JSON.
From the end of listing 24.3.1.1: "In the case where there are
duplicate name Strings within an object, lexically preceeding values
for the same key shall be overwritten"
(https://www.ecma-international.org/ecma-262/7.0/index.html#sec-internalizejsonproperty).
It could be argued that Emacs does not need to have (or pretend to
have) a 262-conforming parser. After all, it's not a JS
interpreter. But as Tess mentioned, this particular point creates
interoperability issues with conforming interpreters.
---
Bruce V. Chiarelli
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: two json.el bugs
2017-05-28 22:44 ` Bruce V Chiarelli
@ 2017-05-29 0:42 ` Drew Adams
0 siblings, 0 replies; 13+ messages in thread
From: Drew Adams @ 2017-05-29 0:42 UTC (permalink / raw)
To: Bruce V Chiarelli; +Cc: Philipp Stephani, Theresa O'Connor, emacs-devel
> >> > If you're talking about JSON objects then no, no particular
> >> > meaning or behavior is defined for duplicate fields (keys)
> >> > in a JSON object.
> >> >
> >> > A given application that handles JSON object is free to
> >> > act as you say. And it is free to act otherwise.
> >>
> >> Please see Theresa's initial post why this matters.
> >
> > It is generally good to preserve the order. With
> > that general intention I agree.
> >
> > What's not true is that "in JSON the last one wins"
> > when there are duplicates. JSON itself imposes no
> > such semantics.
> ECMA 404 doesn't specify anything about duplicate keys,
And rightfully so.
> but the JSON parser behavior
_JavaScript_ JSON parser behavior
> required by Section 24.3 of ECMA 262 does indeed. It's
> not an implementation of course, but it does dictate how
> compliant JS implementations should handle JSON.
Yes, compliant _JavaScript_ implementations.
Is `json.el' intended to provide JavaScript handling
(compliant or not) of JSON data? Dunno.
JavaScript is not JSON is not JavaScript.
JSON is almost, but not quite, a subset of JavaScript
object-literal notation (thus the name). JSON allows
unescaped Unicode chars U+2028 (LINE SEPARATOR) and
U+2029 (PARAGRAPH SEPARATOR) in strings. JavaScript
notation requires such control chars to be escaped in
strings.
JavaScript is one language that parses and makes use
of JSON data. It is not the only one.
> From the end of listing 24.3.1.1:
(24.5.1.1 in the latest draft)
> It could be argued that Emacs does not need to have (or pretend
> to have) a 262-conforming parser. After all, it's not a JS
> interpreter. But as Tess mentioned, this particular point
> creates interoperability issues with conforming interpreters.
Other things being equal, Emacs should preserve the
order of JSON object fields, for maximum fidelity.
Emacs should at least provide a way to preserve the order.
And if there is no good reason to do otherwise then
preserving the order should be the default behavior.
Insofar as Emacs tries to handle JSON data as would a
_JavaScript_ parser, stringifier, or interpreter, yes,
it would be good for it to act similarly to what ECMA
262 prescribes.
But that's about JavaScript, not JSON.
Is `json.el' trying to provide a JavaScript parser for
JSON? Dunno. The names and comments talk only about
JSON and JSON parsing. JavaScript parsing of JSON is
one way to go. (I have nothing against that, a priori.)
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2017-05-29 0:42 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-05 17:51 two json.el bugs Theresa O'Connor
2017-05-05 18:44 ` Yuri Khan
2017-05-05 19:25 ` Drew Adams
2017-05-20 15:51 ` [PATCH] Fix definition of whitespace in JSON Philipp Stephani
2017-05-20 17:28 ` Dmitry Gutov
2017-05-21 21:03 ` Philipp Stephani
2017-05-27 13:32 ` two json.el bugs Philipp Stephani
2017-05-27 13:48 ` Philipp Stephani
2017-05-28 1:30 ` Drew Adams
2017-05-28 18:47 ` Philipp Stephani
2017-05-28 22:12 ` Drew Adams
2017-05-28 22:44 ` Bruce V Chiarelli
2017-05-29 0:42 ` Drew Adams
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).