unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Improving JSON pretty printing, how to represent floats?
@ 2024-04-05  9:39 Herman, Géza
  2024-04-05 12:35 ` Eli Zaretskii
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Herman, Géza @ 2024-04-05  9:39 UTC (permalink / raw)
  To: emacs-devel

Hi,

I'm thinking about using the new JSON parser and encoder for 
json-pretty-print.

There is a thing that I don't like about the current 
json-pretty-print: as it parses floats, pretty printing can be 
lossy.  If one pretty prints this

--8<---------------cut here---------------start------------->8---
{
    "a": 3.333333333333000000000000000001
}
--8<---------------cut here---------------end--------------->8---

then the float gets rounded.  I think pretty printing should be 
lossless (keeping the exact format, exp notation, etc.).

What would be the best representation for numbers in this case? 
I'm thinking about using symbols, but I'm not sure this is the 
best approach.  The parser/encoder would have a new keyword 
parameter, like ":numbers-as-symbols t".  If this is specified, 
then numbers wouldn't be parsed, but kept as symbols.  What do you 
think, is this a good approach?

Geza



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving JSON pretty printing, how to represent floats?
  2024-04-05  9:39 Improving JSON pretty printing, how to represent floats? Herman, Géza
@ 2024-04-05 12:35 ` Eli Zaretskii
  2024-04-05 12:53   ` Mattias Engdegård
  2024-04-05 12:55   ` Herman, Géza
  2024-04-05 13:37 ` Dmitry Gutov
  2024-04-05 14:18 ` Mattias Engdegård
  2 siblings, 2 replies; 14+ messages in thread
From: Eli Zaretskii @ 2024-04-05 12:35 UTC (permalink / raw)
  To: Géza Herman; +Cc: emacs-devel

> From: Herman, Géza <geza.herman@gmail.com>
> Date: Fri, 05 Apr 2024 11:39:10 +0200
> 
> There is a thing that I don't like about the current 
> json-pretty-print: as it parses floats, pretty printing can be 
> lossy.  If one pretty prints this
> 
> --8<---------------cut here---------------start------------->8---
> {
>     "a": 3.333333333333000000000000000001
> }
> --8<---------------cut here---------------end--------------->8---
> 
> then the float gets rounded.  I think pretty printing should be 
> lossless (keeping the exact format, exp notation, etc.).

The digits in 3.333333333333000000000000000001 after 16th digit are
meaningless: they aren't supported by IEEE floating-point standard, so
they are just numerical noise.  Thus, talking about "lossless" wrt
them makes little sense to me.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving JSON pretty printing, how to represent floats?
  2024-04-05 12:35 ` Eli Zaretskii
@ 2024-04-05 12:53   ` Mattias Engdegård
  2024-04-05 12:55   ` Herman, Géza
  1 sibling, 0 replies; 14+ messages in thread
From: Mattias Engdegård @ 2024-04-05 12:53 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Géza Herman, emacs-devel

5 apr. 2024 kl. 14.35 skrev Eli Zaretskii <eliz@gnu.org>:

>> --8<---------------cut here---------------start------------->8---
>> {
>>    "a": 3.333333333333000000000000000001
>> }
>> --8<---------------cut here---------------end--------------->8---
>> 
>> then the float gets rounded.  I think pretty printing should be 
>> lossless (keeping the exact format, exp notation, etc.).
> 
> The digits in 3.333333333333000000000000000001 after 16th digit are
> meaningless: they aren't supported by IEEE floating-point standard, so
> they are just numerical noise.

The float printer we use tries to produce the shortest output that, when read back in using IEEE-754 64-bit floats and the default rounding mode, results in the same number. There is no loss of information.

The JSON specs mandate little about range and precision for numbers and implementations indeed vary a lot; see [1] for a recent overview. However given its Javascript origins and the prevalence of 64-bit floats, we are pretty safe here for anything that doesn't have very special requirements (such as decimal FP). Our support for arbitrary big integers is more than many other implementations do.

[1] https://blog.trl.sn/blog/what-is-a-json-number/




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving JSON pretty printing, how to represent floats?
  2024-04-05 12:35 ` Eli Zaretskii
  2024-04-05 12:53   ` Mattias Engdegård
@ 2024-04-05 12:55   ` Herman, Géza
  2024-04-05 13:09     ` Eli Zaretskii
  1 sibling, 1 reply; 14+ messages in thread
From: Herman, Géza @ 2024-04-05 12:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Géza Herman, emacs-devel


Eli Zaretskii <eliz@gnu.org> writes:

>> From: Herman, Géza <geza.herman@gmail.com>
>> Date: Fri, 05 Apr 2024 11:39:10 +0200
>>
>> There is a thing that I don't like about the current
>> json-pretty-print: as it parses floats, pretty printing can be
>> lossy.  If one pretty prints this
>>
>> --8<---------------cut 
>> here---------------start------------->8---
>> {
>>     "a": 3.333333333333000000000000000001
>> }
>> --8<---------------cut 
>> here---------------end--------------->8---
>>
>> then the float gets rounded.  I think pretty printing should be
>> lossless (keeping the exact format, exp notation, etc.).
>
> The digits in 3.333333333333000000000000000001 after 16th digit 
> are
> meaningless: they aren't supported by IEEE floating-point 
> standard, so
> they are just numerical noise.  Thus, talking about "lossless" 
> wrt
> them makes little sense to me.

You mean the 64-bit binary format. IEEE-754 describes more formats 
which have more precision, and also there are more formats used in 
practice than what IEEE-754 describes.

Plus, I don't think that the JSON standard mandates IEEE-754 
numbers.  It just describes numbers, so I think it's good behavior 
to not change numbers during formatting.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving JSON pretty printing, how to represent floats?
  2024-04-05 12:55   ` Herman, Géza
@ 2024-04-05 13:09     ` Eli Zaretskii
  2024-04-05 13:16       ` Herman, Géza
  0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2024-04-05 13:09 UTC (permalink / raw)
  To: Géza Herman; +Cc: emacs-devel

> From: Herman, Géza <geza.herman@gmail.com>
> Cc: Géza Herman <geza.herman@gmail.com>,
>  emacs-devel@gnu.org
> Date: Fri, 05 Apr 2024 14:55:26 +0200
> 
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > The digits in 3.333333333333000000000000000001 after 16th digit
> > are meaningless: they aren't supported by IEEE floating-point
> > standard, so they are just numerical noise.  Thus, talking about
> > "lossless" wrt them makes little sense to me.
> 
> You mean the 64-bit binary format. IEEE-754 describes more formats 
> which have more precision, and also there are more formats used in 
> practice than what IEEE-754 describes.
> 
> Plus, I don't think that the JSON standard mandates IEEE-754 
> numbers.  It just describes numbers, so I think it's good behavior 
> to not change numbers during formatting.

The fact that Emacs supports IEEE-754 on almost all platforms is
prominently documented in the ELisp reference manual, see the node
"Float Basics" there.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving JSON pretty printing, how to represent floats?
  2024-04-05 13:09     ` Eli Zaretskii
@ 2024-04-05 13:16       ` Herman, Géza
  2024-04-05 14:01         ` tomas
  2024-04-05 15:34         ` Eli Zaretskii
  0 siblings, 2 replies; 14+ messages in thread
From: Herman, Géza @ 2024-04-05 13:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Géza Herman, emacs-devel


Eli Zaretskii <eliz@gnu.org> writes:

>> From: Herman, Géza <geza.herman@gmail.com>
>> Cc: Géza Herman <geza.herman@gmail.com>,
>>  emacs-devel@gnu.org
>> Date: Fri, 05 Apr 2024 14:55:26 +0200
>>
>>
>> Eli Zaretskii <eliz@gnu.org> writes:
>>
>> > The digits in 3.333333333333000000000000000001 after 16th 
>> > digit
>> > are meaningless: they aren't supported by IEEE floating-point
>> > standard, so they are just numerical noise.  Thus, talking 
>> > about
>> > "lossless" wrt them makes little sense to me.
>>
>> You mean the 64-bit binary format. IEEE-754 describes more 
>> formats
>> which have more precision, and also there are more formats used 
>> in
>> practice than what IEEE-754 describes.
>>
>> Plus, I don't think that the JSON standard mandates IEEE-754
>> numbers.  It just describes numbers, so I think it's good 
>> behavior
>> to not change numbers during formatting.
>
> The fact that Emacs supports IEEE-754 on almost all platforms is
> prominently documented in the ELisp reference manual, see the 
> node
> "Float Basics" there.

I don't think it's relevant here.  If I reformat a JSON, it should 
really be a formatting operation, it shouldn't matter what kind of 
floating point numbers a platform supports.  We are not talking 
about reading a JSON, but formatting it.  What if I open a .json 
file because I want to edit it a little bit (renaming some 
members, etc.), but then I realize that the format of the json is 
bad, so I reformat it.  But I want to maintain the precision, 
because I intend to use the .json in some other software which 
supports arbitrary floating point precision.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving JSON pretty printing, how to represent floats?
  2024-04-05  9:39 Improving JSON pretty printing, how to represent floats? Herman, Géza
  2024-04-05 12:35 ` Eli Zaretskii
@ 2024-04-05 13:37 ` Dmitry Gutov
  2024-04-05 14:20   ` Herman, Géza
  2024-04-05 14:18 ` Mattias Engdegård
  2 siblings, 1 reply; 14+ messages in thread
From: Dmitry Gutov @ 2024-04-05 13:37 UTC (permalink / raw)
  To: Herman, Géza, emacs-devel

On 05/04/2024 12:39, Herman, Géza wrote:
> 
> What would be the best representation for numbers in this case? I'm 
> thinking about using symbols, but I'm not sure this is the best 
> approach.  The parser/encoder would have a new keyword parameter, like 
> ":numbers-as-symbols t".  If this is specified, then numbers wouldn't be 
> parsed, but kept as symbols.  What do you think, is this a good approach?

Why not just use strings? Bignum implementations in various libs (other 
languages) often use strings in constructors.

Interning numbers in obarray doesn't seems like too great an idea.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving JSON pretty printing, how to represent floats?
  2024-04-05 13:16       ` Herman, Géza
@ 2024-04-05 14:01         ` tomas
  2024-04-05 15:34         ` Eli Zaretskii
  1 sibling, 0 replies; 14+ messages in thread
From: tomas @ 2024-04-05 14:01 UTC (permalink / raw)
  To: Herman, Géza; +Cc: Eli Zaretskii, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 873 bytes --]

On Fri, Apr 05, 2024 at 03:16:44PM +0200, Herman, Géza wrote:

[...]

> I don't think it's relevant here.  If I reformat a JSON, it should really be
> a formatting operation, it shouldn't matter what kind of floating point
> numbers a platform supports.  We are not talking about reading a JSON, but
> formatting it [...]

It seems you have no choice but to keep the string representation around,
then. There are other possible anomalies with numbers, like "leading zeros",
where a conversion to number and back would transform 0042 -> 42. Likewise
with 1e2 -> 100 and so on. When transformed into numbers, you lose that
information.

Note that 0.2 is an infinite binary fraction, i.e. if you go binary and
back under most JSON implementations, you'd always get a result depending
on the underlying precision, rounding strategies, etc.

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving JSON pretty printing, how to represent floats?
  2024-04-05  9:39 Improving JSON pretty printing, how to represent floats? Herman, Géza
  2024-04-05 12:35 ` Eli Zaretskii
  2024-04-05 13:37 ` Dmitry Gutov
@ 2024-04-05 14:18 ` Mattias Engdegård
  2024-04-05 14:25   ` Herman, Géza
  2 siblings, 1 reply; 14+ messages in thread
From: Mattias Engdegård @ 2024-04-05 14:18 UTC (permalink / raw)
  To: "Herman, Géza"; +Cc: emacs-devel

5 apr. 2024 kl. 11.39 skrev Herman, Géza <geza.herman@gmail.com>:

> I'm thinking about using the new JSON parser and encoder for json-pretty-print.

Sorry, I didn't read your question carefully enough. Now pretty-printing is unlikely to be dominated by JSON parsing or serialising costs so you could have another all-Lisp implementation that keeps numbers exactly as they were written, if that's important.

I doubt it is (and smell gold-plating here).

> What would be the best representation for numbers in this case? I'm thinking about using symbols, but I'm not sure this is the best approach.  The parser/encoder would have a new keyword parameter, like ":numbers-as-symbols t".  If this is specified, then numbers wouldn't be parsed, but kept as symbols.  What do you think, is this a good approach?

Frankly, it sounds dubious at best. Just leave the C code alone. Its business is fast conversion; let's not overload it.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving JSON pretty printing, how to represent floats?
  2024-04-05 13:37 ` Dmitry Gutov
@ 2024-04-05 14:20   ` Herman, Géza
  2024-04-05 16:47     ` Dmitry Gutov
  0 siblings, 1 reply; 14+ messages in thread
From: Herman, Géza @ 2024-04-05 14:20 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Herman, Géza, emacs-devel


Dmitry Gutov <dmitry@gutov.dev> writes:

> On 05/04/2024 12:39, Herman, Géza wrote:
>> What would be the best representation for numbers in this case? 
>> I'm thinking about using symbols, but I'm not sure this is the
>> best approach.  The parser/encoder would have a new keyword 
>> parameter, like ":numbers-as-symbols t".  If this is specified, 
>> then
>> numbers wouldn't be parsed, but kept as symbols.  What do you 
>> think, is this a good approach?
>
> Why not just use strings? Bignum implementations in various libs 
> (other languages) often use strings in constructors.

Because then the original type is lost: Later it's not possible to 
tell whether a value was a number or a string originally.

>
> Interning numbers in obarray doesn't seems like too great an 
> idea.

That's correct, my idea is to use uninterned symbols 
(make-symbol).



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving JSON pretty printing, how to represent floats?
  2024-04-05 14:18 ` Mattias Engdegård
@ 2024-04-05 14:25   ` Herman, Géza
  0 siblings, 0 replies; 14+ messages in thread
From: Herman, Géza @ 2024-04-05 14:25 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Herman, Géza, emacs-devel


Mattias Engdegård <mattias.engdegard@gmail.com> writes:

> 5 apr. 2024 kl. 11.39 skrev Herman, Géza 
> <geza.herman@gmail.com>:
>
>> I'm thinking about using the new JSON parser and encoder for 
>> json-pretty-print.
>
> Sorry, I didn't read your question carefully enough. Now 
> pretty-printing is unlikely to be dominated by JSON parsing or 
> serialising
> costs so you could have another all-Lisp implementation that 
> keeps numbers exactly as they were written, if that's important.
>
> I doubt it is (and smell gold-plating here).

Maybe I'm not aware of something trivial, but isn't pretty 
printing is about parsing and encoding?  What else dominates, if 
not these?

>
>> What would be the best representation for numbers in this case? 
>> I'm thinking about using symbols, but I'm not sure this is the
>> best approach.  The parser/encoder would have a new keyword 
>> parameter, like ":numbers-as-symbols t".  If this is specified, 
>> then
>> numbers wouldn't be parsed, but kept as symbols.  What do you 
>> think, is this a good approach?
>
> Frankly, it sounds dubious at best. Just leave the C code 
> alone. Its business is fast conversion; let's not overload it.

What do you mean by overloading? It's simple to add this feature 
into it. The parser already supports reading arrays/objects into 
different formats, an additional ":number-type 'default" and 
":number-type 'symbol" fits the design perfectly, and I really 
doubt that it has a significant performance impact.  Can be 
implemented 10-20 lines of code.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving JSON pretty printing, how to represent floats?
  2024-04-05 13:16       ` Herman, Géza
  2024-04-05 14:01         ` tomas
@ 2024-04-05 15:34         ` Eli Zaretskii
  2024-04-05 16:46           ` Herman, Géza
  1 sibling, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2024-04-05 15:34 UTC (permalink / raw)
  To: Géza Herman; +Cc: emacs-devel

> From: Herman, Géza <geza.herman@gmail.com>
> Cc: Géza Herman <geza.herman@gmail.com>,
>  emacs-devel@gnu.org
> Date: Fri, 05 Apr 2024 15:16:44 +0200
> 
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> You mean the 64-bit binary format. IEEE-754 describes more
> >> formats which have more precision, and also there are more
> >> formats used in practice than what IEEE-754 describes.
> >>
> >> Plus, I don't think that the JSON standard mandates IEEE-754
> >> numbers.  It just describes numbers, so I think it's good
> >> behavior to not change numbers during formatting.
> >
> > The fact that Emacs supports IEEE-754 on almost all platforms is
> > prominently documented in the ELisp reference manual, see the node
> > "Float Basics" there.
> 
> I don't think it's relevant here.

It is relevant to me, because a float in a JSON has a meaning, it
isn't just a string.  If you want strings, use strings, and then they
will not be changed by pretty-printing.

> If I reformat a JSON, it should really be a formatting operation, it
> shouldn't matter what kind of floating point numbers a platform
> supports.

Any JSON that uses a float which needs more than 64 bits of precision
is not portable to most of today's machines.  I'm not interested in
complicating Emacs to support imaginary use cases, sorry.  When enough
important platform switch to more bits of precision, so will we, and
then we will be able to have more digits in floats -- but even then
the number of significant digits will be finite.

> What if I open a .json file because I want to edit it a little bit
> (renaming some members, etc.), but then I realize that the format of
> the json is bad, so I reformat it.  But I want to maintain the
> precision, because I intend to use the .json in some other software
> which supports arbitrary floating point precision.

Well, you can't.  It makes no sense.  Floats are objects with certain
semantics, they are not just arbitrary strings.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving JSON pretty printing, how to represent floats?
  2024-04-05 15:34         ` Eli Zaretskii
@ 2024-04-05 16:46           ` Herman, Géza
  0 siblings, 0 replies; 14+ messages in thread
From: Herman, Géza @ 2024-04-05 16:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Géza Herman, emacs-devel


Eli Zaretskii <eliz@gnu.org> writes:

>> From: Herman, Géza <geza.herman@gmail.com>
>> Cc: Géza Herman <geza.herman@gmail.com>,
>>  emacs-devel@gnu.org
>> Date: Fri, 05 Apr 2024 15:16:44 +0200
>>
>>
>> Eli Zaretskii <eliz@gnu.org> writes:
>>
>> What if I open a .json file because I want to edit it a little 
>> bit
>> (renaming some members, etc.), but then I realize that the 
>> format of
>> the json is bad, so I reformat it.  But I want to maintain the
>> precision, because I intend to use the .json in some other 
>> software
>> which supports arbitrary floating point precision.
>
> Well, you can't.  It makes no sense.  Floats are objects with 
> certain
> semantics, they are not just arbitrary strings.

I wrote "arbirary floating point precision", not arbitrary string. 
There are libraries which work with arbirary precision floating 
point values, and I'm sure that there are JSON files which contain 
such numbers.  If a scientist uses Emacs to format their JSON 
files, they might get surprised that their values in the JSON file 
will be modified during a pretty formatting process. Again, this 
is simply a bad behavior.

Anyways, it don't want to argue about this further.  It's clear as 
day to me that a formatter should not change values in a JSON 
file, you have other opinion.

I thought that it made sense to improve this in Emacs (as it seems 
pretty trivial to do), but then I won't.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Improving JSON pretty printing, how to represent floats?
  2024-04-05 14:20   ` Herman, Géza
@ 2024-04-05 16:47     ` Dmitry Gutov
  0 siblings, 0 replies; 14+ messages in thread
From: Dmitry Gutov @ 2024-04-05 16:47 UTC (permalink / raw)
  To: Herman, Géza; +Cc: emacs-devel

On 05/04/2024 17:20, Herman, Géza wrote:
> 
> Dmitry Gutov <dmitry@gutov.dev> writes:
> 
>> On 05/04/2024 12:39, Herman, Géza wrote:
>>> What would be the best representation for numbers in this case? I'm 
>>> thinking about using symbols, but I'm not sure this is the
>>> best approach.  The parser/encoder would have a new keyword 
>>> parameter, like ":numbers-as-symbols t".  If this is specified, then
>>> numbers wouldn't be parsed, but kept as symbols.  What do you think, 
>>> is this a good approach?
>>
>> Why not just use strings? Bignum implementations in various libs 
>> (other languages) often use strings in constructors.
> 
> Because then the original type is lost: Later it's not possible to tell 
> whether a value was a number or a string originally.
> 
>>
>> Interning numbers in obarray doesn't seems like too great an idea.
> 
> That's correct, my idea is to use uninterned symbols (make-symbol).

Okay, then that's not a problem. You could use a more complex type (like 
a struct), but that doesn't seem urgent.

OTOH, Mattias also has a point that high performance is not as important 
here, and maybe a Lisp implementation would be preferable over adding a 
slightly clunkier interface to the parser function.



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-04-05 16:47 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-05  9:39 Improving JSON pretty printing, how to represent floats? Herman, Géza
2024-04-05 12:35 ` Eli Zaretskii
2024-04-05 12:53   ` Mattias Engdegård
2024-04-05 12:55   ` Herman, Géza
2024-04-05 13:09     ` Eli Zaretskii
2024-04-05 13:16       ` Herman, Géza
2024-04-05 14:01         ` tomas
2024-04-05 15:34         ` Eli Zaretskii
2024-04-05 16:46           ` Herman, Géza
2024-04-05 13:37 ` Dmitry Gutov
2024-04-05 14:20   ` Herman, Géza
2024-04-05 16:47     ` Dmitry Gutov
2024-04-05 14:18 ` Mattias Engdegård
2024-04-05 14:25   ` Herman, Géza

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).