Checking for loss of information on integer conversion

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* Checking for loss of information on integer conversion
@ 2018-02-18  1:27 Paul Eggert
  2018-02-18 22:31 ` Juliusz Chroboczek
       [not found] ` <83y3jq9q4m.fsf@gnu.org>
  0 siblings, 2 replies; 25+ messages in thread
From: Paul Eggert @ 2018-02-18  1:27 UTC (permalink / raw)
  To: Emacs Development

[-- Attachment #1: Type: text/plain, Size: 1705 bytes --]

Bug#30408 reminded me of a problem that's bitten me before, which is that Emacs 
Lisp reading and printing sometimes loses information when converting from 
integers to strings or vice versa, and this loss of information can lead to real 
problems. Without going to bignums we can't fix this problem nicely in general; 
however, Emacs can do a better job of signaling an error when information is 
lost in the current implementation. I've proposed a patch here:

https://debbugs.gnu.org/30408#16

and am attaching it to this email for convenience.

Briefly, the patch is twofold. First, it causes calls like (format "%d" 
18446744073709551616) to return the mathematically-correct value 
"18446744073709551616" instead of silently returning the 
mathematically-incorrect value "9223372036854775807" as they do now. If the 
function cannot return the correct value due to an implementation limit -- e.g., 
(format "%x" 18446744073709551616) -- the patch causes the function to signal an 
overflow.

Second, although Emacs still reads large integers like 18446744073709551616 as 
if they were floating-point, it now signals an error if information is lost in 
the process. For example, the number 18446744073709551615 now causes the reader 
to signal an error, since it cannot be represented exactly either as a fixnum or 
as a floating-point number. If you want inexact representation, you can append 
".0" or "e0" to the integer.

As these are incompatible changes to Emacs I thought I'd mention them on this list.

Another possibility would be for Emacs to signal an error when reading any 
integer that does not fit in fixnum bounds. That would be a bigger change, though.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Avoid-losing-info-when-converting-integers.patch --]
[-- Type: text/x-patch; name="0001-Avoid-losing-info-when-converting-integers.patch", Size: 8127 bytes --]

From e1865be990e1a520feddc07507a71916d097d633 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sat, 17 Feb 2018 16:45:17 -0800
Subject: [PATCH] Avoid losing info when converting integers

This fixes some glitches with large integers (Bug#30408).
* doc/lispref/numbers.texi (Integer Basics): Say that
decimal integers out of fixnum range must be representable
exactly as floating-point.
* etc/NEWS: Mention this.
* src/data.c (syms_of_data): Add Qinexact_error.
* src/editfns.c (styled_format): Use %.0f when formatting %d or %i
values outside machine integer range, to avoid losing info.
Signal an error for %o or %x values that are too large to be
formatted, to avoid losing info.
* src/lread.c (string_to_number): When converting an integer-format
string to floating-point, signal an error if info is lost.
---
 doc/lispref/numbers.texi |  8 +++--
 etc/NEWS                 |  9 +++++
 src/data.c               |  1 +
 src/editfns.c            | 93 ++++++++++++++++++++----------------------------
 src/lread.c              | 14 ++++++++
 5 files changed, 67 insertions(+), 58 deletions(-)

diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi
index e692ee1cc2..252aafd8fd 100644
--- a/doc/lispref/numbers.texi
+++ b/doc/lispref/numbers.texi
@@ -53,9 +53,11 @@ Integer Basics
 chapter assume the minimum integer width of 30 bits.
 @cindex overflow
 
-  The Lisp reader reads an integer as a sequence of digits with optional
-initial sign and optional final period.  An integer that is out of the
-Emacs range is treated as a floating-point number.
+  The Lisp reader can read an integer as a nonempty sequence of
+decimal digits with optional initial sign and optional final period.
+A decimal integer that is out of the Emacs range is treated as
+floating-point if it can be represented exactly as a floating-point
+number.
 
 @example
  1               ; @r{The integer 1.}
diff --git a/etc/NEWS b/etc/NEWS
index 8db638e5ed..36cbcf6500 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -248,6 +248,12 @@ as new-style, bind the new variable 'force-new-style-backquotes' to t.
 'cl-struct-define' whose name clashes with a builtin type (e.g.,
 'integer' or 'hash-table') now signals an error.
 
+** When formatting a floating-point number as an octal or hexadecimal
+integer, Emacs now signals an error if the number is too large for the
+implementation to format.  When reading an integer outside Emacs
+fixnum range, Emacs now signals an error if the integer cannot be
+represented exactly as a floating-point number.  See Bug#30408.
+
 \f
 * Lisp Changes in Emacs 27.1
 
@@ -289,6 +295,9 @@ remote systems, which support this check.
 If the optional third argument is non-nil, 'make-string' will produce
 a multibyte string even if its second argument is an ASCII character.
 
+** (format "%d" X) no longer mishandles floating-point X values that
+do not fit in a machine integer (Bug#30408).
+
 ** New JSON parsing and serialization functions 'json-serialize',
 'json-insert', 'json-parse-string', and 'json-parse-buffer'.  These
 are implemented in C using the Jansson library.
diff --git a/src/data.c b/src/data.c
index 72abfefb01..8856583f13 100644
--- a/src/data.c
+++ b/src/data.c
@@ -3729,6 +3729,7 @@ syms_of_data (void)
   DEFSYM (Qrange_error, "range-error");
   DEFSYM (Qdomain_error, "domain-error");
   DEFSYM (Qsingularity_error, "singularity-error");
+  DEFSYM (Qinexact_error, "inexact-error");
   DEFSYM (Qoverflow_error, "overflow-error");
   DEFSYM (Qunderflow_error, "underflow-error");
 
diff --git a/src/editfns.c b/src/editfns.c
index 96bb271b2d..d26549ddb8 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -4563,32 +4563,30 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
 		 and with pM inserted for integer formats.
 		 At most two flags F can be specified at once.  */
 	      char convspec[sizeof "%FF.*d" + max (INT_AS_LDBL, pMlen)];
-	      {
-		char *f = convspec;
-		*f++ = '%';
-		/* MINUS_FLAG and ZERO_FLAG are dealt with later.  */
-		*f = '+'; f +=  plus_flag;
-		*f = ' '; f += space_flag;
-		*f = '#'; f += sharp_flag;
-                *f++ = '.';
-                *f++ = '*';
-		if (float_conversion)
-		  {
-		    if (INT_AS_LDBL)
-		      {
-			*f = 'L';
-			f += INTEGERP (arg);
-		      }
-		  }
-		else if (conversion != 'c')
-		  {
-		    memcpy (f, pMd, pMlen);
-		    f += pMlen;
-		    zero_flag &= ! precision_given;
-		  }
-		*f++ = conversion;
-		*f = '\0';
-	      }
+	      char *f = convspec;
+	      *f++ = '%';
+	      /* MINUS_FLAG and ZERO_FLAG are dealt with later.  */
+	      *f = '+'; f +=  plus_flag;
+	      *f = ' '; f += space_flag;
+	      *f = '#'; f += sharp_flag;
+	      *f++ = '.';
+	      *f++ = '*';
+	      if (float_conversion)
+		{
+		  if (INT_AS_LDBL)
+		    {
+		      *f = 'L';
+		      f += INTEGERP (arg);
+		    }
+		}
+	      else if (conversion != 'c')
+		{
+		  memcpy (f, pMd, pMlen);
+		  f += pMlen;
+		  zero_flag &= ! precision_given;
+		}
+	      *f++ = conversion;
+	      *f = '\0';
 
 	      int prec = -1;
 	      if (precision_given)
@@ -4630,29 +4628,18 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
 		}
 	      else if (conversion == 'd' || conversion == 'i')
 		{
-		  /* For float, maybe we should use "%1.0f"
-		     instead so it also works for values outside
-		     the integer range.  */
-		  printmax_t x;
 		  if (INTEGERP (arg))
-		    x = XINT (arg);
+		    {
+		      printmax_t x = XINT (arg);
+		      sprintf_bytes = sprintf (sprintf_buf, convspec, prec, x);
+		    }
 		  else
 		    {
-		      double d = XFLOAT_DATA (arg);
-		      if (d < 0)
-			{
-			  x = TYPE_MINIMUM (printmax_t);
-			  if (x < d)
-			    x = d;
-			}
-		      else
-			{
-			  x = TYPE_MAXIMUM (printmax_t);
-			  if (d < x)
-			    x = d;
-			}
+		      strcpy (f - pMlen - 1, "f");
+		      prec = 0;
+		      double x = XFLOAT_DATA (arg);
+		      sprintf_bytes = sprintf (sprintf_buf, convspec, prec, x);
 		    }
-		  sprintf_bytes = sprintf (sprintf_buf, convspec, prec, x);
 		}
 	      else
 		{
@@ -4663,22 +4650,18 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
 		  else
 		    {
 		      double d = XFLOAT_DATA (arg);
-		      if (d < 0)
-			x = 0;
-		      else
-			{
-			  x = TYPE_MAXIMUM (uprintmax_t);
-			  if (d < x)
-			    x = d;
-			}
+		      if (! (0 <= d && d < TYPE_MAXIMUM (uprintmax_t)))
+			xsignal1 (Qoverflow_error, arg);
+		      x = d;
 		    }
 		  sprintf_bytes = sprintf (sprintf_buf, convspec, prec, x);
 		}
 
 	      /* Now the length of the formatted item is known, except it omits
 		 padding and excess precision.  Deal with excess precision
-		 first.  This happens only when the format specifies
-		 ridiculously large precision.  */
+		 first.  This happens when the format specifies
+		 ridiculously large precision, or when %d or %i has
+		 nonzero precision and formats a float.  */
 	      ptrdiff_t excess_precision
 		= precision_given ? precision - prec : 0;
 	      ptrdiff_t leading_zeros = 0, trailing_zeros = 0;
diff --git a/src/lread.c b/src/lread.c
index d009bd0cd2..cfeaac8030 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -3794,6 +3794,20 @@ string_to_number (char const *string, int base, bool ignore_trailing)
   if (! value)
     value = atof (string + signedp);
 
+  if (! float_syntax)
+    {
+      /* Check that converting the integer-format STRING to a
+	 floating-point number does not lose info.  See Bug#30408.  */
+      char const *bp = string + signedp;
+      while (*bp == '0')
+	bp++;
+      char checkbuf[DBL_MAX_10_EXP + 2];
+      int checkbuflen = sprintf (checkbuf, "%.0f", value);
+      if (! (cp - bp - !!(state & DOT_CHAR) == checkbuflen
+	     && memcmp (bp, checkbuf, checkbuflen) == 0))
+	xsignal1 (Qinexact_error, build_string (string));
+    }
+
   return make_float (negative ? -value : value);
 }
 
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: Checking for loss of information on integer conversion
  2018-02-18  1:27 Checking for loss of information on integer conversion Paul Eggert
@ 2018-02-18 22:31 ` Juliusz Chroboczek
  2018-02-18 22:41   ` Stefan Monnier
  2018-02-19  6:03   ` John Wiegley
       [not found] ` <83y3jq9q4m.fsf@gnu.org>
  1 sibling, 2 replies; 25+ messages in thread
From: Juliusz Chroboczek @ 2018-02-18 22:31 UTC (permalink / raw)
  To: emacs-devel

> Second, although Emacs still reads large integers like
> 18446744073709551616 as if they were floating-point, it now signals an
> error if information is lost in the process.

That's better, but still horrible.

> Another possibility would be for Emacs to signal an error when reading any
> integer that does not fit in fixnum bounds.

Please do that.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Checking for loss of information on integer conversion
  2018-02-18 22:31 ` Juliusz Chroboczek
@ 2018-02-18 22:41   ` Stefan Monnier
  2018-02-18 23:46     ` Juliusz Chroboczek
  2018-02-19  6:03   ` John Wiegley
  1 sibling, 1 reply; 25+ messages in thread
From: Stefan Monnier @ 2018-02-18 22:41 UTC (permalink / raw)
  To: emacs-devel

>> Another possibility would be for Emacs to signal an error when reading any
>> integer that does not fit in fixnum bounds.
> Please do that.

That would be a regression.  On 32bit systems, there are various
circumstances where we need to read a 32bit ID (i.e. one that
doesn't fit within our 30bit fixnums), as well as situations where we
read something like a file size which may also fail to fit.

On 64bit systems (and 32bit systems built with wide-ints),
I don't see such a clear need to convert a large integer into a float,
so on those systems I think it's OK to just signal an error.

        Stefan "still living in the 32bit world"

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Checking for loss of information on integer conversion
  2018-02-18 22:41   ` Stefan Monnier
@ 2018-02-18 23:46     ` Juliusz Chroboczek
  2018-02-19  1:47       ` Stefan Monnier
  2018-02-19 15:05       ` Richard Stallman
  0 siblings, 2 replies; 25+ messages in thread
From: Juliusz Chroboczek @ 2018-02-18 23:46 UTC (permalink / raw)
  To: emacs-devel

>>> Another possibility would be for Emacs to signal an error when
>>> reading any integer that does not fit in fixnum bounds.

>> Please do that.

> That would be a regression.  On 32bit systems, there are various
> circumstances where we need to read a 32bit ID

I can see how this could be a problem, but I still find the current
semantics pretty horrible, and completely different from what any Lisp
hacker would expect.

Perhaps Emacs could acquire small bignums?  Say, boxed 64-bit integers
with signal on overflow?

> On 64bit systems (and 32bit systems built with wide-ints),
> I don't see such a clear need to convert a large integer into a float,
> so on those systems I think it's OK to just signal an error.

Please.

>         Stefan "still living in the 32bit world"

We like you nonetheless.

-- Juliusz




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Checking for loss of information on integer conversion
  2018-02-18 23:46     ` Juliusz Chroboczek
@ 2018-02-19  1:47       ` Stefan Monnier
  2018-02-19  2:22         ` Paul Eggert
  2018-02-19 15:05       ` Richard Stallman
  1 sibling, 1 reply; 25+ messages in thread
From: Stefan Monnier @ 2018-02-19  1:47 UTC (permalink / raw)
  To: emacs-devel

>> That would be a regression.  On 32bit systems, there are various
>> circumstances where we need to read a 32bit ID
> I can see how this could be a problem, but I still find the current
> semantics pretty horrible,

It's not pretty indeed.

> Perhaps Emacs could acquire small bignums?  Say, boxed 64-bit integers
> with signal on overflow?

I don't think adding actual bignums via (say) libgmp would be
significantly harder than adding such "small bignums.


        Stefan




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Checking for loss of information on integer conversion
  2018-02-19  1:47       ` Stefan Monnier
@ 2018-02-19  2:22         ` Paul Eggert
  2018-02-19  3:20           ` Drew Adams
  0 siblings, 1 reply; 25+ messages in thread
From: Paul Eggert @ 2018-02-19  2:22 UTC (permalink / raw)
  To: Stefan Monnier, emacs-devel

Stefan Monnier wrote:
> I don't think adding actual bignums via (say) libgmp would be
> significantly harder than adding such "small bignums.

Although I suspect libgmp would be considerably more of a pain than small 
bignums (e.g., due to the memory-allocation hassle) I agree we should spend our 
limited development time on true bignums rather than on small ones. Emacs 
already links to libgmp so this shouldn't introduce any new dependencies. 
However, this is all a matter for a later day.

 > On 64bit systems (and 32bit systems built with wide-ints),
 > I don't see such a clear need to convert a large integer into a float,
 > so on those systems I think it's OK to just signal an error.

I'll take a look into doing things that way, and into following Eli's suggestion 
to make it configurable.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Checking for loss of information on integer conversion
  2018-02-19  2:22         ` Paul Eggert
@ 2018-02-19  3:20           ` Drew Adams
  0 siblings, 0 replies; 25+ messages in thread
From: Drew Adams @ 2018-02-19  3:20 UTC (permalink / raw)
  To: Paul Eggert, Stefan Monnier, emacs-devel

Big smallnums first, then small bignums, then big bignums? ;-)

(Anyway, +1 for an intention to add bignums.)



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Checking for loss of information on integer conversion
  2018-02-18 23:46     ` Juliusz Chroboczek
  2018-02-19  1:47       ` Stefan Monnier
@ 2018-02-19 15:05       ` Richard Stallman
  2018-02-22 16:31         ` Juliusz Chroboczek
  1 sibling, 1 reply; 25+ messages in thread
From: Richard Stallman @ 2018-02-19 15:05 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I can see how this could be a problem, but I still find the current
  > semantics pretty horrible, and completely different from what any Lisp
  > hacker would expect.

We should not exaggerate the importance of this.  The aim of Emacs
Lisp is to "get the job done", not to be maximally elegant.

  > Perhaps Emacs could acquire small bignums?  Say, boxed 64-bit integers
  > with signal on overflow?

Adding real bignums probably would not be much more work than that.
So IF we decide to do work in that area, let's add real bignums.
GNU MP does the hard part.

However, I'd rather prioritize progress in computing horizontal
alignment and widths with variable-width fonts.  That is something
we really need.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Skype: No way! See https://stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Checking for loss of information on integer conversion
  2018-02-19 15:05       ` Richard Stallman
@ 2018-02-22 16:31         ` Juliusz Chroboczek
  2018-02-22 17:01           ` Eli Zaretskii
  2018-02-23  9:49           ` Richard Stallman
  0 siblings, 2 replies; 25+ messages in thread
From: Juliusz Chroboczek @ 2018-02-22 16:31 UTC (permalink / raw)
  To: emacs-devel

> Adding real bignums probably would not be much more work than that.
> So IF we decide to do work in that area, let's add real bignums.
> GNU MP does the hard part.

> However, I'd rather prioritize progress in computing horizontal
> alignment and widths with variable-width fonts.  That is something
> we really need.

I am tempted to argue that silently returning the wrong answer is a more
serious (but admittedly less visible) issue than minor cosmetic limitations
of the redisplay engine.

But who am I to argue?

-- Juliusz




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Checking for loss of information on integer conversion
  2018-02-22 16:31         ` Juliusz Chroboczek
@ 2018-02-22 17:01           ` Eli Zaretskii
  2018-02-22 19:31             ` Stefan Monnier
  2018-02-23  9:49           ` Richard Stallman
  1 sibling, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2018-02-22 17:01 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: emacs-devel

> From: Juliusz Chroboczek <jch@irif.fr>
> Date: Thu, 22 Feb 2018 17:31:47 +0100
> 
> > However, I'd rather prioritize progress in computing horizontal
> > alignment and widths with variable-width fonts.  That is something
> > we really need.
> 
> I am tempted to argue that silently returning the wrong answer is a more
> serious (but admittedly less visible) issue than minor cosmetic limitations
> of the redisplay engine.

It's not a minor cosmetic limitation, it's a basic inability to line
up text displayed with a variable-pitch font.  It makes Emacs look
ugly and basically unusable when variable-pitch fonts are used to
display program source code, formatted text, tables, etc.  These are
all important and frequent use cases these days.

By contrast, silently returning a wrong result due to overflow is a
much more rare situation, especially since today 64-bit machines are
so ubiquitous.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Checking for loss of information on integer conversion
  2018-02-22 17:01           ` Eli Zaretskii
@ 2018-02-22 19:31             ` Stefan Monnier
  0 siblings, 0 replies; 25+ messages in thread
From: Stefan Monnier @ 2018-02-22 19:31 UTC (permalink / raw)
  To: emacs-devel

> It's not a minor cosmetic limitation, it's a basic inability to line
> up text displayed with a variable-pitch font.  It makes Emacs look
> ugly and basically unusable when variable-pitch fonts are used to
> display program source code, formatted text, tables, etc.  These are
> all important and frequent use cases these days.
>
> By contrast, silently returning a wrong result due to overflow is a
> much more rare situation, especially since today 64-bit machines are
> so ubiquitous.

In any case, the relative importance doesn't matter: we all agree that
both are problems that deserve fixing; and I think it's unlikely that
the relative order we place them will influence which gets fixed first
(which depends rather on who decides to scratch which itch first).

Personnally, my money is on libgmp being implemented first (to a large
extent because I don't think anyone even has a clear idea of what
a solution to the variable-pitch-alignment problem would look like (in
terms of API and code, not in terms of visual display, of course)).

        Stefan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Checking for loss of information on integer conversion
  2018-02-22 16:31         ` Juliusz Chroboczek
  2018-02-22 17:01           ` Eli Zaretskii
@ 2018-02-23  9:49           ` Richard Stallman
  1 sibling, 0 replies; 25+ messages in thread
From: Richard Stallman @ 2018-02-23  9:49 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I am tempted to argue that silently returning the wrong answer is a more
  > serious (but admittedly less visible) issue

That's a bug and we should fix it, but I don't expect
it to require big changes.

Adding any sort of bignums is a big change.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Skype: No way! See https://stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Checking for loss of information on integer conversion
  2018-02-18 22:31 ` Juliusz Chroboczek
  2018-02-18 22:41   ` Stefan Monnier
@ 2018-02-19  6:03   ` John Wiegley
  1 sibling, 0 replies; 25+ messages in thread
From: John Wiegley @ 2018-02-19  6:03 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: emacs-devel

>>>>> "JC" == Juliusz Chroboczek <jch@irif.fr> writes:

>> Second, although Emacs still reads large integers like
>> 18446744073709551616 as if they were floating-point, it now signals an
>> error if information is lost in the process.

JC> That's better, but still horrible.

Now you've made it hard for me to sleep soundly tonight.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2



^ permalink raw reply	[flat|nested] 25+ messages in thread

[parent not found: <83y3jq9q4m.fsf@gnu.org>]

* Re: Checking for loss of information on integer conversion
       [not found] ` <83y3jq9q4m.fsf@gnu.org>
@ 2018-02-18 20:04   ` Paul Eggert
  2018-02-18 20:24     ` Eli Zaretskii
                       ` (2 more replies)
  2018-02-18 20:04   ` Paul Eggert
  2018-03-27 23:19   ` Paul Eggert
  2 siblings, 3 replies; 25+ messages in thread
From: Paul Eggert @ 2018-02-18 20:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 30408, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2618 bytes --]

Eli Zaretskii wrote:

> Emacs Lisp is not used to write software that controls
> aircraft and spaceships

Actually, I maintain Emacs Lisp code that controls timestamps used in aircraft 
and spaceships. I'm not saying that Emacs itself runs the aircraft and 
spaceships, but it definitely is used to develop software and data used there. 
As luck would have it, I'm currently engaged in an email thread about time 
transfer between Earth and Mars (yes, this is really a thing and people are 
trying to do it with millisecond precision) that is related to a project where I 
regularly use Emacs Lisp. See the thread containing this message:

https://mm.icann.org/pipermail/tz/2018-February/026257.html

> More generally, why signaling an error by default in this case is a
> good idea? ...  That would
> be similar to behavior of equivalent constructs in C programs

Sure, and C compilers typically issue diagnostics for situations similar to 
what's in Bug#30408. For example, for this C program:

int a = 18446744073709553664;

GCC issues a diagnostic, whereas for the similar Emacs Lisp program:

(setq b 18446744073709553664)

Emacs silently substitutes a number that is off by 2048. It's the latter 
behavior that causes the sort of problem seen in Bug#30408.

When people write a floating-point number they naturally expect it to have some 
fuzz. But when they write an integer they expect it to be represented exactly, 
and not to be rounded.  Emacs already reports an overflow error for the 
following code that attempts to use the same mathematical value:

(setq c #x10000000000000800)

so it's not like it would be a huge change to do something similar for decimal 
integers.

When Emacs was originally developed, its integers were typically 28 bits (not 
counting sign) and floating-point numbers could typically represent integers 
exactly up to 53 bits (not counting sign), so the old Emacs behavior was 
somewhat defensible: although it didn't do bignums, at least it could represent 
integers nearly twice as wide as fixnums. However, nowadays Emacs integers 
typically have more precision than floating point numbers, and the old Emacs 
behavior is more likely to lead to counterintuitive results such as those 
described in Bug#30408.

On thinking about it in the light of your comments, I suppose it's confusing 
that the proposal used a new signal 'inexact', whereas it should just signal 
overflow. After all, that's what string_to_number already does for out-of-range 
hexadecimal integers. That issue is easily fixed. Revised patch attached.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Avoid-losing-info-when-converting-integers.patch --]
[-- Type: text/x-patch; name="0001-Avoid-losing-info-when-converting-integers.patch", Size: 7603 bytes --]

From 49895e55ed7ac41dbf3752ab534cd665ef45ee71 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 18 Feb 2018 11:37:22 -0800
Subject: [PATCH] Avoid losing info when converting integers

This fixes some glitches with large integers (Bug#30408).
* doc/lispref/numbers.texi (Integer Basics): Say that
decimal integers out of fixnum range must be representable
exactly as floating-point.
* etc/NEWS: Mention this.
* src/editfns.c (styled_format): Use %.0f when formatting %d or %i
values outside machine integer range, to avoid losing info.
Signal an error for %o or %x values that are too large to be
formatted, to avoid losing info.
* src/lread.c (string_to_number): When converting an integer-format
string to floating-point, signal an error if info is lost.
---
 doc/lispref/numbers.texi |  8 +++--
 etc/NEWS                 |  9 +++++
 src/editfns.c            | 93 ++++++++++++++++++++----------------------------
 src/lread.c              | 14 ++++++++
 4 files changed, 66 insertions(+), 58 deletions(-)

diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi
index e692ee1cc2..252aafd8fd 100644
--- a/doc/lispref/numbers.texi
+++ b/doc/lispref/numbers.texi
@@ -53,9 +53,11 @@ Integer Basics
 chapter assume the minimum integer width of 30 bits.
 @cindex overflow
 
-  The Lisp reader reads an integer as a sequence of digits with optional
-initial sign and optional final period.  An integer that is out of the
-Emacs range is treated as a floating-point number.
+  The Lisp reader can read an integer as a nonempty sequence of
+decimal digits with optional initial sign and optional final period.
+A decimal integer that is out of the Emacs range is treated as
+floating-point if it can be represented exactly as a floating-point
+number.
 
 @example
  1               ; @r{The integer 1.}
diff --git a/etc/NEWS b/etc/NEWS
index 8db638e5ed..36cbcf6500 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -248,6 +248,12 @@ as new-style, bind the new variable 'force-new-style-backquotes' to t.
 'cl-struct-define' whose name clashes with a builtin type (e.g.,
 'integer' or 'hash-table') now signals an error.
 
+** When formatting a floating-point number as an octal or hexadecimal
+integer, Emacs now signals an error if the number is too large for the
+implementation to format.  When reading an integer outside Emacs
+fixnum range, Emacs now signals an error if the integer cannot be
+represented exactly as a floating-point number.  See Bug#30408.
+
 \f
 * Lisp Changes in Emacs 27.1
 
@@ -289,6 +295,9 @@ remote systems, which support this check.
 If the optional third argument is non-nil, 'make-string' will produce
 a multibyte string even if its second argument is an ASCII character.
 
+** (format "%d" X) no longer mishandles floating-point X values that
+do not fit in a machine integer (Bug#30408).
+
 ** New JSON parsing and serialization functions 'json-serialize',
 'json-insert', 'json-parse-string', and 'json-parse-buffer'.  These
 are implemented in C using the Jansson library.
diff --git a/src/editfns.c b/src/editfns.c
index 96bb271b2d..d26549ddb8 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -4563,32 +4563,30 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
 		 and with pM inserted for integer formats.
 		 At most two flags F can be specified at once.  */
 	      char convspec[sizeof "%FF.*d" + max (INT_AS_LDBL, pMlen)];
-	      {
-		char *f = convspec;
-		*f++ = '%';
-		/* MINUS_FLAG and ZERO_FLAG are dealt with later.  */
-		*f = '+'; f +=  plus_flag;
-		*f = ' '; f += space_flag;
-		*f = '#'; f += sharp_flag;
-                *f++ = '.';
-                *f++ = '*';
-		if (float_conversion)
-		  {
-		    if (INT_AS_LDBL)
-		      {
-			*f = 'L';
-			f += INTEGERP (arg);
-		      }
-		  }
-		else if (conversion != 'c')
-		  {
-		    memcpy (f, pMd, pMlen);
-		    f += pMlen;
-		    zero_flag &= ! precision_given;
-		  }
-		*f++ = conversion;
-		*f = '\0';
-	      }
+	      char *f = convspec;
+	      *f++ = '%';
+	      /* MINUS_FLAG and ZERO_FLAG are dealt with later.  */
+	      *f = '+'; f +=  plus_flag;
+	      *f = ' '; f += space_flag;
+	      *f = '#'; f += sharp_flag;
+	      *f++ = '.';
+	      *f++ = '*';
+	      if (float_conversion)
+		{
+		  if (INT_AS_LDBL)
+		    {
+		      *f = 'L';
+		      f += INTEGERP (arg);
+		    }
+		}
+	      else if (conversion != 'c')
+		{
+		  memcpy (f, pMd, pMlen);
+		  f += pMlen;
+		  zero_flag &= ! precision_given;
+		}
+	      *f++ = conversion;
+	      *f = '\0';
 
 	      int prec = -1;
 	      if (precision_given)
@@ -4630,29 +4628,18 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
 		}
 	      else if (conversion == 'd' || conversion == 'i')
 		{
-		  /* For float, maybe we should use "%1.0f"
-		     instead so it also works for values outside
-		     the integer range.  */
-		  printmax_t x;
 		  if (INTEGERP (arg))
-		    x = XINT (arg);
+		    {
+		      printmax_t x = XINT (arg);
+		      sprintf_bytes = sprintf (sprintf_buf, convspec, prec, x);
+		    }
 		  else
 		    {
-		      double d = XFLOAT_DATA (arg);
-		      if (d < 0)
-			{
-			  x = TYPE_MINIMUM (printmax_t);
-			  if (x < d)
-			    x = d;
-			}
-		      else
-			{
-			  x = TYPE_MAXIMUM (printmax_t);
-			  if (d < x)
-			    x = d;
-			}
+		      strcpy (f - pMlen - 1, "f");
+		      prec = 0;
+		      double x = XFLOAT_DATA (arg);
+		      sprintf_bytes = sprintf (sprintf_buf, convspec, prec, x);
 		    }
-		  sprintf_bytes = sprintf (sprintf_buf, convspec, prec, x);
 		}
 	      else
 		{
@@ -4663,22 +4650,18 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
 		  else
 		    {
 		      double d = XFLOAT_DATA (arg);
-		      if (d < 0)
-			x = 0;
-		      else
-			{
-			  x = TYPE_MAXIMUM (uprintmax_t);
-			  if (d < x)
-			    x = d;
-			}
+		      if (! (0 <= d && d < TYPE_MAXIMUM (uprintmax_t)))
+			xsignal1 (Qoverflow_error, arg);
+		      x = d;
 		    }
 		  sprintf_bytes = sprintf (sprintf_buf, convspec, prec, x);
 		}
 
 	      /* Now the length of the formatted item is known, except it omits
 		 padding and excess precision.  Deal with excess precision
-		 first.  This happens only when the format specifies
-		 ridiculously large precision.  */
+		 first.  This happens when the format specifies
+		 ridiculously large precision, or when %d or %i has
+		 nonzero precision and formats a float.  */
 	      ptrdiff_t excess_precision
 		= precision_given ? precision - prec : 0;
 	      ptrdiff_t leading_zeros = 0, trailing_zeros = 0;
diff --git a/src/lread.c b/src/lread.c
index d009bd0cd2..9500ed8341 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -3794,6 +3794,20 @@ string_to_number (char const *string, int base, bool ignore_trailing)
   if (! value)
     value = atof (string + signedp);
 
+  if (! float_syntax)
+    {
+      /* Check that converting the integer-format STRING to a
+	 floating-point number does not lose info.  See Bug#30408.  */
+      char const *bp = string + signedp;
+      while (*bp == '0')
+	bp++;
+      char checkbuf[DBL_MAX_10_EXP + 2];
+      int checkbuflen = sprintf (checkbuf, "%.0f", value);
+      if (! (cp - bp - !!(state & DOT_CHAR) == checkbuflen
+	     && memcmp (bp, checkbuf, checkbuflen) == 0))
+	xsignal1 (Qoverflow_error, build_string (string));
+    }
+
   return make_float (negative ? -value : value);
 }
 
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: Checking for loss of information on integer conversion
  2018-02-18 20:04   ` Paul Eggert
@ 2018-02-18 20:24     ` Eli Zaretskii
  2018-03-09  5:00       ` bug#30408: " Paul Eggert
  2018-02-18 20:24     ` Eli Zaretskii
  2018-02-18 21:52     ` Drew Adams
  2 siblings, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2018-02-18 20:24 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 30408, emacs-devel

> From: Paul Eggert <eggert@cs.ucla.edu>
> Cc: emacs-devel@gnu.org, 30408@debbugs.gnu.org
> Date: Sun, 18 Feb 2018 12:04:20 -0800
> 
> > Emacs Lisp is not used to write software that controls
> > aircraft and spaceships
> 
> Actually, I maintain Emacs Lisp code that controls timestamps used in aircraft 
> and spaceships. I'm not saying that Emacs itself runs the aircraft and 
> spaceships, but it definitely is used to develop software and data used there. 
> As luck would have it, I'm currently engaged in an email thread about time 
> transfer between Earth and Mars (yes, this is really a thing and people are 
> trying to do it with millisecond precision) that is related to a project where I 
> regularly use Emacs Lisp. See the thread containing this message:

Interesting, but not really relevant to the issue at hand, IMO.  I was
talking about real-time control, not off-line calculations.  And I did
propose to have this feature as opt-in, so the kind of calculations
that transfer me to Mars could still be held safely and accurately.

> > More generally, why signaling an error by default in this case is a
> > good idea? ...  That would
> > be similar to behavior of equivalent constructs in C programs
> 
> Sure, and C compilers typically issue diagnostics for situations similar to 
> what's in Bug#30408. For example, for this C program:
> 
> int a = 18446744073709553664;
> 
> GCC issues a diagnostic, whereas for the similar Emacs Lisp program:
> 
> (setq b 18446744073709553664)
> 
> Emacs silently substitutes a number that is off by 2048.

I'm okay with flagging such constants during byte compilation.  I was
talking only about run-time diagnostics, not compile-time diagnostics.

> When people write a floating-point number they naturally expect it to have some 
> fuzz. But when they write an integer they expect it to be represented exactly, 
> and not to be rounded.

That is true, but Emacs behaved like it does today for many years, and
I'm worried by the possible breakage such a significant behavior
change could have, including on our own code.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#30408: Checking for loss of information on integer conversion
  2018-02-18 20:24     ` Eli Zaretskii
@ 2018-03-09  5:00       ` Paul Eggert
  2018-03-09  8:22         ` Eli Zaretskii
  0 siblings, 1 reply; 25+ messages in thread
From: Paul Eggert @ 2018-03-09  5:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 30408

[-- Attachment #1: Type: text/plain, Size: 368 bytes --]

Since the qualms expressed on this topic had to do with converting strings to 
integers, I installed into master the noncontroversial part affecting conversion 
of integers to strings (see attached patch; it also fixes some minor glitches in 
the previous proposal). I'll think about the string-to-integer conversion a bit 
more and propose an updated patch for that.

[-- Attachment #2: 0001-Avoid-losing-info-when-formatting-integers.patch --]
[-- Type: text/x-patch, Size: 6220 bytes --]

From 80e145fc96765cc0a0f48ae2425294c8c92bce56 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Thu, 8 Mar 2018 20:55:55 -0800
Subject: [PATCH] Avoid losing info when formatting integers

* doc/lispref/numbers.texi (Integer Basics): Clarify that
out-of-range integers are treated as floating point only when the
integers are decimal.
* etc/NEWS: Mention changes.
* src/editfns.c (styled_format): Use %.0f when formatting %d or %i
values outside machine integer range, to avoid losing info.
Signal an error for %o or %x values that are too large to be
formatted, to avoid losing info.
---
 doc/lispref/numbers.texi |  5 ++-
 etc/NEWS                 |  7 ++++
 src/editfns.c            | 96 +++++++++++++++++++++---------------------------
 3 files changed, 51 insertions(+), 57 deletions(-)

diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi
index e692ee1..f1180cf 100644
--- a/doc/lispref/numbers.texi
+++ b/doc/lispref/numbers.texi
@@ -53,8 +53,9 @@ Integer Basics
 chapter assume the minimum integer width of 30 bits.
 @cindex overflow
 
-  The Lisp reader reads an integer as a sequence of digits with optional
-initial sign and optional final period.  An integer that is out of the
+  The Lisp reader reads an integer as a nonempty sequence
+of decimal digits with optional initial sign and optional
+final period.  A decimal integer that is out of the
 Emacs range is treated as a floating-point number.
 
 @example
diff --git a/etc/NEWS b/etc/NEWS
index 07f6d04..14926ba 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -302,6 +302,10 @@ as new-style, bind the new variable 'force-new-style-backquotes' to t.
 'cl-struct-define' whose name clashes with a builtin type (e.g.,
 'integer' or 'hash-table') now signals an error.
 
+** When formatting a floating-point number as an octal or hexadecimal
+integer, Emacs now signals an error if the number is too large for the
+implementation to format (Bug#30408).
+
 \f
 * Lisp Changes in Emacs 27.1
 
@@ -343,6 +347,9 @@ remote systems, which support this check.
 If the optional third argument is non-nil, 'make-string' will produce
 a multibyte string even if its second argument is an ASCII character.
 
+** (format "%d" X) no longer mishandles a floating-point number X that
+does not fit in a machine integer (Bug#30408).
+
 ** New JSON parsing and serialization functions 'json-serialize',
 'json-insert', 'json-parse-string', and 'json-parse-buffer'.  These
 are implemented in C using the Jansson library.
diff --git a/src/editfns.c b/src/editfns.c
index 96bb271..3a34dd0 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -4563,32 +4563,30 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
 		 and with pM inserted for integer formats.
 		 At most two flags F can be specified at once.  */
 	      char convspec[sizeof "%FF.*d" + max (INT_AS_LDBL, pMlen)];
-	      {
-		char *f = convspec;
-		*f++ = '%';
-		/* MINUS_FLAG and ZERO_FLAG are dealt with later.  */
-		*f = '+'; f +=  plus_flag;
-		*f = ' '; f += space_flag;
-		*f = '#'; f += sharp_flag;
-                *f++ = '.';
-                *f++ = '*';
-		if (float_conversion)
-		  {
-		    if (INT_AS_LDBL)
-		      {
-			*f = 'L';
-			f += INTEGERP (arg);
-		      }
-		  }
-		else if (conversion != 'c')
-		  {
-		    memcpy (f, pMd, pMlen);
-		    f += pMlen;
-		    zero_flag &= ! precision_given;
-		  }
-		*f++ = conversion;
-		*f = '\0';
-	      }
+	      char *f = convspec;
+	      *f++ = '%';
+	      /* MINUS_FLAG and ZERO_FLAG are dealt with later.  */
+	      *f = '+'; f +=  plus_flag;
+	      *f = ' '; f += space_flag;
+	      *f = '#'; f += sharp_flag;
+	      *f++ = '.';
+	      *f++ = '*';
+	      if (float_conversion)
+		{
+		  if (INT_AS_LDBL)
+		    {
+		      *f = 'L';
+		      f += INTEGERP (arg);
+		    }
+		}
+	      else if (conversion != 'c')
+		{
+		  memcpy (f, pMd, pMlen);
+		  f += pMlen;
+		  zero_flag &= ! precision_given;
+		}
+	      *f++ = conversion;
+	      *f = '\0';
 
 	      int prec = -1;
 	      if (precision_given)
@@ -4630,29 +4628,20 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
 		}
 	      else if (conversion == 'd' || conversion == 'i')
 		{
-		  /* For float, maybe we should use "%1.0f"
-		     instead so it also works for values outside
-		     the integer range.  */
-		  printmax_t x;
 		  if (INTEGERP (arg))
-		    x = XINT (arg);
+		    {
+		      printmax_t x = XINT (arg);
+		      sprintf_bytes = sprintf (sprintf_buf, convspec, prec, x);
+		    }
 		  else
 		    {
-		      double d = XFLOAT_DATA (arg);
-		      if (d < 0)
-			{
-			  x = TYPE_MINIMUM (printmax_t);
-			  if (x < d)
-			    x = d;
-			}
-		      else
-			{
-			  x = TYPE_MAXIMUM (printmax_t);
-			  if (d < x)
-			    x = d;
-			}
+		      strcpy (f - pMlen - 1, "f");
+		      double x = XFLOAT_DATA (arg);
+		      sprintf_bytes = sprintf (sprintf_buf, convspec, 0, x);
+		      char c0 = sprintf_buf[0];
+		      bool signedp = ! ('0' <= c0 && c0 <= '9');
+		      prec = min (precision, sprintf_bytes - signedp);
 		    }
-		  sprintf_bytes = sprintf (sprintf_buf, convspec, prec, x);
 		}
 	      else
 		{
@@ -4663,22 +4652,19 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
 		  else
 		    {
 		      double d = XFLOAT_DATA (arg);
-		      if (d < 0)
-			x = 0;
-		      else
-			{
-			  x = TYPE_MAXIMUM (uprintmax_t);
-			  if (d < x)
-			    x = d;
-			}
+		      double uprintmax = TYPE_MAXIMUM (uprintmax_t);
+		      if (! (0 <= d && d < uprintmax + 1))
+			xsignal1 (Qoverflow_error, arg);
+		      x = d;
 		    }
 		  sprintf_bytes = sprintf (sprintf_buf, convspec, prec, x);
 		}
 
 	      /* Now the length of the formatted item is known, except it omits
 		 padding and excess precision.  Deal with excess precision
-		 first.  This happens only when the format specifies
-		 ridiculously large precision.  */
+		 first.  This happens when the format specifies ridiculously
+		 large precision, or when %d or %i formats a float that would
+		 ordinarily need fewer digits than a specified precision.  */
 	      ptrdiff_t excess_precision
 		= precision_given ? precision - prec : 0;
 	      ptrdiff_t leading_zeros = 0, trailing_zeros = 0;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* bug#30408: Checking for loss of information on integer conversion
  2018-03-09  5:00       ` bug#30408: " Paul Eggert
@ 2018-03-09  8:22         ` Eli Zaretskii
  2018-03-21 19:13           ` Paul Eggert
  0 siblings, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2018-03-09  8:22 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 30408

> Cc: 30408@debbugs.gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Thu, 8 Mar 2018 21:00:42 -0800
> 
> Since the qualms expressed on this topic had to do with converting strings to 
> integers, I installed into master the noncontroversial part affecting conversion 
> of integers to strings (see attached patch; it also fixes some minor glitches in 
> the previous proposal). I'll think about the string-to-integer conversion a bit 
> more and propose an updated patch for that.

Thanks.  May I suggest to add a couple of tests for this feature?





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#30408: Checking for loss of information on integer conversion
  2018-03-09  8:22         ` Eli Zaretskii
@ 2018-03-21 19:13           ` Paul Eggert
  2018-03-21 19:29             ` Eli Zaretskii
  0 siblings, 1 reply; 25+ messages in thread
From: Paul Eggert @ 2018-03-21 19:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 30408

[-- Attachment #1: Type: text/plain, Size: 138 bytes --]

On 03/09/2018 12:22 AM, Eli Zaretskii wrote:
> May I suggest to add a couple of tests for this feature?

Sure, I installed the attached.


[-- Attachment #2: 0001-Add-tests-for-Bug-30408.patch --]
[-- Type: text/x-patch, Size: 1766 bytes --]

From 84dbd740c440b4c048f675486af7292cdf251c9f Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Wed, 21 Mar 2018 12:10:11 -0700
Subject: [PATCH] Add tests for Bug#30408

* test/src/editfns-tests.el (format-%d-large-float)
(format-%x-large-float, format-%o-invalid-float): New tests.
---
 test/src/editfns-tests.el | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/test/src/editfns-tests.el b/test/src/editfns-tests.el
index 69ea6f5cc8..6e1f730166 100644
--- a/test/src/editfns-tests.el
+++ b/test/src/editfns-tests.el
@@ -142,6 +142,27 @@ transpose-test-get-byte-positions
   (should (string-equal (format "%#05X" #x10) "0X010"))
   (should (string-equal (format "%#04x" 0) "0000")))
 
+;;; Test Bug#30408.
+(ert-deftest format-%d-large-float ()
+  (should (string-equal (format "%d" 18446744073709551616.0)
+                        "18446744073709551616"))
+  (should (string-equal (format "%d" -18446744073709551616.0)
+                        "-18446744073709551616")))
+
+;;; Another test for Bug#30408.
+;;; Perhaps Emacs will be improved someday to return the correct
+;;; answer for positive numbers instead of overflowing; in
+;;; that case this test will need to be changed.  In the meantime make
+;;; sure Emacs is reporting the overflow correctly.
+(ert-deftest format-%x-large-float ()
+  (should-error (format "%x" 18446744073709551616.0)
+                :type 'overflow-error))
+
+;;; Another test for Bug#30408.
+(ert-deftest format-%o-invalid-float ()
+  (should-error (format "%o" -1e-37)
+                :type 'overflow-error))
+
 ;;; Check format-time-string with various TZ settings.
 ;;; Use only POSIX-compatible TZ values, since the tests should work
 ;;; even if tzdb is not in use.
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* bug#30408: Checking for loss of information on integer conversion
  2018-03-21 19:13           ` Paul Eggert
@ 2018-03-21 19:29             ` Eli Zaretskii
  0 siblings, 0 replies; 25+ messages in thread
From: Eli Zaretskii @ 2018-03-21 19:29 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 30408

> Cc: 30408@debbugs.gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Wed, 21 Mar 2018 12:13:49 -0700
> 
> On 03/09/2018 12:22 AM, Eli Zaretskii wrote:
> > May I suggest to add a couple of tests for this feature?
> 
> Sure, I installed the attached.

Thanks!





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#30408: Checking for loss of information on integer conversion
  2018-02-18 20:04   ` Paul Eggert
  2018-02-18 20:24     ` Eli Zaretskii
@ 2018-02-18 20:24     ` Eli Zaretskii
  2018-02-18 21:52     ` Drew Adams
  2 siblings, 0 replies; 25+ messages in thread
From: Eli Zaretskii @ 2018-02-18 20:24 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 30408, emacs-devel

> From: Paul Eggert <eggert@cs.ucla.edu>
> Cc: emacs-devel@gnu.org, 30408@debbugs.gnu.org
> Date: Sun, 18 Feb 2018 12:04:20 -0800
> 
> > Emacs Lisp is not used to write software that controls
> > aircraft and spaceships
> 
> Actually, I maintain Emacs Lisp code that controls timestamps used in aircraft 
> and spaceships. I'm not saying that Emacs itself runs the aircraft and 
> spaceships, but it definitely is used to develop software and data used there. 
> As luck would have it, I'm currently engaged in an email thread about time 
> transfer between Earth and Mars (yes, this is really a thing and people are 
> trying to do it with millisecond precision) that is related to a project where I 
> regularly use Emacs Lisp. See the thread containing this message:

Interesting, but not really relevant to the issue at hand, IMO.  I was
talking about real-time control, not off-line calculations.  And I did
propose to have this feature as opt-in, so the kind of calculations
that transfer me to Mars could still be held safely and accurately.

> > More generally, why signaling an error by default in this case is a
> > good idea? ...  That would
> > be similar to behavior of equivalent constructs in C programs
> 
> Sure, and C compilers typically issue diagnostics for situations similar to 
> what's in Bug#30408. For example, for this C program:
> 
> int a = 18446744073709553664;
> 
> GCC issues a diagnostic, whereas for the similar Emacs Lisp program:
> 
> (setq b 18446744073709553664)
> 
> Emacs silently substitutes a number that is off by 2048.

I'm okay with flagging such constants during byte compilation.  I was
talking only about run-time diagnostics, not compile-time diagnostics.

> When people write a floating-point number they naturally expect it to have some 
> fuzz. But when they write an integer they expect it to be represented exactly, 
> and not to be rounded.

That is true, but Emacs behaved like it does today for many years, and
I'm worried by the possible breakage such a significant behavior
change could have, including on our own code.





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#30408: Checking for loss of information on integer conversion
  2018-02-18 20:04   ` Paul Eggert
  2018-02-18 20:24     ` Eli Zaretskii
  2018-02-18 20:24     ` Eli Zaretskii
@ 2018-02-18 21:52     ` Drew Adams
  2 siblings, 0 replies; 25+ messages in thread
From: Drew Adams @ 2018-02-18 21:52 UTC (permalink / raw)
  To: Paul Eggert, Eli Zaretskii; +Cc: 30408, emacs-devel

Do you really need to send this thread to both the bug
list and emacs-devel?





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#30408: Checking for loss of information on integer conversion
       [not found] ` <83y3jq9q4m.fsf@gnu.org>
  2018-02-18 20:04   ` Paul Eggert
@ 2018-02-18 20:04   ` Paul Eggert
  2018-03-27 23:19   ` Paul Eggert
  2 siblings, 0 replies; 25+ messages in thread
From: Paul Eggert @ 2018-02-18 20:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 30408, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2618 bytes --]

Eli Zaretskii wrote:

> Emacs Lisp is not used to write software that controls
> aircraft and spaceships

Actually, I maintain Emacs Lisp code that controls timestamps used in aircraft 
and spaceships. I'm not saying that Emacs itself runs the aircraft and 
spaceships, but it definitely is used to develop software and data used there. 
As luck would have it, I'm currently engaged in an email thread about time 
transfer between Earth and Mars (yes, this is really a thing and people are 
trying to do it with millisecond precision) that is related to a project where I 
regularly use Emacs Lisp. See the thread containing this message:

https://mm.icann.org/pipermail/tz/2018-February/026257.html

> More generally, why signaling an error by default in this case is a
> good idea? ...  That would
> be similar to behavior of equivalent constructs in C programs

Sure, and C compilers typically issue diagnostics for situations similar to 
what's in Bug#30408. For example, for this C program:

int a = 18446744073709553664;

GCC issues a diagnostic, whereas for the similar Emacs Lisp program:

(setq b 18446744073709553664)

Emacs silently substitutes a number that is off by 2048. It's the latter 
behavior that causes the sort of problem seen in Bug#30408.

When people write a floating-point number they naturally expect it to have some 
fuzz. But when they write an integer they expect it to be represented exactly, 
and not to be rounded.  Emacs already reports an overflow error for the 
following code that attempts to use the same mathematical value:

(setq c #x10000000000000800)

so it's not like it would be a huge change to do something similar for decimal 
integers.

When Emacs was originally developed, its integers were typically 28 bits (not 
counting sign) and floating-point numbers could typically represent integers 
exactly up to 53 bits (not counting sign), so the old Emacs behavior was 
somewhat defensible: although it didn't do bignums, at least it could represent 
integers nearly twice as wide as fixnums. However, nowadays Emacs integers 
typically have more precision than floating point numbers, and the old Emacs 
behavior is more likely to lead to counterintuitive results such as those 
described in Bug#30408.

On thinking about it in the light of your comments, I suppose it's confusing 
that the proposal used a new signal 'inexact', whereas it should just signal 
overflow. After all, that's what string_to_number already does for out-of-range 
hexadecimal integers. That issue is easily fixed. Revised patch attached.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Avoid-losing-info-when-converting-integers.patch --]
[-- Type: text/x-patch; name="0001-Avoid-losing-info-when-converting-integers.patch", Size: 7603 bytes --]

From 49895e55ed7ac41dbf3752ab534cd665ef45ee71 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 18 Feb 2018 11:37:22 -0800
Subject: [PATCH] Avoid losing info when converting integers

This fixes some glitches with large integers (Bug#30408).
* doc/lispref/numbers.texi (Integer Basics): Say that
decimal integers out of fixnum range must be representable
exactly as floating-point.
* etc/NEWS: Mention this.
* src/editfns.c (styled_format): Use %.0f when formatting %d or %i
values outside machine integer range, to avoid losing info.
Signal an error for %o or %x values that are too large to be
formatted, to avoid losing info.
* src/lread.c (string_to_number): When converting an integer-format
string to floating-point, signal an error if info is lost.
---
 doc/lispref/numbers.texi |  8 +++--
 etc/NEWS                 |  9 +++++
 src/editfns.c            | 93 ++++++++++++++++++++----------------------------
 src/lread.c              | 14 ++++++++
 4 files changed, 66 insertions(+), 58 deletions(-)

diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi
index e692ee1cc2..252aafd8fd 100644
--- a/doc/lispref/numbers.texi
+++ b/doc/lispref/numbers.texi
@@ -53,9 +53,11 @@ Integer Basics
 chapter assume the minimum integer width of 30 bits.
 @cindex overflow
 
-  The Lisp reader reads an integer as a sequence of digits with optional
-initial sign and optional final period.  An integer that is out of the
-Emacs range is treated as a floating-point number.
+  The Lisp reader can read an integer as a nonempty sequence of
+decimal digits with optional initial sign and optional final period.
+A decimal integer that is out of the Emacs range is treated as
+floating-point if it can be represented exactly as a floating-point
+number.
 
 @example
  1               ; @r{The integer 1.}
diff --git a/etc/NEWS b/etc/NEWS
index 8db638e5ed..36cbcf6500 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -248,6 +248,12 @@ as new-style, bind the new variable 'force-new-style-backquotes' to t.
 'cl-struct-define' whose name clashes with a builtin type (e.g.,
 'integer' or 'hash-table') now signals an error.
 
+** When formatting a floating-point number as an octal or hexadecimal
+integer, Emacs now signals an error if the number is too large for the
+implementation to format.  When reading an integer outside Emacs
+fixnum range, Emacs now signals an error if the integer cannot be
+represented exactly as a floating-point number.  See Bug#30408.
+
 \f
 * Lisp Changes in Emacs 27.1
 
@@ -289,6 +295,9 @@ remote systems, which support this check.
 If the optional third argument is non-nil, 'make-string' will produce
 a multibyte string even if its second argument is an ASCII character.
 
+** (format "%d" X) no longer mishandles floating-point X values that
+do not fit in a machine integer (Bug#30408).
+
 ** New JSON parsing and serialization functions 'json-serialize',
 'json-insert', 'json-parse-string', and 'json-parse-buffer'.  These
 are implemented in C using the Jansson library.
diff --git a/src/editfns.c b/src/editfns.c
index 96bb271b2d..d26549ddb8 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -4563,32 +4563,30 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
 		 and with pM inserted for integer formats.
 		 At most two flags F can be specified at once.  */
 	      char convspec[sizeof "%FF.*d" + max (INT_AS_LDBL, pMlen)];
-	      {
-		char *f = convspec;
-		*f++ = '%';
-		/* MINUS_FLAG and ZERO_FLAG are dealt with later.  */
-		*f = '+'; f +=  plus_flag;
-		*f = ' '; f += space_flag;
-		*f = '#'; f += sharp_flag;
-                *f++ = '.';
-                *f++ = '*';
-		if (float_conversion)
-		  {
-		    if (INT_AS_LDBL)
-		      {
-			*f = 'L';
-			f += INTEGERP (arg);
-		      }
-		  }
-		else if (conversion != 'c')
-		  {
-		    memcpy (f, pMd, pMlen);
-		    f += pMlen;
-		    zero_flag &= ! precision_given;
-		  }
-		*f++ = conversion;
-		*f = '\0';
-	      }
+	      char *f = convspec;
+	      *f++ = '%';
+	      /* MINUS_FLAG and ZERO_FLAG are dealt with later.  */
+	      *f = '+'; f +=  plus_flag;
+	      *f = ' '; f += space_flag;
+	      *f = '#'; f += sharp_flag;
+	      *f++ = '.';
+	      *f++ = '*';
+	      if (float_conversion)
+		{
+		  if (INT_AS_LDBL)
+		    {
+		      *f = 'L';
+		      f += INTEGERP (arg);
+		    }
+		}
+	      else if (conversion != 'c')
+		{
+		  memcpy (f, pMd, pMlen);
+		  f += pMlen;
+		  zero_flag &= ! precision_given;
+		}
+	      *f++ = conversion;
+	      *f = '\0';
 
 	      int prec = -1;
 	      if (precision_given)
@@ -4630,29 +4628,18 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
 		}
 	      else if (conversion == 'd' || conversion == 'i')
 		{
-		  /* For float, maybe we should use "%1.0f"
-		     instead so it also works for values outside
-		     the integer range.  */
-		  printmax_t x;
 		  if (INTEGERP (arg))
-		    x = XINT (arg);
+		    {
+		      printmax_t x = XINT (arg);
+		      sprintf_bytes = sprintf (sprintf_buf, convspec, prec, x);
+		    }
 		  else
 		    {
-		      double d = XFLOAT_DATA (arg);
-		      if (d < 0)
-			{
-			  x = TYPE_MINIMUM (printmax_t);
-			  if (x < d)
-			    x = d;
-			}
-		      else
-			{
-			  x = TYPE_MAXIMUM (printmax_t);
-			  if (d < x)
-			    x = d;
-			}
+		      strcpy (f - pMlen - 1, "f");
+		      prec = 0;
+		      double x = XFLOAT_DATA (arg);
+		      sprintf_bytes = sprintf (sprintf_buf, convspec, prec, x);
 		    }
-		  sprintf_bytes = sprintf (sprintf_buf, convspec, prec, x);
 		}
 	      else
 		{
@@ -4663,22 +4650,18 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
 		  else
 		    {
 		      double d = XFLOAT_DATA (arg);
-		      if (d < 0)
-			x = 0;
-		      else
-			{
-			  x = TYPE_MAXIMUM (uprintmax_t);
-			  if (d < x)
-			    x = d;
-			}
+		      if (! (0 <= d && d < TYPE_MAXIMUM (uprintmax_t)))
+			xsignal1 (Qoverflow_error, arg);
+		      x = d;
 		    }
 		  sprintf_bytes = sprintf (sprintf_buf, convspec, prec, x);
 		}
 
 	      /* Now the length of the formatted item is known, except it omits
 		 padding and excess precision.  Deal with excess precision
-		 first.  This happens only when the format specifies
-		 ridiculously large precision.  */
+		 first.  This happens when the format specifies
+		 ridiculously large precision, or when %d or %i has
+		 nonzero precision and formats a float.  */
 	      ptrdiff_t excess_precision
 		= precision_given ? precision - prec : 0;
 	      ptrdiff_t leading_zeros = 0, trailing_zeros = 0;
diff --git a/src/lread.c b/src/lread.c
index d009bd0cd2..9500ed8341 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -3794,6 +3794,20 @@ string_to_number (char const *string, int base, bool ignore_trailing)
   if (! value)
     value = atof (string + signedp);
 
+  if (! float_syntax)
+    {
+      /* Check that converting the integer-format STRING to a
+	 floating-point number does not lose info.  See Bug#30408.  */
+      char const *bp = string + signedp;
+      while (*bp == '0')
+	bp++;
+      char checkbuf[DBL_MAX_10_EXP + 2];
+      int checkbuflen = sprintf (checkbuf, "%.0f", value);
+      if (! (cp - bp - !!(state & DOT_CHAR) == checkbuflen
+	     && memcmp (bp, checkbuf, checkbuflen) == 0))
+	xsignal1 (Qoverflow_error, build_string (string));
+    }
+
   return make_float (negative ? -value : value);
 }
 
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* bug#30408: Checking for loss of information on integer conversion
       [not found] ` <83y3jq9q4m.fsf@gnu.org>
  2018-02-18 20:04   ` Paul Eggert
  2018-02-18 20:04   ` Paul Eggert
@ 2018-03-27 23:19   ` Paul Eggert
  2018-03-29 11:11     ` Eli Zaretskii
  2 siblings, 1 reply; 25+ messages in thread
From: Paul Eggert @ 2018-03-27 23:19 UTC (permalink / raw)
  To: Eli Zaretskii, 30408

[-- Attachment #1: Type: text/plain, Size: 1098 bytes --]

Here's a patch that I hope addresses the main problem. The basic idea is 
to avoid the confusion exemplified in Bug#30408 by changing Emacs so 
that it ordinarily signals an error if it reads a program that contains 
an integer literal that is out of fixnum range. However, if the 
out-of-range literal is followed by '.' then Emacs continues to silently 
convert it to floating-point; this is intended as an escape hatch for 
any programs that need the old behavior (I expect this'll be rare). 
Thus, on 32-bit Emacs, plain '536870912' in a program causes Emacs to 
signal an overflow while loading the program, whereas '536870912.' is 
treated as a floating-point number as before. (On 64-bit Emacs, the same 
two literals are both integers, as before.)

Unlike my previous proposal, this patch does not affect the behavior of 
string-to-integer. As I understand it, that was a primary source of 
qualms about the previous proposal.

I've tested this on both 32- and 64-bit Emacs on master. This patch has 
helped me to find a couple of integer portability bugs which I already 
fixed on master.

[-- Attachment #2: 0001-Lisp-reader-now-checks-for-integer-overflow.patch --]
[-- Type: text/x-patch, Size: 11154 bytes --]

From 94b7a1a171de3113cd5250315dee7bdef5f51890 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 27 Mar 2018 15:48:47 -0700
Subject: [PATCH] Lisp reader now checks for integer overflow

* doc/lispref/numbers.texi (Integer Basics), etc/NEWS:
Document this.
* src/lisp.h (S2N_IGNORE_TRAILING, S2N_OVERFLOW_TO_FLOAT):
New constants.
* src/lread.c (string_to_number): Change trailing bool arg to
integer argument with flags, to support S2N_OVERFLOW_TO_FLOAT.
All uses changed.
* test/src/editfns-tests.el (read-large-integer): New test.
---
 doc/lispref/numbers.texi  | 14 ++++++++++----
 etc/NEWS                  |  7 +++++++
 src/data.c                |  9 ++++-----
 src/lisp.h                |  3 ++-
 src/lread.c               | 35 ++++++++++++++++++++---------------
 src/process.c             |  2 +-
 test/src/editfns-tests.el | 22 ++++++++++++++++++----
 7 files changed, 62 insertions(+), 30 deletions(-)

diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi
index c2cb6651d4..2fed2b642f 100644
--- a/doc/lispref/numbers.texi
+++ b/doc/lispref/numbers.texi
@@ -55,16 +55,13 @@ Integer Basics
 
   The Lisp reader reads an integer as a nonempty sequence
 of decimal digits with optional initial sign and optional
-final period.  A decimal integer that is out of the
-Emacs range is treated as a floating-point number.
+final period.
 
 @example
  1               ; @r{The integer 1.}
  1.              ; @r{The integer 1.}
 +1               ; @r{Also the integer 1.}
 -1               ; @r{The integer @minus{}1.}
- 9000000000000000000
-                 ; @r{The floating-point number 9e18.}
  0               ; @r{The integer 0.}
 -0               ; @r{The integer 0.}
 @end example
@@ -94,6 +91,15 @@ Integer Basics
 #24r1k @result{} 44
 @end example
 
+  If an integer is outside the Emacs range, the Lisp reader ordinarily
+signals an overflow.  However, if a too-large plain integer ends in a
+period, the Lisp reader treats it as a floating-point number instead.
+This lets an Emacs Lisp program specify a large integer that is
+quietly approximated by a floating-point number on machines with
+limited word width.  For example, @samp{536870912.} is a
+floating-point number if Emacs integers are only 30 bits wide and is
+an integer otherwise.
+
   To understand how various functions work on integers, especially the
 bitwise operators (@pxref{Bitwise Operations}), it is often helpful to
 view the numbers in their binary form.
diff --git a/etc/NEWS b/etc/NEWS
index fd1d04b8a0..cb74f512cc 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -349,6 +349,13 @@ as new-style, bind the new variable 'force-new-style-backquotes' to t.
 integer, Emacs now signals an error if the number is too large for the
 implementation to format (Bug#30408).
 
++++
+** The Lisp reader now signals an overflow for plain decimal integers
+that do not end in '.' and are outside Emacs range.  Formerly the Lisp
+reader silently converted them to floating-point numbers, and signaled
+overflow only for integers with a radix that are outside machine range
+(Bug#30408).
+
 ---
 ** Some functions and variables obsolete since Emacs 22 have been removed:
 archive-mouse-extract, assoc-ignore-case, assoc-ignore-representation,
diff --git a/src/data.c b/src/data.c
index a7fab1ef58..6f23a26757 100644
--- a/src/data.c
+++ b/src/data.c
@@ -2716,9 +2716,7 @@ present, base 10 is used.  BASE must be between 2 and 16 (inclusive).
 If the base used is not 10, STRING is always parsed as an integer.  */)
   (register Lisp_Object string, Lisp_Object base)
 {
-  register char *p;
-  register int b;
-  Lisp_Object val;
+  int b;
 
   CHECK_STRING (string);
 
@@ -2732,11 +2730,12 @@ If the base used is not 10, STRING is always parsed as an integer.  */)
       b = XINT (base);
     }
 
-  p = SSDATA (string);
+  char *p = SSDATA (string);
   while (*p == ' ' || *p == '\t')
     p++;
 
-  val = string_to_number (p, b, true);
+  int flags = S2N_IGNORE_TRAILING | S2N_OVERFLOW_TO_FLOAT;
+  Lisp_Object val = string_to_number (p, b, flags);
   return NILP (val) ? make_number (0) : val;
 }
 \f
diff --git a/src/lisp.h b/src/lisp.h
index f0c0c5a14a..b931d23bf3 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -3899,7 +3899,8 @@ LOADHIST_ATTACH (Lisp_Object x)
 }
 extern int openp (Lisp_Object, Lisp_Object, Lisp_Object,
                   Lisp_Object *, Lisp_Object, bool);
-extern Lisp_Object string_to_number (char const *, int, bool);
+enum { S2N_IGNORE_TRAILING = 1, S2N_OVERFLOW_TO_FLOAT = 2 };
+extern Lisp_Object string_to_number (char const *, int, int);
 extern void map_obarray (Lisp_Object, void (*) (Lisp_Object, Lisp_Object),
                          Lisp_Object);
 extern void dir_warning (const char *, Lisp_Object);
diff --git a/src/lread.c b/src/lread.c
index 381f9cf20c..a774524ee4 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -2339,7 +2339,7 @@ character_name_to_code (char const *name, ptrdiff_t name_len)
      monstrosities like "U+-0000".  */
   Lisp_Object code
     = (name[0] == 'U' && name[1] == '+'
-       ? string_to_number (name + 1, 16, false)
+       ? string_to_number (name + 1, 16, 0)
        : call2 (Qchar_from_name, make_unibyte_string (name, name_len), Qt));
 
   if (! RANGED_INTEGERP (0, code, MAX_UNICODE_CHAR)
@@ -2693,7 +2693,7 @@ read_integer (Lisp_Object readcharfun, EMACS_INT radix)
       invalid_syntax (buf);
     }
 
-  return string_to_number (buf, radix, false);
+  return string_to_number (buf, radix, 0);
 }
 
 
@@ -3502,7 +3502,7 @@ read1 (Lisp_Object readcharfun, int *pch, bool first_in_list)
 
 	if (!quoted && !uninterned_symbol)
 	  {
-	    Lisp_Object result = string_to_number (read_buffer, 10, false);
+	    Lisp_Object result = string_to_number (read_buffer, 10, 0);
 	    if (! NILP (result))
 	      return unbind_to (count, result);
 	  }
@@ -3667,16 +3667,17 @@ substitute_in_interval (INTERVAL interval, void *arg)
 }
 
 \f
-/* Convert STRING to a number, assuming base BASE.  Return a fixnum if
-   STRING has integer syntax and fits in a fixnum, else return the
-   nearest float if STRING has either floating point or integer syntax
-   and BASE is 10, else return nil.  If IGNORE_TRAILING, consider just
-   the longest prefix of STRING that has valid floating point syntax.
-   Signal an overflow if BASE is not 10 and the number has integer
-   syntax but does not fit.  */
+/* Convert STRING to a number, assuming base BASE.  When STRING has
+   floating point syntax and BASE is 10, return a nearest float.  When
+   STRING has integer syntax, return a fixnum if the integer fits, and
+   signal an overflow otherwise (unless BASE is 10 and STRING ends in
+   period or FLAGS & S2N_OVERFLOW_TO_FLOAT is nonzero; in this case,
+   return a nearest float instead).  Otherwise, return nil.  If FLAGS
+   & S2N_IGNORE_TRAILING is nonzero, consider just the longest prefix
+   of STRING that has valid syntax.  */
 
 Lisp_Object
-string_to_number (char const *string, int base, bool ignore_trailing)
+string_to_number (char const *string, int base, int flags)
 {
   char const *cp = string;
   bool float_syntax = 0;
@@ -3759,9 +3760,10 @@ string_to_number (char const *string, int base, bool ignore_trailing)
 		      || (state & ~INTOVERFLOW) == (LEAD_INT|E_EXP));
     }
 
-  /* Return nil if the number uses invalid syntax.  If IGNORE_TRAILING, accept
-     any prefix that matches.  Otherwise, the entire string must match.  */
-  if (! (ignore_trailing
+  /* Return nil if the number uses invalid syntax.  If FLAGS &
+     S2N_IGNORE_TRAILING, accept any prefix that matches.  Otherwise,
+     the entire string must match.  */
+  if (! (flags & S2N_IGNORE_TRAILING
 	 ? ((state & LEAD_INT) != 0 || float_syntax)
 	 : (!*cp && ((state & ~(INTOVERFLOW | DOT_CHAR)) == LEAD_INT
 		     || float_syntax))))
@@ -3776,7 +3778,7 @@ string_to_number (char const *string, int base, bool ignore_trailing)
 	  /* Unfortunately there's no simple and accurate way to convert
 	     non-base-10 numbers that are out of C-language range.  */
 	  if (base != 10)
-	    xsignal1 (Qoverflow_error, build_string (string));
+	    flags = 0;
 	}
       else if (n <= (negative ? -MOST_NEGATIVE_FIXNUM : MOST_POSITIVE_FIXNUM))
 	{
@@ -3785,6 +3787,9 @@ string_to_number (char const *string, int base, bool ignore_trailing)
 	}
       else
 	value = n;
+
+      if (! (state & DOT_CHAR) && ! (flags & S2N_OVERFLOW_TO_FLOAT))
+	xsignal1 (Qoverflow_error, build_string (string));
     }
 
   /* Either the number uses float syntax, or it does not fit into a fixnum.
diff --git a/src/process.c b/src/process.c
index 2aaa238f60..ed2cab7b51 100644
--- a/src/process.c
+++ b/src/process.c
@@ -6842,7 +6842,7 @@ SIGCODE may be an integer, or a symbol whose name is a signal name.  */)
     {
       Lisp_Object tem = Fget_process (process);
       if (NILP (tem))
-	tem = string_to_number (SSDATA (process), 10, true);
+	tem = string_to_number (SSDATA (process), 10, S2N_OVERFLOW_TO_FLOAT);
       process = tem;
     }
   else if (!NUMBERP (process))
diff --git a/test/src/editfns-tests.el b/test/src/editfns-tests.el
index 6e1f730166..442ad08937 100644
--- a/test/src/editfns-tests.el
+++ b/test/src/editfns-tests.el
@@ -142,27 +142,41 @@ transpose-test-get-byte-positions
   (should (string-equal (format "%#05X" #x10) "0X010"))
   (should (string-equal (format "%#04x" 0) "0000")))
 
-;;; Test Bug#30408.
+
+;;; Tests for Bug#30408.
+
 (ert-deftest format-%d-large-float ()
   (should (string-equal (format "%d" 18446744073709551616.0)
                         "18446744073709551616"))
   (should (string-equal (format "%d" -18446744073709551616.0)
                         "-18446744073709551616")))
 
-;;; Another test for Bug#30408.
 ;;; Perhaps Emacs will be improved someday to return the correct
 ;;; answer for positive numbers instead of overflowing; in
-;;; that case this test will need to be changed.  In the meantime make
+;;; that case these tests will need to be changed.  In the meantime make
 ;;; sure Emacs is reporting the overflow correctly.
 (ert-deftest format-%x-large-float ()
   (should-error (format "%x" 18446744073709551616.0)
                 :type 'overflow-error))
+(ert-deftest read-large-integer ()
+  (should-error (read (format "%d0" most-negative-fixnum))
+                :type 'overflow-error)
+  (should-error (read (format "%+d" (* -8.0 most-negative-fixnum)))
+                :type 'overflow-error)
+  (should-error (read (substring (format "%d" most-negative-fixnum) 1))
+                :type 'overflow-error)
+  (should-error (read (format "#x%x" most-negative-fixnum))
+                :type 'overflow-error)
+  (should-error (read (format "#o%o" most-negative-fixnum))
+                :type 'overflow-error)
+  (should-error (read (format "#32rG%x" most-positive-fixnum))
+                :type 'overflow-error))
 
-;;; Another test for Bug#30408.
 (ert-deftest format-%o-invalid-float ()
   (should-error (format "%o" -1e-37)
                 :type 'overflow-error))
 
+
 ;;; Check format-time-string with various TZ settings.
 ;;; Use only POSIX-compatible TZ values, since the tests should work
 ;;; even if tzdb is not in use.
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* bug#30408: Checking for loss of information on integer conversion
  2018-03-27 23:19   ` Paul Eggert
@ 2018-03-29 11:11     ` Eli Zaretskii
  2018-03-29 18:09       ` Paul Eggert
  0 siblings, 1 reply; 25+ messages in thread
From: Eli Zaretskii @ 2018-03-29 11:11 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 30408

> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Tue, 27 Mar 2018 16:19:21 -0700
> 
> Here's a patch that I hope addresses the main problem. The basic idea is 
> to avoid the confusion exemplified in Bug#30408 by changing Emacs so 
> that it ordinarily signals an error if it reads a program that contains 
> an integer literal that is out of fixnum range. However, if the 
> out-of-range literal is followed by '.' then Emacs continues to silently 
> convert it to floating-point; this is intended as an escape hatch for 
> any programs that need the old behavior (I expect this'll be rare). 

I'd suggest, for a good measure, to have a variable which would force
the conversion to floats, avoiding an error even without the trailing
period.  We can later remove that variable, or make it a no-op, if the
danger of breaking existing code turns out low or non-existent.





^ permalink raw reply	[flat|nested] 25+ messages in thread

* bug#30408: Checking for loss of information on integer conversion
  2018-03-29 11:11     ` Eli Zaretskii
@ 2018-03-29 18:09       ` Paul Eggert
  0 siblings, 0 replies; 25+ messages in thread
From: Paul Eggert @ 2018-03-29 18:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 30408-done, David Sitsky

[-- Attachment #1: Type: text/plain, Size: 696 bytes --]

On 03/29/2018 04:11 AM, Eli Zaretskii wrote:
> I'd suggest, for a good measure, to have a variable which would force
> the conversion to floats, avoiding an error even without the trailing
> period.  We can later remove that variable, or make it a no-op, if the
> danger of breaking existing code turns out low or non-existent.

OK, I did that, by installing the attached into master, after installing 
the proposed patch.

As a result, unless the user sets the new variable 
read-integer-overflow-as-float, the Lisp reader now rejects the program 
(format "%x" 2738188573457603759) by signaling an overflow error. As 
this was the basis of the original bug report, I'm marking the bug as done.


[-- Attachment #2: 0001-New-experimental-variable-read-integer-overflow-as-f.patch --]
[-- Type: text/x-patch, Size: 2597 bytes --]

From c213f465ba8038ce93314b96fd53ec3e35d34609 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Thu, 29 Mar 2018 11:01:38 -0700
Subject: [PATCH] New experimental variable read-integer-overflow-as-float.

Following a suggestion by Eli Zaretskii (Bug#30408#46).
* etc/NEWS: Mention it.
* src/lread.c (syms_of_lread): Add it.
(read1): Treat out-of-range integers as floats if
read-integer-overflow-as-float is non-nil.
---
 etc/NEWS    |  6 ++++--
 src/lread.c | 11 ++++++++++-
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/etc/NEWS b/etc/NEWS
index 9161f2bd32..9dddc90213 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -356,8 +356,10 @@ implementation to format (Bug#30408).
 ** The Lisp reader now signals an overflow for plain decimal integers
 that do not end in '.' and are outside Emacs range.  Formerly the Lisp
 reader silently converted them to floating-point numbers, and signaled
-overflow only for integers with a radix that are outside machine range
-(Bug#30408).
+overflow only for integers with a radix that are outside machine range.
+To get the old behavior, set the new, experimental variable
+read-integer-overflow-as-float to t and please email
+30408@debbugs.gnu.org if you need that.  (Bug#30408).
 
 ---
 ** Some functions and variables obsolete since Emacs 22 have been removed:
diff --git a/src/lread.c b/src/lread.c
index a774524ee4..8fb61f5633 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -3502,7 +3502,9 @@ read1 (Lisp_Object readcharfun, int *pch, bool first_in_list)
 
 	if (!quoted && !uninterned_symbol)
 	  {
-	    Lisp_Object result = string_to_number (read_buffer, 10, 0);
+	    int flags = (read_integer_overflow_as_float
+			 ? S2N_OVERFLOW_TO_FLOAT : 0);
+	    Lisp_Object result = string_to_number (read_buffer, 10, flags);
 	    if (! NILP (result))
 	      return unbind_to (count, result);
 	  }
@@ -4830,6 +4832,13 @@ were read in.  */);
 	       doc: /* Non-nil means read recursive structures using #N= and #N# syntax.  */);
   Vread_circle = Qt;
 
+  DEFVAR_BOOL ("read-integer-overflow-as-float",
+	       read_integer_overflow_as_float,
+	       doc: /* Non-nil means `read' quietly treats an out-of-range integer as floating point.
+Nil (the default) means signal an overflow unless the integer ends in `.'.
+This variable is experimental; email 30408@debbugs.gnu.org if you need it.  */);
+  read_integer_overflow_as_float = false;
+
   DEFVAR_LISP ("load-path", Vload_path,
 	       doc: /* List of directories to search for files to load.
 Each element is a string (directory file name) or nil (meaning
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-03-29 18:09 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-18  1:27 Checking for loss of information on integer conversion Paul Eggert
2018-02-18 22:31 ` Juliusz Chroboczek
2018-02-18 22:41   ` Stefan Monnier
2018-02-18 23:46     ` Juliusz Chroboczek
2018-02-19  1:47       ` Stefan Monnier
2018-02-19  2:22         ` Paul Eggert
2018-02-19  3:20           ` Drew Adams
2018-02-19 15:05       ` Richard Stallman
2018-02-22 16:31         ` Juliusz Chroboczek
2018-02-22 17:01           ` Eli Zaretskii
2018-02-22 19:31             ` Stefan Monnier
2018-02-23  9:49           ` Richard Stallman
2018-02-19  6:03   ` John Wiegley
     [not found] ` <83y3jq9q4m.fsf@gnu.org>
2018-02-18 20:04   ` Paul Eggert
2018-02-18 20:24     ` Eli Zaretskii
2018-03-09  5:00       ` bug#30408: " Paul Eggert
2018-03-09  8:22         ` Eli Zaretskii
2018-03-21 19:13           ` Paul Eggert
2018-03-21 19:29             ` Eli Zaretskii
2018-02-18 20:24     ` Eli Zaretskii
2018-02-18 21:52     ` Drew Adams
2018-02-18 20:04   ` Paul Eggert
2018-03-27 23:19   ` Paul Eggert
2018-03-29 11:11     ` Eli Zaretskii
2018-03-29 18:09       ` Paul Eggert

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.