Re: Checking for loss of information on integer conversion

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

From: Paul Eggert <eggert@cs.ucla.edu>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 30408@debbugs.gnu.org, emacs-devel@gnu.org
Subject: Re: Checking for loss of information on integer conversion
Date: Sun, 18 Feb 2018 12:04:20 -0800	[thread overview]
Message-ID: <74ac7b77-a756-95a9-b490-6952cf106f21@cs.ucla.edu> (raw)
In-Reply-To: <83y3jq9q4m.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 2618 bytes --]

Eli Zaretskii wrote:

> Emacs Lisp is not used to write software that controls
> aircraft and spaceships

Actually, I maintain Emacs Lisp code that controls timestamps used in aircraft 
and spaceships. I'm not saying that Emacs itself runs the aircraft and 
spaceships, but it definitely is used to develop software and data used there. 
As luck would have it, I'm currently engaged in an email thread about time 
transfer between Earth and Mars (yes, this is really a thing and people are 
trying to do it with millisecond precision) that is related to a project where I 
regularly use Emacs Lisp. See the thread containing this message:

https://mm.icann.org/pipermail/tz/2018-February/026257.html

> More generally, why signaling an error by default in this case is a
> good idea? ...  That would
> be similar to behavior of equivalent constructs in C programs

Sure, and C compilers typically issue diagnostics for situations similar to 
what's in Bug#30408. For example, for this C program:

int a = 18446744073709553664;

GCC issues a diagnostic, whereas for the similar Emacs Lisp program:

(setq b 18446744073709553664)

Emacs silently substitutes a number that is off by 2048. It's the latter 
behavior that causes the sort of problem seen in Bug#30408.

When people write a floating-point number they naturally expect it to have some 
fuzz. But when they write an integer they expect it to be represented exactly, 
and not to be rounded.  Emacs already reports an overflow error for the 
following code that attempts to use the same mathematical value:

(setq c #x10000000000000800)

so it's not like it would be a huge change to do something similar for decimal 
integers.

When Emacs was originally developed, its integers were typically 28 bits (not 
counting sign) and floating-point numbers could typically represent integers 
exactly up to 53 bits (not counting sign), so the old Emacs behavior was 
somewhat defensible: although it didn't do bignums, at least it could represent 
integers nearly twice as wide as fixnums. However, nowadays Emacs integers 
typically have more precision than floating point numbers, and the old Emacs 
behavior is more likely to lead to counterintuitive results such as those 
described in Bug#30408.

On thinking about it in the light of your comments, I suppose it's confusing 
that the proposal used a new signal 'inexact', whereas it should just signal 
overflow. After all, that's what string_to_number already does for out-of-range 
hexadecimal integers. That issue is easily fixed. Revised patch attached.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Avoid-losing-info-when-converting-integers.patch --]
[-- Type: text/x-patch; name="0001-Avoid-losing-info-when-converting-integers.patch", Size: 7603 bytes --]

From 49895e55ed7ac41dbf3752ab534cd665ef45ee71 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 18 Feb 2018 11:37:22 -0800
Subject: [PATCH] Avoid losing info when converting integers

This fixes some glitches with large integers (Bug#30408).
* doc/lispref/numbers.texi (Integer Basics): Say that
decimal integers out of fixnum range must be representable
exactly as floating-point.
* etc/NEWS: Mention this.
* src/editfns.c (styled_format): Use %.0f when formatting %d or %i
values outside machine integer range, to avoid losing info.
Signal an error for %o or %x values that are too large to be
formatted, to avoid losing info.
* src/lread.c (string_to_number): When converting an integer-format
string to floating-point, signal an error if info is lost.
---
 doc/lispref/numbers.texi |  8 +++--
 etc/NEWS                 |  9 +++++
 src/editfns.c            | 93 ++++++++++++++++++++----------------------------
 src/lread.c              | 14 ++++++++
 4 files changed, 66 insertions(+), 58 deletions(-)

diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi
index e692ee1cc2..252aafd8fd 100644
--- a/doc/lispref/numbers.texi
+++ b/doc/lispref/numbers.texi
@@ -53,9 +53,11 @@ Integer Basics
 chapter assume the minimum integer width of 30 bits.
 @cindex overflow
 
-  The Lisp reader reads an integer as a sequence of digits with optional
-initial sign and optional final period.  An integer that is out of the
-Emacs range is treated as a floating-point number.
+  The Lisp reader can read an integer as a nonempty sequence of
+decimal digits with optional initial sign and optional final period.
+A decimal integer that is out of the Emacs range is treated as
+floating-point if it can be represented exactly as a floating-point
+number.
 
 @example
  1               ; @r{The integer 1.}
diff --git a/etc/NEWS b/etc/NEWS
index 8db638e5ed..36cbcf6500 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -248,6 +248,12 @@ as new-style, bind the new variable 'force-new-style-backquotes' to t.
 'cl-struct-define' whose name clashes with a builtin type (e.g.,
 'integer' or 'hash-table') now signals an error.
 
+** When formatting a floating-point number as an octal or hexadecimal
+integer, Emacs now signals an error if the number is too large for the
+implementation to format.  When reading an integer outside Emacs
+fixnum range, Emacs now signals an error if the integer cannot be
+represented exactly as a floating-point number.  See Bug#30408.
+
 \f
 * Lisp Changes in Emacs 27.1
 
@@ -289,6 +295,9 @@ remote systems, which support this check.
 If the optional third argument is non-nil, 'make-string' will produce
 a multibyte string even if its second argument is an ASCII character.
 
+** (format "%d" X) no longer mishandles floating-point X values that
+do not fit in a machine integer (Bug#30408).
+
 ** New JSON parsing and serialization functions 'json-serialize',
 'json-insert', 'json-parse-string', and 'json-parse-buffer'.  These
 are implemented in C using the Jansson library.
diff --git a/src/editfns.c b/src/editfns.c
index 96bb271b2d..d26549ddb8 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -4563,32 +4563,30 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
 		 and with pM inserted for integer formats.
 		 At most two flags F can be specified at once.  */
 	      char convspec[sizeof "%FF.*d" + max (INT_AS_LDBL, pMlen)];
-	      {
-		char *f = convspec;
-		*f++ = '%';
-		/* MINUS_FLAG and ZERO_FLAG are dealt with later.  */
-		*f = '+'; f +=  plus_flag;
-		*f = ' '; f += space_flag;
-		*f = '#'; f += sharp_flag;
-                *f++ = '.';
-                *f++ = '*';
-		if (float_conversion)
-		  {
-		    if (INT_AS_LDBL)
-		      {
-			*f = 'L';
-			f += INTEGERP (arg);
-		      }
-		  }
-		else if (conversion != 'c')
-		  {
-		    memcpy (f, pMd, pMlen);
-		    f += pMlen;
-		    zero_flag &= ! precision_given;
-		  }
-		*f++ = conversion;
-		*f = '\0';
-	      }
+	      char *f = convspec;
+	      *f++ = '%';
+	      /* MINUS_FLAG and ZERO_FLAG are dealt with later.  */
+	      *f = '+'; f +=  plus_flag;
+	      *f = ' '; f += space_flag;
+	      *f = '#'; f += sharp_flag;
+	      *f++ = '.';
+	      *f++ = '*';
+	      if (float_conversion)
+		{
+		  if (INT_AS_LDBL)
+		    {
+		      *f = 'L';
+		      f += INTEGERP (arg);
+		    }
+		}
+	      else if (conversion != 'c')
+		{
+		  memcpy (f, pMd, pMlen);
+		  f += pMlen;
+		  zero_flag &= ! precision_given;
+		}
+	      *f++ = conversion;
+	      *f = '\0';
 
 	      int prec = -1;
 	      if (precision_given)
@@ -4630,29 +4628,18 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
 		}
 	      else if (conversion == 'd' || conversion == 'i')
 		{
-		  /* For float, maybe we should use "%1.0f"
-		     instead so it also works for values outside
-		     the integer range.  */
-		  printmax_t x;
 		  if (INTEGERP (arg))
-		    x = XINT (arg);
+		    {
+		      printmax_t x = XINT (arg);
+		      sprintf_bytes = sprintf (sprintf_buf, convspec, prec, x);
+		    }
 		  else
 		    {
-		      double d = XFLOAT_DATA (arg);
-		      if (d < 0)
-			{
-			  x = TYPE_MINIMUM (printmax_t);
-			  if (x < d)
-			    x = d;
-			}
-		      else
-			{
-			  x = TYPE_MAXIMUM (printmax_t);
-			  if (d < x)
-			    x = d;
-			}
+		      strcpy (f - pMlen - 1, "f");
+		      prec = 0;
+		      double x = XFLOAT_DATA (arg);
+		      sprintf_bytes = sprintf (sprintf_buf, convspec, prec, x);
 		    }
-		  sprintf_bytes = sprintf (sprintf_buf, convspec, prec, x);
 		}
 	      else
 		{
@@ -4663,22 +4650,18 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args, bool message)
 		  else
 		    {
 		      double d = XFLOAT_DATA (arg);
-		      if (d < 0)
-			x = 0;
-		      else
-			{
-			  x = TYPE_MAXIMUM (uprintmax_t);
-			  if (d < x)
-			    x = d;
-			}
+		      if (! (0 <= d && d < TYPE_MAXIMUM (uprintmax_t)))
+			xsignal1 (Qoverflow_error, arg);
+		      x = d;
 		    }
 		  sprintf_bytes = sprintf (sprintf_buf, convspec, prec, x);
 		}
 
 	      /* Now the length of the formatted item is known, except it omits
 		 padding and excess precision.  Deal with excess precision
-		 first.  This happens only when the format specifies
-		 ridiculously large precision.  */
+		 first.  This happens when the format specifies
+		 ridiculously large precision, or when %d or %i has
+		 nonzero precision and formats a float.  */
 	      ptrdiff_t excess_precision
 		= precision_given ? precision - prec : 0;
 	      ptrdiff_t leading_zeros = 0, trailing_zeros = 0;
diff --git a/src/lread.c b/src/lread.c
index d009bd0cd2..9500ed8341 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -3794,6 +3794,20 @@ string_to_number (char const *string, int base, bool ignore_trailing)
   if (! value)
     value = atof (string + signedp);
 
+  if (! float_syntax)
+    {
+      /* Check that converting the integer-format STRING to a
+	 floating-point number does not lose info.  See Bug#30408.  */
+      char const *bp = string + signedp;
+      while (*bp == '0')
+	bp++;
+      char checkbuf[DBL_MAX_10_EXP + 2];
+      int checkbuflen = sprintf (checkbuf, "%.0f", value);
+      if (! (cp - bp - !!(state & DOT_CHAR) == checkbuflen
+	     && memcmp (bp, checkbuf, checkbuflen) == 0))
+	xsignal1 (Qoverflow_error, build_string (string));
+    }
+
   return make_float (negative ? -value : value);
 }
 
-- 
2.14.3

next prev parent reply	other threads:[~2018-02-18 20:04 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-18  1:27 Checking for loss of information on integer conversion Paul Eggert
     [not found] ` <83y3jq9q4m.fsf@gnu.org>
2018-02-18 20:04   ` Paul Eggert [this message]
2018-02-18 20:24     ` Eli Zaretskii
2018-02-18 21:52     ` bug#30408: " Drew Adams
2018-02-18 22:31 ` Juliusz Chroboczek
2018-02-18 22:41   ` Stefan Monnier
2018-02-18 23:46     ` Juliusz Chroboczek
2018-02-19  1:47       ` Stefan Monnier
2018-02-19  2:22         ` Paul Eggert
2018-02-19  3:20           ` Drew Adams
2018-02-19 15:05       ` Richard Stallman
2018-02-22 16:31         ` Juliusz Chroboczek
2018-02-22 17:01           ` Eli Zaretskii
2018-02-22 19:31             ` Stefan Monnier
2018-02-23  9:49           ` Richard Stallman
2018-02-19  6:03   ` John Wiegley

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:e692ee1cc dfblob:252aafd8f dfblob:8db638e5e dfblob:36cbcf650
dfblob:96bb271b2 dfblob:d26549ddb dfblob:d009bd0cd dfblob:9500ed834 )
 OR (
bs:"Avoid losing info when converting integers" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=74ac7b77-a756-95a9-b490-6952cf106f21@cs.ucla.edu \
    --to=eggert@cs.ucla.edu \
    --cc=30408@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).