From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!.POSTED!not-for-mail
From: Paul Eggert <eggert@cs.ucla.edu>
Newsgroups: gmane.emacs.bugs
Subject: bug#30408: Checking for loss of information on integer conversion
Date: Tue, 27 Mar 2018 16:19:21 -0700
Organization: UCLA Computer Science Department
Message-ID: <f1e3e23f-7ac2-9840-403b-e4d76f1ed83c@cs.ucla.edu>
References: <7432641a-cedc-942c-d75c-0320fce5ba39@cs.ucla.edu>
	<83y3jq9q4m.fsf@gnu.org>
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="------------C8997759E81B0890A121F27F"
X-Trace: blaine.gmane.org 1522192695 23229 195.159.176.226 (27 Mar 2018 23:18:15 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Tue, 27 Mar 2018 23:18:15 +0000 (UTC)
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
	Thunderbird/52.6.0
To: Eli Zaretskii <eliz@gnu.org>, 30408@debbugs.gnu.org
Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Mar 28 01:18:11 2018
Return-path: <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geb-bug-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by blaine.gmane.org with esmtp (Exim 4.84_2)
	(envelope-from <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>)
	id 1f0xqy-0005uH-2Y
	for geb-bug-gnu-emacs@m.gmane.org; Wed, 28 Mar 2018 01:18:08 +0200
Original-Received: from localhost ([::1]:36517 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>)
	id 1f0xt1-00034X-FK
	for geb-bug-gnu-emacs@m.gmane.org; Tue, 27 Mar 2018 19:20:15 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39808)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1f0xst-0002jn-P7
	for bug-gnu-emacs@gnu.org; Tue, 27 Mar 2018 19:20:09 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1f0xso-0001Eq-Oi
	for bug-gnu-emacs@gnu.org; Tue, 27 Mar 2018 19:20:07 -0400
Original-Received: from debbugs.gnu.org ([208.118.235.43]:48472)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <Debian-debbugs@debbugs.gnu.org>)
	id 1f0xso-0001Em-JS
	for bug-gnu-emacs@gnu.org; Tue, 27 Mar 2018 19:20:02 -0400
Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1f0xso-0007O0-BU
	for bug-gnu-emacs@gnu.org; Tue, 27 Mar 2018 19:20:02 -0400
X-Loop: help-debbugs@gnu.org
Resent-From: Paul Eggert <eggert@cs.ucla.edu>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@gnu.org
Resent-Date: Tue, 27 Mar 2018 23:20:02 +0000
Resent-Message-ID: <handler.30408.B30408.152219277328348@debbugs.gnu.org>
Resent-Sender: help-debbugs@gnu.org
X-GNU-PR-Message: followup 30408
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: 
Original-Received: via spool by 30408-submit@debbugs.gnu.org id=B30408.152219277328348
	(code B ref 30408); Tue, 27 Mar 2018 23:20:02 +0000
Original-Received: (at 30408) by debbugs.gnu.org; 27 Mar 2018 23:19:33 +0000
Original-Received: from localhost ([127.0.0.1]:56369 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1f0xsK-0007NA-Ff
	for submit@debbugs.gnu.org; Tue, 27 Mar 2018 19:19:32 -0400
Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:36094)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <eggert@cs.ucla.edu>) id 1f0xsH-0007Mw-Nz
	for 30408@debbugs.gnu.org; Tue, 27 Mar 2018 19:19:30 -0400
Original-Received: from localhost (localhost [127.0.0.1])
	by zimbra.cs.ucla.edu (Postfix) with ESMTP id 9DF84161446;
	Tue, 27 Mar 2018 16:19:23 -0700 (PDT)
Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1])
	by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032)
	with ESMTP id LVCKYMWmqoys; Tue, 27 Mar 2018 16:19:22 -0700 (PDT)
Original-Received: from localhost (localhost [127.0.0.1])
	by zimbra.cs.ucla.edu (Postfix) with ESMTP id 3ADFB1614F9;
	Tue, 27 Mar 2018 16:19:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu
Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1])
	by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026)
	with ESMTP id n-3JNygiUATs; Tue, 27 Mar 2018 16:19:22 -0700 (PDT)
Original-Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200])
	by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 1934D161446;
	Tue, 27 Mar 2018 16:19:22 -0700 (PDT)
In-Reply-To: <83y3jq9q4m.fsf@gnu.org>
Content-Language: en-US
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 208.118.235.43
X-BeenThere: bug-gnu-emacs@gnu.org
List-Id: "Bug reports for GNU Emacs,
	the Swiss army knife of text editors" <bug-gnu-emacs.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/bug-gnu-emacs/>
List-Post: <mailto:bug-gnu-emacs@gnu.org>
List-Help: <mailto:bug-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-gnu-emacs>,
	<mailto:bug-gnu-emacs-request@gnu.org?subject=subscribe>
Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org
Original-Sender: "bug-gnu-emacs"
	<bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.emacs.bugs:144663
Archived-At: <http://permalink.gmane.org/gmane.emacs.bugs/144663>

This is a multi-part message in MIME format.
--------------C8997759E81B0890A121F27F
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

Here's a patch that I hope addresses the main problem. The basic idea is 
to avoid the confusion exemplified in Bug#30408 by changing Emacs so 
that it ordinarily signals an error if it reads a program that contains 
an integer literal that is out of fixnum range. However, if the 
out-of-range literal is followed by '.' then Emacs continues to silently 
convert it to floating-point; this is intended as an escape hatch for 
any programs that need the old behavior (I expect this'll be rare). 
Thus, on 32-bit Emacs, plain '536870912' in a program causes Emacs to 
signal an overflow while loading the program, whereas '536870912.' is 
treated as a floating-point number as before. (On 64-bit Emacs, the same 
two literals are both integers, as before.)

Unlike my previous proposal, this patch does not affect the behavior of 
string-to-integer. As I understand it, that was a primary source of 
qualms about the previous proposal.

I've tested this on both 32- and 64-bit Emacs on master. This patch has 
helped me to find a couple of integer portability bugs which I already 
fixed on master.

--------------C8997759E81B0890A121F27F
Content-Type: text/x-patch;
 name="0001-Lisp-reader-now-checks-for-integer-overflow.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="0001-Lisp-reader-now-checks-for-integer-overflow.patch"

>From 94b7a1a171de3113cd5250315dee7bdef5f51890 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Tue, 27 Mar 2018 15:48:47 -0700
Subject: [PATCH] Lisp reader now checks for integer overflow

* doc/lispref/numbers.texi (Integer Basics), etc/NEWS:
Document this.
* src/lisp.h (S2N_IGNORE_TRAILING, S2N_OVERFLOW_TO_FLOAT):
New constants.
* src/lread.c (string_to_number): Change trailing bool arg to
integer argument with flags, to support S2N_OVERFLOW_TO_FLOAT.
All uses changed.
* test/src/editfns-tests.el (read-large-integer): New test.
---
 doc/lispref/numbers.texi  | 14 ++++++++++----
 etc/NEWS                  |  7 +++++++
 src/data.c                |  9 ++++-----
 src/lisp.h                |  3 ++-
 src/lread.c               | 35 ++++++++++++++++++++---------------
 src/process.c             |  2 +-
 test/src/editfns-tests.el | 22 ++++++++++++++++++----
 7 files changed, 62 insertions(+), 30 deletions(-)

diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi
index c2cb6651d4..2fed2b642f 100644
--- a/doc/lispref/numbers.texi
+++ b/doc/lispref/numbers.texi
@@ -55,16 +55,13 @@ Integer Basics
 
   The Lisp reader reads an integer as a nonempty sequence
 of decimal digits with optional initial sign and optional
-final period.  A decimal integer that is out of the
-Emacs range is treated as a floating-point number.
+final period.
 
 @example
  1               ; @r{The integer 1.}
  1.              ; @r{The integer 1.}
 +1               ; @r{Also the integer 1.}
 -1               ; @r{The integer @minus{}1.}
- 9000000000000000000
-                 ; @r{The floating-point number 9e18.}
  0               ; @r{The integer 0.}
 -0               ; @r{The integer 0.}
 @end example
@@ -94,6 +91,15 @@ Integer Basics
 #24r1k @result{} 44
 @end example
 
+  If an integer is outside the Emacs range, the Lisp reader ordinarily
+signals an overflow.  However, if a too-large plain integer ends in a
+period, the Lisp reader treats it as a floating-point number instead.
+This lets an Emacs Lisp program specify a large integer that is
+quietly approximated by a floating-point number on machines with
+limited word width.  For example, @samp{536870912.} is a
+floating-point number if Emacs integers are only 30 bits wide and is
+an integer otherwise.
+
   To understand how various functions work on integers, especially the
 bitwise operators (@pxref{Bitwise Operations}), it is often helpful to
 view the numbers in their binary form.
diff --git a/etc/NEWS b/etc/NEWS
index fd1d04b8a0..cb74f512cc 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -349,6 +349,13 @@ as new-style, bind the new variable 'force-new-style-backquotes' to t.
 integer, Emacs now signals an error if the number is too large for the
 implementation to format (Bug#30408).
 
++++
+** The Lisp reader now signals an overflow for plain decimal integers
+that do not end in '.' and are outside Emacs range.  Formerly the Lisp
+reader silently converted them to floating-point numbers, and signaled
+overflow only for integers with a radix that are outside machine range
+(Bug#30408).
+
 ---
 ** Some functions and variables obsolete since Emacs 22 have been removed:
 archive-mouse-extract, assoc-ignore-case, assoc-ignore-representation,
diff --git a/src/data.c b/src/data.c
index a7fab1ef58..6f23a26757 100644
--- a/src/data.c
+++ b/src/data.c
@@ -2716,9 +2716,7 @@ present, base 10 is used.  BASE must be between 2 and 16 (inclusive).
 If the base used is not 10, STRING is always parsed as an integer.  */)
   (register Lisp_Object string, Lisp_Object base)
 {
-  register char *p;
-  register int b;
-  Lisp_Object val;
+  int b;
 
   CHECK_STRING (string);
 
@@ -2732,11 +2730,12 @@ If the base used is not 10, STRING is always parsed as an integer.  */)
       b = XINT (base);
     }
 
-  p = SSDATA (string);
+  char *p = SSDATA (string);
   while (*p == ' ' || *p == '\t')
     p++;
 
-  val = string_to_number (p, b, true);
+  int flags = S2N_IGNORE_TRAILING | S2N_OVERFLOW_TO_FLOAT;
+  Lisp_Object val = string_to_number (p, b, flags);
   return NILP (val) ? make_number (0) : val;
 }
 
diff --git a/src/lisp.h b/src/lisp.h
index f0c0c5a14a..b931d23bf3 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -3899,7 +3899,8 @@ LOADHIST_ATTACH (Lisp_Object x)
 }
 extern int openp (Lisp_Object, Lisp_Object, Lisp_Object,
                   Lisp_Object *, Lisp_Object, bool);
-extern Lisp_Object string_to_number (char const *, int, bool);
+enum { S2N_IGNORE_TRAILING = 1, S2N_OVERFLOW_TO_FLOAT = 2 };
+extern Lisp_Object string_to_number (char const *, int, int);
 extern void map_obarray (Lisp_Object, void (*) (Lisp_Object, Lisp_Object),
                          Lisp_Object);
 extern void dir_warning (const char *, Lisp_Object);
diff --git a/src/lread.c b/src/lread.c
index 381f9cf20c..a774524ee4 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -2339,7 +2339,7 @@ character_name_to_code (char const *name, ptrdiff_t name_len)
      monstrosities like "U+-0000".  */
   Lisp_Object code
     = (name[0] == 'U' && name[1] == '+'
-       ? string_to_number (name + 1, 16, false)
+       ? string_to_number (name + 1, 16, 0)
        : call2 (Qchar_from_name, make_unibyte_string (name, name_len), Qt));
 
   if (! RANGED_INTEGERP (0, code, MAX_UNICODE_CHAR)
@@ -2693,7 +2693,7 @@ read_integer (Lisp_Object readcharfun, EMACS_INT radix)
       invalid_syntax (buf);
     }
 
-  return string_to_number (buf, radix, false);
+  return string_to_number (buf, radix, 0);
 }
 
 
@@ -3502,7 +3502,7 @@ read1 (Lisp_Object readcharfun, int *pch, bool first_in_list)
 
 	if (!quoted && !uninterned_symbol)
 	  {
-	    Lisp_Object result = string_to_number (read_buffer, 10, false);
+	    Lisp_Object result = string_to_number (read_buffer, 10, 0);
 	    if (! NILP (result))
 	      return unbind_to (count, result);
 	  }
@@ -3667,16 +3667,17 @@ substitute_in_interval (INTERVAL interval, void *arg)
 }
 
 
-/* Convert STRING to a number, assuming base BASE.  Return a fixnum if
-   STRING has integer syntax and fits in a fixnum, else return the
-   nearest float if STRING has either floating point or integer syntax
-   and BASE is 10, else return nil.  If IGNORE_TRAILING, consider just
-   the longest prefix of STRING that has valid floating point syntax.
-   Signal an overflow if BASE is not 10 and the number has integer
-   syntax but does not fit.  */
+/* Convert STRING to a number, assuming base BASE.  When STRING has
+   floating point syntax and BASE is 10, return a nearest float.  When
+   STRING has integer syntax, return a fixnum if the integer fits, and
+   signal an overflow otherwise (unless BASE is 10 and STRING ends in
+   period or FLAGS & S2N_OVERFLOW_TO_FLOAT is nonzero; in this case,
+   return a nearest float instead).  Otherwise, return nil.  If FLAGS
+   & S2N_IGNORE_TRAILING is nonzero, consider just the longest prefix
+   of STRING that has valid syntax.  */
 
 Lisp_Object
-string_to_number (char const *string, int base, bool ignore_trailing)
+string_to_number (char const *string, int base, int flags)
 {
   char const *cp = string;
   bool float_syntax = 0;
@@ -3759,9 +3760,10 @@ string_to_number (char const *string, int base, bool ignore_trailing)
 		      || (state & ~INTOVERFLOW) == (LEAD_INT|E_EXP));
     }
 
-  /* Return nil if the number uses invalid syntax.  If IGNORE_TRAILING, accept
-     any prefix that matches.  Otherwise, the entire string must match.  */
-  if (! (ignore_trailing
+  /* Return nil if the number uses invalid syntax.  If FLAGS &
+     S2N_IGNORE_TRAILING, accept any prefix that matches.  Otherwise,
+     the entire string must match.  */
+  if (! (flags & S2N_IGNORE_TRAILING
 	 ? ((state & LEAD_INT) != 0 || float_syntax)
 	 : (!*cp && ((state & ~(INTOVERFLOW | DOT_CHAR)) == LEAD_INT
 		     || float_syntax))))
@@ -3776,7 +3778,7 @@ string_to_number (char const *string, int base, bool ignore_trailing)
 	  /* Unfortunately there's no simple and accurate way to convert
 	     non-base-10 numbers that are out of C-language range.  */
 	  if (base != 10)
-	    xsignal1 (Qoverflow_error, build_string (string));
+	    flags = 0;
 	}
       else if (n <= (negative ? -MOST_NEGATIVE_FIXNUM : MOST_POSITIVE_FIXNUM))
 	{
@@ -3785,6 +3787,9 @@ string_to_number (char const *string, int base, bool ignore_trailing)
 	}
       else
 	value = n;
+
+      if (! (state & DOT_CHAR) && ! (flags & S2N_OVERFLOW_TO_FLOAT))
+	xsignal1 (Qoverflow_error, build_string (string));
     }
 
   /* Either the number uses float syntax, or it does not fit into a fixnum.
diff --git a/src/process.c b/src/process.c
index 2aaa238f60..ed2cab7b51 100644
--- a/src/process.c
+++ b/src/process.c
@@ -6842,7 +6842,7 @@ SIGCODE may be an integer, or a symbol whose name is a signal name.  */)
     {
       Lisp_Object tem = Fget_process (process);
       if (NILP (tem))
-	tem = string_to_number (SSDATA (process), 10, true);
+	tem = string_to_number (SSDATA (process), 10, S2N_OVERFLOW_TO_FLOAT);
       process = tem;
     }
   else if (!NUMBERP (process))
diff --git a/test/src/editfns-tests.el b/test/src/editfns-tests.el
index 6e1f730166..442ad08937 100644
--- a/test/src/editfns-tests.el
+++ b/test/src/editfns-tests.el
@@ -142,27 +142,41 @@ transpose-test-get-byte-positions
   (should (string-equal (format "%#05X" #x10) "0X010"))
   (should (string-equal (format "%#04x" 0) "0000")))
 
-;;; Test Bug#30408.
+
+;;; Tests for Bug#30408.
+
 (ert-deftest format-%d-large-float ()
   (should (string-equal (format "%d" 18446744073709551616.0)
                         "18446744073709551616"))
   (should (string-equal (format "%d" -18446744073709551616.0)
                         "-18446744073709551616")))
 
-;;; Another test for Bug#30408.
 ;;; Perhaps Emacs will be improved someday to return the correct
 ;;; answer for positive numbers instead of overflowing; in
-;;; that case this test will need to be changed.  In the meantime make
+;;; that case these tests will need to be changed.  In the meantime make
 ;;; sure Emacs is reporting the overflow correctly.
 (ert-deftest format-%x-large-float ()
   (should-error (format "%x" 18446744073709551616.0)
                 :type 'overflow-error))
+(ert-deftest read-large-integer ()
+  (should-error (read (format "%d0" most-negative-fixnum))
+                :type 'overflow-error)
+  (should-error (read (format "%+d" (* -8.0 most-negative-fixnum)))
+                :type 'overflow-error)
+  (should-error (read (substring (format "%d" most-negative-fixnum) 1))
+                :type 'overflow-error)
+  (should-error (read (format "#x%x" most-negative-fixnum))
+                :type 'overflow-error)
+  (should-error (read (format "#o%o" most-negative-fixnum))
+                :type 'overflow-error)
+  (should-error (read (format "#32rG%x" most-positive-fixnum))
+                :type 'overflow-error))
 
-;;; Another test for Bug#30408.
 (ert-deftest format-%o-invalid-float ()
   (should-error (format "%o" -1e-37)
                 :type 'overflow-error))
 
+
 ;;; Check format-time-string with various TZ settings.
 ;;; Use only POSIX-compatible TZ values, since the tests should work
 ;;; even if tzdb is not in use.
-- 
2.14.3


--------------C8997759E81B0890A121F27F--