From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.devel Subject: Checking for loss of information on integer conversion Date: Sat, 17 Feb 2018 17:27:37 -0800 Organization: UCLA Computer Science Department Message-ID: <7432641a-cedc-942c-d75c-0320fce5ba39@cs.ucla.edu> Reply-To: 30408@debbugs.gnu.org NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------360FE4F79F23E6F3850D8D6A" X-Trace: blaine.gmane.org 1518917211 18774 195.159.176.226 (18 Feb 2018 01:26:51 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 18 Feb 2018 01:26:51 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 To: Emacs Development Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Feb 18 02:26:46 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1enDkP-0003rF-Px for ged-emacs-devel@m.gmane.org; Sun, 18 Feb 2018 02:26:34 +0100 Original-Received: from localhost ([::1]:59925 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1enDmR-0000u7-R8 for ged-emacs-devel@m.gmane.org; Sat, 17 Feb 2018 20:28:39 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:33391) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1enDlZ-0000tF-Fq for Emacs-devel@gnu.org; Sat, 17 Feb 2018 20:27:47 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1enDlW-0003GG-7I for Emacs-devel@gnu.org; Sat, 17 Feb 2018 20:27:45 -0500 Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:50594) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1enDlV-0003Fc-Td for Emacs-devel@gnu.org; Sat, 17 Feb 2018 20:27:42 -0500 Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id D42C41615B2 for ; Sat, 17 Feb 2018 17:27:39 -0800 (PST) Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id WQe0htStAbk1 for ; Sat, 17 Feb 2018 17:27:38 -0800 (PST) Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 49D051615E5 for ; Sat, 17 Feb 2018 17:27:38 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 3cHUJX8ML0b1 for ; Sat, 17 Feb 2018 17:27:38 -0800 (PST) Original-Received: from [192.168.1.9] (unknown [47.154.30.119]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 299BC1615B2 for ; Sat, 17 Feb 2018 17:27:38 -0800 (PST) Content-Language: en-US X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 131.179.128.68 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:222872 Archived-At: This is a multi-part message in MIME format. --------------360FE4F79F23E6F3850D8D6A Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Bug#30408 reminded me of a problem that's bitten me before, which is that= Emacs=20 Lisp reading and printing sometimes loses information when converting fro= m=20 integers to strings or vice versa, and this loss of information can lead = to real=20 problems. Without going to bignums we can't fix this problem nicely in ge= neral;=20 however, Emacs can do a better job of signaling an error when information= is=20 lost in the current implementation. I've proposed a patch here: https://debbugs.gnu.org/30408#16 and am attaching it to this email for convenience. Briefly, the patch is twofold. First, it causes calls like (format "%d"=20 18446744073709551616) to return the mathematically-correct value=20 "18446744073709551616" instead of silently returning the=20 mathematically-incorrect value "9223372036854775807" as they do now. If t= he=20 function cannot return the correct value due to an implementation limit -= - e.g.,=20 (format "%x" 18446744073709551616) -- the patch causes the function to si= gnal an=20 overflow. Second, although Emacs still reads large integers like 184467440737095516= 16 as=20 if they were floating-point, it now signals an error if information is lo= st in=20 the process. For example, the number 18446744073709551615 now causes the = reader=20 to signal an error, since it cannot be represented exactly either as a fi= xnum or=20 as a floating-point number. If you want inexact representation, you can a= ppend=20 ".0" or "e0" to the integer. As these are incompatible changes to Emacs I thought I'd mention them on = this list. Another possibility would be for Emacs to signal an error when reading an= y=20 integer that does not fit in fixnum bounds. That would be a bigger change= , though. --------------360FE4F79F23E6F3850D8D6A Content-Type: text/x-patch; name="0001-Avoid-losing-info-when-converting-integers.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0001-Avoid-losing-info-when-converting-integers.patch" =46rom e1865be990e1a520feddc07507a71916d097d633 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sat, 17 Feb 2018 16:45:17 -0800 Subject: [PATCH] Avoid losing info when converting integers This fixes some glitches with large integers (Bug#30408). * doc/lispref/numbers.texi (Integer Basics): Say that decimal integers out of fixnum range must be representable exactly as floating-point. * etc/NEWS: Mention this. * src/data.c (syms_of_data): Add Qinexact_error. * src/editfns.c (styled_format): Use %.0f when formatting %d or %i values outside machine integer range, to avoid losing info. Signal an error for %o or %x values that are too large to be formatted, to avoid losing info. * src/lread.c (string_to_number): When converting an integer-format string to floating-point, signal an error if info is lost. --- doc/lispref/numbers.texi | 8 +++-- etc/NEWS | 9 +++++ src/data.c | 1 + src/editfns.c | 93 ++++++++++++++++++++----------------------= ------ src/lread.c | 14 ++++++++ 5 files changed, 67 insertions(+), 58 deletions(-) diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi index e692ee1cc2..252aafd8fd 100644 --- a/doc/lispref/numbers.texi +++ b/doc/lispref/numbers.texi @@ -53,9 +53,11 @@ Integer Basics chapter assume the minimum integer width of 30 bits. @cindex overflow =20 - The Lisp reader reads an integer as a sequence of digits with optional= -initial sign and optional final period. An integer that is out of the -Emacs range is treated as a floating-point number. + The Lisp reader can read an integer as a nonempty sequence of +decimal digits with optional initial sign and optional final period. +A decimal integer that is out of the Emacs range is treated as +floating-point if it can be represented exactly as a floating-point +number. =20 @example 1 ; @r{The integer 1.} diff --git a/etc/NEWS b/etc/NEWS index 8db638e5ed..36cbcf6500 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -248,6 +248,12 @@ as new-style, bind the new variable 'force-new-style= -backquotes' to t. 'cl-struct-define' whose name clashes with a builtin type (e.g., 'integer' or 'hash-table') now signals an error. =20 +** When formatting a floating-point number as an octal or hexadecimal +integer, Emacs now signals an error if the number is too large for the +implementation to format. When reading an integer outside Emacs +fixnum range, Emacs now signals an error if the integer cannot be +represented exactly as a floating-point number. See Bug#30408. + =0C * Lisp Changes in Emacs 27.1 =20 @@ -289,6 +295,9 @@ remote systems, which support this check. If the optional third argument is non-nil, 'make-string' will produce a multibyte string even if its second argument is an ASCII character. =20 +** (format "%d" X) no longer mishandles floating-point X values that +do not fit in a machine integer (Bug#30408). + ** New JSON parsing and serialization functions 'json-serialize', 'json-insert', 'json-parse-string', and 'json-parse-buffer'. These are implemented in C using the Jansson library. diff --git a/src/data.c b/src/data.c index 72abfefb01..8856583f13 100644 --- a/src/data.c +++ b/src/data.c @@ -3729,6 +3729,7 @@ syms_of_data (void) DEFSYM (Qrange_error, "range-error"); DEFSYM (Qdomain_error, "domain-error"); DEFSYM (Qsingularity_error, "singularity-error"); + DEFSYM (Qinexact_error, "inexact-error"); DEFSYM (Qoverflow_error, "overflow-error"); DEFSYM (Qunderflow_error, "underflow-error"); =20 diff --git a/src/editfns.c b/src/editfns.c index 96bb271b2d..d26549ddb8 100644 --- a/src/editfns.c +++ b/src/editfns.c @@ -4563,32 +4563,30 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args= , bool message) and with pM inserted for integer formats. At most two flags F can be specified at once. */ char convspec[sizeof "%FF.*d" + max (INT_AS_LDBL, pMlen)]; - { - char *f =3D convspec; - *f++ =3D '%'; - /* MINUS_FLAG and ZERO_FLAG are dealt with later. */ - *f =3D '+'; f +=3D plus_flag; - *f =3D ' '; f +=3D space_flag; - *f =3D '#'; f +=3D sharp_flag; - *f++ =3D '.'; - *f++ =3D '*'; - if (float_conversion) - { - if (INT_AS_LDBL) - { - *f =3D 'L'; - f +=3D INTEGERP (arg); - } - } - else if (conversion !=3D 'c') - { - memcpy (f, pMd, pMlen); - f +=3D pMlen; - zero_flag &=3D ! precision_given; - } - *f++ =3D conversion; - *f =3D '\0'; - } + char *f =3D convspec; + *f++ =3D '%'; + /* MINUS_FLAG and ZERO_FLAG are dealt with later. */ + *f =3D '+'; f +=3D plus_flag; + *f =3D ' '; f +=3D space_flag; + *f =3D '#'; f +=3D sharp_flag; + *f++ =3D '.'; + *f++ =3D '*'; + if (float_conversion) + { + if (INT_AS_LDBL) + { + *f =3D 'L'; + f +=3D INTEGERP (arg); + } + } + else if (conversion !=3D 'c') + { + memcpy (f, pMd, pMlen); + f +=3D pMlen; + zero_flag &=3D ! precision_given; + } + *f++ =3D conversion; + *f =3D '\0'; =20 int prec =3D -1; if (precision_given) @@ -4630,29 +4628,18 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args= , bool message) } else if (conversion =3D=3D 'd' || conversion =3D=3D 'i') { - /* For float, maybe we should use "%1.0f" - instead so it also works for values outside - the integer range. */ - printmax_t x; if (INTEGERP (arg)) - x =3D XINT (arg); + { + printmax_t x =3D XINT (arg); + sprintf_bytes =3D sprintf (sprintf_buf, convspec, prec, x); + } else { - double d =3D XFLOAT_DATA (arg); - if (d < 0) - { - x =3D TYPE_MINIMUM (printmax_t); - if (x < d) - x =3D d; - } - else - { - x =3D TYPE_MAXIMUM (printmax_t); - if (d < x) - x =3D d; - } + strcpy (f - pMlen - 1, "f"); + prec =3D 0; + double x =3D XFLOAT_DATA (arg); + sprintf_bytes =3D sprintf (sprintf_buf, convspec, prec, x); } - sprintf_bytes =3D sprintf (sprintf_buf, convspec, prec, x); } else { @@ -4663,22 +4650,18 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args= , bool message) else { double d =3D XFLOAT_DATA (arg); - if (d < 0) - x =3D 0; - else - { - x =3D TYPE_MAXIMUM (uprintmax_t); - if (d < x) - x =3D d; - } + if (! (0 <=3D d && d < TYPE_MAXIMUM (uprintmax_t))) + xsignal1 (Qoverflow_error, arg); + x =3D d; } sprintf_bytes =3D sprintf (sprintf_buf, convspec, prec, x); } =20 /* Now the length of the formatted item is known, except it omits= padding and excess precision. Deal with excess precision - first. This happens only when the format specifies - ridiculously large precision. */ + first. This happens when the format specifies + ridiculously large precision, or when %d or %i has + nonzero precision and formats a float. */ ptrdiff_t excess_precision =3D precision_given ? precision - prec : 0; ptrdiff_t leading_zeros =3D 0, trailing_zeros =3D 0; diff --git a/src/lread.c b/src/lread.c index d009bd0cd2..cfeaac8030 100644 --- a/src/lread.c +++ b/src/lread.c @@ -3794,6 +3794,20 @@ string_to_number (char const *string, int base, bo= ol ignore_trailing) if (! value) value =3D atof (string + signedp); =20 + if (! float_syntax) + { + /* Check that converting the integer-format STRING to a + floating-point number does not lose info. See Bug#30408. */ + char const *bp =3D string + signedp; + while (*bp =3D=3D '0') + bp++; + char checkbuf[DBL_MAX_10_EXP + 2]; + int checkbuflen =3D sprintf (checkbuf, "%.0f", value); + if (! (cp - bp - !!(state & DOT_CHAR) =3D=3D checkbuflen + && memcmp (bp, checkbuf, checkbuflen) =3D=3D 0)) + xsignal1 (Qinexact_error, build_string (string)); + } + return make_float (negative ? -value : value); } =20 --=20 2.14.3 --------------360FE4F79F23E6F3850D8D6A--