From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.devel Subject: Re: Checking for loss of information on integer conversion Date: Sun, 18 Feb 2018 12:04:20 -0800 Organization: UCLA Computer Science Department Message-ID: <74ac7b77-a756-95a9-b490-6952cf106f21@cs.ucla.edu> References: <7432641a-cedc-942c-d75c-0320fce5ba39@cs.ucla.edu> <83y3jq9q4m.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------2033A6E1E168A9BF2F76BBE8" X-Trace: blaine.gmane.org 1518984174 5151 195.159.176.226 (18 Feb 2018 20:02:54 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 18 Feb 2018 20:02:54 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 Cc: 30408@debbugs.gnu.org, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Feb 18 21:02:50 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1enVAY-0000WY-Dq for ged-emacs-devel@m.gmane.org; Sun, 18 Feb 2018 21:02:42 +0100 Original-Received: from localhost ([::1]:46393 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1enVCY-00008t-TW for ged-emacs-devel@m.gmane.org; Sun, 18 Feb 2018 15:04:46 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56428) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1enVCK-00008S-Dh for emacs-devel@gnu.org; Sun, 18 Feb 2018 15:04:34 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1enVCI-0006yq-No for emacs-devel@gnu.org; Sun, 18 Feb 2018 15:04:32 -0500 Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:56198) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1enVCE-0006px-Og; Sun, 18 Feb 2018 15:04:27 -0500 Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 0185A1615B2; Sun, 18 Feb 2018 12:04:25 -0800 (PST) Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id JL1c4anzPgyl; Sun, 18 Feb 2018 12:04:23 -0800 (PST) Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 7D16A1615E5; Sun, 18 Feb 2018 12:04:23 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 8sWl-4VMjwLy; Sun, 18 Feb 2018 12:04:23 -0800 (PST) Original-Received: from [192.168.1.9] (unknown [47.154.30.119]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 53CDE1615B2; Sun, 18 Feb 2018 12:04:23 -0800 (PST) In-Reply-To: <83y3jq9q4m.fsf@gnu.org> Content-Language: en-US X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 131.179.128.68 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:222881 Archived-At: This is a multi-part message in MIME format. --------------2033A6E1E168A9BF2F76BBE8 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Eli Zaretskii wrote: > Emacs Lisp is not used to write software that controls > aircraft and spaceships Actually, I maintain Emacs Lisp code that controls timestamps used in air= craft=20 and spaceships. I'm not saying that Emacs itself runs the aircraft and=20 spaceships, but it definitely is used to develop software and data used t= here.=20 As luck would have it, I'm currently engaged in an email thread about tim= e=20 transfer between Earth and Mars (yes, this is really a thing and people a= re=20 trying to do it with millisecond precision) that is related to a project = where I=20 regularly use Emacs Lisp. See the thread containing this message: https://mm.icann.org/pipermail/tz/2018-February/026257.html > More generally, why signaling an error by default in this case is a > good idea? ... That would > be similar to behavior of equivalent constructs in C programs Sure, and C compilers typically issue diagnostics for situations similar = to=20 what's in Bug#30408. For example, for this C program: int a =3D 18446744073709553664; GCC issues a diagnostic, whereas for the similar Emacs Lisp program: (setq b 18446744073709553664) Emacs silently substitutes a number that is off by 2048. It's the latter=20 behavior that causes the sort of problem seen in Bug#30408. When people write a floating-point number they naturally expect it to hav= e some=20 fuzz. But when they write an integer they expect it to be represented exa= ctly,=20 and not to be rounded. Emacs already reports an overflow error for the=20 following code that attempts to use the same mathematical value: (setq c #x10000000000000800) so it's not like it would be a huge change to do something similar for de= cimal=20 integers. When Emacs was originally developed, its integers were typically 28 bits = (not=20 counting sign) and floating-point numbers could typically represent integ= ers=20 exactly up to 53 bits (not counting sign), so the old Emacs behavior was=20 somewhat defensible: although it didn't do bignums, at least it could rep= resent=20 integers nearly twice as wide as fixnums. However, nowadays Emacs integer= s=20 typically have more precision than floating point numbers, and the old Em= acs=20 behavior is more likely to lead to counterintuitive results such as those= =20 described in Bug#30408. On thinking about it in the light of your comments, I suppose it's confus= ing=20 that the proposal used a new signal 'inexact', whereas it should just sig= nal=20 overflow. After all, that's what string_to_number already does for out-of= -range=20 hexadecimal integers. That issue is easily fixed. Revised patch attached. --------------2033A6E1E168A9BF2F76BBE8 Content-Type: text/x-patch; name="0001-Avoid-losing-info-when-converting-integers.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0001-Avoid-losing-info-when-converting-integers.patch" =46rom 49895e55ed7ac41dbf3752ab534cd665ef45ee71 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sun, 18 Feb 2018 11:37:22 -0800 Subject: [PATCH] Avoid losing info when converting integers This fixes some glitches with large integers (Bug#30408). * doc/lispref/numbers.texi (Integer Basics): Say that decimal integers out of fixnum range must be representable exactly as floating-point. * etc/NEWS: Mention this. * src/editfns.c (styled_format): Use %.0f when formatting %d or %i values outside machine integer range, to avoid losing info. Signal an error for %o or %x values that are too large to be formatted, to avoid losing info. * src/lread.c (string_to_number): When converting an integer-format string to floating-point, signal an error if info is lost. --- doc/lispref/numbers.texi | 8 +++-- etc/NEWS | 9 +++++ src/editfns.c | 93 ++++++++++++++++++++----------------------= ------ src/lread.c | 14 ++++++++ 4 files changed, 66 insertions(+), 58 deletions(-) diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi index e692ee1cc2..252aafd8fd 100644 --- a/doc/lispref/numbers.texi +++ b/doc/lispref/numbers.texi @@ -53,9 +53,11 @@ Integer Basics chapter assume the minimum integer width of 30 bits. @cindex overflow =20 - The Lisp reader reads an integer as a sequence of digits with optional= -initial sign and optional final period. An integer that is out of the -Emacs range is treated as a floating-point number. + The Lisp reader can read an integer as a nonempty sequence of +decimal digits with optional initial sign and optional final period. +A decimal integer that is out of the Emacs range is treated as +floating-point if it can be represented exactly as a floating-point +number. =20 @example 1 ; @r{The integer 1.} diff --git a/etc/NEWS b/etc/NEWS index 8db638e5ed..36cbcf6500 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -248,6 +248,12 @@ as new-style, bind the new variable 'force-new-style= -backquotes' to t. 'cl-struct-define' whose name clashes with a builtin type (e.g., 'integer' or 'hash-table') now signals an error. =20 +** When formatting a floating-point number as an octal or hexadecimal +integer, Emacs now signals an error if the number is too large for the +implementation to format. When reading an integer outside Emacs +fixnum range, Emacs now signals an error if the integer cannot be +represented exactly as a floating-point number. See Bug#30408. + =0C * Lisp Changes in Emacs 27.1 =20 @@ -289,6 +295,9 @@ remote systems, which support this check. If the optional third argument is non-nil, 'make-string' will produce a multibyte string even if its second argument is an ASCII character. =20 +** (format "%d" X) no longer mishandles floating-point X values that +do not fit in a machine integer (Bug#30408). + ** New JSON parsing and serialization functions 'json-serialize', 'json-insert', 'json-parse-string', and 'json-parse-buffer'. These are implemented in C using the Jansson library. diff --git a/src/editfns.c b/src/editfns.c index 96bb271b2d..d26549ddb8 100644 --- a/src/editfns.c +++ b/src/editfns.c @@ -4563,32 +4563,30 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args= , bool message) and with pM inserted for integer formats. At most two flags F can be specified at once. */ char convspec[sizeof "%FF.*d" + max (INT_AS_LDBL, pMlen)]; - { - char *f =3D convspec; - *f++ =3D '%'; - /* MINUS_FLAG and ZERO_FLAG are dealt with later. */ - *f =3D '+'; f +=3D plus_flag; - *f =3D ' '; f +=3D space_flag; - *f =3D '#'; f +=3D sharp_flag; - *f++ =3D '.'; - *f++ =3D '*'; - if (float_conversion) - { - if (INT_AS_LDBL) - { - *f =3D 'L'; - f +=3D INTEGERP (arg); - } - } - else if (conversion !=3D 'c') - { - memcpy (f, pMd, pMlen); - f +=3D pMlen; - zero_flag &=3D ! precision_given; - } - *f++ =3D conversion; - *f =3D '\0'; - } + char *f =3D convspec; + *f++ =3D '%'; + /* MINUS_FLAG and ZERO_FLAG are dealt with later. */ + *f =3D '+'; f +=3D plus_flag; + *f =3D ' '; f +=3D space_flag; + *f =3D '#'; f +=3D sharp_flag; + *f++ =3D '.'; + *f++ =3D '*'; + if (float_conversion) + { + if (INT_AS_LDBL) + { + *f =3D 'L'; + f +=3D INTEGERP (arg); + } + } + else if (conversion !=3D 'c') + { + memcpy (f, pMd, pMlen); + f +=3D pMlen; + zero_flag &=3D ! precision_given; + } + *f++ =3D conversion; + *f =3D '\0'; =20 int prec =3D -1; if (precision_given) @@ -4630,29 +4628,18 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args= , bool message) } else if (conversion =3D=3D 'd' || conversion =3D=3D 'i') { - /* For float, maybe we should use "%1.0f" - instead so it also works for values outside - the integer range. */ - printmax_t x; if (INTEGERP (arg)) - x =3D XINT (arg); + { + printmax_t x =3D XINT (arg); + sprintf_bytes =3D sprintf (sprintf_buf, convspec, prec, x); + } else { - double d =3D XFLOAT_DATA (arg); - if (d < 0) - { - x =3D TYPE_MINIMUM (printmax_t); - if (x < d) - x =3D d; - } - else - { - x =3D TYPE_MAXIMUM (printmax_t); - if (d < x) - x =3D d; - } + strcpy (f - pMlen - 1, "f"); + prec =3D 0; + double x =3D XFLOAT_DATA (arg); + sprintf_bytes =3D sprintf (sprintf_buf, convspec, prec, x); } - sprintf_bytes =3D sprintf (sprintf_buf, convspec, prec, x); } else { @@ -4663,22 +4650,18 @@ styled_format (ptrdiff_t nargs, Lisp_Object *args= , bool message) else { double d =3D XFLOAT_DATA (arg); - if (d < 0) - x =3D 0; - else - { - x =3D TYPE_MAXIMUM (uprintmax_t); - if (d < x) - x =3D d; - } + if (! (0 <=3D d && d < TYPE_MAXIMUM (uprintmax_t))) + xsignal1 (Qoverflow_error, arg); + x =3D d; } sprintf_bytes =3D sprintf (sprintf_buf, convspec, prec, x); } =20 /* Now the length of the formatted item is known, except it omits= padding and excess precision. Deal with excess precision - first. This happens only when the format specifies - ridiculously large precision. */ + first. This happens when the format specifies + ridiculously large precision, or when %d or %i has + nonzero precision and formats a float. */ ptrdiff_t excess_precision =3D precision_given ? precision - prec : 0; ptrdiff_t leading_zeros =3D 0, trailing_zeros =3D 0; diff --git a/src/lread.c b/src/lread.c index d009bd0cd2..9500ed8341 100644 --- a/src/lread.c +++ b/src/lread.c @@ -3794,6 +3794,20 @@ string_to_number (char const *string, int base, bo= ol ignore_trailing) if (! value) value =3D atof (string + signedp); =20 + if (! float_syntax) + { + /* Check that converting the integer-format STRING to a + floating-point number does not lose info. See Bug#30408. */ + char const *bp =3D string + signedp; + while (*bp =3D=3D '0') + bp++; + char checkbuf[DBL_MAX_10_EXP + 2]; + int checkbuflen =3D sprintf (checkbuf, "%.0f", value); + if (! (cp - bp - !!(state & DOT_CHAR) =3D=3D checkbuflen + && memcmp (bp, checkbuf, checkbuflen) =3D=3D 0)) + xsignal1 (Qoverflow_error, build_string (string)); + } + return make_float (negative ? -value : value); } =20 --=20 2.14.3 --------------2033A6E1E168A9BF2F76BBE8--