unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#48678: [PATCH] lex floats with trailing dot and exponent correctly
@ 2021-05-26 16:56 Mattias Engdegård
  2021-05-26 17:27 ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Mattias Engdegård @ 2021-05-26 16:56 UTC (permalink / raw)
  To: 48678

[-- Attachment #1: Type: text/plain, Size: 915 bytes --]

Motivation: I poured lots of numeric data into Emacs for a computation, but the result weren't as expected at all. Yet my code was correct, and so was the data.

After hours of debugging, it turned out that Emacs reads a number like 1.e6 as the integer 1, not the float 1000000.0. The exponent is silently ignored!

Now Emacs has always treated numbers like 123. as integers rather than floats, but
(1) it's documented,
(2) it's what Common Lisp does, and
(3) it actually doesn't affect the numeric value most of the time.

(Common Lisp probably got this from Maclisp, the rationale being that a trailing dot can be used to write integers in base 10 even when the current input radix is set to something else, something that Emacs Lisp doesn't need.)

Obviously this doesn't apply to 1.e6 which any sane person agrees is the float 1.0e+6 (including Common Lisp).

The attached patch fixes this bug.


[-- Attachment #2: 0001-Fix-lexing-of-numbers-with-trailing-decimal-point-an.patch --]
[-- Type: application/octet-stream, Size: 5931 bytes --]

From a0b69a9fc17c42b0c15b28c5894ffb2a1a9327e3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Thu, 20 May 2021 18:26:15 +0200
Subject: [PATCH] Fix lexing of numbers with trailing decimal point and
 exponent

Numbers with a trailing dot and an exponent were incorrectly read as
integers (with the exponent ignored) instead of the floats they should
be.  For example, 1.e6 was read as the integer 1, not 1000000.0 as
every sane person would agree was meant.

Numbers with a trailing dot but no exponent are still read as
integers.

* src/lread.c (string_to_number): Fix float lexing.
* test/src/lread-tests.el (lread-float): Add test.
* doc/lispref/numbers.texi (Float Basics): Clarify syntax.
---
 doc/lispref/numbers.texi |  3 +-
 src/lread.c              | 10 +++---
 test/src/lread-tests.el  | 67 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 74 insertions(+), 6 deletions(-)

diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi
index 4c5f72126e..d28e15869a 100644
--- a/doc/lispref/numbers.texi
+++ b/doc/lispref/numbers.texi
@@ -237,7 +237,8 @@ Float Basics
 @samp{+15e2}, @samp{15.0e+2}, @samp{+1500000e-3}, and @samp{.15e4} are
 five ways of writing a floating-point number whose value is 1500.
 They are all equivalent.  Like Common Lisp, Emacs Lisp requires at
-least one digit after any decimal point in a floating-point number;
+least one digit after a decimal point in a floating-point number that
+does not have an exponent;
 @samp{1500.} is an integer, not a floating-point number.
 
   Emacs Lisp treats @code{-0.0} as numerically equal to ordinary zero
diff --git a/src/lread.c b/src/lread.c
index bca53a9a37..0b33fd0f25 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -3938,8 +3938,7 @@ string_to_number (char const *string, int base, ptrdiff_t *plen)
   bool signedp = negative | positive;
   cp += signedp;
 
-  enum { INTOVERFLOW = 1, LEAD_INT = 2, DOT_CHAR = 4, TRAIL_INT = 8,
-	 E_EXP = 16 };
+  enum { INTOVERFLOW = 1, LEAD_INT = 2, TRAIL_INT = 4, E_EXP = 16 };
   int state = 0;
   int leading_digit = digit_to_number (*cp, base);
   uintmax_t n = leading_digit;
@@ -3959,7 +3958,6 @@ string_to_number (char const *string, int base, ptrdiff_t *plen)
   char const *after_digits = cp;
   if (*cp == '.')
     {
-      state |= DOT_CHAR;
       cp++;
     }
 
@@ -4008,8 +4006,10 @@ string_to_number (char const *string, int base, ptrdiff_t *plen)
 	    cp = ecp;
 	}
 
-      float_syntax = ((state & (DOT_CHAR|TRAIL_INT)) == (DOT_CHAR|TRAIL_INT)
-		      || (state & ~INTOVERFLOW) == (LEAD_INT|E_EXP));
+      /* A float has digits after the dot or an exponent.
+	 This excludes numbers like "1." which are lexed as integers. */
+      float_syntax = ((state & TRAIL_INT)
+		      || ((state & LEAD_INT) && (state & E_EXP)));
     }
 
   if (plen)
diff --git a/test/src/lread-tests.el b/test/src/lread-tests.el
index f2a60bcf32..dac8f95bc4 100644
--- a/test/src/lread-tests.el
+++ b/test/src/lread-tests.el
@@ -196,4 +196,71 @@ test-inhibit-interaction
     (should-error (read-event "foo: "))
     (should-error (read-char-exclusive "foo: "))))
 
+(ert-deftest lread-float ()
+  (should (equal (read "13") 13))
+  (should (equal (read "+13") 13))
+  (should (equal (read "-13") -13))
+  (should (equal (read "13.") 13))
+  (should (equal (read "+13.") 13))
+  (should (equal (read "-13.") -13))
+  (should (equal (read "13.25") 13.25))
+  (should (equal (read "+13.25") 13.25))
+  (should (equal (read "-13.25") -13.25))
+  (should (equal (read ".25") 0.25))
+  (should (equal (read "+.25") 0.25))
+  (should (equal (read "-.25") -0.25))
+  (should (equal (read "13e4") 130000.0))
+  (should (equal (read "+13e4") 130000.0))
+  (should (equal (read "-13e4") -130000.0))
+  (should (equal (read "13e+4") 130000.0))
+  (should (equal (read "+13e+4") 130000.0))
+  (should (equal (read "-13e+4") -130000.0))
+  (should (equal (read "625e-4") 0.0625))
+  (should (equal (read "+625e-4") 0.0625))
+  (should (equal (read "-625e-4") -0.0625))
+  (should (equal (read "1.25e2") 125.0))
+  (should (equal (read "+1.25e2") 125.0))
+  (should (equal (read "-1.25e2") -125.0))
+  (should (equal (read "1.25e+2") 125.0))
+  (should (equal (read "+1.25e+2") 125.0))
+  (should (equal (read "-1.25e+2") -125.0))
+  (should (equal (read "1.25e-1") 0.125))
+  (should (equal (read "+1.25e-1") 0.125))
+  (should (equal (read "-1.25e-1") -0.125))
+  (should (equal (read "4.e3") 4000.0))
+  (should (equal (read "+4.e3") 4000.0))
+  (should (equal (read "-4.e3") -4000.0))
+  (should (equal (read "4.e+3") 4000.0))
+  (should (equal (read "+4.e+3") 4000.0))
+  (should (equal (read "-4.e+3") -4000.0))
+  (should (equal (read "5.e-1") 0.5))
+  (should (equal (read "+5.e-1") 0.5))
+  (should (equal (read "-5.e-1") -0.5))
+  (should (equal (read "0") 0))
+  (should (equal (read "+0") 0))
+  (should (equal (read "-0") 0))
+  (should (equal (read "0.") 0))
+  (should (equal (read "+0.") 0))
+  (should (equal (read "-0.") 0))
+  (should (equal (read "0.0") 0.0))
+  (should (equal (read "+0.0") 0.0))
+  (should (equal (read "-0.0") -0.0))
+  (should (equal (read "0e5") 0.0))
+  (should (equal (read "+0e5") 0.0))
+  (should (equal (read "-0e5") -0.0))
+  (should (equal (read "0e-5") 0.0))
+  (should (equal (read "+0e-5") 0.0))
+  (should (equal (read "-0e-5") -0.0))
+  (should (equal (read ".0e-5") 0.0))
+  (should (equal (read "+.0e-5") 0.0))
+  (should (equal (read "-.0e-5") -0.0))
+  (should (equal (read "0.0e-5") 0.0))
+  (should (equal (read "+0.0e-5") 0.0))
+  (should (equal (read "-0.0e-5") -0.0))
+  (should (equal (read "0.e-5") 0.0))
+  (should (equal (read "+0.e-5") 0.0))
+  (should (equal (read "-0.e-5") -0.0))
+  )
+
+
 ;;; lread-tests.el ends here
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* bug#48678: [PATCH] lex floats with trailing dot and exponent correctly
  2021-05-26 16:56 bug#48678: [PATCH] lex floats with trailing dot and exponent correctly Mattias Engdegård
@ 2021-05-26 17:27 ` Eli Zaretskii
  2021-05-26 22:13   ` Lars Ingebrigtsen
  2021-05-27 12:20   ` Mattias Engdegård
  0 siblings, 2 replies; 12+ messages in thread
From: Eli Zaretskii @ 2021-05-26 17:27 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 48678

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Wed, 26 May 2021 18:56:43 +0200
> 
> Now Emacs has always treated numbers like 123. as integers rather than floats, but
> (1) it's documented,
> (2) it's what Common Lisp does, and
> (3) it actually doesn't affect the numeric value most of the time.
> 
> (Common Lisp probably got this from Maclisp, the rationale being that a trailing dot can be used to write integers in base 10 even when the current input radix is set to something else, something that Emacs Lisp doesn't need.)
> 
> Obviously this doesn't apply to 1.e6 which any sane person agrees is the float 1.0e+6 (including Common Lisp).
> 
> The attached patch fixes this bug.

Brace for massive breakage.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#48678: [PATCH] lex floats with trailing dot and exponent correctly
  2021-05-26 17:27 ` Eli Zaretskii
@ 2021-05-26 22:13   ` Lars Ingebrigtsen
  2021-05-27  7:32     ` Andreas Schwab
  2021-05-27 12:20   ` Mattias Engdegård
  1 sibling, 1 reply; 12+ messages in thread
From: Lars Ingebrigtsen @ 2021-05-26 22:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Mattias Engdegård, 48678

Eli Zaretskii <eliz@gnu.org> writes:

>> Obviously this doesn't apply to 1.e6 which any sane person agrees is
>> the float 1.0e+6 (including Common Lisp).
>> 
>> The attached patch fixes this bug.
>
> Brace for massive breakage.

Yes, it's a rather scary change -- people will have code that sloppily
parses noisy things like "1foo" and expect to get a 1 out, and ".e6"
could well be noise that they expect to have ignored.

So I'm sceptical.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#48678: [PATCH] lex floats with trailing dot and exponent correctly
  2021-05-26 22:13   ` Lars Ingebrigtsen
@ 2021-05-27  7:32     ` Andreas Schwab
  2021-05-27  7:40       ` Lars Ingebrigtsen
  0 siblings, 1 reply; 12+ messages in thread
From: Andreas Schwab @ 2021-05-27  7:32 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Mattias Engdegård, 48678

On Mai 27 2021, Lars Ingebrigtsen wrote:

> Yes, it's a rather scary change -- people will have code that sloppily
> parses noisy things like "1foo" and expect to get a 1 out, and ".e6"
> could well be noise that they expect to have ignored.
>
> So I'm sceptical.

But then 1.e6 should be parsed as a symbol.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#48678: [PATCH] lex floats with trailing dot and exponent correctly
  2021-05-27  7:32     ` Andreas Schwab
@ 2021-05-27  7:40       ` Lars Ingebrigtsen
  2021-05-27 12:28         ` Mattias Engdegård
  2021-05-27 12:36         ` Philipp Stephani
  0 siblings, 2 replies; 12+ messages in thread
From: Lars Ingebrigtsen @ 2021-05-27  7:40 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Mattias Engdegård, 48678

Andreas Schwab <schwab@linux-m68k.org> writes:

> On Mai 27 2021, Lars Ingebrigtsen wrote:
>
>> Yes, it's a rather scary change -- people will have code that sloppily
>> parses noisy things like "1foo" and expect to get a 1 out, and ".e6"
>> could well be noise that they expect to have ignored.
>>
>> So I'm sceptical.
>
> But then 1.e6 should be parsed as a symbol.

Oops, I thought this was about string-to-number, which it wasn't at all.

Hm.  Currently 1.e6 reads to 1?  Weird.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#48678: [PATCH] lex floats with trailing dot and exponent correctly
  2021-05-26 17:27 ` Eli Zaretskii
  2021-05-26 22:13   ` Lars Ingebrigtsen
@ 2021-05-27 12:20   ` Mattias Engdegård
  2021-05-29  6:03     ` Lars Ingebrigtsen
  1 sibling, 1 reply; 12+ messages in thread
From: Mattias Engdegård @ 2021-05-27 12:20 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 48678

26 maj 2021 kl. 19.27 skrev Eli Zaretskii <eliz@gnu.org>:

> Brace for massive breakage.

Challenge accepted! Now in master. Bring it on!






^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#48678: [PATCH] lex floats with trailing dot and exponent correctly
  2021-05-27  7:40       ` Lars Ingebrigtsen
@ 2021-05-27 12:28         ` Mattias Engdegård
  2021-05-27 12:36         ` Philipp Stephani
  1 sibling, 0 replies; 12+ messages in thread
From: Mattias Engdegård @ 2021-05-27 12:28 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Andreas Schwab, 48678

27 maj 2021 kl. 09.40 skrev Lars Ingebrigtsen <larsi@gnus.org>:

> Hm.  Currently 1.e6 reads to 1?  Weird.

Yes, this behaviour was probably not intended at all. Such things happen; the best we can do is to put things right.






^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#48678: [PATCH] lex floats with trailing dot and exponent correctly
  2021-05-27  7:40       ` Lars Ingebrigtsen
  2021-05-27 12:28         ` Mattias Engdegård
@ 2021-05-27 12:36         ` Philipp Stephani
  2021-05-27 12:37           ` Philipp Stephani
  1 sibling, 1 reply; 12+ messages in thread
From: Philipp Stephani @ 2021-05-27 12:36 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Mattias Engdegård, Andreas Schwab, 48678

Am Do., 27. Mai 2021 um 09:41 Uhr schrieb Lars Ingebrigtsen <larsi@gnus.org>:
>
> Andreas Schwab <schwab@linux-m68k.org> writes:
>
> > On Mai 27 2021, Lars Ingebrigtsen wrote:
> >
> >> Yes, it's a rather scary change -- people will have code that sloppily
> >> parses noisy things like "1foo" and expect to get a 1 out, and ".e6"
> >> could well be noise that they expect to have ignored.
> >>
> >> So I'm sceptical.
> >
> > But then 1.e6 should be parsed as a symbol.
>
> Oops, I thought this was about string-to-number, which it wasn't at all.
>
> Hm.  Currently 1.e6 reads to 1?  Weird.

At least for me it's parsed as a symbol.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#48678: [PATCH] lex floats with trailing dot and exponent correctly
  2021-05-27 12:36         ` Philipp Stephani
@ 2021-05-27 12:37           ` Philipp Stephani
  0 siblings, 0 replies; 12+ messages in thread
From: Philipp Stephani @ 2021-05-27 12:37 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Mattias Engdegård, Andreas Schwab, 48678

Am Do., 27. Mai 2021 um 14:36 Uhr schrieb Philipp Stephani
<p.stephani2@gmail.com>:
>
> Am Do., 27. Mai 2021 um 09:41 Uhr schrieb Lars Ingebrigtsen <larsi@gnus.org>:
> >
> > Andreas Schwab <schwab@linux-m68k.org> writes:
> >
> > > On Mai 27 2021, Lars Ingebrigtsen wrote:
> > >
> > >> Yes, it's a rather scary change -- people will have code that sloppily
> > >> parses noisy things like "1foo" and expect to get a 1 out, and ".e6"
> > >> could well be noise that they expect to have ignored.
> > >>
> > >> So I'm sceptical.
> > >
> > > But then 1.e6 should be parsed as a symbol.
> >
> > Oops, I thought this was about string-to-number, which it wasn't at all.
> >
> > Hm.  Currently 1.e6 reads to 1?  Weird.
>
> At least for me it's parsed as a symbol.

Oops, taking that back, I checked 1e.6 instead of 1.e6.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#48678: [PATCH] lex floats with trailing dot and exponent correctly
  2021-05-27 12:20   ` Mattias Engdegård
@ 2021-05-29  6:03     ` Lars Ingebrigtsen
  2021-05-29  7:47       ` Mattias Engdegård
  0 siblings, 1 reply; 12+ messages in thread
From: Lars Ingebrigtsen @ 2021-05-29  6:03 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 48678

Mattias Engdegård <mattiase@acm.org> writes:

> 26 maj 2021 kl. 19.27 skrev Eli Zaretskii <eliz@gnu.org>:
>
>> Brace for massive breakage.
>
> Challenge accepted! Now in master. Bring it on!

:-)

There didn't seem to be any reported breakages from this yet.  It does
seem quite NEWS-worthy, though, so I've added an entry, and I'm closing
this bug report.  If serious breakages do happen, we should consider
backing out the change.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#48678: [PATCH] lex floats with trailing dot and exponent correctly
  2021-05-29  6:03     ` Lars Ingebrigtsen
@ 2021-05-29  7:47       ` Mattias Engdegård
  2021-05-30  4:07         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 12+ messages in thread
From: Mattias Engdegård @ 2021-05-29  7:47 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 48678

29 maj 2021 kl. 08.03 skrev Lars Ingebrigtsen <larsi@gnus.org>:

> There didn't seem to be any reported breakages from this yet.  It does
> seem quite NEWS-worthy, though, so I've added an entry, and I'm closing
> this bug report.

Excellent! I was going to write a NEWS entry, so thank you for forcing my hand. I took the liberty to make a few minor changes to it for precision; if it isn't to your liking, do tell.

>  If serious breakages do happen, we should consider
> backing out the change.

Most certainly, but I'm confident in the change. It wasn't done without serious preparation: I scanned hundreds of Emacs packages, and checked all boolean combinations in the reader condition to guarantee correctness (which showed that a flag in the condition was redundant and could be removed). There is now a serious test.

Looking for the origin I also ran Maclisp on a PDP-10 and can confirm that it does not have the bug, so it must have been endogenous to Emacs.






^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#48678: [PATCH] lex floats with trailing dot and exponent correctly
  2021-05-29  7:47       ` Mattias Engdegård
@ 2021-05-30  4:07         ` Lars Ingebrigtsen
  0 siblings, 0 replies; 12+ messages in thread
From: Lars Ingebrigtsen @ 2021-05-30  4:07 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 48678

Mattias Engdegård <mattiase@acm.org> writes:

> Excellent! I was going to write a NEWS entry, so thank you for forcing
> my hand. I took the liberty to make a few minor changes to it for
> precision; if it isn't to your liking, do tell.

Looks good to me; thanks.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-05-30  4:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-26 16:56 bug#48678: [PATCH] lex floats with trailing dot and exponent correctly Mattias Engdegård
2021-05-26 17:27 ` Eli Zaretskii
2021-05-26 22:13   ` Lars Ingebrigtsen
2021-05-27  7:32     ` Andreas Schwab
2021-05-27  7:40       ` Lars Ingebrigtsen
2021-05-27 12:28         ` Mattias Engdegård
2021-05-27 12:36         ` Philipp Stephani
2021-05-27 12:37           ` Philipp Stephani
2021-05-27 12:20   ` Mattias Engdegård
2021-05-29  6:03     ` Lars Ingebrigtsen
2021-05-29  7:47       ` Mattias Engdegård
2021-05-30  4:07         ` Lars Ingebrigtsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).