Strange problems with data-tests.el

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Strange problems with data-tests.el
@ 2018-08-12 18:37 Eli Zaretskii
  2018-08-12 22:43 ` Paul Eggert
  2018-08-13 10:06 ` Pip Cet
  0 siblings, 2 replies; 10+ messages in thread
From: Eli Zaretskii @ 2018-08-12 18:37 UTC (permalink / raw)
  To: emacs-devel

The newly built master (including bignums and after removing misc)
passed src/data-tests.el for an unoptimized build and for an -O2
optimized build.  When I tried the -Og optimized build, it failed the
tests.  The reason for the failures seems to be that the following
expression yields nil, whereas the test expects it to yield t:

  (< 0.5 most-positive-fixnum (+ 1.0 most-positive-fixnum))

(This is a 32-bit build --with-wide-int, using GCC 7.3.0.)

I stepped with a debugger through arithcompare.  The values involved
aren't easy to display, and optimizations don't help, but after some
tinkering with "nexti" and "info float", it looks like the values of
f1 and f2 are loaded into FP registers when this line is executed:

	  fneq = f1 != f2;

which correctly yields 'true' (as there's no loss of precision in both
values), but when we get to this line:

    case ARITH_LESS:
      test = fneq ? f1 < f2 : i1 < i2;

what I seem to see in the 2 FP registers is the same value(??), and in
any case 'test' comes out 'false'.  ("info address" at this point
tells me that both f1 and f2 are "complex DWARF expressions", so
perhaps what I see in the FP registers is not the real values.)

This all looks like a compiler bug, but the code looks too simple for
that, so perhaps it's some subtlety that only trips us in this
combination: 32-bit build with 64-bit EMACS_INT and some FP values in
80-bit extended-precision registers?

Rebuilding with -O2 makes the problem go away, but why should -Og
(whose optimizations are almost exactly -O1) fail to work here?

Any ideas for how to pursue this further?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange problems with data-tests.el
  2018-08-12 18:37 Strange problems with data-tests.el Eli Zaretskii
@ 2018-08-12 22:43 ` Paul Eggert
  2018-08-13 15:05   ` Eli Zaretskii
  2018-08-13 10:06 ` Pip Cet
  1 sibling, 1 reply; 10+ messages in thread
From: Paul Eggert @ 2018-08-12 22:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1004 bytes --]

Eli Zaretskii wrote:
> This all looks like a compiler bug

Yes and no. In this area, GCC deliberately doesn't conform to the C standard 
unless you use an option like -fexcess-precision=standard that causes GCC to 
conform but often generate worse code.

The odd results you observed can occur when a C 'double' variable spontaneously 
changes value, e.g., when an 80-bit floating-point register is spilled to a 
64-bit memory location. I reproduced the problem on Fedora 28 and fixed it by 
installing the attached patch into master; please give it a try.

I suspect that it's only bad luck that exposed the problem, that is, that the 
recent bignum or misc changes triggered the bug only because they caused GCC to 
optimize in a different way. In theory if we're unlucky the bug could recur, 
since GCC doesn't make promises in this area. However, I changed arithcompare in 
such a way that GCC would have to be pretty perverse to cause the problem to 
resurface, and similarly for other C compilers.

[-- Attachment #2: 0001-Port-better-to-x86-fexcess-precision-fast.patch --]
[-- Type: text/x-patch, Size: 3508 bytes --]

From a84cef90957f2379cc0df6bd908317fc441971ce Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 12 Aug 2018 15:28:20 -0700
Subject: [PATCH] Port better to x86 -fexcess-precision=fast

Problem reported by Eli Zaretskii in:
https://lists.gnu.org/r/emacs-devel/2018-08/msg00380.html
* src/data.c (arithcompare): Work around incompatibility
between gcc -fexcess-precision=fast and the C standard on x86,
by capturing the results of floating-point comparisons before
the excess precision spontaneously decays.  Although this fix
might not work in general, it does work here and is probably
good enough for the platforms we care about.
---
 src/data.c | 43 +++++++++++++++++++++++++++++--------------
 1 file changed, 29 insertions(+), 14 deletions(-)

diff --git a/src/data.c b/src/data.c
index c8a9c6b378..7b8dd45c94 100644
--- a/src/data.c
+++ b/src/data.c
@@ -2492,7 +2492,7 @@ arithcompare (Lisp_Object num1, Lisp_Object num2,
 {
   double f1, f2;
   EMACS_INT i1, i2;
-  bool fneq;
+  bool lt, eq, gt;
   bool test;
 
   CHECK_NUMBER_COERCE_MARKER (num1);
@@ -2502,10 +2502,13 @@ arithcompare (Lisp_Object num1, Lisp_Object num2,
     return bignumcompare (num1, num2, comparison);
 
   /* If either arg is floating point, set F1 and F2 to the 'double'
-     approximations of the two arguments, and set FNEQ if floating-point
-     comparison reports that F1 is not equal to F2, possibly because F1
-     or F2 is a NaN.  Regardless, set I1 and I2 to integers that break
-     ties if the floating-point comparison is either not done or reports
+     approximations of the two arguments, and set LT, EQ, and GT to
+     the <, ==, > floating-point comparisons of F1 and F2
+     respectively, taking care to avoid problems if either is a NaN,
+     and trying to avoid problems on platforms where variables (in
+     violation of the C standard) can contain excess precision.
+     Regardless, set I1 and I2 to integers that break ties if the
+     floating-point comparison is either not done or reports
      equality.  */
 
   if (FLOATP (num1))
@@ -2528,7 +2531,9 @@ arithcompare (Lisp_Object num1, Lisp_Object num2,
 	     to I2 will break the tie correctly.  */
 	  i1 = f2 = i2 = XFIXNUM (num2);
 	}
-      fneq = f1 != f2;
+      lt = f1 < f2;
+      eq = f1 == f2;
+      gt = f1 > f2;
     }
   else
     {
@@ -2539,39 +2544,49 @@ arithcompare (Lisp_Object num1, Lisp_Object num2,
 	     converse of comparing float to integer (see above).  */
 	  i2 = f1 = i1;
 	  f2 = XFLOAT_DATA (num2);
-	  fneq = f1 != f2;
+	  lt = f1 < f2;
+	  eq = f1 == f2;
+	  gt = f1 > f2;
 	}
       else
 	{
 	  i2 = XFIXNUM (num2);
-	  fneq = false;
+	  eq = true;
 	}
     }
 
+  if (eq)
+    {
+      /* Break a floating-point tie by comparing the integers.  */
+      lt = i1 < i2;
+      eq = i1 == i2;
+      gt = i1 > i2;
+    }
+
   switch (comparison)
     {
     case ARITH_EQUAL:
-      test = !fneq && i1 == i2;
+      test = eq;
       break;
 
     case ARITH_NOTEQUAL:
-      test = fneq || i1 != i2;
+      test = !eq;
       break;
 
     case ARITH_LESS:
-      test = fneq ? f1 < f2 : i1 < i2;
+      test = lt;
       break;
 
     case ARITH_LESS_OR_EQUAL:
-      test = fneq ? f1 <= f2 : i1 <= i2;
+      test = lt | eq;
       break;
 
     case ARITH_GRTR:
-      test = fneq ? f1 > f2 : i1 > i2;
+      test = gt;
       break;
 
     case ARITH_GRTR_OR_EQUAL:
-      test = fneq ? f1 >= f2 : i1 >= i2;
+      test = gt | eq;
       break;
 
     default:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Strange problems with data-tests.el
  2018-08-12 18:37 Strange problems with data-tests.el Eli Zaretskii
  2018-08-12 22:43 ` Paul Eggert
@ 2018-08-13 10:06 ` Pip Cet
  2018-08-13 14:44   ` Eli Zaretskii
  1 sibling, 1 reply; 10+ messages in thread
From: Pip Cet @ 2018-08-13 10:06 UTC (permalink / raw)
  To: eliz; +Cc: emacs-devel

On Sun, Aug 12, 2018 at 6:38 PM Eli Zaretskii <eliz@gnu.org> wrote:
>   (< 0.5 most-positive-fixnum (+ 1.0 most-positive-fixnum))

I think that test is a bit misleading, as (< most-positive-fixnum (+
-128.0 most-positive-fixnum)) also returns true. It's not the addition
that makes the RHS greater than the LHS, it's the rounding that
happens when it's converted to a float. (This is probably obvious to
most people, but it surprised me).



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange problems with data-tests.el
  2018-08-13 10:06 ` Pip Cet
@ 2018-08-13 14:44   ` Eli Zaretskii
  2018-08-13 14:58     ` Pip Cet
  0 siblings, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2018-08-13 14:44 UTC (permalink / raw)
  To: Pip Cet; +Cc: emacs-devel

> From: Pip Cet <pipcet@gmail.com>
> Date: Mon, 13 Aug 2018 10:06:12 +0000
> Cc: emacs-devel@gnu.org
> 
> On Sun, Aug 12, 2018 at 6:38 PM Eli Zaretskii <eliz@gnu.org> wrote:
> >   (< 0.5 most-positive-fixnum (+ 1.0 most-positive-fixnum))
> 
> I think that test is a bit misleading, as (< most-positive-fixnum (+
> -128.0 most-positive-fixnum)) also returns true.

Not on a 32-bit host with a 32-bit EMACS_INT, it doesn't.

> It's not the addition that makes the RHS greater than the LHS, it's
> the rounding that happens when it's converted to a float.

With 64-bit EMACS_INT, yes.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange problems with data-tests.el
  2018-08-13 14:44   ` Eli Zaretskii
@ 2018-08-13 14:58     ` Pip Cet
  2018-08-13 16:43       ` Paul Eggert
  0 siblings, 1 reply; 10+ messages in thread
From: Pip Cet @ 2018-08-13 14:58 UTC (permalink / raw)
  To: eliz; +Cc: emacs-devel

On Mon, Aug 13, 2018 at 2:44 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > I think that test is a bit misleading, as (< most-positive-fixnum (+
> > -128.0 most-positive-fixnum)) also returns true.
>
> Not on a 32-bit host with a 32-bit EMACS_INT, it doesn't.

You're right, of course. Which is what I meant when I said the test
was misleading: it's true for subtly different reasons on different
hosts; (< x (+ 1.0 x)) is false on some hosts for some x smaller than
most-positive-fixnum, after all.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange problems with data-tests.el
  2018-08-12 22:43 ` Paul Eggert
@ 2018-08-13 15:05   ` Eli Zaretskii
  2018-08-13 16:24     ` Paul Eggert
  0 siblings, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2018-08-13 15:05 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

> From: Paul Eggert <eggert@cs.ucla.edu>
> Cc: emacs-devel@gnu.org
> Date: Sun, 12 Aug 2018 15:43:14 -0700
> 
> I suspect that it's only bad luck that exposed the problem, that is, that the 
> recent bignum or misc changes triggered the bug only because they caused GCC to 
> optimize in a different way.

No, it didn't happen because of the bignum merge.  I found a 26.0.90
pretest that was built with -Og, and sure enough, the problem is there
as well.  And when I installed your patch in that version and
recompiled, the problem went away.

So I think this is a relatively old bug, and we should backport it to
the emacs-26 branch.  WDYT?

> In theory if we're unlucky the bug could recur, since GCC doesn't
> make promises in this area. However, I changed arithcompare in such
> a way that GCC would have to be pretty perverse to cause the problem
> to resurface, and similarly for other C compilers.

Thanks.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange problems with data-tests.el
  2018-08-13 15:05   ` Eli Zaretskii
@ 2018-08-13 16:24     ` Paul Eggert
  2018-08-13 16:49       ` Eli Zaretskii
  0 siblings, 1 reply; 10+ messages in thread
From: Paul Eggert @ 2018-08-13 16:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii wrote:
> So I think this is a relatively old bug, and we should backport it to
> the emacs-26 branch.  WDYT?

Although you could talk me into it, I think it's not serious enough to backport. 
It's merely a rounding error with numbers that are very close to each other, and 
rounding errors happen all the time anyway.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange problems with data-tests.el
  2018-08-13 14:58     ` Pip Cet
@ 2018-08-13 16:43       ` Paul Eggert
  0 siblings, 0 replies; 10+ messages in thread
From: Paul Eggert @ 2018-08-13 16:43 UTC (permalink / raw)
  To: Pip Cet, eliz; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 457 bytes --]

Pip Cet wrote:
> Which is what I meant when I said the test
> was misleading: it's true for subtly different reasons on different
> hosts

Who would have thought that the tedium of writing floating-point tests could be 
overwhelmed by the tedium of documenting them and worse yet, writing ChangeLog 
entries for them? Anyway, I just now experienced this, and installed the 
attached into emacs-26. Though I was strongly tempted to remove the tests instead.

[-- Attachment #2: 0001-Add-comment-about-floating-point-test.patch --]
[-- Type: text/x-patch, Size: 3493 bytes --]

From 217a88557da50be8bcd76f55b90137d121eb2ecf Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 13 Aug 2018 09:36:11 -0700
Subject: [PATCH] Add comment about floating point test

* test/src/data-tests.el (data-tests--float-greater-than-fixnums):
New constant.
(data-tests-=, data-tests-<, data-tests->, data-tests-<=)
(data-tests->=, data-tests-min): Use it.
---
 test/src/data-tests.el | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/test/src/data-tests.el b/test/src/data-tests.el
index 91463db113..b444dc70f1 100644
--- a/test/src/data-tests.el
+++ b/test/src/data-tests.el
@@ -23,13 +23,21 @@
 
 (require 'cl-lib)
 
+(defconst data-tests--float-greater-than-fixnums (+ 1.0 most-positive-fixnum)
+  "A floating-point value that is greater than all fixnums.
+It is also as small as conveniently possible, to make the tests sharper.
+Adding 1.0 to most-positive-fixnum should suffice on all
+practical Emacs platforms, since the result is a power of 2 and
+this is exactly representable and is greater than
+most-positive-fixnum, which is just less than a power of 2.")
+
 (ert-deftest data-tests-= ()
   (should-error (=))
   (should (= 1))
   (should (= 2 2))
   (should (= 9 9 9 9 9 9 9 9 9))
   (should (= most-negative-fixnum (float most-negative-fixnum)))
-  (should-not (= most-positive-fixnum (+ 1.0 most-positive-fixnum)))
+  (should-not (= most-positive-fixnum data-tests--float-greater-than-fixnums))
   (should-not (apply #'= '(3 8 3)))
   (should-error (= 9 9 'foo))
   ;; Short circuits before getting to bad arg
@@ -40,7 +48,7 @@
   (should (< 1))
   (should (< 2 3))
   (should (< -6 -1 0 2 3 4 8 9 999))
-  (should (< 0.5 most-positive-fixnum (+ 1.0 most-positive-fixnum)))
+  (should (< 0.5 most-positive-fixnum data-tests--float-greater-than-fixnums))
   (should-not (apply #'< '(3 8 3)))
   (should-error (< 9 10 'foo))
   ;; Short circuits before getting to bad arg
@@ -51,7 +59,7 @@
   (should (> 1))
   (should (> 3 2))
   (should (> 6 1 0 -2 -3 -4 -8 -9 -999))
-  (should (> (+ 1.0 most-positive-fixnum) most-positive-fixnum 0.5))
+  (should (> data-tests--float-greater-than-fixnums most-positive-fixnum 0.5))
   (should-not (apply #'> '(3 8 3)))
   (should-error (> 9 8 'foo))
   ;; Short circuits before getting to bad arg
@@ -62,7 +70,7 @@
   (should (<= 1))
   (should (<= 2 3))
   (should (<= -6 -1 -1 0 0 0 2 3 4 8 999))
-  (should (<= 0.5 most-positive-fixnum (+ 1.0 most-positive-fixnum)))
+  (should (<= 0.5 most-positive-fixnum data-tests--float-greater-than-fixnums))
   (should-not (apply #'<= '(3 8 3 3)))
   (should-error (<= 9 10 'foo))
   ;; Short circuits before getting to bad arg
@@ -73,7 +81,7 @@
   (should (>= 1))
   (should (>= 3 2))
   (should (>= 666 1 0 0 -2 -3 -3 -3 -4 -8 -8 -9 -999))
-  (should (>= (+ 1.0 most-positive-fixnum) most-positive-fixnum))
+  (should (>= data-tests--float-greater-than-fixnums most-positive-fixnum))
   (should-not (apply #'>= '(3 8 3)))
   (should-error (>= 9 8 'foo))
   ;; Short circuits before getting to bad arg
@@ -97,7 +105,7 @@
   (should (= 2 (min 3 2)))
   (should (= -999 (min 666 1 0 0 -2 -3 -3 -3 -4 -8 -8 -9 -999)))
   (should (= most-positive-fixnum
-             (min (+ 1.0 most-positive-fixnum) most-positive-fixnum)))
+             (min data-tests--float-greater-than-fixnums most-positive-fixnum)))
   (should (= 3 (apply #'min '(3 8 3))))
   (should-error (min 9 8 'foo))
   (should-error (min (make-marker)))
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Strange problems with data-tests.el
  2018-08-13 16:24     ` Paul Eggert
@ 2018-08-13 16:49       ` Eli Zaretskii
  2018-08-13 17:02         ` Paul Eggert
  0 siblings, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2018-08-13 16:49 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

> Cc: emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Mon, 13 Aug 2018 09:24:23 -0700
> 
> Eli Zaretskii wrote:
> > So I think this is a relatively old bug, and we should backport it to
> > the emacs-26 branch.  WDYT?
> 
> Although you could talk me into it, I think it's not serious enough to backport. 

But is there a downside to backporting it?



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Strange problems with data-tests.el
  2018-08-13 16:49       ` Eli Zaretskii
@ 2018-08-13 17:02         ` Paul Eggert
  0 siblings, 0 replies; 10+ messages in thread
From: Paul Eggert @ 2018-08-13 17:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii wrote:
> But is there a downside to backporting it?

Just that it's more work for little real benefit, which means more-important 
stuff won't get done. It's just a teensy rounding error on an obsolescent 
platform configured with unusual optimization options, after all.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-08-13 17:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-12 18:37 Strange problems with data-tests.el Eli Zaretskii
2018-08-12 22:43 ` Paul Eggert
2018-08-13 15:05   ` Eli Zaretskii
2018-08-13 16:24     ` Paul Eggert
2018-08-13 16:49       ` Eli Zaretskii
2018-08-13 17:02         ` Paul Eggert
2018-08-13 10:06 ` Pip Cet
2018-08-13 14:44   ` Eli Zaretskii
2018-08-13 14:58     ` Pip Cet
2018-08-13 16:43       ` Paul Eggert

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).