all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#72145: rare Emacs screwups on x86 due to GCC bug 58416
@ 2024-07-16 23:26 Paul Eggert
  2024-07-17  0:57 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Paul Eggert @ 2024-07-16 23:26 UTC (permalink / raw)
  To: 72145

[-- Attachment #1: Type: text/plain, Size: 2683 bytes --]

While testing GNU Emacs built on Fedora 40 with gcc (GCC) 14.1.1 
20240607 (Red Hat 14.1.1-5) with -m32 for x86 and configured 
--with-wide-int, I discovered that Emacs misbehaved in a hard-to-debug 
way due to GCC bug 58416. This bug causes GCC to generate wrong x86 
machine instructions when a C program accesses a union containing a 
'double'.

The bug I observed is that if you have something like this:

    union u { double d; long long int i; } u;

then GCC sometimes generates x86 instructions that copy u.i by using 
fldl/fstpl instruction pairs to push the 64-bit quantity onto the 387 
floating point stack, and then pop the stack into another memory 
location. Unfortunately the fldl/fstpl trick fails in the unusual case 
when the bit pattern of u.i, when interpreted as a double, is a NaN, as 
that can cause the fldl/fstpl pair to store a different NaN with a 
different bit pattern, which means the destination integer disagrees 
with u.i.

The bug is obscure, since the bug's presence depends on the GCC version, 
on the optimization options used, on the exact source code, and on the 
exact integer value at runtime (the value is typically copied correctly 
even when GCC has generated the incorrect machine code, since most long 
long int values don't alias with NaNs).

In short the bug appears to be rare.

Here are some possible courses of action:

* Do nothing and hope x86 users won't run into this rare bug.

* Have the GCC folks fix the bug. However, given that the bug has been 
reported for over a decade multiple times without a fix, it seems that 
fixing it is too difficult and/or too low priority for this aging 
platform. Also, even if the bug is fixed in future GCC the bug will 
still be present with people using older GCC.

* Build with Clang or some other compiler instead. We should be 
encouraging GCC, though.

* Rewrite Emacs to never use 'double' (or 'float' or 'long double') 
inside a union. This could be painful and hardly seems worthwhile.

* When using GCC to build Emacs on x86, compile with safer options that 
make the bug impossible. The attached proposed patch does that, by 
telling GCC not to use the 387 stack. (This patch fixed the Emacs 
misbehavior in my experimental build.) The downside is that the 
resulting Emacs executables need SSE2, introduced for the Pentium 4 in 
2000 <https://en.wikipedia.org/wiki/SSE2>. Nowadays few users need to 
run Emacs on non-SSE2 x86, so this may be good enough. Also, the 
proposed patch gives the builder an option to compile Emacs without the 
safer options, for people who want to build for older Intel-compatible 
platforms and who don't mind an occasional wrong answer or crash.

[-- Attachment #2: 0001-Work-around-GCC-bug-58416-when-building-for-x86.patch --]
[-- Type: text/x-patch, Size: 3125 bytes --]

From b6c6c0545607687e36fc47e5d5079aec4f58d591 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 15 Jul 2024 10:26:47 -0700
Subject: [PATCH] Work around GCC bug 58416 when building for x86

* configure.ac (C_SWITCH_MACHINE): Add -mfpmath=sse and perhaps
-msse2 to work around GCC bug 58416.
---
 configure.ac | 43 +++++++++++++++++++++++++++++++++++++++++++
 etc/NEWS     |  9 +++++++++
 2 files changed, 52 insertions(+)

diff --git a/configure.ac b/configure.ac
index e2b6dc2fc4d..4e74d66c65f 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2325,6 +2325,49 @@ AC_DEFUN
     fi
   ;;
 esac
+
+AC_CACHE_CHECK([for flags to work around GCC bug 58416],
+  [emacs_cv_gcc_bug_58416_CFLAGS],
+  [emacs_cv_gcc_bug_58416_CFLAGS='none needed'
+   AS_CASE([$canonical],
+     [[i[3456]86-* | x86_64-*]],
+	[AS_IF([test "$GCC" = yes],
+	   [old_CFLAGS=$CFLAGS
+	    for emacs_cv_gcc_bug_58416_CFLAGS in \
+		'none needed' '-mfpmath=sse' '-msse2 -mfpmath=sse' 'none work'
+	    do
+	      AS_CASE([$emacs_cv_gcc_bug_58416_CFLAGS],
+	        ['none work'], [break],
+		['none needed'], [],
+		[CFLAGS="$old_CFLAGS $emacs_cv_gcc_bug_58416_CFLAGS"])
+	      AC_COMPILE_IFELSE(
+		[AC_LANG_DEFINES_PROVIDED
+		 [/* Work around GCC bug with double in unions on x86,
+		     where the generated insns copy non-floating-point data
+		     via fldl/fstpl instruction pairs.  This can misbehave
+		     the data's bit pattern looks like a NaN.  See, e.g.:
+			https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58416
+			https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93271
+			https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114659
+		     Problem observed with 'gcc -m32' with GCC 14.1.1
+		     20240607 (Red Hat 14.1.1-5) on x86-64.  */
+		  #include <float.h>
+		  #if \
+		      ((defined __GNUC__ && !defined __clang__) \
+		       && (defined __i386__ || defined __x86_64__) \
+		       && ! (defined FLT_EVAL_METHOD \
+			     && 0 <= FLT_EVAL_METHOD \
+			     && FLT_EVAL_METHOD <= 1))
+		  # error "GCC bug 58416 is possibly present"
+		  #endif
+		]],
+		[break])
+	    done
+	    CFLAGS=$old_CFLAGS])])])
+AS_CASE([$emacs_cv_gcc_bug_58416_CFLAGS],
+  [-*],
+    [C_SWITCH_MACHINE="$C_SWITCH_MACHINE $emacs_cv_gcc_bug_58416_CFLAGS"])
+
 AC_SUBST([C_SWITCH_MACHINE])
 
 C_SWITCH_SYSTEM=
diff --git a/etc/NEWS b/etc/NEWS
index f10f9ae4d65..f3852750b9a 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -24,6 +24,15 @@ applies, and please also update docstrings as needed.
 \f
 * Installation Changes in Emacs 31.1
 
+** When using GCC to build Emacs on 32-bit x86 systems, 'configure' now
+defaults to specifying the GCC options -msse2 and -mfpmath=sse to work
+around GCC bug 58416.  As a result, the resulting Emacs executable now
+requires support for SSE2, introduced by Intel in 2000 for the Pentium 4
+and by AMD in 2003 for the Opteron and Athlon 64.  To build Emacs with
+GCC for older x86 processors, pass 'emacs_cv_gcc_bug_58416_CFLAGS=no' to
+'configure'; although the resulting Emacs may generate incorrect results
+or dump core, any such misbehavior should be rare.
+
 \f
 * Startup Changes in Emacs 31.1
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* bug#72145: rare Emacs screwups on x86 due to GCC bug 58416
  2024-07-16 23:26 bug#72145: rare Emacs screwups on x86 due to GCC bug 58416 Paul Eggert
@ 2024-07-17  0:57 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-07-17  5:01   ` Paul Eggert
  2024-07-18  3:22 ` Richard Stallman
  2024-07-18 14:19 ` Andrea Corallo
  2 siblings, 1 reply; 12+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-07-17  0:57 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 72145

Paul Eggert <eggert@cs.ucla.edu> writes:

> While testing GNU Emacs built on Fedora 40 with gcc (GCC) 14.1.1
> 20240607 (Red Hat 14.1.1-5) with -m32 for x86 and configured
> --with-wide-int, I discovered that Emacs misbehaved in a hard-to-debug
> way due to GCC bug 58416. This bug causes GCC to generate wrong x86
> machine instructions when a C program accesses a union containing a
> 'double'.
>
> The bug I observed is that if you have something like this:
>
>    union u { double d; long long int i; } u;
>
> then GCC sometimes generates x86 instructions that copy u.i by using
> fldl/fstpl instruction pairs to push the 64-bit quantity onto the 387
> floating point stack, and then pop the stack into another memory
> location. Unfortunately the fldl/fstpl trick fails in the unusual case
> when the bit pattern of u.i, when interpreted as a double, is a NaN,
> as that can cause the fldl/fstpl pair to store a different NaN with a
> different bit pattern, which means the destination integer disagrees
> with u.i.
>
> The bug is obscure, since the bug's presence depends on the GCC
> version, on the optimization options used, on the exact source code,
> and on the exact integer value at runtime (the value is typically
> copied correctly even when GCC has generated the incorrect machine
> code, since most long long int values don't alias with NaNs).
>
> In short the bug appears to be rare.
>
> Here are some possible courses of action:
>
> * Do nothing and hope x86 users won't run into this rare bug.
>
> * Have the GCC folks fix the bug. However, given that the bug has been
>   reported for over a decade multiple times without a fix, it seems
>   that fixing it is too difficult and/or too low priority for this
>   aging platform. Also, even if the bug is fixed in future GCC the bug
>   will still be present with people using older GCC.
>
> * Build with Clang or some other compiler instead. We should be
>   encouraging GCC, though.
>
> * Rewrite Emacs to never use 'double' (or 'float' or 'long double')
>   inside a union. This could be painful and hardly seems worthwhile.
>
> * When using GCC to build Emacs on x86, compile with safer options
>   that make the bug impossible. The attached proposed patch does that,
>   by telling GCC not to use the 387 stack. (This patch fixed the Emacs
>   misbehavior in my experimental build.) The downside is that the
>   resulting Emacs executables need SSE2, introduced for the Pentium 4
>   in 2000 <https://en.wikipedia.org/wiki/SSE2>. Nowadays few users
>   need to run Emacs on non-SSE2 x86, so this may be good enough. Also,
>   the proposed patch gives the builder an option to compile Emacs
>   without the safer options, for people who want to build for older
>   Intel-compatible platforms and who don't mind an occasional wrong
>  answer or crash.

Wouldn't it be better if configure attempted to detect the presence of
SSE2 on the host system?





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#72145: rare Emacs screwups on x86 due to GCC bug 58416
  2024-07-17  0:57 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-07-17  5:01   ` Paul Eggert
  2024-07-17 21:56     ` Paul Eggert
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Eggert @ 2024-07-17  5:01 UTC (permalink / raw)
  To: Po Lu; +Cc: 72145

On 2024-07-16 17:57, Po Lu wrote:
> Wouldn't it be better if configure attempted to detect the presence of
> SSE2 on the host system?

We could add an AC_RUN_IFELSE test for SSE2, though I doubt whether it 
would affect builds significantly in practice. Build systems invariably 
support SSE2 nowadays and AC_RUN_IFELSE tests the build system, not the 
host system.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#72145: rare Emacs screwups on x86 due to GCC bug 58416
  2024-07-17  5:01   ` Paul Eggert
@ 2024-07-17 21:56     ` Paul Eggert
  2024-07-18  2:39       ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Eggert @ 2024-07-17 21:56 UTC (permalink / raw)
  To: Po Lu; +Cc: 72145

[-- Attachment #1: Type: text/plain, Size: 571 bytes --]

On 2024-07-16 22:01, Paul Eggert wrote:
> We could add an AC_RUN_IFELSE test for SSE2, though I doubt whether it 
> would affect builds significantly in practice.

On second thought the rare Arch or Gentoo user could still be building 
Emacs for the Pentium III, and for such a user a run-time test on the 
build host would be a win. This can be done via the attached revised 
patch. It uses AC_LINK_IFELSE to compile and run a single program, 
instead of AC_RUN_IFELSE which (when combined with AC_COMPILE_IFELSE) 
would mean compiling two test programs and running one.

[-- Attachment #2: 0001-Work-around-GCC-bug-58416-when-building-for-x86.patch --]
[-- Type: text/x-patch, Size: 3465 bytes --]

From 5b14689f2389df1a4573c36ef9b597c1bd7c6326 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Mon, 15 Jul 2024 10:26:47 -0700
Subject: [PATCH] Work around GCC bug 58416 when building for x86

* configure.ac (C_SWITCH_MACHINE): Add -mfpmath=sse and perhaps
-msse2 to work around GCC bug 58416.
---
 configure.ac | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 etc/NEWS     | 10 ++++++++++
 2 files changed, 63 insertions(+)

diff --git a/configure.ac b/configure.ac
index e2b6dc2fc4d..8d6ff2c4db3 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2325,6 +2325,59 @@ AC_DEFUN
     fi
   ;;
 esac
+
+AC_CACHE_CHECK([for flags to work around GCC bug 58416],
+  [emacs_cv_SSE2_CFLAGS],
+  [emacs_cv_SSE2_CFLAGS='none needed'
+   AS_CASE([$canonical],
+     [[i[3456]86-* | x86_64-*]],
+	[AS_IF([test "$GCC" = yes],
+	   [old_CFLAGS=$CFLAGS
+	    for emacs_cv_SSE2_CFLAGS in \
+		'none needed' '-mfpmath=sse' '-msse2 -mfpmath=sse' 'none work'
+	    do
+	      AS_CASE([$emacs_cv_SSE2_CFLAGS],
+	        ['none work'], [break],
+		['none needed'], [],
+		[CFLAGS="$old_CFLAGS $emacs_cv_SSE2_CFLAGS"])
+	      AC_LINK_IFELSE(
+		[AC_LANG_DEFINES_PROVIDED
+		 [/* Work around GCC bug with double in unions on x86,
+		     where the generated insns copy non-floating-point data
+		     via fldl/fstpl instruction pairs.  This can misbehave
+		     the data's bit pattern looks like a NaN.  See, e.g.:
+			https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58416
+			https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93271
+			https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114659
+		     Problem observed with 'gcc -m32' with GCC 14.1.1
+		     20240607 (Red Hat 14.1.1-5) on x86-64.  */
+		  #include <float.h>
+		  #if \
+		      ((defined __GNUC__ && !defined __clang__) \
+		       && (defined __i386__ || defined __x86_64__) \
+		       && ! (defined FLT_EVAL_METHOD \
+			     && 0 <= FLT_EVAL_METHOD \
+			     && FLT_EVAL_METHOD <= 1))
+		  # error "GCC bug 58416 is possibly present"
+		  #endif
+		  int
+		  main (int argc, char **argv)
+		  {
+		    return argc / 1.3;
+		  }
+		]],
+		[# When not cross-compiling, test that the SSE2 code runs.
+		 # This lets native builds work even on ancient systems
+		 # (e.g., Pentium III, last new model introduced 2003).
+		 AS_CASE([$cross,_compiling,$emacs_cv_SSE2_CFLAGS],
+		   [no,-*],
+		     [./conftest$EXEEXT]) && break])
+	    done
+	    CFLAGS=$old_CFLAGS])])])
+AS_CASE([$emacs_cv_SSE2_CFLAGS],
+  [-*],
+    [C_SWITCH_MACHINE="$C_SWITCH_MACHINE $emacs_cv_SSE2_CFLAGS"])
+
 AC_SUBST([C_SWITCH_MACHINE])
 
 C_SWITCH_SYSTEM=
diff --git a/etc/NEWS b/etc/NEWS
index 60bde2abb40..8ab60503ba4 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -24,6 +24,16 @@ applies, and please also update docstrings as needed.
 \f
 * Installation Changes in Emacs 31.1
 
+** When using GCC to build Emacs on 32-bit x86 systems, 'configure' now
+defaults to specifying the GCC options -msse2 and -mfpmath=sse to work
+around GCC bug 58416.  As a result, the resulting Emacs executable now
+requires support for SSE2, introduced by Intel in 2000 for the Pentium 4
+and by AMD in 2003 for the Opteron and Athlon 64.  To build Emacs with
+GCC for older x86, either build natively on a system that lacks SSE2, or
+pass 'emacs_cv_SSE2_CFLAGS=no' to 'configure'; although the resulting
+Emacs may generate incorrect results or dump core, any such misbehavior
+should be rare.
+
 \f
 * Startup Changes in Emacs 31.1
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* bug#72145: rare Emacs screwups on x86 due to GCC bug 58416
  2024-07-17 21:56     ` Paul Eggert
@ 2024-07-18  2:39       ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-07-18  5:14         ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-07-18  2:39 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 72145

Paul Eggert <eggert@cs.ucla.edu> writes:

> On 2024-07-16 22:01, Paul Eggert wrote:
>> We could add an AC_RUN_IFELSE test for SSE2, though I doubt whether
>> it would affect builds significantly in practice.
>
> On second thought the rare Arch or Gentoo user could still be building
> Emacs for the Pentium III, and for such a user a run-time test on the
> build host would be a win. This can be done via the attached revised
> patch. It uses AC_LINK_IFELSE to compile and run a single program,
> instead of AC_RUN_IFELSE which (when combined with AC_COMPILE_IFELSE)
> would mean compiling two test programs and running one.

I'm thinking of the computer where I produce binaries for Windows 9X,
which, being a Windows 98 system, probably does not support SSE2.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#72145: rare Emacs screwups on x86 due to GCC bug 58416
  2024-07-16 23:26 bug#72145: rare Emacs screwups on x86 due to GCC bug 58416 Paul Eggert
  2024-07-17  0:57 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-07-18  3:22 ` Richard Stallman
  2024-07-18 12:38   ` Paul Eggert
  2024-07-18 14:19 ` Andrea Corallo
  2 siblings, 1 reply; 12+ messages in thread
From: Richard Stallman @ 2024-07-18  3:22 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 72145

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > * Rewrite Emacs to never use 'double' (or 'float' or 'long double') 
  > inside a union. This could be painful and hardly seems worthwhile.

Where does Emacs use those types inside a union?
Maybe this is not difficult.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)







^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#72145: rare Emacs screwups on x86 due to GCC bug 58416
  2024-07-18  2:39       ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-07-18  5:14         ` Eli Zaretskii
  0 siblings, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2024-07-18  5:14 UTC (permalink / raw)
  To: Po Lu; +Cc: 72145, eggert

> Cc: 72145@debbugs.gnu.org
> Date: Thu, 18 Jul 2024 10:39:42 +0800
> From:  Po Lu via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> Paul Eggert <eggert@cs.ucla.edu> writes:
> 
> > On 2024-07-16 22:01, Paul Eggert wrote:
> >> We could add an AC_RUN_IFELSE test for SSE2, though I doubt whether
> >> it would affect builds significantly in practice.
> >
> > On second thought the rare Arch or Gentoo user could still be building
> > Emacs for the Pentium III, and for such a user a run-time test on the
> > build host would be a win. This can be done via the attached revised
> > patch. It uses AC_LINK_IFELSE to compile and run a single program,
> > instead of AC_RUN_IFELSE which (when combined with AC_COMPILE_IFELSE)
> > would mean compiling two test programs and running one.
> 
> I'm thinking of the computer where I produce binaries for Windows 9X,
> which, being a Windows 98 system, probably does not support SSE2.

Look at the Properties to see what kind of CPU it has.  Then you can
establish whether it supports SSE2.

But I think the problem is not where you produce the binaries, the
problem is where people will run them.  On Windows, it is very
frequently a completely different system, so a test on the build host
is insufficient.  I think builds for Windows 9X should use the
'emacs_cv_SSE2_CFLAGS=no' thing regardless of what the build host
supports, because otherwise the binary will simply refuse to run on
the target.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#72145: rare Emacs screwups on x86 due to GCC bug 58416
  2024-07-18  3:22 ` Richard Stallman
@ 2024-07-18 12:38   ` Paul Eggert
  2024-07-18 15:19     ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Eggert @ 2024-07-18 12:38 UTC (permalink / raw)
  To: rms; +Cc: 72145

On 2024-07-17 20:22, Richard Stallman wrote:
>    > * Rewrite Emacs to never use 'double' (or 'float' or 'long double')
>    > inside a union. This could be painful and hardly seems worthwhile.
> 
> Where does Emacs use those types inside a union?
> Maybe this is not difficult.

I found the bug in src/timefns.c, which uses a union to represent 
timestamp forms (one of which represents an Emacs float). Other uses 
that come to mind are src/lisp.h's struct Lisp_Float, which uses a union 
to save space when representing Lisp floats, and src/lread.c's and 
src/print.c's use of <ieee754.h>'s unions to deal with NaNs when reading 
and printing Lisp floats. Although I have not done an audit I expect 
there are other places too, and I expect it would take some time to 
audit, rewrite and thoroughly test Emacs to not use floating point in 
these places, with runtime performance degraded somewhat as a result.

Although that effort might be worth it if the bug was likely and there 
was no other workaround, the bug is quite rare (we've lived with it for 
decades and I'm the first person to notice it, or at least track it 
down), and with the proposed compiler-flag workaround the remaining 
affected platforms are so obsolescent (decades-old CPUs) that they're 
also rare. I doubt whether it's worth significantly contorting the C 
code (possibly introducing bugs on mainstream platforms) to fix these 
exceedingly rare bugs in obsolescent platforms.






^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#72145: rare Emacs screwups on x86 due to GCC bug 58416
  2024-07-16 23:26 bug#72145: rare Emacs screwups on x86 due to GCC bug 58416 Paul Eggert
  2024-07-17  0:57 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-07-18  3:22 ` Richard Stallman
@ 2024-07-18 14:19 ` Andrea Corallo
  2 siblings, 0 replies; 12+ messages in thread
From: Andrea Corallo @ 2024-07-18 14:19 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 72145

Paul Eggert <eggert@cs.ucla.edu> writes:

> While testing GNU Emacs built on Fedora 40 with gcc (GCC) 14.1.1
> 20240607 (Red Hat 14.1.1-5) with -m32 for x86 and configured
> --with-wide-int, I discovered that Emacs misbehaved in a hard-to-debug
> way due to GCC bug 58416. This bug causes GCC to generate wrong x86
> machine instructions when a C program accesses a union containing a
> 'double'.
>
> The bug I observed is that if you have something like this:
>
>    union u { double d; long long int i; } u;
>
> then GCC sometimes generates x86 instructions that copy u.i by using
> fldl/fstpl instruction pairs to push the 64-bit quantity onto the 387
> floating point stack, and then pop the stack into another memory
> location. Unfortunately the fldl/fstpl trick fails in the unusual case
> when the bit pattern of u.i, when interpreted as a double, is a NaN,
> as that can cause the fldl/fstpl pair to store a different NaN with a
> different bit pattern, which means the destination integer disagrees
> with u.i.
>
> The bug is obscure, since the bug's presence depends on the GCC
> version, on the optimization options used, on the exact source code,
> and on the exact integer value at runtime (the value is typically
> copied correctly even when GCC has generated the incorrect machine
> code, since most long long int values don't alias with NaNs).
>
> In short the bug appears to be rare.
>
> Here are some possible courses of action:
>
> * Do nothing and hope x86 users won't run into this rare bug.
>
> * Have the GCC folks fix the bug. However, given that the bug has been
>   reported for over a decade multiple times without a fix, it seems
>   that fixing it is too difficult and/or too low priority for this
>   aging platform. Also, even if the bug is fixed in future GCC the bug
>   will still be present with people using older GCC.
>
> * Build with Clang or some other compiler instead. We should be
>   encouraging GCC, though.
>
> * Rewrite Emacs to never use 'double' (or 'float' or 'long double')
>   inside a union. This could be painful and hardly seems worthwhile.
>
> * When using GCC to build Emacs on x86, compile with safer options
>   that make the bug impossible. The attached proposed patch does that,
>   by telling GCC not to use the 387 stack. (This patch fixed the Emacs
>   misbehavior in my experimental build.) The downside is that the
>   resulting Emacs executables need SSE2, introduced for the Pentium 4
>   in 2000 <https://en.wikipedia.org/wiki/SSE2>. Nowadays few users
>   need to run Emacs on non-SSE2 x86, so this may be good enough. Also,
>   the proposed patch gives the builder an option to compile Emacs
>   without the safer options, for people who want to build for older
>   Intel-compatible platforms and who don't mind an occasional wrong
>  answer or crash.

Mmmh nice one :)

I asked GCC people if they have a suggestion on how to work around this
bug <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58416#c9>.

Thanks

  Andrea





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#72145: rare Emacs screwups on x86 due to GCC bug 58416
  2024-07-18 12:38   ` Paul Eggert
@ 2024-07-18 15:19     ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-07-19 21:31       ` Paul Eggert
  0 siblings, 1 reply; 12+ messages in thread
From: Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-07-18 15:19 UTC (permalink / raw)
  To: Paul Eggert; +Cc: 72145, rms

On Thursday, July 18th, 2024 at 12:38, Paul Eggert <eggert@cs.ucla.edu> wrote:
> On 2024-07-17 20:22, Richard Stallman wrote:
> 
> > > * Rewrite Emacs to never use 'double' (or 'float' or 'long double')
> > > inside a union. This could be painful and hardly seems worthwhile.
> > 
> > Where does Emacs use those types inside a union?
> > Maybe this is not difficult.
> 
> 
> I found the bug in src/timefns.c, which uses a union to represent
> timestamp forms (one of which represents an Emacs float). Other uses
> that come to mind are src/lisp.h's struct Lisp_Float, which uses a union
> to save space when representing Lisp floats, and src/lread.c's and
> src/print.c's use of <ieee754.h>'s unions to deal with NaNs when reading
> 
> and printing Lisp floats. Although I have not done an audit I expect
> there are other places too, and I expect it would take some time to
> audit, rewrite and thoroughly test Emacs to not use floating point in
> these places, with runtime performance degraded somewhat as a result.
> 
> Although that effort might be worth it if the bug was likely and there
> was no other workaround, the bug is quite rare (we've lived with it for
> decades and I'm the first person to notice it, or at least track it
> down), and with the proposed compiler-flag workaround the remaining
> affected platforms are so obsolescent (decades-old CPUs) that they're
> also rare. I doubt whether it's worth significantly contorting the C
> code (possibly introducing bugs on mainstream platforms) to fix these
> exceedingly rare bugs in obsolescent platforms.

It should be mentioned that this isn't just about the CPU: the OS also needs to enable the XMM register set, right? That means we might end up dropping support for many old platforms as well as old CPUs and emulators, and I'm not sure that's a good idea.

Pip





^ permalink raw reply	[flat|nested] 12+ messages in thread

* bug#72145: rare Emacs screwups on x86 due to GCC bug 58416
  2024-07-18 15:19     ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-07-19 21:31       ` Paul Eggert
  2024-08-22  6:44         ` Paul Eggert
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Eggert @ 2024-07-19 21:31 UTC (permalink / raw)
  To: Pip Cet; +Cc: 72145-done, rms

[-- Attachment #1: Type: text/plain, Size: 640 bytes --]

On 2024-07-18 08:19, Pip Cet wrote:
> It should be mentioned that this isn't just about the CPU: the OS also needs to enable the XMM register set, right?

Right.

In <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58416#c10> GCC's 
Richard Biener suggested a more portable workaround: use -fno-tree-sra 
when generating 32-bit x86 code for which it is not known that SSE2 is 
supported. (With SSE2, -mfpmath=sse is a better workaround.) Using 
-fno-tree-rsa means we needn't worry whether the build and host 
platforms use different CPU types.

I did that by installing the attached patch to Emacs on savannah, and am 
closing the bug report.

[-- Attachment #2: 0001-Work-around-GCC-bug-58416-on-32-bit-x86.patch --]
[-- Type: text/x-patch, Size: 3142 bytes --]

From 9f4fc6608212191e1a9e07bf89f38ba9e4ea786c Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Fri, 19 Jul 2024 13:39:21 -0700
Subject: [PATCH] Work around GCC bug 58416 on 32-bit x86

* configure.ac (C_SWITCH_MATCHINE): On 32-bit x86 with GCC 4+,
append -mfpmath=sse (if SSE2 is known to work) or -fno-tree-sra
(otherwise) to work around GCC bug 58416.
* etc/NEWS: Mention this.
---
 configure.ac | 45 +++++++++++++++++++++++++++++++++++++++++++++
 etc/NEWS     |  6 ++++++
 2 files changed, 51 insertions(+)

diff --git a/configure.ac b/configure.ac
index b6acdf2e456..67da852667d 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2333,6 +2333,51 @@ AC_DEFUN
     fi
   ;;
 esac
+
+AC_CACHE_CHECK([for flags to work around GCC bug 58416],
+  [emacs_cv_gcc_bug_58416_CFLAGS],
+  [emacs_cv_gcc_bug_58416_CFLAGS='none needed'
+   AS_CASE([$canonical],
+     [[i[3456]86-* | x86_64-*]],
+       [AS_IF([test "$GCC" = yes],
+	  [old_CFLAGS=$CFLAGS
+	   # If no flags are needed (e.g., not GCC 4+), don't use any.
+	   # Otherwise, use -mfpmath=sse if already assuming SSE2.
+	   # Otherwise, use -fno-tree-sra.
+	   for emacs_cv_gcc_bug_58416_CFLAGS in \
+	       'none needed' -mfpmath=sse -fno-tree-sra
+	   do
+	     AS_CASE([$emacs_cv_gcc_bug_58416_CFLAGS],
+	       ['none needed'], [],
+	       [-fno-tree-sra], [break],
+	       [CFLAGS="$old_CFLAGS $emacs_cv_gcc_bug_58416_CFLAGS"])
+	     AC_COMPILE_IFELSE(
+	       [AC_LANG_DEFINES_PROVIDED
+		[/* Work around GCC bug with double in unions on x86,
+		    where the generated insns copy non-floating-point data
+		    via fldl/fstpl instruction pairs.  This can misbehave
+		    the data's bit pattern looks like a NaN.  See, e.g.:
+		       https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58416#c10
+		       https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71460
+		       https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93271
+		       https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114659
+		    Problem observed with 'gcc -m32' with GCC 14.1.1
+		    20240607 (Red Hat 14.1.1-5) on x86-64.  */
+		 #include <float.h>
+		 #if \
+		     (4 <= __GNUC__ && !defined __clang__ \
+		      && (defined __i386__ || defined __x86_64__) \
+		      && ! (0 <= FLT_EVAL_METHOD && FLT_EVAL_METHOD <= 1))
+		 # error "GCC bug 58416 is possibly present"
+		 #endif
+	       ]],
+	       [break])
+	   done
+	   CFLAGS=$old_CFLAGS])])])
+AS_CASE([$emacs_cv_gcc_bug_58416_CFLAGS],
+  [-*],
+    [C_SWITCH_MACHINE="$C_SWITCH_MACHINE $emacs_cv_gcc_bug_58416_CFLAGS"])
+
 AC_SUBST([C_SWITCH_MACHINE])
 
 C_SWITCH_SYSTEM=
diff --git a/etc/NEWS b/etc/NEWS
index 5429db1dded..0e13f471c74 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -24,6 +24,12 @@ applies, and please also update docstrings as needed.
 \f
 * Installation Changes in Emacs 31.1
 
+** When using GCC 4 or later to build Emacs on 32-bit x86 systems,
+'configure' now defaults to using the GCC options -mfpmath=sse (if the
+host system supports SSE2) or -fno-tree-sra (if not).  These GCC options
+work around GCC bug 58416, which can cause Emacs to behave incorrectly
+in rare cases.
+
 \f
 * Startup Changes in Emacs 31.1
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* bug#72145: rare Emacs screwups on x86 due to GCC bug 58416
  2024-07-19 21:31       ` Paul Eggert
@ 2024-08-22  6:44         ` Paul Eggert
  0 siblings, 0 replies; 12+ messages in thread
From: Paul Eggert @ 2024-08-22  6:44 UTC (permalink / raw)
  To: Pip Cet; +Cc: 72145, rms

[-- Attachment #1: Type: text/plain, Size: 217 bytes --]

GCC bug 58416 has been fixed, and the fix should appear in in the 
forthcoming GCC 15. I installed the attached patch into GNU Emacs, so 
that Emacs no longer attempts to work around the bug if GCC 15+ is being 
used.

[-- Attachment #2: 0001-GCC-bug-58416-is-fixed-in-GCC-15.patch --]
[-- Type: text/x-patch, Size: 1059 bytes --]

From 3d1d4f109ed4115256a08c74eeb704259d91c9f4 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Wed, 21 Aug 2024 23:36:45 -0700
Subject: [PATCH] GCC bug 58416 is fixed in GCC 15
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* configure.ac (emacs_cv_gcc_bug_58416_CFLAGS):
No need for a workaround in GCC 15.

2024-08-19  Paul Eggert  <eggert@cs.ucla.edu>

Remove obsolete comment about ‘volatile’
---
 configure.ac | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index 67da852667d..d1a5b63b924 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2365,7 +2365,7 @@ AC_DEFUN
 		    20240607 (Red Hat 14.1.1-5) on x86-64.  */
 		 #include <float.h>
 		 #if \
-		     (4 <= __GNUC__ && !defined __clang__ \
+		     (4 <= __GNUC__ && __GNUC__ <= 14 && !defined __clang__ \
 		      && (defined __i386__ || defined __x86_64__) \
 		      && ! (0 <= FLT_EVAL_METHOD && FLT_EVAL_METHOD <= 1))
 		 # error "GCC bug 58416 is possibly present"
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-08-22  6:44 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-16 23:26 bug#72145: rare Emacs screwups on x86 due to GCC bug 58416 Paul Eggert
2024-07-17  0:57 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-07-17  5:01   ` Paul Eggert
2024-07-17 21:56     ` Paul Eggert
2024-07-18  2:39       ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-07-18  5:14         ` Eli Zaretskii
2024-07-18  3:22 ` Richard Stallman
2024-07-18 12:38   ` Paul Eggert
2024-07-18 15:19     ` Pip Cet via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-07-19 21:31       ` Paul Eggert
2024-08-22  6:44         ` Paul Eggert
2024-07-18 14:19 ` Andrea Corallo

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.