unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* failing T810-tsan on ppc64el
@ 2023-08-25 11:07 David Bremner
  2023-08-26 13:34 ` Kevin Boulain
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: David Bremner @ 2023-08-25 11:07 UTC (permalink / raw)
  To: Kevin Boulain; +Cc: notmuch


Hi Kevin;

Does this failure make any sense to you?

     https://buildd.debian.org/status/fetch.php?pkg=notmuch&arch=ppc64el&ver=0.38%7Erc0-1&stamp=1692959868&raw=0

I can just disable tsan tests on ppc64el for Debian, but I wondered if
there is an underlying bug that only shows up on ppc64el

d

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: failing T810-tsan on ppc64el
  2023-08-25 11:07 failing T810-tsan on ppc64el David Bremner
@ 2023-08-26 13:34 ` Kevin Boulain
  2023-08-27 10:31   ` David Bremner
  2023-08-30 12:52 ` [PATCH 1/1] test: suppress all interceptors in glib Kevin Boulain
  2023-08-30 13:07 ` failing T810-tsan on ppc64el Kevin Boulain
  2 siblings, 1 reply; 6+ messages in thread
From: Kevin Boulain @ 2023-08-26 13:34 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

On 2023-08-25 at 08:07 -03, David Bremner <david@tethera.net> wrote:
> Does this failure make any sense to you?
>
>      https://buildd.debian.org/status/fetch.php?pkg=notmuch&arch=ppc64el&ver=0.38%7Erc0-1&stamp=1692959868&raw=0

It doesn't ring any bell. Do you happen to know if this is reproducible
or if we could get a better stack trace?

From what I can tell, there are only three references to g_strdup* in
Notmuch and they seem to be safe. I also couldn't spot any usage of
iconv so it could mean this issue is happening in GLib, like a few
others:
https://git.notmuchmail.org/git?p=notmuch;a=blob;f=test/T810-tsan.suppressions;h=dbd16a94971134c7f675debdfc741f15f2c7abeb;hb=HEAD

So I took a quick look at GLib and their issue tracker but I couldn't
see anything obvious (e.g.: the logging system appears to use iconv when
the console isn't UTF-8 but the Debian build system exports
LC_ALL=C.UTF-8, or sometimes filenames are converted to UTF-8, ...).

> I can just disable tsan tests on ppc64el for Debian, but I wondered if
> there is an underlying bug that only shows up on ppc64el

I can't say if this is unrelated to the architecture but if this test is
too annoying to maintain, feel free to disable it.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: failing T810-tsan on ppc64el
  2023-08-26 13:34 ` Kevin Boulain
@ 2023-08-27 10:31   ` David Bremner
  0 siblings, 0 replies; 6+ messages in thread
From: David Bremner @ 2023-08-27 10:31 UTC (permalink / raw)
  To: notmuch@notmuchmail.org

Kevin Boulain <kevin@boula.in> writes:

> On 2023-08-25 at 08:07 -03, David Bremner <david@tethera.net> wrote:
>> Does this failure make any sense to you?
>>
>>      https://buildd.debian.org/status/fetch.php?pkg=notmuch&arch=ppc64el&ver=0.38%7Erc0-1&stamp=1692959868&raw=0
>
> It doesn't ring any bell. Do you happen to know if this is reproducible
> or if we could get a better stack trace?

It seems to be reproducible. I couldn't figure out how to get a
meaningful stack trace, but I did run with TSAN_OPTIONS verbosity=2, and
here is what I got

line 12: 19
==================
WARNING: ThreadSanitizer: data race (pid=1922026)
  Read of size 8 at 0x7ffff4000ae0 by thread T2:
    #0 strlen ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:458 (libtsan.so.2+0x4be2c) (BuildId: f73f09a6be7c39b64e5181d8bff5d6428a26fb06)
    #1 strlen ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:450 (libtsan.so.2+0x4be2c)
    #2 g_strdup <null> (libglib-2.0.so.0+0xac71c) (BuildId: a7cf71e4a2880e9668a36d104e5ba37ce43fcca7)

  Previous write of size 8 at 0x7ffff4000ae0 by thread T1:
    #0 iconv ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:5515 (libtsan.so.2+0x65dd4) (BuildId: f73f09a6be7c39b64e5181d8bff5d6428a26fb06)
    #1 g_iconv <null> (libglib-2.0.so.0+0x42694) (BuildId: a7cf71e4a2880e9668a36d104e5ba37ce43fcca7)

  Location is heap block of size 20 at 0x7ffff4000ae0 allocated by thread T1:
    #0 malloc ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:681 (libtsan.so.2+0x43d6c) (BuildId: f73f09a6be7c39b64e5181d8bff5d6428a26fb06)
    #1 g_malloc <null> (libglib-2.0.so.0+0x82f40) (BuildId: a7cf71e4a2880e9668a36d104e5ba37ce43fcca7)

  Thread T2 (tid=1922029, running) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1036 (libtsan.so.2+0x45084) (BuildId: f73f09a6be7c39b64e5181d8bff5d6428a26fb06)
    #1 main /home/bremner/notmuch-0.38~rc1/test/tmp.foo/test0.c:24 (test0+0xef0) (BuildId: 3ccb421a18018bc8c9014a7dcb9d821c7dc288bf)

  Thread T1 (tid=1922028, running) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1036 (libtsan.so.2+0x45084) (BuildId: f73f09a6be7c39b64e5181d8bff5d6428a26fb06)
    #1 main /home/bremner/notmuch-0.38~rc1/test/tmp.foo/test0.c:23 (test0+0xea0) (BuildId: 3ccb421a18018bc8c9014a7dcb9d821c7dc288bf)

SUMMARY: ThreadSanitizer: data race (/usr/lib/powerpc64le-linux-gnu/libglib-2.0.so.0+0xac71c) (BuildId: a7cf71e4a2880e9668a36d104e5ba37ce43fcca7) in g_strdup
==================

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/1] test: suppress all interceptors in glib
  2023-08-25 11:07 failing T810-tsan on ppc64el David Bremner
  2023-08-26 13:34 ` Kevin Boulain
@ 2023-08-30 12:52 ` Kevin Boulain
  2023-09-02 10:51   ` David Bremner
  2023-08-30 13:07 ` failing T810-tsan on ppc64el Kevin Boulain
  2 siblings, 1 reply; 6+ messages in thread
From: Kevin Boulain @ 2023-08-30 12:52 UTC (permalink / raw)
  To: notmuch; +Cc: Kevin Boulain

On ppc64el, races are detected by TSan:
  WARNING: ThreadSanitizer: data race (pid=4520)
    Read of size 8 at 0x7ffff20016c0 by thread T1:
      #0 strlen ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:386 (libtsan.so.2+0x77c0c)
      #1 strlen ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:378 (libtsan.so.2+0x77c0c)
      #2 g_strdup ../../../glib/gstrfuncs.c:362 (libglib-2.0.so.0+0xa4ac4)

    Previous write of size 8 at 0x7ffff20016c0 by thread T2:
      #0 malloc ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:647 (libtsan.so.2+0x471f0)
      #1 g_malloc ../../../glib/gmem.c:130 (libglib-2.0.so.0+0x7bb68)

    Location is heap block of size 20 at 0x7ffff20016c0 allocated by thread T2:
      #0 malloc ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:647 (libtsan.so.2+0x471f0)
      #1 g_malloc ../../../glib/gmem.c:130 (libglib-2.0.so.0+0x7bb68)

This appears to be a false positive in GLib, as explained at
https://gitlab.gnome.org/GNOME/glib/-/issues/1672#note_1831968
In short, a call to fstat fails under TSan and GLib's g_sterror will
intern the error message, which will be reused by other threads.

Since upstream appears to be aware that GLib doesn't play nicely with
TSan, suppress everything coming from the library instead of
maintaining a fine grained list.

Reported at
https://buildd.debian.org/status/fetch.php?pkg=notmuch&arch=ppc64el&ver=0.38%7Erc0-1&stamp=1692959868&raw=0
---
 test/T810-tsan.suppressions | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/test/T810-tsan.suppressions b/test/T810-tsan.suppressions
index dbd16a94..80dc062f 100644
--- a/test/T810-tsan.suppressions
+++ b/test/T810-tsan.suppressions
@@ -1,5 +1,3 @@
 # It's unclear how TSan-friendly GLib is:
 # https://gitlab.gnome.org/GNOME/glib/-/issues/1672
-race:g_rw_lock_reader_lock
-# https://gitlab.gnome.org/GNOME/glib/-/issues/1952
-race:g_slice_alloc0
+called_from_lib:libglib*.so

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: failing T810-tsan on ppc64el
  2023-08-25 11:07 failing T810-tsan on ppc64el David Bremner
  2023-08-26 13:34 ` Kevin Boulain
  2023-08-30 12:52 ` [PATCH 1/1] test: suppress all interceptors in glib Kevin Boulain
@ 2023-08-30 13:07 ` Kevin Boulain
  2 siblings, 0 replies; 6+ messages in thread
From: Kevin Boulain @ 2023-08-30 13:07 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch

On 2023-08-25 at 08:07 -03, David Bremner <david@tethera.net> wrote:
> I can just disable tsan tests on ppc64el for Debian, but I wondered if
> there is an underlying bug that only shows up on ppc64el

I see you've skipped T810-tsan in 90c61828, I think it's fair for the
time being. I did some digging and sent a patch to silence the races
reported by TSan in
https://buildd.debian.org/status/fetch.php?pkg=notmuch&arch=ppc64el&ver=0.38%7Erc0-1&stamp=1692959868&raw=0

However, the races are just a red herring and the diff actually shows
the key problem (easier to see once the suppressions have been updated):
  line 12: 19
Surely, you're more familiar than me with the output so you might have
noticed already, but it means this line fails:
  https://git.notmuchmail.org/git?p=notmuch;a=blob;f=test/T810-tsan.sh;h=4071e2968f2ad62eb6642c68b39f2750327682d0;hb=HEAD#l68

The call stack is as follow (I'm linking to the latest versions for
simplicity, not the ones packaged by Debian):
  _load_key_file https://git.notmuchmail.org/git?p=notmuch;a=blob;f=lib/open.cc;h=54d1faf30127c8a72f3c96e838cf9d4edba7e70a;hb=HEAD#l155
  g_key_file_load_from_file https://gitlab.gnome.org/GNOME/glib/-/blob/197e6d6f5df6bc809bd3deaadec9519af2951cd2/glib/gkeyfile.c#L937
  g_key_file_load_from_fd https://gitlab.gnome.org/GNOME/glib/-/blob/197e6d6f5df6bc809bd3deaadec9519af2951cd2/glib/gkeyfile.c#L829

The rest is available at
https://gitlab.gnome.org/GNOME/glib/-/issues/1672#note_1831968
but doesn't matter in this case.

So, fstat failed but the syscall isn't actually performed: strace only
reports an open followed nearly immediately by a close for this fd. That
means glibc must be returning the EINVAL that we're seeing. There's a
case where ___fxstat64 can actually set errno to that:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/fxstat64.c;h=b8c4c0a13c7c9d5a426a9c17a96d5db8fd08455e;hb=HEAD#l52

So, something is feeding an invalid 'vers' to glibc (inspecting the
assembly in gdb reveals it's set to 0 and it's checked against a 1).
TSan actually installs interceptors for fstat:
  __interceptor_fstat64 (fd=3, buf=0x7fffee63d418) at ../../../../src/libsanitizer/tsan/tsan_rtl.h:242
  [...]
  ___fxstat64 (vers=0, fd=3, buf=0x7fffee63d418) at ../sysdeps/unix/sysv/linux/fxstat64.c:50

Because it's called __interceptor_fstat64 I believe it's
https://github.com/llvm/llvm-project/blob/04b1276ad3b8976241228be8a966b1557f63492f/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp#L1634
(not sure how to get the sources for TSan under Debian), which does
hardcode 'vers' to 0. Commenting out
  TEST_CFLAGS="${TEST_CFLAGS:-} -fsanitize=thread"
to eliminate the interceptor in the test suffices to make fstat succeed.

I'll see if I can find something more, otherwise I'll probably just
report this problem upstream.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] test: suppress all interceptors in glib
  2023-08-30 12:52 ` [PATCH 1/1] test: suppress all interceptors in glib Kevin Boulain
@ 2023-09-02 10:51   ` David Bremner
  0 siblings, 0 replies; 6+ messages in thread
From: David Bremner @ 2023-09-02 10:51 UTC (permalink / raw)
  To: Kevin Boulain, notmuch; +Cc: Kevin Boulain

Kevin Boulain <kevin@boula.in> writes:

> On ppc64el, races are detected by TSan:
>   WARNING: ThreadSanitizer: data race (pid=4520)
>     Read of size 8 at 0x7ffff20016c0 by thread T1:
>       #0 strlen ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:386 (libtsan.so.2+0x77c0c)
>       #1 strlen ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:378 (libtsan.so.2+0x77c0c)
>       #2 g_strdup ../../../glib/gstrfuncs.c:362 (libglib-2.0.so.0+0xa4ac4)
>

Applied to master.

d

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-09-02 10:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-25 11:07 failing T810-tsan on ppc64el David Bremner
2023-08-26 13:34 ` Kevin Boulain
2023-08-27 10:31   ` David Bremner
2023-08-30 12:52 ` [PATCH 1/1] test: suppress all interceptors in glib Kevin Boulain
2023-09-02 10:51   ` David Bremner
2023-08-30 13:07 ` failing T810-tsan on ppc64el Kevin Boulain

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).