unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* fftwf tests running for 25+ hours - is this normal?
@ 2019-10-15  3:46 Chris Marusich
  2019-10-15  5:52 ` Bengt Richter
  2019-10-15 14:12 ` Matteo Frigo
  0 siblings, 2 replies; 8+ messages in thread
From: Chris Marusich @ 2019-10-15  3:46 UTC (permalink / raw)
  To: guix-devel; +Cc: fftw

[-- Attachment #1: Type: text/plain, Size: 1115 bytes --]

Hi,

I've been trying to reconfigure my system on my x200 laptop.  It's been
a day or two, and I've found that my computer has spent the last 25
hours running fftwf tests.  I know my laptop is rather weak (2 cores, 8
GB of memory, an SSD) compared to most desktops today, but still, 25
hours seems pretty long.

Guix is running

  make check -j 2

which ultimately calls

  perl -w ./check.pl -r -c=30 -v --nthreads=2 /tmp/guix-build-fftwf-3.3.8.drv-0/fftw-3.3.8/tests/bench

which calls

  /tmp/guix-build-fftwf-3.3.8.drv-0/fftw-3.3.8/tests/.libs/bench -o nthreads=2 --verbose=1 --verify //obcd34320 [... a lot of arguments ...]

Despite the fact that "-j 2" was provided to make, only 1 CPU is pegged
at 100% utilization.  I suppose I'll just wait and see what happens, but
25 hours feels very long.  It seems on ci.guix.gnu.org, the build times
are generally less than 1 hour:

  https://ci.guix.gnu.org/search?query=fftwf

Is this normal, or should I kill the build and try again?  What kind of
system is ci.guix.gnu.org building on (how many cores, how much memory)?

-- 
Chris

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fftwf tests running for 25+ hours - is this normal?
  2019-10-15  3:46 fftwf tests running for 25+ hours - is this normal? Chris Marusich
@ 2019-10-15  5:52 ` Bengt Richter
  2019-10-15  6:38   ` Pierre Neidhardt
  2019-10-15 18:32   ` Chris Marusich
  2019-10-15 14:12 ` Matteo Frigo
  1 sibling, 2 replies; 8+ messages in thread
From: Bengt Richter @ 2019-10-15  5:52 UTC (permalink / raw)
  To: Chris Marusich; +Cc: guix-devel, fftw

On +2019-10-14 20:46:01 -0700, Chris Marusich wrote:
> Hi,
> 
> I've been trying to reconfigure my system on my x200 laptop.  It's been
> a day or two, and I've found that my computer has spent the last 25
> hours running fftwf tests.  I know my laptop is rather weak (2 cores, 8
> GB of memory, an SSD) compared to most desktops today, but still, 25
> hours seems pretty long.
> 
> Guix is running
> 
>   make check -j 2
> 
> which ultimately calls
> 
>   perl -w ./check.pl -r -c=30 -v --nthreads=2 /tmp/guix-build-fftwf-3.3.8.drv-0/fftw-3.3.8/tests/bench
> 
> which calls
> 
>   /tmp/guix-build-fftwf-3.3.8.drv-0/fftw-3.3.8/tests/.libs/bench -o nthreads=2 --verbose=1 --verify //obcd34320 [... a lot of arguments ...]
> 
> Despite the fact that "-j 2" was provided to make, only 1 CPU is pegged
> at 100% utilization.  I suppose I'll just wait and see what happens, but
> 25 hours feels very long.  It seems on ci.guix.gnu.org, the build times
> are generally less than 1 hour:
> 
>   https://ci.guix.gnu.org/search?query=fftwf
> 
> Is this normal, or should I kill the build and try again?  What kind of
> system is ci.guix.gnu.org building on (how many cores, how much memory)?
> 
> -- 
> Chris

Hi Chris,

Have you checked sensors for overheating that might induce CPU clock throttling?

That happened to me when I did guix pull; guix upgrade; and it got to doing racket,
which started out fast, but got really slow -- though it did finally crawl across
the finish line, and cooled off.

I guess it would show up in dmesg or journalctl -xe but I didn't look, sorry.

HTH
--
Regards,
Bengt Richter

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fftwf tests running for 25+ hours - is this normal?
  2019-10-15  5:52 ` Bengt Richter
@ 2019-10-15  6:38   ` Pierre Neidhardt
  2019-10-15 18:32   ` Chris Marusich
  1 sibling, 0 replies; 8+ messages in thread
From: Pierre Neidhardt @ 2019-10-15  6:38 UTC (permalink / raw)
  To: Bengt Richter, Chris Marusich; +Cc: guix-devel, fftw

[-- Attachment #1: Type: text/plain, Size: 410 bytes --]

Bengt Richter <bokr@bokr.com> writes:

> That happened to me when I did guix pull; guix upgrade; and it got to doing racket,
> which started out fast, but got really slow -- though it did finally crawl across
> the finish line, and cooled off.

I don't know if this is related, but racket takes ages to go through the
"install" phase here on my desktop.

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fftwf tests running for 25+ hours - is this normal?
  2019-10-15  3:46 fftwf tests running for 25+ hours - is this normal? Chris Marusich
  2019-10-15  5:52 ` Bengt Richter
@ 2019-10-15 14:12 ` Matteo Frigo
  1 sibling, 0 replies; 8+ messages in thread
From: Matteo Frigo @ 2019-10-15 14:12 UTC (permalink / raw)
  To: Chris Marusich; +Cc: guix-devel, fftw

Chris,

thanks for reporting this issue.  (By the way, did the test ever
finish?)

Most likely, the problem is in FFTW and not in your machine.  Would you
mind reporting the exact configure options (I care specifically about
--enable-sse2, --enable-avx, etc.) and the exact command line that is
taking 25h?

The issue is probably related to the funny integer 34320.  The FFT
algorithm depends on the factorization of the size, 34320 in this case,
which is 34320=2*2*2*2*3*5*11*13.  FFTW tries to find the optimal way to
decompose 34320 into small problems that it can solve.  For example, it
may reduce a DFT of size 34320 into 34320/13 FFTs of size 13 followed by
13 FFTs of size 34320/13.  Or it may reduce a DFT of size 34320 into
34320/11 FFTs of size 11 followed by 11 FFTs of size 34320/11.  Because
there are many distinct prime factors, the number of factorizations is
really large and FFTW is pretty much trying them all.  The problem is
even harder once you consider the fact that some sizes are odd, and thus
they don't naturally match the vector length of SSE/AVX instructions.

FFTW has some heuristics to limit the search space in cases like this,
but these heuristics were implemented in a simpler world (before AVX)
and maybe they are not adequate any longer, so I want to look into this.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fftwf tests running for 25+ hours - is this normal?
  2019-10-15  5:52 ` Bengt Richter
  2019-10-15  6:38   ` Pierre Neidhardt
@ 2019-10-15 18:32   ` Chris Marusich
  2019-10-16  5:11     ` Bengt Richter
  1 sibling, 1 reply; 8+ messages in thread
From: Chris Marusich @ 2019-10-15 18:32 UTC (permalink / raw)
  To: Bengt Richter, Matteo Frigo; +Cc: guix-devel, fftw


[-- Attachment #1.1: Type: text/plain, Size: 17274 bytes --]

Hi Bengt and Matteo,

Bengt Richter <bokr@bokr.com> writes:

> Have you checked sensors for overheating that might induce CPU clock throttling?

Actually, yes, I happened to be watching dmesg output at the time, and I
did notice these messages (no similar messages have been printed since
then, which was many hours ago):

--8<---------------cut here---------------start------------->8---
[180270.045081] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 124733)
[180270.045567] mce: CPU1: Core temperature/speed normal
[180570.044352] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 134902)
[180570.044838] mce: CPU1: Core temperature/speed normal
[180875.663432] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 143897)
[180875.663918] mce: CPU1: Core temperature/speed normal
[181175.748616] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 153503)
[181175.749103] mce: CPU1: Core temperature/speed normal
[181476.496915] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 171377)
[181476.497401] mce: CPU1: Core temperature/speed normal
[181776.914264] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 185828)
[181776.914751] mce: CPU1: Core temperature/speed normal
[182076.914391] mce: CPU1: Core temperature/speed normal
[182377.322112] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 231578)
[182377.322598] mce: CPU1: Core temperature/speed normal
--8<---------------cut here---------------end--------------->8---

Is this throttling permanent, or is the throttling released after the
temperature returns to normal?

Matteo Frigo <athena@fftw.org> writes:

> thanks for reporting this issue.  (By the way, did the test ever
> finish?)

It's still running.  It's been 38 hours now.

> Most likely, the problem is in FFTW and not in your machine.  Would you
> mind reporting the exact configure options (I care specifically about
> --enable-sse2, --enable-avx, etc.) and the exact command line that is
> taking 25h?

These two options were included in the configure invocation.

> The issue is probably related to the funny integer 34320.  The FFT
> algorithm depends on the factorization of the size, 34320 in this case,
> which is 34320=2*2*2*2*3*5*11*13.  FFTW tries to find the optimal way to
> decompose 34320 into small problems that it can solve.  For example, it
> may reduce a DFT of size 34320 into 34320/13 FFTs of size 13 followed by
> 13 FFTs of size 34320/13.  Or it may reduce a DFT of size 34320 into
> 34320/11 FFTs of size 11 followed by 11 FFTs of size 34320/11.  Because
> there are many distinct prime factors, the number of factorizations is
> really large and FFTW is pretty much trying them all.  The problem is
> even harder once you consider the fact that some sizes are odd, and thus
> they don't naturally match the vector length of SSE/AVX instructions.
>
> FFTW has some heuristics to limit the search space in cases like this,
> but these heuristics were implemented in a simpler world (before AVX)
> and maybe they are not adequate any longer, so I want to look into this.

Thank you for taking a look!  Here is how the test is being run.

Output of "pstree -lap 20098":

--8<---------------cut here---------------start------------->8---
make,20098 check -j 2
  `-bash,20099 -c fail=; \\\012if (target_option=k; case ${target_option-} in ?) ;; *) echo "am__make_running_with_option: internal error: invalid" "target option '${target_option-}' specified" >&2; exit 1;; esac; has_opt=no; sane_makeflags=$MAKEFLAGS; if { if test -z '0'; then false; elif test -n 'x86_64-unknown-linux-gnu'; then true; elif test -n '4.2.1' && test -n '/tmp/guix-build-fftwf-3.3.8.drv-0/fftw-3.3.8'; then true; else false; fi; }; then sane_makeflags=$MFLAGS; else case $MAKEFLAGS in *\\\\[\\ \\\011]*) bs=\\\\; sane_makeflags=`printf '%s\\n' "$MAKEFLAGS" | sed "s/$bs$bs[$bs $bs\011]*//g"`;; esac; fi; skip_next=no; strip_trailopt () { flg=`printf '%s\\n' "$flg" | sed "s/$1.*$//"`; }; for flg in $sane_makeflags; do test $skip_next = yes && { skip_next=no; continue; }; case $flg in *=*|--*) continue;; -*I) strip_trailopt 'I'; skip_next=yes;; -*I?*) strip_trailopt 'I';; -*O) strip_trailopt 'O'; skip_next=yes;; -*O?*) strip_trailopt 'O';; -*l) strip_trailopt 'l'; skip_next=yes;; -*l?*) strip_trailopt 'l';; -[dEDm]) skip_next=yes;; -[JT]) skip_next=yes;; esac; case $flg in *$target_option*) has_opt=yes; break;; esac; done; test $has_opt = yes); then \\\012  failcom='fail=yes'; \\\012else \\\012  failcom='exit 1'; \\\012fi; \\\012dot_seen=no; \\\012target=`echo check-recursive | sed s/-recursive//`; \\\012case "check-recursive" in \\\012  distclean-* | maintainer-clean-*) list='support genfft kernel simd-support dft rdft reodft api libbench2 . threads tests mpi doc tools m4' ;; \\\012  *) list='support  kernel simd-support dft rdft reodft api libbench2 . threads tests mpi doc tools m4' ;; \\\012esac; \\\012for subdir in $list; do \\\012  echo "Making $target in $subdir"; \\\012  if test "$subdir" = "."; then \\\012    dot_seen=yes; \\\012    local_target="$target-am"; \\\012  else \\\012    local_target="$target"; \\\012  fi; \\\012  (CDPATH="${ZSH_VERSION+.}:" && cd $subdir && make  $local_target) \\\012  || eval $failcom; \\\012done; \\\012if test "$dot_seen" = "no"; then \\\012  make  "$target-am" || exit 1; \\\012fi; test -z "$fail"
      `-bash,20267 -c fail=; \\\012if (target_option=k; case ${target_option-} in ?) ;; *) echo "am__make_running_with_option: internal error: invalid" "target option '${target_option-}' specified" >&2; exit 1;; esac; has_opt=no; sane_makeflags=$MAKEFLAGS; if { if test -z '0'; then false; elif test -n 'x86_64-unknown-linux-gnu'; then true; elif test -n '4.2.1' && test -n '/tmp/guix-build-fftwf-3.3.8.drv-0/fftw-3.3.8'; then true; else false; fi; }; then sane_makeflags=$MFLAGS; else case $MAKEFLAGS in *\\\\[\\ \\\011]*) bs=\\\\; sane_makeflags=`printf '%s\\n' "$MAKEFLAGS" | sed "s/$bs$bs[$bs $bs\011]*//g"`;; esac; fi; skip_next=no; strip_trailopt () { flg=`printf '%s\\n' "$flg" | sed "s/$1.*$//"`; }; for flg in $sane_makeflags; do test $skip_next = yes && { skip_next=no; continue; }; case $flg in *=*|--*) continue;; -*I) strip_trailopt 'I'; skip_next=yes;; -*I?*) strip_trailopt 'I';; -*O) strip_trailopt 'O'; skip_next=yes;; -*O?*) strip_trailopt 'O';; -*l) strip_trailopt 'l'; skip_next=yes;; -*l?*) strip_trailopt 'l';; -[dEDm]) skip_next=yes;; -[JT]) skip_next=yes;; esac; case $flg in *$target_option*) has_opt=yes; break;; esac; done; test $has_opt = yes); then \\\012  failcom='fail=yes'; \\\012else \\\012  failcom='exit 1'; \\\012fi; \\\012dot_seen=no; \\\012target=`echo check-recursive | sed s/-recursive//`; \\\012case "check-recursive" in \\\012  distclean-* | maintainer-clean-*) list='support genfft kernel simd-support dft rdft reodft api libbench2 . threads tests mpi doc tools m4' ;; \\\012  *) list='support  kernel simd-support dft rdft reodft api libbench2 . threads tests mpi doc tools m4' ;; \\\012esac; \\\012for subdir in $list; do \\\012  echo "Making $target in $subdir"; \\\012  if test "$subdir" = "."; then \\\012    dot_seen=yes; \\\012    local_target="$target-am"; \\\012  else \\\012    local_target="$target"; \\\012  fi; \\\012  (CDPATH="${ZSH_VERSION+.}:" && cd $subdir && make  $local_target) \\\012  || eval $failcom; \\\012done; \\\012if test "$dot_seen" = "no"; then \\\012  make  "$target-am" || exit 1; \\\012fi; test -z "$fail"
          `-make,20268 check
              `-make,20269 check-local
                  `-perl,20431 -w ./check.pl -r -c=30 -v --nthreads=2 /tmp/guix-build-fftwf-3.3.8.drv-0/fftw-3.3.8/tests/bench
                      `-bench,20572 -o nthreads=2 --verbose=1 --verify //obcd34320 --verify //ibcd34320 --verify //ofcd34320 --verify //ifcd34320 --verify obcd34320 --verify ibcd34320 --verify ofcd34320 --verify ifcd34320 --verify okd1200e11 --verify ikd1200e11 --verify obr9x5x6v12 --verify ibr9x5x6v12 --verify ofr9x5x6v12 --verify ifr9x5x6v12 --verify //obc9x5x6v12 --verify //ibc9x5x6v12 --verify //ofc9x5x6v12 --verify //ifc9x5x6v12 --verify obc9x5x6v12 --verify ibc9x5x6v12 --verify ofc9x5x6v12 --verify ifc9x5x6v12 --verify ok65e01x36e10 --verify ik65e01x36e10 --verify obr8x9x3*17 --verify ibr8x9x3*17 --verify ofr8x9x3*17 --verify ifr8x9x3*17 --verify //obc8x9x3*17 --verify //ibc8x9x3*17 --verify //ofc8x9x3*17 --verify //ifc8x9x3*17 --verify obc8x9x3*17 --verify ibc8x9x3*17 --verify ofc8x9x3*17 --verify ifc8x9x3*17 --verify ok7o10x28e00v21 --verify ik7o10x28e00v21 --verify obr8x4x4x5v3 --verify ibr8x4x4x5v3 --verify ofr8x4x4x5v3 --verify ifr8x4x4x5v3 --verify //obc8x4x4x5v3 --verify //ibc8x4x4x5v3
                          `-{bench},20586
--8<---------------cut here---------------end--------------->8---

For some reason, the pstree output seems to have a lot of backslash
escapes in it.  That might be a little hard to read, so here is the same
thing from "ps --forest -ef", which for some reason does not have the
same number of backslash escapes:

--8<---------------cut here---------------start------------->8---
UID        PID  PPID  C STIME TTY          TIME CMD
[... omitted ...]
guixbui+ 20098 31417  0 Oct13 ?        00:00:00          \_ make check -j 2
guixbui+ 20099 20098  0 Oct13 ?        00:00:00              \_ /gnu/store/vd5fcqira1q4ibq5q7bfdfpcmdyy6fxg-bash-minimal-5.0.7/bin/bash -c fail=; \ if (target_option=k; case ${target_option-} in ?) ;; *) echo "am__make_running_with_option: internal error: invalid" "target option '${target_option-}' specified" >&2; exit 1;; esac; has_opt=no; sane_makeflags=$MAKEFLAGS; if { if test -z '0'; then false; elif test -n 'x86_64-unknown-linux-gnu'; then true; elif test -n '4.2.1' && test -n '/tmp/guix-build-fftwf-3.3.8.drv-0/fftw-3.3.8'; then true; else false; fi; }; then sane_makeflags=$MFLAGS; else case $MAKEFLAGS in *\\[\ \?]*) bs=\\; sane_makeflags=`printf '%s\n' "$MAKEFLAGS" | sed "s/$bs$bs[$bs $bs?]*//g"`;; esac; fi; skip_next=no; strip_trailopt () { flg=`printf '%s\n' "$flg" | sed "s/$1.*$//"`; }; for flg in $sane_makeflags; do test $skip_next = yes && { skip_next=no; continue; }; case $flg in *=*|--*) continue;; -*I) strip_trailopt 'I'; skip_next=yes;; -*I?*) strip_trailopt 'I';; -*O) strip_trailopt 'O'; skip_next=yes;; -*O?*) strip_trailopt 'O';; -*l) strip_trailopt 'l'; skip_next=yes;; -*l?*) strip_trailopt 'l';; -[dEDm]) skip_next=yes;; -[JT]) skip_next=yes;; esac; case $flg in *$target_option*) has_opt=yes; break;; esac; done; test $has_opt = yes); then \   failcom='fail=yes'; \ else \   failcom='exit 1'; \ fi; \ dot_seen=no; \ target=`echo check-recursive | sed s/-recursive//`; \ case "check-recursive" in \   distclean-* | maintainer-clean-*) list='support genfft kernel simd-support dft rdft reodft api libbench2 . threads tests mpi doc tools m4' ;; \   *) list='support  kernel simd-support dft rdft reodft api libbench2 . threads tests mpi doc tools m4' ;; \ esac; \ for subdir in $list; do \   echo "Making $target in $subdir"; \   if test "$subdir" = "."; then \     dot_seen=yes; \     local_target="$target-am"; \   else \     local_target="$target"; \   fi; \   (CDPATH="${ZSH_VERSION+.}:" && cd $subdir && make  $local_target) \   || eval $failcom; \ done; \ if test "$dot_seen" = "no"; then \   make  "$target-am" || exit 1; \ fi; test -z "$fail"
guixbui+ 20267 20099  0 Oct13 ?        00:00:00                  \_ /gnu/store/vd5fcqira1q4ibq5q7bfdfpcmdyy6fxg-bash-minimal-5.0.7/bin/bash -c fail=; \ if (target_option=k; case ${target_option-} in ?) ;; *) echo "am__make_running_with_option: internal error: invalid" "target option '${target_option-}' specified" >&2; exit 1;; esac; has_opt=no; sane_makeflags=$MAKEFLAGS; if { if test -z '0'; then false; elif test -n 'x86_64-unknown-linux-gnu'; then true; elif test -n '4.2.1' && test -n '/tmp/guix-build-fftwf-3.3.8.drv-0/fftw-3.3.8'; then true; else false; fi; }; then sane_makeflags=$MFLAGS; else case $MAKEFLAGS in *\\[\ \?]*) bs=\\; sane_makeflags=`printf '%s\n' "$MAKEFLAGS" | sed "s/$bs$bs[$bs $bs?]*//g"`;; esac; fi; skip_next=no; strip_trailopt () { flg=`printf '%s\n' "$flg" | sed "s/$1.*$//"`; }; for flg in $sane_makeflags; do test $skip_next = yes && { skip_next=no; continue; }; case $flg in *=*|--*) continue;; -*I) strip_trailopt 'I'; skip_next=yes;; -*I?*) strip_trailopt 'I';; -*O) strip_trailopt 'O'; skip_next=yes;; -*O?*) strip_trailopt 'O';; -*l) strip_trailopt 'l'; skip_next=yes;; -*l?*) strip_trailopt 'l';; -[dEDm]) skip_next=yes;; -[JT]) skip_next=yes;; esac; case $flg in *$target_option*) has_opt=yes; break;; esac; done; test $has_opt = yes); then \   failcom='fail=yes'; \ else \   failcom='exit 1'; \ fi; \ dot_seen=no; \ target=`echo check-recursive | sed s/-recursive//`; \ case "check-recursive" in \   distclean-* | maintainer-clean-*) list='support genfft kernel simd-support dft rdft reodft api libbench2 . threads tests mpi doc tools m4' ;; \   *) list='support  kernel simd-support dft rdft reodft api libbench2 . threads tests mpi doc tools m4' ;; \ esac; \ for subdir in $list; do \   echo "Making $target in $subdir"; \   if test "$subdir" = "."; then \     dot_seen=yes; \     local_target="$target-am"; \   else \     local_target="$target"; \   fi; \   (CDPATH="${ZSH_VERSION+.}:" && cd $subdir && make  $local_target) \   || eval $failcom; \ done; \ if test "$dot_seen" = "no"; then \   make  "$target-am" || exit 1; \ fi; test -z "$fail"
guixbui+ 20268 20267  0 Oct13 ?        00:00:00                      \_ make check
guixbui+ 20269 20268  0 Oct13 ?        00:00:00                          \_ make check-local
guixbui+ 20431 20269  0 Oct13 ?        00:00:00                              \_ perl -w ./check.pl -r -c=30 -v --nthreads=2 /tmp/guix-build-fftwf-3.3.8.drv-0/fftw-3.3.8/tests/bench
guixbui+ 20572 20431 99 Oct13 ?        1-15:47:52                                  \_ /tmp/guix-build-fftwf-3.3.8.drv-0/fftw-3.3.8/tests/.libs/bench -o nthreads=2 --verbose=1 --verify //obcd34320 --verify //ibcd34320 --verify //ofcd34320 --verify //ifcd34320 --verify obcd34320 --verify ibcd34320 --verify ofcd34320 --verify ifcd34320 --verify okd1200e11 --verify ikd1200e11 --verify obr9x5x6v12 --verify ibr9x5x6v12 --verify ofr9x5x6v12 --verify ifr9x5x6v12 --verify //obc9x5x6v12 --verify //ibc9x5x6v12 --verify //ofc9x5x6v12 --verify //ifc9x5x6v12 --verify obc9x5x6v12 --verify ibc9x5x6v12 --verify ofc9x5x6v12 --verify ifc9x5x6v12 --verify ok65e01x36e10 --verify ik65e01x36e10 --verify obr8x9x3*17 --verify ibr8x9x3*17 --verify ofr8x9x3*17 --verify ifr8x9x3*17 --verify //obc8x9x3*17 --verify //ibc8x9x3*17 --verify //ofc8x9x3*17 --verify //ifc8x9x3*17 --verify obc8x9x3*17 --verify ibc8x9x3*17 --verify ofc8x9x3*17 --verify ifc8x9x3*17 --verify ok7o10x28e00v21 --verify ik7o10x28e00v21 --verify obr8x4x4x5v3 --verify ibr8x4x4x5v3 --verify ofr8x4x4x5v3 --verify ifr8x4x4x5v3 --verify //obc8x4x4x5v3 --verify //ibc8x4x4x5v3
--8<---------------cut here---------------end--------------->8---

The configure invocation, according to the config.log, was:

--8<---------------cut here---------------start------------->8---
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by fftw configure 3.3.8, which was
generated by GNU Autoconf 2.69.  Invocation command line was

  $ ./configure CONFIG_SHELL=/gnu/store/vd5fcqira1q4ibq5q7bfdfpcmdyy6fxg-bash-minimal-5.0.7/bin/bash SHELL=/gnu/store/vd5fcqira1q4ibq5q7bfdfpcmdyy6fxg-bash-minimal-5.0.7/bin/bash --prefix=/gnu/store/w21mrlm5bl2q54x8xlij0cbf4pjzxcii-fftwf-3.3.8 --enable-fast-install --build=x86_64-unknown-linux-gnu --enable-single --enable-shared --enable-openmp --enable-threads --enable-sse2 --enable-avx --enable-avx2 --enable-avx512 --enable-avx-128-fma ax_cv_c_flags__mtune_native=no
--8<---------------cut here---------------end--------------->8---

I've attached the config.log (config.log.gz) to this email, since it
might be helpful.  I've also attached the build log
(fftwf-3.3.8.drv.bz2), which contains all the lines that the build logic
has printed up to this point.

Additionally, if you're familiar with Guix, you might find the package
definition to be helpful, as well.  It is defined here:

https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/algebra.scm?id=6c50e1dc0625f89884cff40b22627091efa37708#n774

To be specific, I'm building fftwf.  The fftwf package builds a
single-precision version of fftw by customizing the fftw package's
configure options a little bit.  I'm building fftwf from a custom Guix
checkout with a few personal commits unrelated to fftw, but the commit
6c50e1dc0625f89884cff40b22627091efa37708 is similar enough that you
should be able to reproduce the issue with Guix if you run "guix pull
--commit=6c50e1dc0625f89884cff40b22627091efa37708" and then run "guix
build --keep-failed fftwf".

For now, I'll keep on waiting and see what happens...  If any other info
would be helpful, let me know and I'll see what I can do!

Thank you for your help,

-- 
Chris

[-- Attachment #1.2: config.log.gz --]
[-- Type: application/octet-stream, Size: 14832 bytes --]

[-- Attachment #1.3: fftwf-3.3.8.drv.bz2 --]
[-- Type: application/octet-stream, Size: 65536 bytes --]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fftwf tests running for 25+ hours - is this normal?
  2019-10-15 18:32   ` Chris Marusich
@ 2019-10-16  5:11     ` Bengt Richter
  2019-10-16  5:29       ` Bengt Richter
  0 siblings, 1 reply; 8+ messages in thread
From: Bengt Richter @ 2019-10-16  5:11 UTC (permalink / raw)
  To: Chris Marusich; +Cc: guix-devel, fftw, Matteo Frigo

On +2019-10-15 11:32:38 -0700, Chris Marusich wrote:
> Hi Bengt and Matteo,
> 
> Bengt Richter <bokr@bokr.com> writes:
> 
> > Have you checked sensors for overheating that might induce CPU clock throttling?
> 
> Actually, yes, I happened to be watching dmesg output at the time, and I
> did notice these messages (no similar messages have been printed since
> then, which was many hours ago):
> 
> --8<---------------cut here---------------start------------->8---
> [180270.045081] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 124733)
> [180270.045567] mce: CPU1: Core temperature/speed normal
> [180570.044352] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 134902)
> [180570.044838] mce: CPU1: Core temperature/speed normal
> [180875.663432] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 143897)
> [180875.663918] mce: CPU1: Core temperature/speed normal
> [181175.748616] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 153503)
> [181175.749103] mce: CPU1: Core temperature/speed normal
> [181476.496915] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 171377)
> [181476.497401] mce: CPU1: Core temperature/speed normal
> [181776.914264] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 185828)
> [181776.914751] mce: CPU1: Core temperature/speed normal
> [182076.914391] mce: CPU1: Core temperature/speed normal
> [182377.322112] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 231578)
> [182377.322598] mce: CPU1: Core temperature/speed normal
> --8<---------------cut here---------------end--------------->8---
> 
> Is this throttling permanent, or is the throttling released after the
> temperature returns to normal?

In my case, throttling ceased as soon as temps normalized.
If it happens to me again, I will try

lscpu|grep -i '^CPU m'

to see if the throttling is shown in the first number
which right now gets me:

CPU MHz:                         2697.781
CPU max MHz:                     3500.0000
CPU min MHz:                     800.0000

again:

CPU MHz:                         2102.444
CPU max MHz:                     3500.0000
CPU min MHz:                     800.0000

Yeah, so that first number is a current sample.
So I guess 800 could make something run 3x or more longer.

... Hm. I think I am going to see about dust-bunnies in the air ducts :) ...
--
Regards,
Bengt Richter

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fftwf tests running for 25+ hours - is this normal?
  2019-10-16  5:11     ` Bengt Richter
@ 2019-10-16  5:29       ` Bengt Richter
  2019-10-17  4:06         ` Chris Marusich
  0 siblings, 1 reply; 8+ messages in thread
From: Bengt Richter @ 2019-10-16  5:29 UTC (permalink / raw)
  To: Chris Marusich; +Cc: guix-devel, fftw, Matteo Frigo

Hi Chris and Matteo,

On +2019-10-15 22:11:50 -0700, Bengt Richter wrote:
> On +2019-10-15 11:32:38 -0700, Chris Marusich wrote:
> > Hi Bengt and Matteo,
> > 
> > Bengt Richter <bokr@bokr.com> writes:
> > 
> > > Have you checked sensors for overheating that might induce CPU clock throttling?
> > 
> > Actually, yes, I happened to be watching dmesg output at the time, and I
> > did notice these messages (no similar messages have been printed since
> > then, which was many hours ago):
> > 
> > --8<---------------cut here---------------start------------->8---
> > [180270.045081] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 124733)
> > [180270.045567] mce: CPU1: Core temperature/speed normal
> > [180570.044352] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 134902)
> > [180570.044838] mce: CPU1: Core temperature/speed normal
> > [180875.663432] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 143897)
> > [180875.663918] mce: CPU1: Core temperature/speed normal
> > [181175.748616] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 153503)
> > [181175.749103] mce: CPU1: Core temperature/speed normal
> > [181476.496915] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 171377)
> > [181476.497401] mce: CPU1: Core temperature/speed normal
> > [181776.914264] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 185828)
> > [181776.914751] mce: CPU1: Core temperature/speed normal
> > [182076.914391] mce: CPU1: Core temperature/speed normal
> > [182377.322112] mce: CPU1: Core temperature above threshold, cpu clock throttled (total events = 231578)
> > [182377.322598] mce: CPU1: Core temperature/speed normal
> > --8<---------------cut here---------------end--------------->8---
> > 
> > Is this throttling permanent, or is the throttling released after the
> > temperature returns to normal?
> 
> In my case, throttling ceased as soon as temps normalized.
> If it happens to me again, I will try
> 
> lscpu|grep -i '^CPU m'
> 
> to see if the throttling is shown in the first number
> which right now gets me:
> 
> CPU MHz:                         2697.781
> CPU max MHz:                     3500.0000
> CPU min MHz:                     800.0000
> 
> again:
> 
> CPU MHz:                         2102.444
> CPU max MHz:                     3500.0000
> CPU min MHz:                     800.0000
> 
> Yeah, so that first number is a current sample.
> So I guess 800 could make something run 3x or more longer.
> 
> ... Hm. I think I am going to see about dust-bunnies in the air ducts :) ...
> --

I note that lscpu only displays one core speed, not all of them.
lscpu is a prettified view of info from /proc/cpuinfo AFAIK,
but the latter shows all cpus:

grep '^cpu MHz' /proc/cpuinfo
right now gets me:

cpu MHz		: 2617.409
cpu MHz		: 2396.652
cpu MHz		: 2167.756
cpu MHz		: 2393.218
cpu MHz		: 2406.541
cpu MHz		: 2524.652
cpu MHz		: 2444.346
cpu MHz		: 2443.544

HTH
__
Regards,
Bengt Richter











> Regards,
> Bengt Richter
> 

-- 
NB: If you have bokr@oz.net in your contact info, please
    update it to bokr@bokr.com. Thank you :)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fftwf tests running for 25+ hours - is this normal?
  2019-10-16  5:29       ` Bengt Richter
@ 2019-10-17  4:06         ` Chris Marusich
  0 siblings, 0 replies; 8+ messages in thread
From: Chris Marusich @ 2019-10-17  4:06 UTC (permalink / raw)
  To: Bengt Richter, Matteo Frigo; +Cc: guix-devel, fftw

[-- Attachment #1: Type: text/plain, Size: 1665 bytes --]

Hi Bengt, Matteo, and others,

I've stopped the build.  The process ran for 73 hours and didn't
complete.  I've also noticed that there are no recent builds in Cuirass,
so I wonder: has anyone built fftwf successfully recently?  I was trying
to build fftwf based off of what is currently in the core-updates
branch.

The latest successful build for fftwf 3.3.8 on x86_64-linux was on
September 1st (of this year, I think):

https://ci.guix.gnu.org/build/1640133/details

It supposedly took 13145 seconds to complete.  However, I'm not sure how
accurate that is, since this other entry for a build on armhf-linux says
it took 1570098173 seconds (is probably some kind of Cuirass bug, since
that puts the supposed start time around the epoch):

https://ci.guix.gnu.org/build/1690957/details

For now, I'm going to try building an older version of Guix or fftwf and
see how that goes.

Bengt Richter <bokr@bokr.com> writes:

> lscpu|grep -i '^CPU m'
> [...]
> grep '^cpu MHz' /proc/cpuinfo

Thanks for this tip!  I can see the CPU speed was reduced to 800 MHz at
one point, but it seems it's back to 2400 MHz now:

--8<---------------cut here---------------start------------->8---
$ lscpu  | grep -i '^cpu m'
CPU MHz:             2400.035
CPU max MHz:         2400.0000
CPU min MHz:         800.0000
--8<---------------cut here---------------end--------------->8---

Same info from /proc/cpuinfo:

--8<---------------cut here---------------start------------->8---
$ cat /proc/cpuinfo | grep '^cpu MHz'
cpu MHz		: 2400.035
cpu MHz		: 2400.035
--8<---------------cut here---------------end--------------->8---

-- 
Chris

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-10-17  4:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-15  3:46 fftwf tests running for 25+ hours - is this normal? Chris Marusich
2019-10-15  5:52 ` Bengt Richter
2019-10-15  6:38   ` Pierre Neidhardt
2019-10-15 18:32   ` Chris Marusich
2019-10-16  5:11     ` Bengt Richter
2019-10-16  5:29       ` Bengt Richter
2019-10-17  4:06         ` Chris Marusich
2019-10-15 14:12 ` Matteo Frigo

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).