unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#34033: Offloading sometimes hangs
@ 2019-01-10 16:09 Ludovic Courtès
  2019-01-14 22:45 ` Ludovic Courtès
  0 siblings, 1 reply; 9+ messages in thread
From: Ludovic Courtès @ 2019-01-10 16:09 UTC (permalink / raw)
  To: 34033

Hello,

So there’s another situation where offloading regularly hangs on
berlin.  The ‘guix offload’ process looks like this:

--8<---------------cut here---------------start------------->8---
(gdb) bt
#0  0x00007f1f715686a1 in __GI___poll (fds=0x14e9b30, nfds=1, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007f1f673b94e7 in ssh_poll (timeout=<optimized out>, nfds=<optimized out>, fds=<optimized out>)
    at /tmp/guix-build-libssh-0.7.7.drv-0/libssh-0.7.7-checkout/src/poll.c:98
#2  ssh_poll_ctx_dopoll (ctx=ctx@entry=0x14ee2e0, timeout=timeout@entry=-1)
    at /tmp/guix-build-libssh-0.7.7.drv-0/libssh-0.7.7-checkout/src/poll.c:612
#3  0x00007f1f673ba449 in ssh_handle_packets (session=session@entry=0x2249360, timeout=timeout@entry=-1)
    at /tmp/guix-build-libssh-0.7.7.drv-0/libssh-0.7.7-checkout/src/session.c:634
#4  0x00007f1f673ba51d in ssh_handle_packets_termination (session=session@entry=0x2249360, timeout=<optimized out>,
    timeout@entry=-3, fct=fct@entry=0x7f1f673a4430 <ssh_channel_read_termination>, user=user@entry=0x7ffce23953f0)
    at /tmp/guix-build-libssh-0.7.7.drv-0/libssh-0.7.7-checkout/src/session.c:696
#5  0x00007f1f673a6aaf in ssh_channel_read_timeout (channel=0x224e360, dest=dest@entry=0x18ef020,
    count=count@entry=8, is_stderr=<optimized out>, timeout=-3, timeout@entry=-1)
    at /tmp/guix-build-libssh-0.7.7.drv-0/libssh-0.7.7-checkout/src/channels.c:2705
#6  0x00007f1f673a6bbb in ssh_channel_read (channel=<optimized out>, dest=dest@entry=0x18ef020, count=count@entry=8,
    is_stderr=<optimized out>) at /tmp/guix-build-libssh-0.7.7.drv-0/libssh-0.7.7-checkout/src/channels.c:2621
#7  0x00007f1f67413a23 in read_from_channel_port (
    channel=<error reading variable: ERROR: In procedure gdbscm_memory_port_fill_input: error reading memory>0x22f01a0, dst=<optimized out>, start=0, count=8) at channel-type.c:161
#8  0x00007f1f71b65287 in scm_i_read_bytes (
    port=port@entry=<error reading variable: ERROR: In procedure gdbscm_memory_port_fill_input: error reading memory>0x22f01a0, dst=dst@entry="#<vu8vector>" = {...}, start=start@entry=0, count=count@entry=8) at ports.c:1559
#9  0x00007f1f71b6996c in scm_c_read_bytes (
    port=port@entry=<error reading variable: ERROR: In procedure gdbscm_memory_port_fill_input: error reading memory>0x22f01a0, dst=dst@entry="#<vu8vector>" = {...}, start=start@entry=0, count=count@entry=8) at ports.c:1639
#10 0x00007f1f71b6fd80 in scm_get_bytevector_n (
    port=<error reading variable: ERROR: In procedure gdbscm_memory_port_fill_input: error reading memory>0x22f01a0,
    count=<optimized out>) at r6rs-ports.c:421
#11 0x00007f1f71ba4715 in vm_regular_engine (thread=0x14e9b30, vp=0xc31f30, registers=0xffffffff, resume=1901495969)
    at vm-engine.c:786

[...]

(gdb) p *fds
$1 = {fd = 15, events = 1, revents = 0}
(gdb) shell ls -l /proc/12185/fd
total 0
lr-x------ 1 root root 64 Jan 10 16:56 0 -> 'pipe:[76778016]'
l-wx------ 1 root root 64 Jan 10 16:56 1 -> 'pipe:[76778015]'
lr-x------ 1 root root 64 Jan 10 16:56 10 -> 'pipe:[76838317]'
l-wx------ 1 root root 64 Jan 10 16:56 11 -> 'pipe:[76838317]'
lr-x------ 1 root root 64 Jan 10 16:56 12 -> 'pipe:[76851360]'
l-wx------ 1 root root 64 Jan 10 16:56 13 -> 'pipe:[76851360]'
l-wx------ 1 root root 64 Jan 10 16:56 14 -> /var/guix/offload/overdrive1.guixsd.org/1
lrwx------ 1 root root 64 Jan 10 16:56 15 -> 'socket:[76860702]'
lr-x------ 1 root root 64 Jan 10 16:56 16 -> /dev/urandom
l-wx------ 1 root root 64 Jan 10 16:56 2 -> 'pipe:[76778015]'
lr-x------ 1 root root 64 Jan 10 16:56 3 -> 'pipe:[76838313]'
l-wx------ 1 root root 64 Jan 10 16:56 4 -> 'pipe:[76778017]'
l-wx------ 1 root root 64 Jan 10 16:56 5 -> 'pipe:[76838313]'
lr-x------ 1 root root 64 Jan 10 16:56 6 -> 'pipe:[76838316]'
l-wx------ 1 root root 64 Jan 10 16:56 7 -> 'pipe:[76838316]'
lr-x------ 1 root root 64 Jan 10 16:56 8 -> 'pipe:[76841414]'
l-wx------ 1 root root 64 Jan 10 16:56 9 -> 'pipe:[76841414]'
--8<---------------cut here---------------end--------------->8---

It’s a ‘get-bytevector-n’ for 8 bytes, so it looks like the daemon
protocol.  At that point the socket is actually dead: if I connect on
the remote machine (overdrive1.guixsd.org) I can see that there are no
other open SSH sessions.

A simple thing would be to somehow get libssh to pass POLLIN | POLLRDHUP
instead of just POLLIN.

Additionally, we could change Guile-SSH so that we can specify a timeout
when reading from a channel.

Ludo’.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-07-03 14:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-01-10 16:09 bug#34033: Offloading sometimes hangs Ludovic Courtès
2019-01-14 22:45 ` Ludovic Courtès
2020-02-22  4:37   ` Maxim Cournoyer
2020-02-22 20:35     ` Ludovic Courtès
2020-02-24 13:59       ` Maxim Cournoyer
2020-02-24 14:59         ` Ludovic Courtès
2020-07-02 14:20   ` Mathieu Othacehe
2020-07-03  7:05     ` Mathieu Othacehe
2020-07-03 13:58       ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).