unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#65858: mumi crashes
@ 2023-09-11  1:28 Maxim Cournoyer
  2023-10-08  3:13 ` Maxim Cournoyer
  0 siblings, 1 reply; 11+ messages in thread
From: Maxim Cournoyer @ 2023-09-11  1:28 UTC (permalink / raw)
  To: 65858; +Cc: guix-sysadmin

[-- Attachment #1: Type: text/plain, Size: 4564 bytes --]

Hello,

Since today, Mumi was found crashing with the following errors in
/var/log/mumi.log:

--8<---------------cut here---------------start------------->8---
2023-09-10 11:49:01 GET GC Warning: Repeated allocation of very large block (appr. size 4194304):
2023-09-10 11:49:01     May lead to memory leak and poor performance
2023-09-10 12:02:37 Throw to key `match-error' with args `("match" "no matching pattern" (HEAD "63641"))'.
2023-09-10 12:06:00 Uncaught exception in fiber ##f:
2023-09-10 12:06:00 In ice-9/boot-9.scm:
2023-09-10 12:06:00   1752:10  5 (with-exception-handler _ _ #:unwind? _ # _)
2023-09-10 12:06:00 In fibers/web/server.scm:
2023-09-10 12:06:00    211:21  4 (_)
2023-09-10 12:06:00 In ice-9/suspendable-ports.scm:
2023-09-10 12:06:00      83:4  3 (write-bytes #<closed: file 7f4efcabc8c0> #vu8(60 33 …) …)
2023-09-10 12:06:00 In unknown file:
2023-09-10 12:06:00            2 (port-write #<closed: file 7f4efcabc8c0> #vu8(60 33 # …) …)
2023-09-10 12:06:00 In ice-9/boot-9.scm:
2023-09-10 12:06:00   1685:16  1 (raise-exception _ #:continuable? _)
2023-09-10 12:06:00   1685:16  0 (raise-exception _ #:continuable? _)
2023-09-10 12:06:00 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
2023-09-10 12:06:00 In procedure fport_write: Broken pipe
2023-09-10 12:15:36 /graphql
--8<---------------cut here---------------end--------------->8---

And also:

--8<---------------cut here---------------start------------->8---
[...]
2023-09-10 22:23:07 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
2023-09-10 22:23:07 In procedure fport_write: Broken pipe
2023-09-10 22:23:07 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
2023-09-10 22:23:07 In procedure fport_write: Broken pipe
2023-09-10 22:23:07 Uncaught exception in fiber ##f:
2023-09-10 22:23:07 In ice-9/boot-9.scm:
2023-09-10 22:23:07   1752:10  5 (with-exception-handler _ _ #:unwind? _ # _)
2023-09-10 22:23:07 In fibers/web/server.scm:
2023-09-10 22:23:07    211:21  4 (_)
2023-09-10 22:23:07 In ice-9/suspendable-ports.scm:
2023-09-10 22:23:07      83:4  3 (write-bytes #<closed: file 7f1caec72d90> #vu8(60 33 …) …)
2023-09-10 22:23:07 In unknown file:
2023-09-10 22:23:07            2 (port-write #<closed: file 7f1caec72d90> #vu8(60 33 # …) …)
2023-09-10 22:23:07 In ice-9/boot-9.scm:
2023-09-10 22:23:07   1685:16  1 (raise-exception _ #:continuable? _)
2023-09-10 22:23:07 Uncaught exception in fiber ##f:
2023-09-10 22:23:07   1685:16  0 (raise-exception _ #:continuable? _)
2023-09-10 22:23:07 In ice-9/boot-9.scm:
2023-09-10 22:23:07   1752:10  5 (with-exception-handler _ _ #:unwind? _ # _)
2023-09-10 22:23:07 In fibers/web/server.scm:
2023-09-10 22:23:07 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
2023-09-10 22:23:07 In procedure fport_write: Broken pipe
2023-09-10 22:23:07    211:21  4 (_)
2023-09-10 22:23:07 In ice-9/suspendable-ports.scm:
2023-09-10 22:23:07      83:4  3 (write-bytes #<closed: file 7f1cb6303e00> #vu8(60 33 …) …)
2023-09-10 22:23:07 In unknown file:
2023-09-10 22:23:07            2 (port-write #<closed: file 7f1cb6303e00> #vu8(60 33 # …) …)
2023-09-10 22:23:07 In ice-9/boot-9.scm:
2023-09-10 22:23:07   1685:16  1 (raise-exception _ #:continuable? _)
2023-09-10 22:23:07   1685:16  0 (raise-exception _ #:continuable? _)
2023-09-10 22:23:07 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
2023-09-10 22:23:07 In procedure fport_write: Broken pipe
2023-09-10 22:23:07 Uncaught exception in fiber ##f:
2023-09-10 22:23:07 In ice-9/boot-9.scm:
2023-09-10 22:23:07   1752:10  5 (with-exception-handler _ _ #:unwind? _ # _)
2023-09-10 22:23:07 In fibers/web/server.scm:
2023-09-10 22:23:07    211:21  4 (_)
2023-09-10 22:23:07 In ice-9/suspendable-ports.scm:
2023-09-10 22:23:07      83:4  3 (write-bytes #<closed: file 7f1cac7101c0> #vu8(60 33 …) …)
2023-09-10 22:23:07 In unknown file:
2023-09-10 22:23:07            2 (port-write #<closed: file 7f1cac7101c0> #vu8(60 33 # …) …)
2023-09-10 22:23:07 In ice-9/boot-9.scm:
2023-09-10 22:23:07   1685:16  1 (raise-exception _ #:continuable? _)
2023-09-10 22:23:07   1685:16  0 (raise-exception _ #:continuable? _)
2023-09-10 22:23:07 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
2023-09-10 22:23:07 In procedure fport_write: Broken pipe
--8<---------------cut here---------------end--------------->8---

I've restarted the 'mumi' service and it's up and running again, for
now.

Attached is the complete mumi log for reference.


[-- Attachment #2: mumi.log.gz --]
[-- Type: application/octet-stream, Size: 123757 bytes --]

[-- Attachment #3: Type: text/plain, Size: 19 bytes --]


-- 
Thanks,
Maxim

^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#65858: mumi crashes
  2023-09-11  1:28 bug#65858: mumi crashes Maxim Cournoyer
@ 2023-10-08  3:13 ` Maxim Cournoyer
  2023-10-09 23:02   ` Arun Isaac
  0 siblings, 1 reply; 11+ messages in thread
From: Maxim Cournoyer @ 2023-10-08  3:13 UTC (permalink / raw)
  To: 65858; +Cc: guix-sysadmin

Hello,

Today, Mumi (version 0.0.5-4.2453a5a) was crashing again, with the
website returning 502 Bad Gateway errors, with the same error, it seems:

--8<---------------cut here---------------start------------->8---
2023-10-07 11:48:52 In procedure fport_write: Broken pipe
2023-10-07 11:48:52 Uncaught exception in fiber ##f:
2023-10-07 11:48:52 In ice-9/boot-9.scm:
2023-10-07 11:48:52   1752:10  5 (with-exception-handler _ _ #:unwind? _ # _)
2023-10-07 11:48:52 In fibers/web/server.scm:
2023-10-07 11:48:52    211:21  4 (_)
2023-10-07 11:48:52 In ice-9/suspendable-ports.scm:
2023-10-07 11:48:52      83:4  3 (write-bytes #<closed: file 7f55f4a47070> #vu8(60 33 …) …)
2023-10-07 11:48:52 In unknown file:
2023-10-07 11:48:52            2 (port-write #<closed: file 7f55f4a47070> #vu8(60 33 # …) …)
2023-10-07 11:48:52 In ice-9/boot-9.scm:
2023-10-07 11:48:52   1685:16  1 (raise-exception _ #:continuable? _)
2023-10-07 11:48:52   1685:16  0 (raise-exception _ #:continuable? _)
2023-10-07 11:48:52 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
2023-10-07 11:48:52 In procedure fport_write: Broken pipe
--8<---------------cut here---------------end--------------->8---

That's from /var/log/mumi.log.1.gz.  The worker is peaking 1 to 4 CPUs;
here's what strace looked like at some point:

--8<---------------cut here---------------start------------->8---

rt_sigprocmask(SIG_BLOCK, ~[], [], 8)   = 0
clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7f831b49a910, parent_tid=0x7f831b49a910, exit_signal=0, stack=0x7f831ac9a000, stack_size=0x7fff00, tls=0x7f831b49a640} => {parent_tid=[87808]}, 88) = 87808
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
futex(0x7f8328b87f74, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = 0
futex(0x7f831e3177c0, FUTEX_WAKE_PRIVATE, 1) = 0
rt_sigprocmask(SIG_BLOCK, ~[], [], 8)   = 0
clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7f8319c97910, parent_tid=0x7f8319c97910, exit_signal=0, stack=0x7f8319497000, stack_size=0x7fff00, tls=0x7f8319c97640} => {parent_tid=[87809]}, 88) = 87809
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
futex(0x7f8328b87f70, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = 0
futex(0x7f831e317780, FUTEX_WAKE_PRIVATE, 1) = 0
rt_sigprocmask(SIG_BLOCK, ~[], [], 8)   = 0
clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7f831ac99910, parent_tid=0x7f831ac99910, exit_signal=0, stack=0x7f831a499000, stack_size=0x7fff00, tls=0x7f831ac99640} => {parent_tid=[87810]}, 88) = 87810
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
futex(0x7f8328b87f74, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7f831e317740, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f8329190c80, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7f8329190c80, FUTEX_WAKE_PRIVATE, 1) = 0
rt_sigprocmask(SIG_BLOCK, ~[], [], 8)   = 0
clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7f831a498910, parent_tid=0x7f831a498910, exit_signal=0, stack=0x7f8319c98000, stack_size=0x7fff00, tls=0x7f831a498640} => {parent_tid=[87811]}, 88) = 87811
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
futex(0x7f8328b87f70, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7f831e317700, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f8328b87f74, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGPWR {si_signo=SIGPWR, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigsuspend(~[INT QUIT ABRT BUS SEGV TERM XCPU RTMIN RT_1], 8) = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGXCPU {si_signo=SIGXCPU, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
rt_sigreturn({mask=~[INT QUIT ABRT BUS KILL SEGV TERM STOP RTMIN RT_1]}) = -1 EINTR (Interrupted system call)
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn({mask=[]})                 = 202
futex(0x7f8328b87f74, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGPWR {si_signo=SIGPWR, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigsuspend(~[INT QUIT ABRT BUS SEGV TERM XCPU RTMIN RT_1], 8) = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGXCPU {si_signo=SIGXCPU, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
rt_sigreturn({mask=~[INT QUIT ABRT BUS KILL SEGV TERM STOP RTMIN RT_1]}) = -1 EINTR (Interrupted system call)
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn({mask=[]})                 = 202
futex(0x7f8328b87f74, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGPWR {si_signo=SIGPWR, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigsuspend(~[INT QUIT ABRT BUS SEGV TERM XCPU RTMIN RT_1], 8) = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGXCPU {si_signo=SIGXCPU, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
rt_sigreturn({mask=~[INT QUIT ABRT BUS KILL SEGV TERM STOP RTMIN RT_1]}) = -1 EINTR (Interrupted system call)
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn({mask=[]})                 = 202
futex(0x7f8328b87f74, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGPWR {si_signo=SIGPWR, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigsuspend(~[INT QUIT ABRT BUS SEGV TERM XCPU RTMIN RT_1], 8) = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGXCPU {si_signo=SIGXCPU, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
rt_sigreturn({mask=~[INT QUIT ABRT BUS KILL SEGV TERM STOP RTMIN RT_1]}) = -1 EINTR (Interrupted system call)
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn({mask=[]})                 = 202
futex(0x7f8328b87f74, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGPWR {si_signo=SIGPWR, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigsuspend(~[INT QUIT ABRT BUS SEGV TERM XCPU RTMIN RT_1], 8) = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGXCPU {si_signo=SIGXCPU, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
rt_sigreturn({mask=~[INT QUIT ABRT BUS KILL SEGV TERM STOP RTMIN RT_1]}) = -1 EINTR (Interrupted system call)
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn({mask=[]})                 = 202
futex(0x7f8328b87f74, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGPWR {si_signo=SIGPWR, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigsuspend(~[INT QUIT ABRT BUS SEGV TERM XCPU RTMIN RT_1], 8) = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGXCPU {si_signo=SIGXCPU, si_code=SI_TKILL, si_pid=75049, si_uid=940} ---
rt_sigreturn({mask=~[INT QUIT ABRT BUS KILL SEGV TERM STOP RTMIN RT_1]}) = -1 EINTR (Interrupted system call)
futex(0x7f832917fce0, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn({mask=[]})                 = 202
futex(0x7f8328b87f74, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = 0
futex(0x7f831e317700, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f8328b87f70, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = 0
futex(0x7f831e317780, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f8328b87f74, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = 0
futex(0x7f831e3177c0, FUTEX_WAKE_PRIVATE, 1) = 0
newfstatat(AT_FDCWD, "/var/mumi/db/mumi.xapian", {st_mode=S_IFDIR|0755, st_size=144, ...}, 0) = 0
newfstatat(AT_FDCWD, "/var/mumi/db/mumi.xapian/iamchert", 0x7ffcd7e8ec60, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/var/mumi/db/mumi.xapian/iamglass", {st_mode=S_IFREG|0644, st_size=155, ...}, 0) = 0
newfstatat(AT_FDCWD, "/var/mumi/db/mumi.xapian/postlist.glass", {st_mode=S_IFREG|0644, st_size=389939200, ...}, 0) = 0
openat(AT_FDCWD, "/var/mumi/db/mumi.xapian/flintlock", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 10
fcntl(10, F_OFD_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = 0
--8<---------------cut here---------------end--------------->8---

Restarting it resolved it for now.

-- 
Thanks,
Maxim




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#65858: mumi crashes
  2023-10-08  3:13 ` Maxim Cournoyer
@ 2023-10-09 23:02   ` Arun Isaac
  2023-10-10  3:51     ` Maxim Cournoyer
  0 siblings, 1 reply; 11+ messages in thread
From: Arun Isaac @ 2023-10-09 23:02 UTC (permalink / raw)
  To: Maxim Cournoyer, 65858; +Cc: guix-sysadmin


Hi Maxim,

Thanks for monitoring mumi. I'm not sure why these mumi crashes are
happening. I can think of two leads to start working on.

1. I see many lines such as those below that warn of garbage collection
issues with "repeated allocations of large blocks". I am not sure, but
this sounds like some limitation of the guile garbage collector. I
suspect these warnings arise because the guile-email parser thrashes the
garbage collector too much. I could look into rewriting parts of
guile-email to be more careful with memory use.

--8<---------------cut here---------------start------------->8---
2023-09-09 16:21:00 GC Warning: Repeated allocation of very large block (appr. size 16777216):
2023-09-09 16:21:00 	May lead to memory leak and poor performance
2023-09-09 16:30:24 GC Warning: Repeated allocation of very large block (appr. size 16138240):
2023-09-09 16:30:24 	May lead to memory leak and poor performance
2023-09-09 16:39:11 GC Warning: Repeated allocation of very large block (appr. size 16781312):
2023-09-09 16:39:11 	May lead to memory leak and poor performance
--8<---------------cut here---------------end--------------->8---

2. If there is a specific query that causes mumi to crash, I'd like to
find out. This could be a GraphQL query. Unfortunately the actual
GraphQL document lives in the query part of the URI, and we don't log
the query part. I can patch mumi to log the query part.

WDYT?

Regards,
Arun




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#65858: mumi crashes
  2023-10-09 23:02   ` Arun Isaac
@ 2023-10-10  3:51     ` Maxim Cournoyer
  2023-10-10 22:57       ` Arun Isaac
  0 siblings, 1 reply; 11+ messages in thread
From: Maxim Cournoyer @ 2023-10-10  3:51 UTC (permalink / raw)
  To: Arun Isaac; +Cc: 65858, guix-sysadmin

Hi Arun,

Arun Isaac <arunisaac@systemreboot.net> writes:

> Hi Maxim,
>
> Thanks for monitoring mumi. I'm not sure why these mumi crashes are
> happening. I can think of two leads to start working on.
>
> 1. I see many lines such as those below that warn of garbage collection
> issues with "repeated allocations of large blocks". I am not sure, but
> this sounds like some limitation of the guile garbage collector. I
> suspect these warnings arise because the guile-email parser thrashes the
> garbage collector too much. I could look into rewriting parts of
> guile-email to be more careful with memory use.
>
> 2023-09-09 16:21:00 GC Warning: Repeated allocation of very large block (appr. size 16777216):
> 2023-09-09 16:21:00 	May lead to memory leak and poor performance
> 2023-09-09 16:30:24 GC Warning: Repeated allocation of very large block (appr. size 16138240):
> 2023-09-09 16:30:24 	May lead to memory leak and poor performance
> 2023-09-09 16:39:11 GC Warning: Repeated allocation of very large block (appr. size 16781312):
> 2023-09-09 16:39:11 	May lead to memory leak and poor performance

Perhaps that could be related, but I have no such experience to
corroborate unfortunately.

> 2. If there is a specific query that causes mumi to crash, I'd like to
> find out. This could be a GraphQL query. Unfortunately the actual
> GraphQL document lives in the query part of the URI, and we don't log
> the query part. I can patch mumi to log the query part.
>
> WDYT?

I think adding more logging to narrow where the problem is happening
would be a fine action to take here!

-- 
Thanks,
Maxim




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#65858: mumi crashes
  2023-10-10  3:51     ` Maxim Cournoyer
@ 2023-10-10 22:57       ` Arun Isaac
  2023-10-11  1:29         ` Maxim Cournoyer
  0 siblings, 1 reply; 11+ messages in thread
From: Arun Isaac @ 2023-10-10 22:57 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 65858, guix-sysadmin


Hi Maxim,

I have made a number of changes to mumi and reconfigured berlin with the
latest mumi. Here is a quick summary of the main changes to mumi.

- We now log the complete URI and the response code for every request to
  mumi.
- We now handle HEAD requests correctly. This should eliminate some of
  the crashes we saw in the mumi log.

Regards,
Arun




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#65858: mumi crashes
  2023-10-10 22:57       ` Arun Isaac
@ 2023-10-11  1:29         ` Maxim Cournoyer
  2023-10-24  4:26           ` Maxim Cournoyer
  0 siblings, 1 reply; 11+ messages in thread
From: Maxim Cournoyer @ 2023-10-11  1:29 UTC (permalink / raw)
  To: Arun Isaac; +Cc: 65858, guix-sysadmin

Hi Arun,

Arun Isaac <arunisaac@systemreboot.net> writes:

> Hi Maxim,
>
> I have made a number of changes to mumi and reconfigured berlin with the
> latest mumi. Here is a quick summary of the main changes to mumi.
>
> - We now log the complete URI and the response code for every request to
>   mumi.
> - We now handle HEAD requests correctly. This should eliminate some of
>   the crashes we saw in the mumi log.

Thanks!  Let's keep an eye on things.

-- 
Thanks,
Maxim




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#65858: mumi crashes
  2023-10-11  1:29         ` Maxim Cournoyer
@ 2023-10-24  4:26           ` Maxim Cournoyer
  2023-10-24  7:13             ` Christopher Baines
  0 siblings, 1 reply; 11+ messages in thread
From: Maxim Cournoyer @ 2023-10-24  4:26 UTC (permalink / raw)
  To: Arun Isaac; +Cc: 65858, guix-sysadmin

Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

> Hi Arun,
>
> Arun Isaac <arunisaac@systemreboot.net> writes:
>
>> Hi Maxim,
>>
>> I have made a number of changes to mumi and reconfigured berlin with the
>> latest mumi. Here is a quick summary of the main changes to mumi.
>>
>> - We now log the complete URI and the response code for every request to
>>   mumi.
>> - We now handle HEAD requests correctly. This should eliminate some of
>>   the crashes we saw in the mumi log.
>
> Thanks!  Let's keep an eye on things.

Here's a fresh crash (on berlin):

--8<---------------cut here---------------start------------->8---
2023-10-24 06:22:58 GET /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065806%29%20%7B%0A%20%20%20%
20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
2023-10-24 06:22:58 GET /issue/29433/attachment/1/ 200
2023-10-24 06:22:58 GET /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065853%29%20%7B%0A%20%20%20%
20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
2023-10-24 06:22:58 GET /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065869%29%20%7B%0A%20%20%20%
20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
2023-10-24 06:23:15 GET Uncaught exception in task:
2023-10-24 06:23:15 In procedure port-auxiliary-write-buffer: Wrong type argument in position 1 (expecting
 open port): #<closed: file 7f2ac30077e0>
2023-10-24 06:23:15 In fibers.scm:
2023-10-24 06:23:15     172:8  6 (_)
2023-10-24 06:23:15 In ice-9/boot-9.scm:
2023-10-24 06:23:15   1752:10  5 (with-exception-handler _ _ #:unwind? _ # _)
2023-10-24 06:23:15 In fibers/web/server.scm:
2023-10-24 06:23:15    214:25  4 (_)
2023-10-24 06:23:15 In ice-9/suspendable-ports.scm:
2023-10-24 06:23:15      83:4  3 (write-bytes #<closed: file 7f2abfe0e770> #vu8(47 42 …) …)
2023-10-24 06:23:15 In unknown file:
2023-10-24 06:23:15            2 (port-write #<closed: file 7f2abfe0e770> #vu8(47 42 # …) …)
2023-10-24 06:23:15 In ice-9/boot-9.scm:
2023-10-24 06:23:15   1685:16  1 (raise-exception _ #:continuable? _)
2023-10-24 06:23:15   1685:16  0 (raise-exception _ #:continuable? _)
2023-10-24 06:23:15 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
2023-10-24 06:23:15 In procedure fport_write: Broken pipe
--8<---------------cut here---------------end--------------->8---

-- 
Thanks,
Maxim




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#65858: mumi crashes
  2023-10-24  4:26           ` Maxim Cournoyer
@ 2023-10-24  7:13             ` Christopher Baines
  2023-10-24 12:47               ` Maxim Cournoyer
  0 siblings, 1 reply; 11+ messages in thread
From: Christopher Baines @ 2023-10-24  7:13 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 65858, arunisaac, guix-sysadmin

[-- Attachment #1: Type: text/plain, Size: 2598 bytes --]


Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

> Hi,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
>
>> Hi Arun,
>>
>> Arun Isaac <arunisaac@systemreboot.net> writes:
>>
>>> Hi Maxim,
>>>
>>> I have made a number of changes to mumi and reconfigured berlin with the
>>> latest mumi. Here is a quick summary of the main changes to mumi.
>>>
>>> - We now log the complete URI and the response code for every request to
>>>   mumi.
>>> - We now handle HEAD requests correctly. This should eliminate some of
>>>   the crashes we saw in the mumi log.
>>
>> Thanks!  Let's keep an eye on things.
>
> Here's a fresh crash (on berlin):
>
> 2023-10-24 06:22:58 GET /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065806%29%20%7B%0A%20%20%20%
> 20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
> 2023-10-24 06:22:58 GET /issue/29433/attachment/1/ 200
> 2023-10-24 06:22:58 GET /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065853%29%20%7B%0A%20%20%20%
> 20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
> 2023-10-24 06:22:58 GET /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065869%29%20%7B%0A%20%20%20%
> 20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
> 2023-10-24 06:23:15 GET Uncaught exception in task:
> 2023-10-24 06:23:15 In procedure port-auxiliary-write-buffer: Wrong type argument in position 1 (expecting
>  open port): #<closed: file 7f2ac30077e0>
> 2023-10-24 06:23:15 In fibers.scm:
> 2023-10-24 06:23:15     172:8  6 (_)
> 2023-10-24 06:23:15 In ice-9/boot-9.scm:
> 2023-10-24 06:23:15   1752:10  5 (with-exception-handler _ _ #:unwind? _ # _)
> 2023-10-24 06:23:15 In fibers/web/server.scm:
> 2023-10-24 06:23:15    214:25  4 (_)
> 2023-10-24 06:23:15 In ice-9/suspendable-ports.scm:
> 2023-10-24 06:23:15      83:4  3 (write-bytes #<closed: file 7f2abfe0e770> #vu8(47 42 …) …)
> 2023-10-24 06:23:15 In unknown file:
> 2023-10-24 06:23:15            2 (port-write #<closed: file 7f2abfe0e770> #vu8(47 42 # …) …)
> 2023-10-24 06:23:15 In ice-9/boot-9.scm:
> 2023-10-24 06:23:15   1685:16  1 (raise-exception _ #:continuable? _)
> 2023-10-24 06:23:15   1685:16  0 (raise-exception _ #:continuable? _)
> 2023-10-24 06:23:15 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
> 2023-10-24 06:23:15 In procedure fport_write: Broken pipe

I think this is kind of expected. If NGinx hits the proxy_read_timeout
it'll return a 504 to the client and close the connection to Mumi I
think. I think what you're seeing here is Mumi trying to respond to a
request from NGinx that NGinx has closed.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#65858: mumi crashes
  2023-10-24  7:13             ` Christopher Baines
@ 2023-10-24 12:47               ` Maxim Cournoyer
  2023-10-24 13:15                 ` Christopher Baines
  2023-10-26  0:16                 ` Arun Isaac
  0 siblings, 2 replies; 11+ messages in thread
From: Maxim Cournoyer @ 2023-10-24 12:47 UTC (permalink / raw)
  To: Christopher Baines; +Cc: 65858, Arun Isaac, guix-sysadmin

Hi Christopher,

Christopher Baines <mail@cbaines.net> writes:

[...]

>> Here's a fresh crash (on berlin):
>>
>> 2023-10-24 06:22:58 GET
>> /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065806%29%20%7B%0A%20%20%20%
>> 20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
>> 2023-10-24 06:22:58 GET /issue/29433/attachment/1/ 200
>> 2023-10-24 06:22:58 GET
>> /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065853%29%20%7B%0A%20%20%20%
>> 20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
>> 2023-10-24 06:22:58 GET
>> /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065869%29%20%7B%0A%20%20%20%
>> 20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
>> 2023-10-24 06:23:15 GET Uncaught exception in task:
>> 2023-10-24 06:23:15 In procedure port-auxiliary-write-buffer: Wrong
>> type argument in position 1 (expecting
>>  open port): #<closed: file 7f2ac30077e0>
>> 2023-10-24 06:23:15 In fibers.scm:
>> 2023-10-24 06:23:15     172:8  6 (_)
>> 2023-10-24 06:23:15 In ice-9/boot-9.scm:
>> 2023-10-24 06:23:15   1752:10  5 (with-exception-handler _ _ #:unwind? _ # _)
>> 2023-10-24 06:23:15 In fibers/web/server.scm:
>> 2023-10-24 06:23:15    214:25  4 (_)
>> 2023-10-24 06:23:15 In ice-9/suspendable-ports.scm:
>> 2023-10-24 06:23:15      83:4  3 (write-bytes #<closed: file 7f2abfe0e770> #vu8(47 42 …) …)
>> 2023-10-24 06:23:15 In unknown file:
>> 2023-10-24 06:23:15            2 (port-write #<closed: file 7f2abfe0e770> #vu8(47 42 # …) …)
>> 2023-10-24 06:23:15 In ice-9/boot-9.scm:
>> 2023-10-24 06:23:15   1685:16  1 (raise-exception _ #:continuable? _)
>> 2023-10-24 06:23:15   1685:16  0 (raise-exception _ #:continuable? _)
>> 2023-10-24 06:23:15 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
>> 2023-10-24 06:23:15 In procedure fport_write: Broken pipe
>
> I think this is kind of expected. If NGinx hits the proxy_read_timeout
> it'll return a 504 to the client and close the connection to Mumi I
> think. I think what you're seeing here is Mumi trying to respond to a
> request from NGinx that NGinx has closed.

If it's expected, we should handle it and produce a useful warning
instead of crashing, right?

Thanks for tipping in!

-- 
Thanks,
Maxim




^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#65858: mumi crashes
  2023-10-24 12:47               ` Maxim Cournoyer
@ 2023-10-24 13:15                 ` Christopher Baines
  2023-10-26  0:16                 ` Arun Isaac
  1 sibling, 0 replies; 11+ messages in thread
From: Christopher Baines @ 2023-10-24 13:15 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 65858, Arun Isaac, guix-sysadmin

[-- Attachment #1: Type: text/plain, Size: 3092 bytes --]


Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

> Hi Christopher,
>
> Christopher Baines <mail@cbaines.net> writes:
>
> [...]
>
>>> Here's a fresh crash (on berlin):
>>>
>>> 2023-10-24 06:22:58 GET
>>> /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065806%29%20%7B%0A%20%20%20%
>>> 20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
>>> 2023-10-24 06:22:58 GET /issue/29433/attachment/1/ 200
>>> 2023-10-24 06:22:58 GET
>>> /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065853%29%20%7B%0A%20%20%20%
>>> 20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
>>> 2023-10-24 06:22:58 GET
>>> /graphql?query=query%20%7B%0A%20%20issue%28number%3A%2065869%29%20%7B%0A%20%20%20%
>>> 20open%0A%20%20%7D%0A%7D%0A&variables=%7B%7D 200
>>> 2023-10-24 06:23:15 GET Uncaught exception in task:
>>> 2023-10-24 06:23:15 In procedure port-auxiliary-write-buffer: Wrong
>>> type argument in position 1 (expecting
>>>  open port): #<closed: file 7f2ac30077e0>
>>> 2023-10-24 06:23:15 In fibers.scm:
>>> 2023-10-24 06:23:15     172:8  6 (_)
>>> 2023-10-24 06:23:15 In ice-9/boot-9.scm:
>>> 2023-10-24 06:23:15   1752:10  5 (with-exception-handler _ _ #:unwind? _ # _)
>>> 2023-10-24 06:23:15 In fibers/web/server.scm:
>>> 2023-10-24 06:23:15    214:25  4 (_)
>>> 2023-10-24 06:23:15 In ice-9/suspendable-ports.scm:
>>> 2023-10-24 06:23:15      83:4  3 (write-bytes #<closed: file 7f2abfe0e770> #vu8(47 42 …) …)
>>> 2023-10-24 06:23:15 In unknown file:
>>> 2023-10-24 06:23:15            2 (port-write #<closed: file 7f2abfe0e770> #vu8(47 42 # …) …)
>>> 2023-10-24 06:23:15 In ice-9/boot-9.scm:
>>> 2023-10-24 06:23:15   1685:16  1 (raise-exception _ #:continuable? _)
>>> 2023-10-24 06:23:15   1685:16  0 (raise-exception _ #:continuable? _)
>>> 2023-10-24 06:23:15 ice-9/boot-9.scm:1685:16: In procedure raise-exception:
>>> 2023-10-24 06:23:15 In procedure fport_write: Broken pipe
>>
>> I think this is kind of expected. If NGinx hits the proxy_read_timeout
>> it'll return a 504 to the client and close the connection to Mumi I
>> think. I think what you're seeing here is Mumi trying to respond to a
>> request from NGinx that NGinx has closed.
>
> If it's expected, we should handle it and produce a useful warning
> instead of crashing, right?

As you can see from the above stack trace, this doesn't involve Mumi,
it's more of an issue for the guile-fibers web server. But yes, it would
be nice not to have this clutter in the logs.

That being said, if there are ways of having the application internally
timeout before NGinx times out, that can help to avoid this kind of
issue. Maybe that's not that easy for Mumi though.

I think I remember Ricardo doing some investigating and experimenting
with switching to non-suspendable ports when writing responses, and I
think that might have exhibited different behaviour.

It's probably worth trying to reproduce this in isolation, and double
checking if the ports implementation has any effect (obviously
suspendable ports are better, but this might be informative).


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* bug#65858: mumi crashes
  2023-10-24 12:47               ` Maxim Cournoyer
  2023-10-24 13:15                 ` Christopher Baines
@ 2023-10-26  0:16                 ` Arun Isaac
  1 sibling, 0 replies; 11+ messages in thread
From: Arun Isaac @ 2023-10-26  0:16 UTC (permalink / raw)
  To: Maxim Cournoyer, Christopher Baines; +Cc: 65858, guix-sysadmin


Hi Chris and Maxim,

>> I think this is kind of expected. If NGinx hits the proxy_read_timeout
>> it'll return a 504 to the client and close the connection to Mumi I
>> think. I think what you're seeing here is Mumi trying to respond to a
>> request from NGinx that NGinx has closed.

Thanks for the tip. This indeed makes a lot of sense!

But, I don't understand why a proxy read timeout happens. The requests
in question seem to be very simple requests that shoud be processed
quickly. Should we increase nginx's proxy_read_timeout?

> If it's expected, we should handle it and produce a useful warning
> instead of crashing, right?

A couple of weeks ago, I put in a mechanism to better catch and report
errors.

https://git.savannah.gnu.org/cgit/guix/mumi.git/commit/?id=7a71c12a1f7f033ae93f26fb09387f6c04c143ba

I'm not sure this is working correctly, though. I have used `guard` in the
commit above. Does `guard` catch all exceptions correctly, or should I
instead use `catch`?

Also, I believe the logs are written in parallel by fibers. This may
mangle up the logs a bit, but I'm not sure if we're actually see
requests rapid enough for this to happen.

Regards,
Arun




^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-10-26  0:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-11  1:28 bug#65858: mumi crashes Maxim Cournoyer
2023-10-08  3:13 ` Maxim Cournoyer
2023-10-09 23:02   ` Arun Isaac
2023-10-10  3:51     ` Maxim Cournoyer
2023-10-10 22:57       ` Arun Isaac
2023-10-11  1:29         ` Maxim Cournoyer
2023-10-24  4:26           ` Maxim Cournoyer
2023-10-24  7:13             ` Christopher Baines
2023-10-24 12:47               ` Maxim Cournoyer
2023-10-24 13:15                 ` Christopher Baines
2023-10-26  0:16                 ` Arun Isaac

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).