unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#42548: Cuirass 504 errors
@ 2020-07-26 16:10 Mathieu Othacehe
  2020-07-27 22:11 ` zimoun
  2020-07-30 14:47 ` Mathieu Othacehe
  0 siblings, 2 replies; 8+ messages in thread
From: Mathieu Othacehe @ 2020-07-26 16:10 UTC (permalink / raw)
  To: 42548


Hello,

Back from holidays, perfect time to fix some Cuirass issues :) The
Cuirass web interface frequently serves 504 errors for all requests,
requiring a service restart on berlin.

Having a look to /var/log/cuirass-web.log it seems that we have indeed
multiple things going wrong.

A first problem is caused by checkout entries pointing to remove
inputs. This should be fix with f71f026a41d8e68e4a7f11ef6e708964594a599c
in Cuirass.

A second issue is caused when a build product download is started, then
aborted. In that case, sendfile throws an exception or enters an endless
loop.

There's a third issue, but the cause is not clear to me:

--8<---------------cut here---------------start------------->8---
Uncaught exception in fiber ##f:
In ice-9/boot-9.scm:
  1736:10  5 (with-exception-handler _ _ #:unwind? _ # _)
In web/server/fiberized.scm:
   160:26  4 (_)
In ice-9/suspendable-ports.scm:
     83:4  3 (write-bytes #<closed: file 7f3a4ed46310> #vu8(60 33 ?) ?)
In unknown file:
           2 (port-write #<closed: file 7f3a4ed46310> #vu8(60 33 # ?) ?)
In ice-9/boot-9.scm:
  1669:16  1 (raise-exception _ #:continuable? _)
  1669:16  0 (raise-exception _ #:continuable? _)
ice-9/boot-9.scm:1669:16: In procedure raise-exception:
--8<---------------cut here---------------end--------------->8---

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#42548: Cuirass 504 errors
  2020-07-26 16:10 bug#42548: Cuirass 504 errors Mathieu Othacehe
@ 2020-07-27 22:11 ` zimoun
  2020-07-28  7:32   ` Mathieu Othacehe
  2020-07-30 14:47 ` Mathieu Othacehe
  1 sibling, 1 reply; 8+ messages in thread
From: zimoun @ 2020-07-27 22:11 UTC (permalink / raw)
  To: Mathieu Othacehe, 42548

Hi Mathieu,

On Sun, 26 Jul 2020 at 18:10, Mathieu Othacehe <othacehe@gnu.org> wrote:

> A second issue is caused when a build product download is started, then
> aborted. In that case, sendfile throws an exception or enters an endless
> loop.

What do you mean by “build product download is started, then aborted”?

Cheers,
simon




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#42548: Cuirass 504 errors
  2020-07-27 22:11 ` zimoun
@ 2020-07-28  7:32   ` Mathieu Othacehe
  2020-07-28  8:49     ` zimoun
  0 siblings, 1 reply; 8+ messages in thread
From: Mathieu Othacehe @ 2020-07-28  7:32 UTC (permalink / raw)
  To: zimoun; +Cc: 42548


Hey zimoun,

> What do you mean by “build product download is started, then aborted”?

Here I mean clicking on the downloadable image here[1] and then hit
"cancel" when the download popup appears, or the abort button later on,
when the download is started.

Thanks,

Mathieu

[1]: https://ci.guix.gnu.org/build/3031091/details




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#42548: Cuirass 504 errors
  2020-07-28  7:32   ` Mathieu Othacehe
@ 2020-07-28  8:49     ` zimoun
  2020-07-28 14:56       ` Mathieu Othacehe
  0 siblings, 1 reply; 8+ messages in thread
From: zimoun @ 2020-07-28  8:49 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 42548

Hi Mathieu,

On Tue, 28 Jul 2020 at 09:32, Mathieu Othacehe <othacehe@gnu.org> wrote:

> Here I mean clicking on the downloadable image here[1] and then hit
> "cancel" when the download popup appears, or the abort button later on,
> when the download is started.

Ah that’ annoying indeed. :-)

And does it mess Cuirass if the connection is lost e.g. down the
network?

Cheers,
simon




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#42548: Cuirass 504 errors
  2020-07-28  8:49     ` zimoun
@ 2020-07-28 14:56       ` Mathieu Othacehe
  2020-08-04 16:48         ` Mathieu Othacehe
  0 siblings, 1 reply; 8+ messages in thread
From: Mathieu Othacehe @ 2020-07-28 14:56 UTC (permalink / raw)
  To: zimoun; +Cc: 42548


> And does it mess Cuirass if the connection is lost e.g. down the
> network?

Not sure yet, I also found this message:

--8<---------------cut here---------------start------------->8---
Uncaught exception in fiber ##f:
In ice-9/boot-9.scm:
  1736:10  5 (with-exception-handler _ _ #:unwind? _ # _)
In web/server/fiberized.scm:
   160:26  4 (_)
In ice-9/suspendable-ports.scm:
     83:4  3 (write-bytes #<closed: file 7ff11c2dec40> #vu8(60 33 ?) ?)
In unknown file:
           2 (port-write #<closed: file 7ff11c2dec40> #vu8(60 33 # ?) ?)
In ice-9/boot-9.scm:
  1669:16  1 (raise-exception _ #:continuable? _)
  1669:16  0 (raise-exception _ #:continuable? _)
ice-9/boot-9.scm:1669:16: In procedure raise-exception:
In procedure fport_write: Broken pipe
--8<---------------cut here---------------end--------------->8---

that suggests that we try to write something to a closed file.

To be investigated :)

Mathieu




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#42548: Cuirass 504 errors
  2020-07-26 16:10 bug#42548: Cuirass 504 errors Mathieu Othacehe
  2020-07-27 22:11 ` zimoun
@ 2020-07-30 14:47 ` Mathieu Othacehe
  1 sibling, 0 replies; 8+ messages in thread
From: Mathieu Othacehe @ 2020-07-30 14:47 UTC (permalink / raw)
  To: 42548


Hey,

> A second issue is caused when a build product download is started, then
> aborted. In that case, sendfile throws an exception or enters an endless
> loop.

Ok, so I found a couple of errors here. First, I noticed that it was not
possible to download simultaneously two build products, because the
first download was blocking the whole process.

This is solved by: 6ad9c602697ffe33c8fbb09ccd796b74bf600223. In short,
current-fiber was set to #f, both in the context of the caller and the
spawned thread. So I think the get-message operating was blocking the
whole thread instead of suspending the current fiber. But if someone
else could take a look it would be nice :).

Second issue, sendfile may throw EPIPE or ECONNRESET if the client
disconnects before the end of the transfer. I think, besides the dirty
backtrace, it was not harmful. But anyway, its better to catch this as
we are doing in "guix publish", see:
0955a11abd9e27c96a1375cca6a1c97869b5780a.

I fear it won't be enough to fix the 504 errors, but at least it's a
start.

Thanks,

Mathieu




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#42548: Cuirass 504 errors
  2020-07-28 14:56       ` Mathieu Othacehe
@ 2020-08-04 16:48         ` Mathieu Othacehe
  2020-08-06  8:16           ` Mathieu Othacehe
  0 siblings, 1 reply; 8+ messages in thread
From: Mathieu Othacehe @ 2020-08-04 16:48 UTC (permalink / raw)
  To: 42548


Hello,

> that suggests that we try to write something to a closed file.
>
> To be investigated :)

Ok, so I have a better grasp on what's going on. Cuirass web server is
receiving some requests such as "/builds/1234)" which were not rejected,
but worst, caused SQL queries such as "select * from Builds".

As the table is quite large, it caused some of the DB workers to
hang. Once all DB workers were hanging, the queries started to
accumulate until the open fd limit (1024) was reached.

I did consolidate the HTTP queries validation, and Cuirass web server is
now running since 48 hours, which has not happened in months I think.

I also added some warnings to detect DB workers hanging for more than 5
seconds. The next step is to log all SQL queries using[1]. This should
allow us to spot this kind of issues more easily. Logging the duration
of each query should also help us to optimize the queries.

I'm still waiting a few days before closing this issue.

Thanks,

Mathieu

[1]: https://notabug.org/guile-sqlite3/guile-sqlite3/pulls/16 




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#42548: Cuirass 504 errors
  2020-08-04 16:48         ` Mathieu Othacehe
@ 2020-08-06  8:16           ` Mathieu Othacehe
  0 siblings, 0 replies; 8+ messages in thread
From: Mathieu Othacehe @ 2020-08-06  8:16 UTC (permalink / raw)
  To: 42548-done


Hello,

> I'm still waiting a few days before closing this issue.

No issues so far, closing this one.

Mathieu




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-08-06  8:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-26 16:10 bug#42548: Cuirass 504 errors Mathieu Othacehe
2020-07-27 22:11 ` zimoun
2020-07-28  7:32   ` Mathieu Othacehe
2020-07-28  8:49     ` zimoun
2020-07-28 14:56       ` Mathieu Othacehe
2020-08-04 16:48         ` Mathieu Othacehe
2020-08-06  8:16           ` Mathieu Othacehe
2020-07-30 14:47 ` Mathieu Othacehe

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).