unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* Occasional public-inbox-httpd flakiness
@ 2024-11-05 22:31 Jonathan Corbet
  2024-11-05 23:24 ` Eric Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Jonathan Corbet @ 2024-11-05 22:31 UTC (permalink / raw)
  To: meta

The LWN archive server is running Debian's 1.9.0 public-inbox package.
Every now and then, usually after at least a week of operation, HTTP
requests will start returning empty messages; I find stuff like this in
the log:

Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in index at /usr/share/perl5/PublicInbox/Eml.pm line 109.
Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in index at /usr/share/perl5/PublicInbox/Eml.pm line 109.
Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in pattern match (m//) at /usr/share/perl5/PublicInbox/Eml.pm line 115.
Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in pattern match (m//) at /usr/share/perl5/PublicInbox/Eml.pm line 120.
Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in pattern match (m//) at /usr/share/perl5/PublicInbox/Eml.pm line 127.
Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in substitution (s///) at /usr/share/perl5/PublicInbox/Mbox.pm line 115.

The pattern is pretty much always the same.  Restarting
public-inbox-httpd makes the problem go away again.

Is this a problem that anybody else has seen, or am I especially
lucky...?

Thanks,

jon

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Occasional public-inbox-httpd flakiness
  2024-11-05 22:31 Occasional public-inbox-httpd flakiness Jonathan Corbet
@ 2024-11-05 23:24 ` Eric Wong
  2024-11-05 23:29   ` Jonathan Corbet
  2024-11-12 19:14   ` Jonathan Corbet
  0 siblings, 2 replies; 10+ messages in thread
From: Eric Wong @ 2024-11-05 23:24 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: meta

Jonathan Corbet <corbet@lwn.net> wrote:
> The LWN archive server is running Debian's 1.9.0 public-inbox package.
> Every now and then, usually after at least a week of operation, HTTP
> requests will start returning empty messages; I find stuff like this in
> the log:
> 
> Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in index at /usr/share/perl5/PublicInbox/Eml.pm line 109.
> Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in index at /usr/share/perl5/PublicInbox/Eml.pm line 109.
> Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in pattern match (m//) at /usr/share/perl5/PublicInbox/Eml.pm line 115.
> Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in pattern match (m//) at /usr/share/perl5/PublicInbox/Eml.pm line 120.
> Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in pattern match (m//) at /usr/share/perl5/PublicInbox/Eml.pm line 127.
> Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in substitution (s///) at /usr/share/perl5/PublicInbox/Mbox.pm line 115.

Definitely something that's popped up in my recollection; but
hasn't happened in a while for Eml.pm and Mbox.pm (yeah, it's
been a while since v1.9 :x).

Are the git cat-file (or Gcf2) processes still running?  Are any
successful responses returned for requests to mail messages?

Error handling should be improved in .git nowadays but I'm still
struggling to get a release out due to coderepo <=> inbox mapping
messiness :<

> The pattern is pretty much always the same.  Restarting
> public-inbox-httpd makes the problem go away again.

Usually, uninitialized value errors are isolated to a single
request (e.g. broken emails) and there shouldn't be a
need to restart unless every request is failing.

> Is this a problem that anybody else has seen, or am I especially
> lucky...?

I've had some similar problems from inboxes/coderepos getting
removed; also there's OOMs on my HW causing git processes to fail.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Occasional public-inbox-httpd flakiness
  2024-11-05 23:24 ` Eric Wong
@ 2024-11-05 23:29   ` Jonathan Corbet
  2024-11-05 23:34     ` Eric Wong
  2024-11-12 19:14   ` Jonathan Corbet
  1 sibling, 1 reply; 10+ messages in thread
From: Jonathan Corbet @ 2024-11-05 23:29 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

> Jonathan Corbet <corbet@lwn.net> wrote:
>> The LWN archive server is running Debian's 1.9.0 public-inbox package.
>> Every now and then, usually after at least a week of operation, HTTP
>> requests will start returning empty messages; I find stuff like this in
>> the log:
>> 
>> Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in index at /usr/share/perl5/PublicInbox/Eml.pm line 109.
>> Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in index at /usr/share/perl5/PublicInbox/Eml.pm line 109.
>> Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in pattern match (m//) at /usr/share/perl5/PublicInbox/Eml.pm line 115.
>> Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in pattern match (m//) at /usr/share/perl5/PublicInbox/Eml.pm line 120.
>> Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in pattern match (m//) at /usr/share/perl5/PublicInbox/Eml.pm line 127.
>> Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in substitution (s///) at /usr/share/perl5/PublicInbox/Mbox.pm line 115.
>
> Definitely something that's popped up in my recollection; but
> hasn't happened in a while for Eml.pm and Mbox.pm (yeah, it's
> been a while since v1.9 :x).
>
> Are the git cat-file (or Gcf2) processes still running?  Are any
> successful responses returned for requests to mail messages?

As for the first, I'll have to look the next time it happens - I know of
no way to force that, so it's a matter of waiting.

There are still definitely successful responses; my guess has always
been that one of the public-inbox-httpd processes has gone weird while
the other still work.

>> The pattern is pretty much always the same.  Restarting
>> public-inbox-httpd makes the problem go away again.
>
> Usually, uninitialized value errors are isolated to a single
> request (e.g. broken emails) and there shouldn't be a
> need to restart unless every request is failing.

It's not the email that is the issue - a specific URL that fails before
the restart will work afterward.

>> Is this a problem that anybody else has seen, or am I especially
>> lucky...?
>
> I've had some similar problems from inboxes/coderepos getting
> removed; also there's OOMs on my HW causing git processes to fail.

The system as a whole is far from any sort of OOM state; that was one of
the first things I looked for.

Thanks,

jon

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Occasional public-inbox-httpd flakiness
  2024-11-05 23:29   ` Jonathan Corbet
@ 2024-11-05 23:34     ` Eric Wong
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Wong @ 2024-11-05 23:34 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: meta

Jonathan Corbet <corbet@lwn.net> wrote:
> There are still definitely successful responses; my guess has always
> been that one of the public-inbox-httpd processes has gone weird while
> the other still work.

Btw, if you're running with the -W<num> switch for multiple
workers, you can kill the specific worker and leave the rest of
the workers, too.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Occasional public-inbox-httpd flakiness
  2024-11-05 23:24 ` Eric Wong
  2024-11-05 23:29   ` Jonathan Corbet
@ 2024-11-12 19:14   ` Jonathan Corbet
  2024-11-12 19:20     ` Eric Wong
  1 sibling, 1 reply; 10+ messages in thread
From: Jonathan Corbet @ 2024-11-12 19:14 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

> Jonathan Corbet <corbet@lwn.net> wrote:
>> The LWN archive server is running Debian's 1.9.0 public-inbox package.
>> Every now and then, usually after at least a week of operation, HTTP
>> requests will start returning empty messages; I find stuff like this in
>> the log:
>> 
>> Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in index at /usr/share/perl5/PublicInbox/Eml.pm line 109.
>> Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in index at /usr/share/perl5/PublicInbox/Eml.pm line 109.
>> Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in pattern match (m//) at /usr/share/perl5/PublicInbox/Eml.pm line 115.
>> Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in pattern match (m//) at /usr/share/perl5/PublicInbox/Eml.pm line 120.
>> Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in pattern match (m//) at /usr/share/perl5/PublicInbox/Eml.pm line 127.
>> Nov 05 15:24:37 archive2.lwn.net public-inbox-httpd[1267166]: Use of uninitialized value in substitution (s///) at /usr/share/perl5/PublicInbox/Mbox.pm line 115.
>
> Definitely something that's popped up in my recollection; but
> hasn't happened in a while for Eml.pm and Mbox.pm (yeah, it's
> been a while since v1.9 :x).
>
> Are the git cat-file (or Gcf2) processes still running?  Are any
> successful responses returned for requests to mail messages?

Just to add a data point...the problem just recurred, and there are
definitely cat-file processes running:

$  ps ax | fgrep git
2640024 ?        S      0:40 /usr/bin/git --git-dir=repos/ALL.git -c core.abbrev=40 cat-file --batch
2735080 ?        S      0:18 /usr/bin/git --git-dir=repos/ALL.git -c core.abbrev=40 cat-file --batch
3184082 ?        S      0:03 /usr/bin/git --git-dir=repos/ALL.git -c core.abbrev=40 cat-file --batch
3723223 ?        Z      0:00 [git] <defunct>
3723227 ?        Z      0:00 [git] <defunct>

jon

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Occasional public-inbox-httpd flakiness
  2024-11-12 19:14   ` Jonathan Corbet
@ 2024-11-12 19:20     ` Eric Wong
  2024-11-12 21:25       ` Jonathan Corbet
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Wong @ 2024-11-12 19:20 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: meta

Jonathan Corbet <corbet@lwn.net> wrote:
> Just to add a data point...the problem just recurred, and there are
> definitely cat-file processes running:
> 
> $  ps ax | fgrep git
> 2640024 ?        S      0:40 /usr/bin/git --git-dir=repos/ALL.git -c core.abbrev=40 cat-file --batch
> 2735080 ?        S      0:18 /usr/bin/git --git-dir=repos/ALL.git -c core.abbrev=40 cat-file --batch
> 3184082 ?        S      0:03 /usr/bin/git --git-dir=repos/ALL.git -c core.abbrev=40 cat-file --batch
> 3723223 ?        Z      0:00 [git] <defunct>
> 3723227 ?        Z      0:00 [git] <defunct>
 
Can you see if the worker process causing warnings is connected to defunct gits?
Should've been fixed in master a while ago, but there's a lot of changes :x
master should be fine as long as you're not using -cindex + coderepos yet.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Occasional public-inbox-httpd flakiness
  2024-11-12 19:20     ` Eric Wong
@ 2024-11-12 21:25       ` Jonathan Corbet
  2024-11-12 21:41         ` Eric Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Jonathan Corbet @ 2024-11-12 21:25 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

> Jonathan Corbet <corbet@lwn.net> wrote:
>> Just to add a data point...the problem just recurred, and there are
>> definitely cat-file processes running:
>> 
>> $  ps ax | fgrep git
>> 2640024 ?        S      0:40 /usr/bin/git --git-dir=repos/ALL.git -c core.abbrev=40 cat-file --batch
>> 2735080 ?        S      0:18 /usr/bin/git --git-dir=repos/ALL.git -c core.abbrev=40 cat-file --batch
>> 3184082 ?        S      0:03 /usr/bin/git --git-dir=repos/ALL.git -c core.abbrev=40 cat-file --batch
>> 3723223 ?        Z      0:00 [git] <defunct>
>> 3723227 ?        Z      0:00 [git] <defunct>
>  
> Can you see if the worker process causing warnings is connected to defunct gits?
> Should've been fixed in master a while ago, but there's a lot of changes :x
> master should be fine as long as you're not using -cindex + coderepos yet.

By "connected to" you mean "is the parent of"?

I've long since restarted things - the LWN show must go on - but can
certainly look for that the next time around.

Thanks,

jon

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Occasional public-inbox-httpd flakiness
  2024-11-12 21:25       ` Jonathan Corbet
@ 2024-11-12 21:41         ` Eric Wong
  2024-11-12 21:46           ` Jonathan Corbet
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Wong @ 2024-11-12 21:41 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: meta

Jonathan Corbet <corbet@lwn.net> wrote:
> Eric Wong <e@80x24.org> writes:
> > Jonathan Corbet <corbet@lwn.net> wrote:
> >> Just to add a data point...the problem just recurred, and there are
> >> definitely cat-file processes running:
> >> 
> >> $  ps ax | fgrep git
> >> 2640024 ?        S      0:40 /usr/bin/git --git-dir=repos/ALL.git -c core.abbrev=40 cat-file --batch
> >> 2735080 ?        S      0:18 /usr/bin/git --git-dir=repos/ALL.git -c core.abbrev=40 cat-file --batch
> >> 3184082 ?        S      0:03 /usr/bin/git --git-dir=repos/ALL.git -c core.abbrev=40 cat-file --batch
> >> 3723223 ?        Z      0:00 [git] <defunct>
> >> 3723227 ?        Z      0:00 [git] <defunct>
> >  
> > Can you see if the worker process causing warnings is connected to defunct gits?
> > Should've been fixed in master a while ago, but there's a lot of changes :x
> > master should be fine as long as you're not using -cindex + coderepos yet.
> 
> By "connected to" you mean "is the parent of"?

Yes.  lsof +E should show how pipes are connecting processes.
Just wondering, those git zombies lingered until the restart, right?

IOW, they didn't disappear after a few seconds if the -httpd
worker was busy with other things.  During heavy traffic you'll
inevitably see short-lived zombies as the -httpd may not reap
fast enough, but zombies shouldn't linger indefinitely.

> I've long since restarted things - the LWN show must go on - but can
> certainly look for that the next time around.

Understood.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Occasional public-inbox-httpd flakiness
  2024-11-12 21:41         ` Eric Wong
@ 2024-11-12 21:46           ` Jonathan Corbet
  2024-11-12 21:54             ` Eric Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Jonathan Corbet @ 2024-11-12 21:46 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

> Jonathan Corbet <corbet@lwn.net> wrote:
>> Eric Wong <e@80x24.org> writes:
>> > Jonathan Corbet <corbet@lwn.net> wrote:
>> >> Just to add a data point...the problem just recurred, and there are
>> >> definitely cat-file processes running:
>> >> 
>> >> $  ps ax | fgrep git
>> >> 2640024 ?        S      0:40 /usr/bin/git --git-dir=repos/ALL.git -c core.abbrev=40 cat-file --batch
>> >> 2735080 ?        S      0:18 /usr/bin/git --git-dir=repos/ALL.git -c core.abbrev=40 cat-file --batch
>> >> 3184082 ?        S      0:03 /usr/bin/git --git-dir=repos/ALL.git -c core.abbrev=40 cat-file --batch
>> >> 3723223 ?        Z      0:00 [git] <defunct>
>> >> 3723227 ?        Z      0:00 [git] <defunct>
>> >  
>> > Can you see if the worker process causing warnings is connected to defunct gits?
>> > Should've been fixed in master a while ago, but there's a lot of changes :x
>> > master should be fine as long as you're not using -cindex + coderepos yet.
>> 
>> By "connected to" you mean "is the parent of"?
>
> Yes.  lsof +E should show how pipes are connecting processes.
> Just wondering, those git zombies lingered until the restart, right?
>
> IOW, they didn't disappear after a few seconds if the -httpd
> worker was busy with other things.  During heavy traffic you'll
> inevitably see short-lived zombies as the -httpd may not reap
> fast enough, but zombies shouldn't linger indefinitely.

Looking back through the terminal history, it looks like the zombies
hung out for a bit, but then went away.  There were a couple of zombies
every time I looked, but the PIDs eventually changed.

Thanks,

jon

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Occasional public-inbox-httpd flakiness
  2024-11-12 21:46           ` Jonathan Corbet
@ 2024-11-12 21:54             ` Eric Wong
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Wong @ 2024-11-12 21:54 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: meta

Jonathan Corbet <corbet@lwn.net> wrote:
> Eric Wong <e@80x24.org> writes:
> > Jonathan Corbet <corbet@lwn.net> wrote:
> >> Eric Wong <e@80x24.org> writes:
> >> > Jonathan Corbet <corbet@lwn.net> wrote:
> >> >> Just to add a data point...the problem just recurred, and there are
> >> >> definitely cat-file processes running:
> >> >> 
> >> >> $  ps ax | fgrep git
> >> >> 2640024 ?        S      0:40 /usr/bin/git --git-dir=repos/ALL.git -c core.abbrev=40 cat-file --batch
> >> >> 2735080 ?        S      0:18 /usr/bin/git --git-dir=repos/ALL.git -c core.abbrev=40 cat-file --batch
> >> >> 3184082 ?        S      0:03 /usr/bin/git --git-dir=repos/ALL.git -c core.abbrev=40 cat-file --batch
> >> >> 3723223 ?        Z      0:00 [git] <defunct>
> >> >> 3723227 ?        Z      0:00 [git] <defunct>
> >> >  
> >> > Can you see if the worker process causing warnings is connected to defunct gits?
> >> > Should've been fixed in master a while ago, but there's a lot of changes :x
> >> > master should be fine as long as you're not using -cindex + coderepos yet.
> >> 
> >> By "connected to" you mean "is the parent of"?
> >
> > Yes.  lsof +E should show how pipes are connecting processes.
> > Just wondering, those git zombies lingered until the restart, right?
> >
> > IOW, they didn't disappear after a few seconds if the -httpd
> > worker was busy with other things.  During heavy traffic you'll
> > inevitably see short-lived zombies as the -httpd may not reap
> > fast enough, but zombies shouldn't linger indefinitely.
> 
> Looking back through the terminal history, it looks like the zombies
> hung out for a bit, but then went away.  There were a couple of zombies
> every time I looked, but the PIDs eventually changed.

OK.  Any idea how long the zombies lingered?

How much load the LWN -httpd instance see?

AFAIK, the main cause of zombies I've seen was from the blob solver,
but that requires coderepos, which AFAIK nobody else ever used...

Also, it would likely be useful if you got an strace on the worker PID
that was causing warnings

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-11-12 21:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-05 22:31 Occasional public-inbox-httpd flakiness Jonathan Corbet
2024-11-05 23:24 ` Eric Wong
2024-11-05 23:29   ` Jonathan Corbet
2024-11-05 23:34     ` Eric Wong
2024-11-12 19:14   ` Jonathan Corbet
2024-11-12 19:20     ` Eric Wong
2024-11-12 21:25       ` Jonathan Corbet
2024-11-12 21:41         ` Eric Wong
2024-11-12 21:46           ` Jonathan Corbet
2024-11-12 21:54             ` Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).