* Race condition in public-inbox-nntpd? @ 2018-10-13 18:46 Jonathan Corbet 2018-10-16 6:36 ` Eric Wong 2018-12-06 3:59 ` [PATCH] nntp: prevent event_read from firing twice in a row Eric Wong 0 siblings, 2 replies; 4+ messages in thread From: Jonathan Corbet @ 2018-10-13 18:46 UTC (permalink / raw) To: meta; +Cc: Eric Wong So for a little while, I've been seeing occasional NNTP weirdness, associated with log lines like: Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: error from: XOVER 196731-196731 (BUG: nested long response at /usr/local/share/perl5/PublicInbox/NNTP.pm line 588. Such complaints tend to be immediately followed by significant disgruntlement on the client side. I use gnus to read the various NNTP feeds, and I mark articles (with "u") when I want to keep them around; I typically have quite a few of them marked at any given time in a group like linux-kernel. When I open the group in gnus, it will do an XOVER on each of those marked articles, generating dozens of single-article XOVERs in quick succession. It's always the single-article ones that fail; the big XOVER for all of the new stuff at the end works just fine. Another user has complained that things fail with Thunderbird, again with the same symptoms on the server side. I have "fixed" the problem with this little patch: diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm index 022bb80..017ad88 100644 --- a/lib/PublicInbox/NNTP.pm +++ b/lib/PublicInbox/NNTP.pm @@ -951,6 +951,10 @@ sub event_read { $self->{rbuf} .= $$buf; } my $r = 1; + if ($self->{long_res}) { + err($self, "long-res on event read"); + $r = 0; + } while ($r > 0 && $self->{rbuf} =~ s/\A\s*([^\r\n]*)\r?\n//) { my $line = $1; return $self->close if $line =~ /[[:cntrl:]]/s; That makes things work, but it is clearly papering over the real problem. I've spent a fair while staring at the code. As far as I can tell, the logic there should be sufficient to prevent this from happening; it's not supposed to be reading while a long response is in the works. But somehow it happens. Does anybody have any thoughts on how this could be coming about? Thanks, jon ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: Race condition in public-inbox-nntpd? 2018-10-13 18:46 Race condition in public-inbox-nntpd? Jonathan Corbet @ 2018-10-16 6:36 ` Eric Wong 2018-10-16 14:04 ` Jonathan Corbet 2018-12-06 3:59 ` [PATCH] nntp: prevent event_read from firing twice in a row Eric Wong 1 sibling, 1 reply; 4+ messages in thread From: Eric Wong @ 2018-10-16 6:36 UTC (permalink / raw) To: Jonathan Corbet; +Cc: meta Jonathan Corbet <corbet@lwn.net> wrote: > So for a little while, I've been seeing occasional NNTP weirdness, > associated with log lines like: > > Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: error from: > XOVER 196731-196731 (BUG: nested long response at /usr/local/share/perl5/PublicInbox/NNTP.pm line 588. > > Such complaints tend to be immediately followed by significant > disgruntlement on the client side. Do you have any logs of client commands which hit this? public-inbox-nntpd should log commands + timings to stdout. > I use gnus to read the various NNTP feeds, and I mark articles (with "u") > when I want to keep them around; I typically have quite a few of them > marked at any given time in a group like linux-kernel. When I open the > group in gnus, it will do an XOVER on each of those marked articles, > generating dozens of single-article XOVERs in quick succession. It's > always the single-article ones that fail; the big XOVER for all of the new > stuff at the end works just fine. > > Another user has complained that things fail with Thunderbird, again with > the same symptoms on the server side. No idea how to use either gnus or tbird, here; so a command sequence which can be fed into nc or socat would be helpful in reproducing the problem. Any relation to group size in reproducing this? > I have "fixed" the problem with this little patch: <snip> > That makes things work, but it is clearly papering over the real > problem. Agreed. > I've spent a fair while staring at the code. As far as I can tell, the > logic there should be sufficient to prevent this from happening; it's not > supposed to be reading while a long response is in the works. But somehow > it happens. I stared at it some, too; but I'm not seeing it right now, either; but it's been a tiring few weeks for me so I'm not at my sharpest. Thanks for bringing this to everyones' attention. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Race condition in public-inbox-nntpd? 2018-10-16 6:36 ` Eric Wong @ 2018-10-16 14:04 ` Jonathan Corbet 0 siblings, 0 replies; 4+ messages in thread From: Jonathan Corbet @ 2018-10-16 14:04 UTC (permalink / raw) To: Eric Wong; +Cc: meta On Tue, 16 Oct 2018 06:36:20 +0000 Eric Wong <e@80x24.org> wrote: > Jonathan Corbet <corbet@lwn.net> wrote: > > So for a little while, I've been seeing occasional NNTP weirdness, > > associated with log lines like: > > > > Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: error from: > > XOVER 196731-196731 (BUG: nested long response at /usr/local/share/perl5/PublicInbox/NNTP.pm line 588. > > > > Such complaints tend to be immediately followed by significant > > disgruntlement on the client side. > > Do you have any logs of client commands which hit this? I'll attach a snippet. The pattern always looks about the same - a lot of XOVER commands; at some point they start failing. > Any relation to group size in reproducing this? It tends to happen on big groups, but the real common factor seems to be a lot of saved messages, resulting in a lot of single-article XOVER commands in rapid sequence. > I stared at it some, too; but I'm not seeing it right now, either; > but it's been a tiring few weeks for me so I'm not at my sharpest. I can relate to that! Thanks, jon Oct 13 18:01:04 archive.lwn.net public-inbox-nntpd[12352]: [6] 1 GROUP lwn.kernel.lkml - 0.001696 Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 153433-153447 - 0.087349 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 158027-158032 - 0.009294 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 181909-181914 - 0.009046 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 182996-183003 - 0.010729 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 187687-187687 - 0.001941 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 188205-188205 - 0.003165 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 188351-188351 - 0.002194 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 188571-188573 - 0.005668 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 188580-188580 - 0.001533 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 188680-188684 - 0.005570 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 188691-188697 - 0.008083 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 188792-188792 - 0.003481 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 188798-188798 - 0.001609 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 188808-188808 - 0.001487 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 188818-188818 - 0.002147 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 188841-188841 - 0.001679 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 189272-189272 - 0.001514 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 189524-189528 - 0.008616 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 189547-189547 - 0.001627 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 189736-189736 - 0.002618 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 189868-189877 - 0.012032 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 189950-189952 - 0.004263 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 190022-190022 - 0.001705 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 190123-190123 - 0.001533 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 190207-190207 - 0.001467 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 190224-190224 - 0.002197 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 190452-190452 - 0.001644 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 190603-190603 - 0.001524 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 190771-190773 - 0.003532 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 191276-191276 - 0.001874 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 191290-191290 - 0.001543 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 191448-191448 - 0.001545 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 191519-191519 - 0.001578 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 192078-192078 - 0.001462 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 192278-192278 - 0.001610 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 192385-192385 - 0.001401 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 192458-192458 - 0.001678 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 192534-192534 - 0.001726 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 193216-193224 - 0.009579 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 193719-193719 - 0.001444 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 193736-193743 - 0.008615 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 194152-194152 - 0.001519 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 194261-194264 - 0.004747 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 194371-194371 - 0.001522 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 194390-194390 - 0.001397 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 194840-194856 - 0.017661 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 194991-195022 - 0.031130 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 0 XOVER 196023-196023 - 0.001578 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: error from: XOVER 196034-196034 (BUG: nested long response at /usr/local/share/perl5/PublicInbox/NNTP.pm line 588. Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: ) Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 1 XOVER 196034-196034 - 0.000694 deferred[6] Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: error from: XOVER 196073-196073 (BUG: nested long response at /usr/local/share/perl5/PublicInbox/NNTP.pm line 588. Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: ) Oct 13 18:01:06 archive.lwn.net public-inbox-nntpd[12352]: [6] 1 XOVER 196073-196073 - 0.000563 deferred[6] ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH] nntp: prevent event_read from firing twice in a row 2018-10-13 18:46 Race condition in public-inbox-nntpd? Jonathan Corbet 2018-10-16 6:36 ` Eric Wong @ 2018-12-06 3:59 ` Eric Wong 1 sibling, 0 replies; 4+ messages in thread From: Eric Wong @ 2018-12-06 3:59 UTC (permalink / raw) To: Jonathan Corbet; +Cc: meta Jonathan Corbet <corbet@lwn.net> wrote: > index 022bb80..017ad88 100644 > --- a/lib/PublicInbox/NNTP.pm > +++ b/lib/PublicInbox/NNTP.pm > @@ -951,6 +951,10 @@ sub event_read { > $self->{rbuf} .= $$buf; > } > my $r = 1; > + if ($self->{long_res}) { > + err($self, "long-res on event read"); > + $r = 0; > + } It's actually fine, too; and I considered returning early from event_read if long_res was set once I figured out the problem. I would at least have made it exit the function before calling $self->read a few lines above, however; otherwise rbuf might get too huge if a client kept sending. But my alternative patch below is fewer syscalls with epoll[*], since it avoids the needless registration and immediate wakeup ---------8<--------- Subject: [PATCH] nntp: prevent event_read from firing twice in a row When a client starts pipelining requests to us which trigger long responses, we need to keep socket readiness checks disabled and only enable them when our socket rbuf is drained. Failure to do this caused aborted clients with "BUG: nested long response" when Danga::Socket calls event_read for read-readiness after our "next_tick" sub fires in the same event loop iteration. Reported-by: Jonathan Corbet <corbet@lwn.net> cf. https://public-inbox.org/meta/20181013124658.23b9f9d2@lwn.net/ --- lib/PublicInbox/NNTP.pm | 26 ++++++++++++++++++-------- t/nntpd.t | 21 +++++++++++++++++++++ 2 files changed, 39 insertions(+), 8 deletions(-) diff --git a/lib/PublicInbox/NNTP.pm b/lib/PublicInbox/NNTP.pm index 022bb80..90a5a3a 100644 --- a/lib/PublicInbox/NNTP.pm +++ b/lib/PublicInbox/NNTP.pm @@ -51,8 +51,16 @@ () # before finishing reading: if (my $long_cb = $nntp->{long_res}) { $nntp->write($long_cb); - } elsif (&Danga::Socket::POLLIN & $nntp->{event_watch}) { + } else { + # pipelined request, we bypassed socket-readiness + # checks to get here: event_read($nntp); + + # maybe there's more pipelined data, or we'll have + # to register it for socket-readiness notifications + if (!$nntp->{long_res} && !$nntp->{closed}) { + check_read($nntp); + } } } } @@ -609,7 +617,7 @@ ($$) now() - $t0); } else { update_idle_time($self); - $self->watch_read(1); + check_read($self); } } elsif ($more) { # $self->{write_buf_size}: # no recursion, schedule another call ASAP @@ -620,7 +628,7 @@ ($$) $nextt ||= PublicInbox::EvCleanup::asap(*next_tick); } else { # all done! $self->{long_res} = undef; - $self->watch_read(1); + check_read($self); res($self, '.'); out($self, " deferred[$fd] done - %0.6f", now() - $t0); } @@ -968,10 +976,9 @@ sub event_read { update_idle_time($self); } -sub watch_read { - my ($self, $bool) = @_; - my $rv = $self->SUPER::watch_read($bool); - if ($bool && index($self->{rbuf}, "\n") >= 0) { +sub check_read { + my ($self) = @_; + if (index($self->{rbuf}, "\n") >= 0) { # Force another read if there is a pipelined request. # We don't know if the socket has anything for us to read, # and we must double-check again by the time the timer fires @@ -979,8 +986,11 @@ sub watch_read { # another long response. push @$nextq, $self; $nextt ||= PublicInbox::EvCleanup::asap(*next_tick); + } else { + # no pipelined requests available, let the kernel know + # to wake us up if there's more + $self->watch_read(1); # Danga::Socket::watch_read } - $rv; } sub not_idle_long ($$) { diff --git a/t/nntpd.t b/t/nntpd.t index 9c1d076..ffed437 100644 --- a/t/nntpd.t +++ b/t/nntpd.t @@ -252,6 +252,27 @@ EOF ok($date <= $t1, 'valid date before stop'); } + # pipelined requests: + { + my $nreq = 90; + syswrite($s, "GROUP $group\r\n"); + my $res = <$s>; + my $rdr = fork; + if ($rdr == 0) { + use POSIX qw(_exit); + for (1..$nreq) { + <$s> =~ /\A224 / or _exit(1); + <$s> =~ /\A1/ or _exit(2); + <$s> eq ".\r\n" or _exit(3); + } + _exit(0); + } + for (1..$nreq) { + syswrite($s, "XOVER 1\r\n"); + } + is($rdr, waitpid($rdr, 0), 'reader done'); + is($? >> 8, 0, 'no errors'); + } { setsockopt($s, IPPROTO_TCP, TCP_NODELAY, 1); syswrite($s, 'HDR List-id 1-'); [*] With my noisy system, the time it took to do all the git@vger XOVER one-by-one with the below script went from ~3:58 to ~3:52 on my system by avoiding unnecessary watch_read (epoll_ctl) calls and epoll_wait wakeups. Quick-and-dirty Ruby script I wrote to trigger the problem (Ruby since it's what I've been spending too much of my time on :x) -------------------------------- #!/usr/bin/ruby # frozen_string_literal: true require 'socket' require 'uri' usage = "usage: #$0 news://example.com/inbox.group.name" uri = ARGV.shift or abort usage uri = URI(uri) uri.port ||= 119 group = uri.path.sub(%r{\A/+}, '') # String#delete_prefix requires Ruby 2.5+ s = Socket.tcp(uri.host, uri.port) l = s.gets l =~ /\A2\d\d / or abort "bad greeting: #{l}" s.puts "GROUP #{group}" l = s.gets code, _, min, max = l.chomp!.split.map!(&:to_i) code == 211 or abort "bad GROUP response: #{l}" rdr = Thread.new do nres = 0 while l = s.gets l.start_with?('205 ') and break # cmd_quit l.start_with?('224 ') or abort "bad OVER response: #{l}" while l = s.gets if l == ".\r\n" nres += 1 break end end end nres end range = min..max range.each { |i| s.puts "XOVER #{i}" } puts "requests=#{range.size} #{Time.now}" s.puts "QUIT" puts "responses=#{rdr.value} #{Time.now}" ^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-12-06 3:59 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-10-13 18:46 Race condition in public-inbox-nntpd? Jonathan Corbet 2018-10-16 6:36 ` Eric Wong 2018-10-16 14:04 ` Jonathan Corbet 2018-12-06 3:59 ` [PATCH] nntp: prevent event_read from firing twice in a row Eric Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).