unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH] doc: technical/ds.txt: describe PublicInbox::DS divergences
@ 2020-01-10 20:35 Eric Wong
  0 siblings, 0 replies; only message in thread
From: Eric Wong @ 2020-01-10 20:35 UTC (permalink / raw)
  To: meta

Danga::Socket 1.62 was released a few months back and
the maintainer indicated it would be the last release.
We've diverged significantly in incompatible ways...

While most of this should've already been documented in
commit messages, putting it all into one document could
make it easier-to-digest.

It's also a strange design for anybody used to conventional
event loops.  Maybe this is an unconventional project :P
---
 Documentation/technical/ds.txt | 112 +++++++++++++++++++++++++++++++++
 MANIFEST                       |   1 +
 lib/PublicInbox/DS.pm          |  16 ++---
 3 files changed, 121 insertions(+), 8 deletions(-)
 create mode 100644 Documentation/technical/ds.txt

diff --git a/Documentation/technical/ds.txt b/Documentation/technical/ds.txt
new file mode 100644
index 00000000..cbd06cfb
--- /dev/null
+++ b/Documentation/technical/ds.txt
@@ -0,0 +1,112 @@
+PublicInbox::DS - event loop and async I/O base class
+
+Our PublicInbox::DS event loop which powers public-inbox-nntpd
+and public-inbox-httpd diverges significantly from the
+unmaintained Danga::Socket package we forked from.  In fact,
+it's probably different from most other event loops out there.
+
+Most notably:
+
+* There is one and only one callback: ->event_step.  Unlike other
+  event loops, there are no separate callbacks for read, write,
+  error or hangup events.  In fact, we never care which kevent
+  filter or poll/epoll event flag (e.g. POLLIN/POLLOUT/POLLHUP)
+  triggers a call.
+
+  The lack of read/write callback distinction is driven by the
+  fact TLS libraries (e.g. OpenSSL via IO::Socket::SSL) may
+  declare SSL_WANT_READ on SSL_write(), and SSL_WANT_READ on
+  SSL_read().  So we end up having to let each user object decide
+  whether it wants to make read or write calls depending on its
+  internal state, completely independent of the event loop.
+
+  Error and hangup (POLLERR and POLLHUP) callbacks are redundant and
+  only triggered in rare cases.  They're redundant because the
+  result of every read and write call in ->event_step must be
+  checked, anyways.  At best, callbacks for POLLHUP and POLLERR can
+  save one syscall per socket lifetime and not worth the extra code
+  it imposes.
+
+  Reducing the user-supplied code down to a single callback allows
+  subclasses to keep their logic self-contained.  The combination
+  of this change and one-shot wakeups (see below) for bidirectional
+  data flows make asynchronous code easier to reason about.
+
+Other divergences:
+
+* ->write buffering uses temporary files whereas Danga::Socket used
+  the heap.  The rationale for this is the kernel already provides
+  ample (and configurable) space for socket buffers.  Modern kernels
+  also cache FS operations aggressively, so systems with ample RAM
+  are unlikely to notice degradation, while small systems are less
+  likely to suffer unpredictable heap fragmentation, swap and OOM
+  penalties.
+
+  In the future, we may introduce sendfile and mmap+SSL_write to
+  reduce data copies, and use FALLOC_FL_PUNCH_HOLE on Linux to
+  release space after the buffer is partially cleared.
+
+Augmented features:
+
+* obj->write(CODEREF) passes the object itself to the CODEREF
+  Being able to enqueue subroutine calls is a powerful feature in
+  Danga::Socket for keeping linear logic in an asynchronous environment.
+  Unfortunately, each subroutine takes several kilobytes of memory.
+  One small change to Danga::Socket is to pass the receiver object
+  (aka "$self") to the CODEREF.  $self can store any necessary
+  state it needs for a normal (named) subroutine.  This allows us to
+  put the same sub into multiple queues without paying a large
+  memory penalty for each one.
+
+  This idea is also more easily ported to C or other languages which
+  lack anonymous subroutines (aka "closures").
+
+* ->requeue support.  An optimization of the AddTimer(0, ...) idiom
+  for immediately dispatching code at the next event loop iteration.
+  public-inbox uses this for fairly generating large responses
+  iteratively (see PublicInbox::NNTP::long_response or the use of
+  ->getline callbacks for generating gigantic gzipped mboxes).
+
+New features
+
+* One-shot wakeups allowed via EPOLLONESHOT or EV_DISPATCH.  These
+  flags allow us to simplify code in ->event_step callbacks for
+  bidirectional sockets (NNTP and HTTP).  Instead of merely reacting
+  to events, control is handed over at ->event_step in one-shot scenarios.
+  The event_step caller (NNTP || HTTP) then becomes proactive in declaring
+  which (if any) events it's interested in for the next loop iteration.
+
+* Edge-triggering available via EPOLLET or EV_CLEAR.  These reduce wakeups
+  for unidirectional classes (e.g. PublicInbox::Listener sockets,
+  and pipes via PublicInbox::HTTPD::Async).
+
+* IO::Socket::SSL support (for NNTPS, STARTTLS+NNTP, HTTPS)
+
+* dwaitpid (waitpid wrapper) support for reaping dead children
+
+* reliable signal wakeups are supported via signalfd on Linux,
+  EVFILT_SIGNAL on *BSDs via IO::KQueue.
+
+Removed features
+
+* Many fields removed or moved to subclasses, so the underlying
+  hash is smaller and suitable for FDs other than stream sockets.
+  Some fields we enforce (e.g. wbuf, wbuf_off) are autovivified
+  on an as-needed basis to save memory when they're not needed.
+
+* TCP_CORK support removed, instead we use MSG_MORE on non-TLS sockets
+  and we may use vectored I/O support via GnuTLS in the future
+  for TLS sockets.
+
+* per-FD PLCMap (post-loop callback) removed, we got ->requeue
+  support where no extra hash lookups or assignments are necessary.
+
+* read push backs removed.  Some subclasses use a read buffer ({rbuf})
+  but they control it, not this event loop.
+
+* Profiling and debug logging removed.  Perl and OS-specific tracers
+  and profilers are sufficient.
+
+* ->AddOtherFds support removed, everything watched is a subclass of
+  PublicInbox::DS, but we've slimmed down the fields to eliminate
+  the memory penalty for objects.
diff --git a/MANIFEST b/MANIFEST
index 914015ad..3736c777 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -34,6 +34,7 @@ Documentation/public-inbox-watch.pod
 Documentation/public-inbox-xcpdb.pod
 Documentation/public-inbox.cgi.pod
 Documentation/standards.perl
+Documentation/technical/ds.txt
 Documentation/txt2pre
 HACKING
 INSTALL
diff --git a/lib/PublicInbox/DS.pm b/lib/PublicInbox/DS.pm
index 09dc3992..058b1358 100644
--- a/lib/PublicInbox/DS.pm
+++ b/lib/PublicInbox/DS.pm
@@ -3,15 +3,15 @@
 #
 # This license differs from the rest of public-inbox
 #
-# This is a fork of the (for now) unmaintained Danga::Socket 1.61.
-# Unused features will be removed, and updates will be made to take
-# advantage of newer kernels.
+# This is a fork of the unmaintained Danga::Socket (1.61) with
+# significant changes.  See Documentation/technical/ds.txt in our
+# source for details.
 #
-# API changes to diverge from Danga::Socket will happen to better
-# accomodate new features and improve scalability.  Do not expect
-# this to be a stable API like Danga::Socket.
-# Bugs encountered (and likely fixed) are reported to
-# bug-Danga-Socket@rt.cpan.org and visible at:
+# Do not expect this to be a stable API like Danga::Socket,
+# but it will evolve to suite our needs and to take advantage of
+# newer Linux and *BSD features.
+# Bugs encountered were reported to bug-Danga-Socket@rt.cpan.org,
+# fixed in Danga::Socket 1.62 and visible at:
 # https://rt.cpan.org/Public/Dist/Display.html?Name=Danga-Socket
 package PublicInbox::DS;
 use strict;

^ permalink raw reply related	[flat|nested] only message in thread

only message in thread, other threads:[~2020-01-10 20:35 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-01-10 20:35 [PATCH] doc: technical/ds.txt: describe PublicInbox::DS divergences Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).