unofficial mirror of meta@public-inbox.org
 help / color / mirror / Atom feed
* [PATCH 0/7] doc updates, fixups, and more
@ 2021-03-11 10:45 Eric Wong
  2021-03-11 10:45 ` [PATCH 1/7] doc: glossary: add information for dates and timestamps Eric Wong
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Eric Wong @ 2021-03-11 10:45 UTC (permalink / raw)
  To: meta

Eric Wong (7):
  doc: glossary: add information for dates and timestamps
  searchidx: remove smsg_from_doc
  lei_curl: note proposed master/client mode for curl
  doc: update 1.7 release notes, tuning, TODO
  imapclient: disable workaround for Mail::IMAPClient 3.43+
  config: use '-f' key to store config file pathname
  v2writable: fix undocumented --xapian-only

 Documentation/RelNotes/v1.7.0.wip       | 59 ++++++++++++++++++++++++-
 Documentation/public-inbox-glossary.pod | 14 ++++++
 Documentation/public-inbox-tuning.pod   |  9 ++++
 TODO                                    | 16 +++----
 lib/PublicInbox/Config.pm               |  3 +-
 lib/PublicInbox/IMAPClient.pm           | 12 ++---
 lib/PublicInbox/LeiCurl.pm              |  2 +
 lib/PublicInbox/Search.pm               |  4 +-
 lib/PublicInbox/SearchIdx.pm            | 12 -----
 lib/PublicInbox/V2Writable.pm           |  2 +-
 t/v2reindex.t                           |  5 +++
 11 files changed, 105 insertions(+), 33 deletions(-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/7] doc: glossary: add information for dates and timestamps
  2021-03-11 10:45 [PATCH 0/7] doc updates, fixups, and more Eric Wong
@ 2021-03-11 10:45 ` Eric Wong
  2021-03-11 10:45 ` [PATCH 2/7] searchidx: remove smsg_from_doc Eric Wong
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2021-03-11 10:45 UTC (permalink / raw)
  To: meta

These have been confusing to me in the past, too.
---
 Documentation/public-inbox-glossary.pod | 14 ++++++++++++++
 lib/PublicInbox/Search.pm               |  4 ++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/Documentation/public-inbox-glossary.pod b/Documentation/public-inbox-glossary.pod
index e188e563..61e1e9f8 100644
--- a/Documentation/public-inbox-glossary.pod
+++ b/Documentation/public-inbox-glossary.pod
@@ -83,6 +83,20 @@ the same email into one or more virtual folders for
 ease-of-filtering.  This is NOT tied to public-inbox names, as
 messages stored by lei may not be public.
 
+=item IMAP INTERNALDATE, JMAP receivedAt, rt: search prefix
+
+The first valid timestamp value of Received: headers (top first).
+If no Received: header exists, the Date: header is used, and the
+current time if neither header(s) exist.  When mirroring via
+git, this is the git commit time.
+
+=item IMAP SENT*, JMAP sentAt, dt: and d: search prefixes
+
+The first valid timestamp value of the Date: header(s).
+If no Date: header exists, the time from the Received: header is
+used, and then the current time if neither header exists.
+When mirroring via git, this is the git author time.
+
 =head1 COPYRIGHT
 
 Copyright 2021 all contributors L<mailto:meta@public-inbox.org>
diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index 209969c5..c7d52daf 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -13,9 +13,9 @@ use POSIX qw(strftime);
 # values for searching, changing the numeric value breaks
 # compatibility with old indices (so don't change them it)
 use constant {
-	TS => 0, # Received: header in Unix time (IMAP INTERNALDATE)
+	TS => 0, # Received: in Unix time (IMAP INTERNALDATE, JMAP receivedAt)
 	YYYYMMDD => 1, # Date: header for searching in the WWW UI
-	DT => 2, # Date: YYYYMMDDHHMMSS
+	DT => 2, # Date: YYYYMMDDHHMMSS (IMAP SENT*, JMAP sentAt)
 
 	# added for public-inbox 1.6.0+
 	BYTES => 3, # IMAP RFC822.SIZE

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/7] searchidx: remove smsg_from_doc
  2021-03-11 10:45 [PATCH 0/7] doc updates, fixups, and more Eric Wong
  2021-03-11 10:45 ` [PATCH 1/7] doc: glossary: add information for dates and timestamps Eric Wong
@ 2021-03-11 10:45 ` Eric Wong
  2021-03-11 10:45 ` [PATCH 3/7] lei_curl: note proposed master/client mode for curl Eric Wong
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2021-03-11 10:45 UTC (permalink / raw)
  To: meta

We no longer read Xapian docdata and favor hitting over.sqlite3,
instead, as Xapian is less likely to be available than SQLite.
---
 lib/PublicInbox/SearchIdx.pm | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 826302de..3372bea5 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -542,18 +542,6 @@ sub remove_keywords {
 	$self->{xdb}->replace_document($docid, $doc) if $replace;
 }
 
-sub smsg_from_doc ($) {
-	my ($doc) = @_;
-	my $data = $doc->get_data or return;
-	my $smsg = bless {}, 'PublicInbox::Smsg';
-	$smsg->{ts} = int_val($doc, PublicInbox::Search::TS());
-	my $dt = int_val($doc, PublicInbox::Search::DT());
-	my ($yyyy, $mon, $dd, $hh, $mm, $ss) = unpack('A4A2A2A2A2A2', $dt);
-	$smsg->{ds} = timegm($ss, $mm, $hh, $dd, $mon - 1, $yyyy);
-	$smsg->load_from_data($data);
-	$smsg;
-}
-
 sub xdb_remove {
 	my ($self, @docids) = @_;
 	$self->begin_txn_lazy;

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/7] lei_curl: note proposed master/client mode for curl
  2021-03-11 10:45 [PATCH 0/7] doc updates, fixups, and more Eric Wong
  2021-03-11 10:45 ` [PATCH 1/7] doc: glossary: add information for dates and timestamps Eric Wong
  2021-03-11 10:45 ` [PATCH 2/7] searchidx: remove smsg_from_doc Eric Wong
@ 2021-03-11 10:45 ` Eric Wong
  2021-03-11 10:45 ` [PATCH 4/7] doc: update 1.7 release notes, tuning, TODO Eric Wong
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2021-03-11 10:45 UTC (permalink / raw)
  To: meta

Who knows, maybe stuff learned during lei development
can be used to implement it in curl:
https://curl.se/mail/archive-2021-02/0031.html
---
 lib/PublicInbox/LeiCurl.pm | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/PublicInbox/LeiCurl.pm b/lib/PublicInbox/LeiCurl.pm
index 3a79fbf8..69c64cdf 100644
--- a/lib/PublicInbox/LeiCurl.pm
+++ b/lib/PublicInbox/LeiCurl.pm
@@ -4,6 +4,8 @@
 # common option and torsocks(1) wrapping for curl(1)
 # Eventually, we may support using libcurl via Inline::C and/or
 # WWW::Curl; but curl(1) is most prevalent and widely-installed.
+# n.b. curl may support a daemon/client model like lei someday:
+#   https://github.com/curl/curl/wiki/curl-tool-master-client
 package PublicInbox::LeiCurl;
 use strict;
 use v5.10.1;

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/7] doc: update 1.7 release notes, tuning, TODO
  2021-03-11 10:45 [PATCH 0/7] doc updates, fixups, and more Eric Wong
                   ` (2 preceding siblings ...)
  2021-03-11 10:45 ` [PATCH 3/7] lei_curl: note proposed master/client mode for curl Eric Wong
@ 2021-03-11 10:45 ` Eric Wong
  2021-03-11 10:45 ` [PATCH 5/7] imapclient: disable workaround for Mail::IMAPClient 3.43+ Eric Wong
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2021-03-11 10:45 UTC (permalink / raw)
  To: meta

Some stuff done, some stuff still needs doing.
---
 Documentation/RelNotes/v1.7.0.wip     | 59 ++++++++++++++++++++++++++-
 Documentation/public-inbox-tuning.pod |  9 ++++
 TODO                                  | 16 +++-----
 3 files changed, 72 insertions(+), 12 deletions(-)

diff --git a/Documentation/RelNotes/v1.7.0.wip b/Documentation/RelNotes/v1.7.0.wip
index a35ff227..f71f447f 100644
--- a/Documentation/RelNotes/v1.7.0.wip
+++ b/Documentation/RelNotes/v1.7.0.wip
@@ -4,12 +4,69 @@ MIME-Version: 1.0
 Content-Type: text/plain; charset=utf-8
 Content-Disposition: inline
 
-TODO: gcf2, detached indices, JMAP, ...
+Another big release focused on multi-inbox search and scalability.
+
+* general changes
+
+  config file parsing is 2x faster with 50K inboxes
+
+* read-only public-inbox-daemon (-httpd, -nntpd, -imapd):
+
+  libgit2 may be used via Inline::C to avoid hitting system pipe
+  and process limits.  See public-inbox-tuning(7) manpage
+  for more details.
+
+* public-inbox-extindex
+
+  A new Xapian + SQLite index able to search across several inboxes.
+  This may be configured to replace per-inbox Xapian DBs,
+  (but not per-inbox SQLite indices) and speed up manifest.js.gz
+  generation.
+
+  See public-inbox-extindex-format(5) and
+  public-inbox-extindex(1) manpages for more details.
+
+* public-inbox-nntpd
+
+  - startup is 6x faster with 50K inboxes if using -extindex
+
+* PublicInbox::WWW
+
+  - mboxrd search results are returned in reverse Xapian docid order,
+    so more recent results are more likely to show up first
+
+  - d: and dt: search prefixes allow "approxidate" formats supported
+    by "git log --since="
+
+  - manifest.js.gz generation is ~25x faster with -extindex
+
+* lei - local email interface
+
+  An experimental, subject-to-change, likely-to-eat-your-mail tool for
+  personal mail as well as interacting with public-inboxes on the local
+  filesystem or over HTTP(S).  See lei(1), lei-overview(7), and other
+  lei-* manpages for details.
+
+* public-inbox-watch
+
+  - IMAP and NNTP code shared with lei, fixing an off-by-one error
+    in IMAP synchronization for single-message IMAP folders.
+
+  - \Deleted and \Draft messages ignored for IMAP, as they are for
+    Maildir.
+
+  - IMAP and NNTP connection establishment (including git-credential
+    prompts) ordering is now tied to config file order.
 
 Compatibility:
 
 * Rollbacks all the way to public-inbox 1.2.0 remain supported
 
+Internal changes
+
+* public-inbox-index switched to new internal IPC code shared
+  with lei
+
 Please report bugs via plain-text mail to: meta@public-inbox.org
 
 See archives at https://public-inbox.org/meta/ for all history.
diff --git a/Documentation/public-inbox-tuning.pod b/Documentation/public-inbox-tuning.pod
index e9702416..b3a2b411 100644
--- a/Documentation/public-inbox-tuning.pod
+++ b/Documentation/public-inbox-tuning.pod
@@ -55,6 +55,15 @@ public-inbox processes.
 More (optional) L<Inline::C> use will be introduced in the future
 to lower memory use and improve scalability.
 
+=head2 libgit2 usage via Inline::C
+
+If libgit2 development files are installed and L<Inline::C>
+is enabled (described above), per-inbox C<git cat-file --batch>
+processes are replaced with a single L<perl(1)> process running
+C<PublicInbox::Gcf2::loop> in read-only daemons.
+
+Available as of public-inbox 1.7.0.
+
 =head2 Performance on rotational hard disk drives
 
 Random I/O performance is poor on rotational HDDs.  Xapian indexing
diff --git a/TODO b/TODO
index 53907efd..4993b02c 100644
--- a/TODO
+++ b/TODO
@@ -86,7 +86,10 @@ all need to be considered for everything we introduce)
 
 * more and better test cases (use git fast-import to speed up creation)
 
-* large mbox/Maildir/MH/NNTP spool import (see PublicInbox::Import)
+* large mbox/Maildir/MH/NNTP spool import (in lei, but not
+  for public-facing inboxes)
+
+* MH import support (read-only, at least)
 
 * Read-only WebDAV interface to the git repo so it can be mounted
   via davfs2 or fusedav to avoid full clones.
@@ -133,18 +136,9 @@ all need to be considered for everything we introduce)
 
   - inotify-based manifest.js.gz updates
 
-  - process/FD reduction (needs to be slow-storage friendly)
-
   ...
 
-* command-line tool (similar to mairix/notmuch, but solver+git-aware)
-
-* consider removing doc_data from Xapian, redundant with over.sqlite3
-  It's no longer read as of public-inbox 1.6.0, but still written for
-  compatibility.
-
-* share "git cat-file --batch" processes across inboxes to avoid
-  bumping into /proc/sys/fs/pipe-user-pages-* limits
+* lei - see %CMD in lib/PublicInbox/LEI.pm
 
 * make "git cat-file --batch" detect unlinked packfiles so we don't
   have to restart processes (very long-term)

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/7] imapclient: disable workaround for Mail::IMAPClient 3.43+
  2021-03-11 10:45 [PATCH 0/7] doc updates, fixups, and more Eric Wong
                   ` (3 preceding siblings ...)
  2021-03-11 10:45 ` [PATCH 4/7] doc: update 1.7 release notes, tuning, TODO Eric Wong
@ 2021-03-11 10:45 ` Eric Wong
  2021-03-11 20:34   ` [SQUASH] " Eric Wong
  2021-03-11 10:45 ` [PATCH 6/7] config: use '-f' key to store config file pathname Eric Wong
  2021-03-11 10:45 ` [PATCH 7/7] v2writable: fix undocumented --xapian-only Eric Wong
  6 siblings, 1 reply; 9+ messages in thread
From: Eric Wong @ 2021-03-11 10:45 UTC (permalink / raw)
  To: meta

These fixes are in the recently-released Mail::IMAPClient 3.43:

https://metacpan.org/source/PLOBBES/Mail-IMAPClient-3.43/Changes
---
 lib/PublicInbox/IMAPClient.pm | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/IMAPClient.pm b/lib/PublicInbox/IMAPClient.pm
index 33deee9e..945b10fa 100644
--- a/lib/PublicInbox/IMAPClient.pm
+++ b/lib/PublicInbox/IMAPClient.pm
@@ -4,17 +4,18 @@
 #
 # The license for this file differs from the rest of public-inbox.
 #
-# Workaround some bugs in upstream Mail::IMAPClient when
+# Workaround some bugs in upstream Mail::IMAPClient <= 3.42 when
 # compression is enabled:
 # - reference cycle: https://rt.cpan.org/Ticket/Display.html?id=132654
 # - read starvation: https://rt.cpan.org/Ticket/Display.html?id=132720
 package PublicInbox::IMAPClient;
 use strict;
 use parent 'Mail::IMAPClient';
-use Errno qw(EAGAIN);
+unless (eval('use Mail::IMAPClient 3.43')) {
+require Errno;
 
 # RFC4978 COMPRESS
-sub compress {
+*compress = sub {
     my ($self) = @_;
 
     # BUG? strict check on capability commented out for now...
@@ -101,7 +102,7 @@ sub compress {
             # I/O readiness notifications (select, poll).  Refactoring
             # callers will be needed in the unlikely case somebody wants
             # to use edge-triggered notifications (EV_CLEAR, EPOLLET).
-            $! = EAGAIN;
+            $! = Errno::EAGAIN();
             return undef;
         }
 
@@ -114,6 +115,7 @@ sub compress {
     };
 
     return $self;
-}
+};
+} # $Mail::IMAPClient::VERSION < 3.43
 
 1;

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 6/7] config: use '-f' key to store config file pathname
  2021-03-11 10:45 [PATCH 0/7] doc updates, fixups, and more Eric Wong
                   ` (4 preceding siblings ...)
  2021-03-11 10:45 ` [PATCH 5/7] imapclient: disable workaround for Mail::IMAPClient 3.43+ Eric Wong
@ 2021-03-11 10:45 ` Eric Wong
  2021-03-11 10:45 ` [PATCH 7/7] v2writable: fix undocumented --xapian-only Eric Wong
  6 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2021-03-11 10:45 UTC (permalink / raw)
  To: meta

This fixes ->urlmatch use from lei, which already sets '-f'.
I noticed this because imap.$URL.compress was ignored in
my lei config file.
---
 lib/PublicInbox/Config.pm | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
index a4b1756d..87a03fd3 100644
--- a/lib/PublicInbox/Config.pm
+++ b/lib/PublicInbox/Config.pm
@@ -26,6 +26,7 @@ sub new {
 		$self = config_fh_parse($fh, "\n", '=');
 	} else {
 		$self = git_config_dump($file);
+		$self->{'-f'} = $file;
 	}
 	bless $self, $class;
 	# caches
@@ -505,7 +506,7 @@ sub urlmatch {
 	my ($self, $key, $url) = @_;
 	state $urlmatch_broken; # requires git 1.8.5
 	return if $urlmatch_broken;
-	my $file = default_file();
+	my $file = $self->{'-f'} // default_file();
 	my $cmd = [qw/git config -z --includes --get-urlmatch/,
 		"--file=$file", $key, $url ];
 	my $fh = popen_rd($cmd);

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 7/7] v2writable: fix undocumented --xapian-only
  2021-03-11 10:45 [PATCH 0/7] doc updates, fixups, and more Eric Wong
                   ` (5 preceding siblings ...)
  2021-03-11 10:45 ` [PATCH 6/7] config: use '-f' key to store config file pathname Eric Wong
@ 2021-03-11 10:45 ` Eric Wong
  6 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2021-03-11 10:45 UTC (permalink / raw)
  To: meta

We can't pass $self and GLOBs across IPC channels transparently.
I only noticed this because I'm testing the application/octet-stream
fallback with https://public-inbox.org/meta/20210311014539.19756-1-e@80x24.org/

Fixes: bf8df8160076d7a1 ("searchidxshard: use PublicInbox::IPC to kill lots of code")
---
 lib/PublicInbox/V2Writable.pm | 2 +-
 t/v2reindex.t                 | 5 +++++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index cbd4f003..03590850 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -1216,7 +1216,7 @@ sub sync_ranges ($$) {
 
 sub index_xap_only { # git->cat_async callback
 	my ($bref, $oid, $type, $size, $smsg) = @_;
-	my $self = $smsg->{self};
+	my $self = delete $smsg->{self};
 	my $idx = idx_shard($self, $smsg->{num});
 	$idx->index_eml(PublicInbox::Eml->new($bref), $smsg);
 	$self->{transact_bytes} += $smsg->{bytes};
diff --git a/t/v2reindex.t b/t/v2reindex.t
index 05ea952f..56540c8b 100644
--- a/t/v2reindex.t
+++ b/t/v2reindex.t
@@ -543,4 +543,9 @@ EOF
 $check_rethread->('3-headed-monster once');
 $check_rethread->('3-headed-monster twice');
 
+my $rdr = { 2 => \(my $err = '') };
+ok(run_script([qw(-index --reindex --xapian-only), $inboxdir], undef, $rdr),
+	'--xapian-only works');
+is($err, '', 'no errors from --xapian-only');
+
 done_testing();

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [SQUASH] imapclient: disable workaround for Mail::IMAPClient 3.43+
  2021-03-11 10:45 ` [PATCH 5/7] imapclient: disable workaround for Mail::IMAPClient 3.43+ Eric Wong
@ 2021-03-11 20:34   ` Eric Wong
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2021-03-11 20:34 UTC (permalink / raw)
  To: meta

Will squash this in to disable the 'once' warning

diff --git a/lib/PublicInbox/IMAPClient.pm b/lib/PublicInbox/IMAPClient.pm
index 945b10fa..56001517 100644
--- a/lib/PublicInbox/IMAPClient.pm
+++ b/lib/PublicInbox/IMAPClient.pm
@@ -13,6 +13,7 @@ use strict;
 use parent 'Mail::IMAPClient';
 unless (eval('use Mail::IMAPClient 3.43')) {
 require Errno;
+no warnings 'once';
 
 # RFC4978 COMPRESS
 *compress = sub {


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-03-11 20:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-03-11 10:45 [PATCH 0/7] doc updates, fixups, and more Eric Wong
2021-03-11 10:45 ` [PATCH 1/7] doc: glossary: add information for dates and timestamps Eric Wong
2021-03-11 10:45 ` [PATCH 2/7] searchidx: remove smsg_from_doc Eric Wong
2021-03-11 10:45 ` [PATCH 3/7] lei_curl: note proposed master/client mode for curl Eric Wong
2021-03-11 10:45 ` [PATCH 4/7] doc: update 1.7 release notes, tuning, TODO Eric Wong
2021-03-11 10:45 ` [PATCH 5/7] imapclient: disable workaround for Mail::IMAPClient 3.43+ Eric Wong
2021-03-11 20:34   ` [SQUASH] " Eric Wong
2021-03-11 10:45 ` [PATCH 6/7] config: use '-f' key to store config file pathname Eric Wong
2021-03-11 10:45 ` [PATCH 7/7] v2writable: fix undocumented --xapian-only Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).