* [PATCH 1/7] doc: glossary: add information for dates and timestamps
2021-03-11 10:45 [PATCH 0/7] doc updates, fixups, and more Eric Wong
@ 2021-03-11 10:45 ` Eric Wong
2021-03-11 10:45 ` [PATCH 2/7] searchidx: remove smsg_from_doc Eric Wong
` (5 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2021-03-11 10:45 UTC (permalink / raw)
To: meta
These have been confusing to me in the past, too.
---
Documentation/public-inbox-glossary.pod | 14 ++++++++++++++
lib/PublicInbox/Search.pm | 4 ++--
2 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/Documentation/public-inbox-glossary.pod b/Documentation/public-inbox-glossary.pod
index e188e563..61e1e9f8 100644
--- a/Documentation/public-inbox-glossary.pod
+++ b/Documentation/public-inbox-glossary.pod
@@ -83,6 +83,20 @@ the same email into one or more virtual folders for
ease-of-filtering. This is NOT tied to public-inbox names, as
messages stored by lei may not be public.
+=item IMAP INTERNALDATE, JMAP receivedAt, rt: search prefix
+
+The first valid timestamp value of Received: headers (top first).
+If no Received: header exists, the Date: header is used, and the
+current time if neither header(s) exist. When mirroring via
+git, this is the git commit time.
+
+=item IMAP SENT*, JMAP sentAt, dt: and d: search prefixes
+
+The first valid timestamp value of the Date: header(s).
+If no Date: header exists, the time from the Received: header is
+used, and then the current time if neither header exists.
+When mirroring via git, this is the git author time.
+
=head1 COPYRIGHT
Copyright 2021 all contributors L<mailto:meta@public-inbox.org>
diff --git a/lib/PublicInbox/Search.pm b/lib/PublicInbox/Search.pm
index 209969c5..c7d52daf 100644
--- a/lib/PublicInbox/Search.pm
+++ b/lib/PublicInbox/Search.pm
@@ -13,9 +13,9 @@ use POSIX qw(strftime);
# values for searching, changing the numeric value breaks
# compatibility with old indices (so don't change them it)
use constant {
- TS => 0, # Received: header in Unix time (IMAP INTERNALDATE)
+ TS => 0, # Received: in Unix time (IMAP INTERNALDATE, JMAP receivedAt)
YYYYMMDD => 1, # Date: header for searching in the WWW UI
- DT => 2, # Date: YYYYMMDDHHMMSS
+ DT => 2, # Date: YYYYMMDDHHMMSS (IMAP SENT*, JMAP sentAt)
# added for public-inbox 1.6.0+
BYTES => 3, # IMAP RFC822.SIZE
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/7] searchidx: remove smsg_from_doc
2021-03-11 10:45 [PATCH 0/7] doc updates, fixups, and more Eric Wong
2021-03-11 10:45 ` [PATCH 1/7] doc: glossary: add information for dates and timestamps Eric Wong
@ 2021-03-11 10:45 ` Eric Wong
2021-03-11 10:45 ` [PATCH 3/7] lei_curl: note proposed master/client mode for curl Eric Wong
` (4 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2021-03-11 10:45 UTC (permalink / raw)
To: meta
We no longer read Xapian docdata and favor hitting over.sqlite3,
instead, as Xapian is less likely to be available than SQLite.
---
lib/PublicInbox/SearchIdx.pm | 12 ------------
1 file changed, 12 deletions(-)
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 826302de..3372bea5 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -542,18 +542,6 @@ sub remove_keywords {
$self->{xdb}->replace_document($docid, $doc) if $replace;
}
-sub smsg_from_doc ($) {
- my ($doc) = @_;
- my $data = $doc->get_data or return;
- my $smsg = bless {}, 'PublicInbox::Smsg';
- $smsg->{ts} = int_val($doc, PublicInbox::Search::TS());
- my $dt = int_val($doc, PublicInbox::Search::DT());
- my ($yyyy, $mon, $dd, $hh, $mm, $ss) = unpack('A4A2A2A2A2A2', $dt);
- $smsg->{ds} = timegm($ss, $mm, $hh, $dd, $mon - 1, $yyyy);
- $smsg->load_from_data($data);
- $smsg;
-}
-
sub xdb_remove {
my ($self, @docids) = @_;
$self->begin_txn_lazy;
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/7] lei_curl: note proposed master/client mode for curl
2021-03-11 10:45 [PATCH 0/7] doc updates, fixups, and more Eric Wong
2021-03-11 10:45 ` [PATCH 1/7] doc: glossary: add information for dates and timestamps Eric Wong
2021-03-11 10:45 ` [PATCH 2/7] searchidx: remove smsg_from_doc Eric Wong
@ 2021-03-11 10:45 ` Eric Wong
2021-03-11 10:45 ` [PATCH 4/7] doc: update 1.7 release notes, tuning, TODO Eric Wong
` (3 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2021-03-11 10:45 UTC (permalink / raw)
To: meta
Who knows, maybe stuff learned during lei development
can be used to implement it in curl:
https://curl.se/mail/archive-2021-02/0031.html
---
lib/PublicInbox/LeiCurl.pm | 2 ++
1 file changed, 2 insertions(+)
diff --git a/lib/PublicInbox/LeiCurl.pm b/lib/PublicInbox/LeiCurl.pm
index 3a79fbf8..69c64cdf 100644
--- a/lib/PublicInbox/LeiCurl.pm
+++ b/lib/PublicInbox/LeiCurl.pm
@@ -4,6 +4,8 @@
# common option and torsocks(1) wrapping for curl(1)
# Eventually, we may support using libcurl via Inline::C and/or
# WWW::Curl; but curl(1) is most prevalent and widely-installed.
+# n.b. curl may support a daemon/client model like lei someday:
+# https://github.com/curl/curl/wiki/curl-tool-master-client
package PublicInbox::LeiCurl;
use strict;
use v5.10.1;
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 4/7] doc: update 1.7 release notes, tuning, TODO
2021-03-11 10:45 [PATCH 0/7] doc updates, fixups, and more Eric Wong
` (2 preceding siblings ...)
2021-03-11 10:45 ` [PATCH 3/7] lei_curl: note proposed master/client mode for curl Eric Wong
@ 2021-03-11 10:45 ` Eric Wong
2021-03-11 10:45 ` [PATCH 5/7] imapclient: disable workaround for Mail::IMAPClient 3.43+ Eric Wong
` (2 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2021-03-11 10:45 UTC (permalink / raw)
To: meta
Some stuff done, some stuff still needs doing.
---
Documentation/RelNotes/v1.7.0.wip | 59 ++++++++++++++++++++++++++-
Documentation/public-inbox-tuning.pod | 9 ++++
TODO | 16 +++-----
3 files changed, 72 insertions(+), 12 deletions(-)
diff --git a/Documentation/RelNotes/v1.7.0.wip b/Documentation/RelNotes/v1.7.0.wip
index a35ff227..f71f447f 100644
--- a/Documentation/RelNotes/v1.7.0.wip
+++ b/Documentation/RelNotes/v1.7.0.wip
@@ -4,12 +4,69 @@ MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
-TODO: gcf2, detached indices, JMAP, ...
+Another big release focused on multi-inbox search and scalability.
+
+* general changes
+
+ config file parsing is 2x faster with 50K inboxes
+
+* read-only public-inbox-daemon (-httpd, -nntpd, -imapd):
+
+ libgit2 may be used via Inline::C to avoid hitting system pipe
+ and process limits. See public-inbox-tuning(7) manpage
+ for more details.
+
+* public-inbox-extindex
+
+ A new Xapian + SQLite index able to search across several inboxes.
+ This may be configured to replace per-inbox Xapian DBs,
+ (but not per-inbox SQLite indices) and speed up manifest.js.gz
+ generation.
+
+ See public-inbox-extindex-format(5) and
+ public-inbox-extindex(1) manpages for more details.
+
+* public-inbox-nntpd
+
+ - startup is 6x faster with 50K inboxes if using -extindex
+
+* PublicInbox::WWW
+
+ - mboxrd search results are returned in reverse Xapian docid order,
+ so more recent results are more likely to show up first
+
+ - d: and dt: search prefixes allow "approxidate" formats supported
+ by "git log --since="
+
+ - manifest.js.gz generation is ~25x faster with -extindex
+
+* lei - local email interface
+
+ An experimental, subject-to-change, likely-to-eat-your-mail tool for
+ personal mail as well as interacting with public-inboxes on the local
+ filesystem or over HTTP(S). See lei(1), lei-overview(7), and other
+ lei-* manpages for details.
+
+* public-inbox-watch
+
+ - IMAP and NNTP code shared with lei, fixing an off-by-one error
+ in IMAP synchronization for single-message IMAP folders.
+
+ - \Deleted and \Draft messages ignored for IMAP, as they are for
+ Maildir.
+
+ - IMAP and NNTP connection establishment (including git-credential
+ prompts) ordering is now tied to config file order.
Compatibility:
* Rollbacks all the way to public-inbox 1.2.0 remain supported
+Internal changes
+
+* public-inbox-index switched to new internal IPC code shared
+ with lei
+
Please report bugs via plain-text mail to: meta@public-inbox.org
See archives at https://public-inbox.org/meta/ for all history.
diff --git a/Documentation/public-inbox-tuning.pod b/Documentation/public-inbox-tuning.pod
index e9702416..b3a2b411 100644
--- a/Documentation/public-inbox-tuning.pod
+++ b/Documentation/public-inbox-tuning.pod
@@ -55,6 +55,15 @@ public-inbox processes.
More (optional) L<Inline::C> use will be introduced in the future
to lower memory use and improve scalability.
+=head2 libgit2 usage via Inline::C
+
+If libgit2 development files are installed and L<Inline::C>
+is enabled (described above), per-inbox C<git cat-file --batch>
+processes are replaced with a single L<perl(1)> process running
+C<PublicInbox::Gcf2::loop> in read-only daemons.
+
+Available as of public-inbox 1.7.0.
+
=head2 Performance on rotational hard disk drives
Random I/O performance is poor on rotational HDDs. Xapian indexing
diff --git a/TODO b/TODO
index 53907efd..4993b02c 100644
--- a/TODO
+++ b/TODO
@@ -86,7 +86,10 @@ all need to be considered for everything we introduce)
* more and better test cases (use git fast-import to speed up creation)
-* large mbox/Maildir/MH/NNTP spool import (see PublicInbox::Import)
+* large mbox/Maildir/MH/NNTP spool import (in lei, but not
+ for public-facing inboxes)
+
+* MH import support (read-only, at least)
* Read-only WebDAV interface to the git repo so it can be mounted
via davfs2 or fusedav to avoid full clones.
@@ -133,18 +136,9 @@ all need to be considered for everything we introduce)
- inotify-based manifest.js.gz updates
- - process/FD reduction (needs to be slow-storage friendly)
-
...
-* command-line tool (similar to mairix/notmuch, but solver+git-aware)
-
-* consider removing doc_data from Xapian, redundant with over.sqlite3
- It's no longer read as of public-inbox 1.6.0, but still written for
- compatibility.
-
-* share "git cat-file --batch" processes across inboxes to avoid
- bumping into /proc/sys/fs/pipe-user-pages-* limits
+* lei - see %CMD in lib/PublicInbox/LEI.pm
* make "git cat-file --batch" detect unlinked packfiles so we don't
have to restart processes (very long-term)
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 5/7] imapclient: disable workaround for Mail::IMAPClient 3.43+
2021-03-11 10:45 [PATCH 0/7] doc updates, fixups, and more Eric Wong
` (3 preceding siblings ...)
2021-03-11 10:45 ` [PATCH 4/7] doc: update 1.7 release notes, tuning, TODO Eric Wong
@ 2021-03-11 10:45 ` Eric Wong
2021-03-11 20:34 ` [SQUASH] " Eric Wong
2021-03-11 10:45 ` [PATCH 6/7] config: use '-f' key to store config file pathname Eric Wong
2021-03-11 10:45 ` [PATCH 7/7] v2writable: fix undocumented --xapian-only Eric Wong
6 siblings, 1 reply; 9+ messages in thread
From: Eric Wong @ 2021-03-11 10:45 UTC (permalink / raw)
To: meta
These fixes are in the recently-released Mail::IMAPClient 3.43:
https://metacpan.org/source/PLOBBES/Mail-IMAPClient-3.43/Changes
---
lib/PublicInbox/IMAPClient.pm | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/lib/PublicInbox/IMAPClient.pm b/lib/PublicInbox/IMAPClient.pm
index 33deee9e..945b10fa 100644
--- a/lib/PublicInbox/IMAPClient.pm
+++ b/lib/PublicInbox/IMAPClient.pm
@@ -4,17 +4,18 @@
#
# The license for this file differs from the rest of public-inbox.
#
-# Workaround some bugs in upstream Mail::IMAPClient when
+# Workaround some bugs in upstream Mail::IMAPClient <= 3.42 when
# compression is enabled:
# - reference cycle: https://rt.cpan.org/Ticket/Display.html?id=132654
# - read starvation: https://rt.cpan.org/Ticket/Display.html?id=132720
package PublicInbox::IMAPClient;
use strict;
use parent 'Mail::IMAPClient';
-use Errno qw(EAGAIN);
+unless (eval('use Mail::IMAPClient 3.43')) {
+require Errno;
# RFC4978 COMPRESS
-sub compress {
+*compress = sub {
my ($self) = @_;
# BUG? strict check on capability commented out for now...
@@ -101,7 +102,7 @@ sub compress {
# I/O readiness notifications (select, poll). Refactoring
# callers will be needed in the unlikely case somebody wants
# to use edge-triggered notifications (EV_CLEAR, EPOLLET).
- $! = EAGAIN;
+ $! = Errno::EAGAIN();
return undef;
}
@@ -114,6 +115,7 @@ sub compress {
};
return $self;
-}
+};
+} # $Mail::IMAPClient::VERSION < 3.43
1;
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 6/7] config: use '-f' key to store config file pathname
2021-03-11 10:45 [PATCH 0/7] doc updates, fixups, and more Eric Wong
` (4 preceding siblings ...)
2021-03-11 10:45 ` [PATCH 5/7] imapclient: disable workaround for Mail::IMAPClient 3.43+ Eric Wong
@ 2021-03-11 10:45 ` Eric Wong
2021-03-11 10:45 ` [PATCH 7/7] v2writable: fix undocumented --xapian-only Eric Wong
6 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2021-03-11 10:45 UTC (permalink / raw)
To: meta
This fixes ->urlmatch use from lei, which already sets '-f'.
I noticed this because imap.$URL.compress was ignored in
my lei config file.
---
lib/PublicInbox/Config.pm | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
index a4b1756d..87a03fd3 100644
--- a/lib/PublicInbox/Config.pm
+++ b/lib/PublicInbox/Config.pm
@@ -26,6 +26,7 @@ sub new {
$self = config_fh_parse($fh, "\n", '=');
} else {
$self = git_config_dump($file);
+ $self->{'-f'} = $file;
}
bless $self, $class;
# caches
@@ -505,7 +506,7 @@ sub urlmatch {
my ($self, $key, $url) = @_;
state $urlmatch_broken; # requires git 1.8.5
return if $urlmatch_broken;
- my $file = default_file();
+ my $file = $self->{'-f'} // default_file();
my $cmd = [qw/git config -z --includes --get-urlmatch/,
"--file=$file", $key, $url ];
my $fh = popen_rd($cmd);
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 7/7] v2writable: fix undocumented --xapian-only
2021-03-11 10:45 [PATCH 0/7] doc updates, fixups, and more Eric Wong
` (5 preceding siblings ...)
2021-03-11 10:45 ` [PATCH 6/7] config: use '-f' key to store config file pathname Eric Wong
@ 2021-03-11 10:45 ` Eric Wong
6 siblings, 0 replies; 9+ messages in thread
From: Eric Wong @ 2021-03-11 10:45 UTC (permalink / raw)
To: meta
We can't pass $self and GLOBs across IPC channels transparently.
I only noticed this because I'm testing the application/octet-stream
fallback with https://public-inbox.org/meta/20210311014539.19756-1-e@80x24.org/
Fixes: bf8df8160076d7a1 ("searchidxshard: use PublicInbox::IPC to kill lots of code")
---
lib/PublicInbox/V2Writable.pm | 2 +-
t/v2reindex.t | 5 +++++
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index cbd4f003..03590850 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -1216,7 +1216,7 @@ sub sync_ranges ($$) {
sub index_xap_only { # git->cat_async callback
my ($bref, $oid, $type, $size, $smsg) = @_;
- my $self = $smsg->{self};
+ my $self = delete $smsg->{self};
my $idx = idx_shard($self, $smsg->{num});
$idx->index_eml(PublicInbox::Eml->new($bref), $smsg);
$self->{transact_bytes} += $smsg->{bytes};
diff --git a/t/v2reindex.t b/t/v2reindex.t
index 05ea952f..56540c8b 100644
--- a/t/v2reindex.t
+++ b/t/v2reindex.t
@@ -543,4 +543,9 @@ EOF
$check_rethread->('3-headed-monster once');
$check_rethread->('3-headed-monster twice');
+my $rdr = { 2 => \(my $err = '') };
+ok(run_script([qw(-index --reindex --xapian-only), $inboxdir], undef, $rdr),
+ '--xapian-only works');
+is($err, '', 'no errors from --xapian-only');
+
done_testing();
^ permalink raw reply related [flat|nested] 9+ messages in thread