* Reimagining notmuch-git/nmbug
@ 2023-03-29 8:41 Felipe Contreras
2023-03-29 9:50 ` Michael J Gruber
2023-04-03 9:49 ` David Bremner
0 siblings, 2 replies; 17+ messages in thread
From: Felipe Contreras @ 2023-03-29 8:41 UTC (permalink / raw)
To: notmuch@notmuchmail.org
Hi,
I noticed you promoted notmuch-git as a user tool to toy around with it.
Very quickly I realized that most of what it does is something I've
been working on for at least 10 years: making git work with other
tools.
I presume you haven't heard of git remote-helpers [1], because they do
precisely what notmuch-git is trying to do.
As a proof of concept I created a remote helper for notmuch [2]. If
you have this script (`git-remote-nm`) anywhere in your path, git will
interpret URLs prefixed with "nm::" as notmuch transports, and you can
do:
git clone nm::$HOME/mail
The contents of this repository are generated by `git-remote-nm`,
which I chose to write in Ruby, but you can use any language you want.
All it needs to do is interpret simple commands, and generate output
understandable by `git fast-import` [3].
For example, this command actually creates a repository:
git fast-import <<EOF
blob
mark :1
data 13
inbox
unread
commit refs/heads/master
mark :2
committer Author <author@example.com> 1680077472 +0000
data 0
M 100644 :1 878we4qdqf.fsf@yoom.home.cworth.org/tags
EOF
You can interact with this repository as you would with any other
repository, because it is a git repository.
The only difference is at the time of pull/push from this nm remote,
at which time `git-remote-hg` is invoked again.
When you do `git pull` the local tags will be updated with the tags of
the notmuch database.
And when you do `git push` the tags of the notmuch database are
updated with the local tags.
The code that does this is extremely simple, only 180 lines of code.
I wrote some tests using the notmuch default corpus and the last epoch
of the git.git public-inbox and everything works fine. The initial
clone of 28082 messages takes 1.5s and weighs 5.9M on my machine.
Of course, it's only a proof of concept and has very basic features,
but I'm certain the most important features of `notmuch-git` can be
easily implemented.
I see most of the complexity of `notmuch-git` is dealing with caches
and git indexes, but that's a task better left for the tools that were
meant to deal with that: `git fast-import`.
Thoughts?
[1] https://git-scm.com/docs/gitremote-helpers
[2] https://github.com/felipec/git-notmuch
[3] https://git-scm.com/docs/git-fast-import
--
Felipe Contreras
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Reimagining notmuch-git/nmbug
2023-03-29 8:41 Reimagining notmuch-git/nmbug Felipe Contreras
@ 2023-03-29 9:50 ` Michael J Gruber
2023-03-29 12:17 ` Felipe Contreras
2023-04-03 9:49 ` David Bremner
1 sibling, 1 reply; 17+ messages in thread
From: Michael J Gruber @ 2023-03-29 9:50 UTC (permalink / raw)
To: Felipe Contreras; +Cc: notmuch@notmuchmail.org
Am Mi., 29. März 2023 um 10:41 Uhr schrieb Felipe Contreras
<felipe.contreras@gmail.com>:
>
> Hi,
>
> I noticed you promoted notmuch-git as a user tool to toy around with it.
>
> Very quickly I realized that most of what it does is something I've
> been working on for at least 10 years: making git work with other
> tools.
>
> I presume you haven't heard of git remote-helpers [1], because they do
> precisely what notmuch-git is trying to do.
>
Hi Felipe
that's an interesting idea for sure. When I came across `notmuch-git`
first I wondered whether it rather should be`git-notmuch`, i.e. a
subcommand to `git`. I admit that - given its preexistence as nmbug -
I was never quite sure what to use it for. Maybe sync tags for mail
stores whose content you sync otherwise? `public-inbox` came to my
mind in this context, too. (I wondered about an nm backend for that,
i.e. a public-inbox backed mailstore for notmuch, without multiple
checkouts.)
So, if we consider the notmuch database (more precisely: the dump
output) as a "remote", then what is the history? I understand that we
can transfer and transform its content in the form of blobs as
specific paths encoding mid etc. Is the history stored by current
`notmuch-git` something secondary (say, like the history of notes refs
in git) which can be discarded?
Note that I haven't looked at your code thoroughly yet (I'm not a
rubyist), and I'm all for using git tools to do gittish things and
more; I'm just wondering whether fast-import/export cover what current
`notmuch-git` intends to do. They are probably the best tool for
"cloning" an existing nm-db into a git repo of mid-tag associations.
And if all you want is a gittish transport for nm tags then that's
probably perfect!
`notmuch-git` seems to be about handling both updates (commit etc) and
queries (log etc), too, as a wrapper to git commands. Those may be
candidates for other git tools, such as aliases, diff helpers,
textconv and such.
In summary, I think a notmuch-git repo is more than a conversion of
notmuch-dump output (it adds history and commit messages; we have a
"one-sided inverse" only), and the notmuch-git command is more than a
converter between the respective data stores. It smells more like
`git-lfs` or other filter-based approaches, storing the real objects
outside of the git repo. But I feel I know too little about
`notmuch-git`'s purpose so far.
Cheers
Michael\r
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Reimagining notmuch-git/nmbug
2023-03-29 9:50 ` Michael J Gruber
@ 2023-03-29 12:17 ` Felipe Contreras
0 siblings, 0 replies; 17+ messages in thread
From: Felipe Contreras @ 2023-03-29 12:17 UTC (permalink / raw)
To: Michael J Gruber; +Cc: notmuch@notmuchmail.org
On Wed, Mar 29, 2023 at 3:50 AM Michael J Gruber
<michaeljgruber+grubix+git@gmail.com> wrote:
>
> Am Mi., 29. März 2023 um 10:41 Uhr schrieb Felipe Contreras
> <felipe.contreras@gmail.com>:
> >
> > Hi,
> >
> > I noticed you promoted notmuch-git as a user tool to toy around with it.
> >
> > Very quickly I realized that most of what it does is something I've
> > been working on for at least 10 years: making git work with other
> > tools.
> >
> > I presume you haven't heard of git remote-helpers [1], because they do
> > precisely what notmuch-git is trying to do.
> >
>
> Hi Felipe
>
> that's an interesting idea for sure. When I came across `notmuch-git`
> first I wondered whether it rather should be`git-notmuch`, i.e. a
> subcommand to `git`. I admit that - given its preexistence as nmbug -
> I was never quite sure what to use it for. Maybe sync tags for mail
> stores whose content you sync otherwise? `public-inbox` came to my
> mind in this context, too. (I wondered about an nm backend for that,
> i.e. a public-inbox backed mailstore for notmuch, without multiple
> checkouts.)
Yes, I also thought of a public-inbox backend for notmuch, but for
that some notion of virtual files should probably be introduced, and I
think at the moment the current code of notmuch relies on real files.
> So, if we consider the notmuch database (more precisely: the dump
> output) as a "remote", then what is the history? I understand that we
> can transfer and transform its content in the form of blobs as
> specific paths encoding mid etc. Is the history stored by current
> `notmuch-git` something secondary (say, like the history of notes refs
> in git) which can be discarded?
The history is arbitrarily created.
Say you have two `git-remote-nm` repositories keeping track of the
same notmuch database. Except one does a daily `git fetch`, and the
other does it once a month. The former is going to have many more
commits, and thus a more granular history.
Think of it as a `git fetch` just being a simpler version of some
custom `notmuch dump | convert-script | git commit`.
> Note that I haven't looked at your code thoroughly yet (I'm not a
> rubyist),
You don't need to be a rubyist, just copy the script anywhere in your
path, and clone your mail database. As long as you never do `git
push`, the operations are going to be read-only, but if you want to be
extra safe, remove " mode: Notmuch::MODE_READ_WRITE" from the code,
and/or copy the mail database somewhere temporary.
Do `git fetch` regularly, and you'll see how a history of
"origin/master" is being created.
> and I'm all for using git tools to do gittish things and
> more; I'm just wondering whether fast-import/export cover what current
> `notmuch-git` intends to do. They are probably the best tool for
> "cloning" an existing nm-db into a git repo of mid-tag associations.
> And if all you want is a gittish transport for nm tags then that's
> probably perfect!
>
> `notmuch-git` seems to be about handling both updates (commit etc)
You can do the same with `git-notmuch`: just do `git commit`.
I do that in the tests to add a tag [1].
> and queries (log etc),
Ditto: just do `git log`.
If you look at the code of `notmuch-git`, it's just a wrapper for `git
log --name-status --no-renames`.
> In summary, I think a notmuch-git repo is more than a conversion of
> notmuch-dump output (it adds history and commit messages; we have a
> "one-sided inverse" only), and the notmuch-git command is more than a
> converter between the respective data stores.
So is `git-notmuch`: every time you do `git fetch` a commit is created.
The history is all there.
Cheers.
[1] https://github.com/felipec/git-notmuch/blob/cdb2954abf3eb9f2f04f71fd2385a34653f758f5/t/basic.t#L87
--
Felipe Contreras\r
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Reimagining notmuch-git/nmbug
2023-03-29 8:41 Reimagining notmuch-git/nmbug Felipe Contreras
2023-03-29 9:50 ` Michael J Gruber
@ 2023-04-03 9:49 ` David Bremner
2023-04-03 10:46 ` David Bremner
` (2 more replies)
1 sibling, 3 replies; 17+ messages in thread
From: David Bremner @ 2023-04-03 9:49 UTC (permalink / raw)
To: Felipe Contreras, notmuch@notmuchmail.org
Felipe Contreras <felipe.contreras@gmail.com> writes:
> Hi,
>
> I noticed you promoted notmuch-git as a user tool to toy around with it.
>
> Very quickly I realized that most of what it does is something I've
> been working on for at least 10 years: making git work with other
> tools.
>
> I presume you haven't heard of git remote-helpers [1], because they do
> precisely what notmuch-git is trying to do.
>
> As a proof of concept I created a remote helper for notmuch [2]. If
> you have this script (`git-remote-nm`) anywhere in your path, git will
> interpret URLs prefixed with "nm::" as notmuch transports, and you can
> do:
>
> git clone nm::$HOME/mail
I'm intrigued (and indeed I hadn't really thought about the degree to
which we were re-inventing git-fast-import and friends); however so far
my experiments did not get far enough to say anything conclusive.
I tried your script with the bindings from master (554690) but it does
not seem to like my split configuration, where the database lives in
~/.local/share/share/notmuch/default/xapian.
$ git clone nm::/home/bremner/Maildir
Cloning into 'Maildir'...
/home/bremner/.config/scripts/git-remote-nm:164:in `initialize': failed to read/write file (Notmuch::FileError)
from /home/bremner/.config/scripts/git-remote-nm:164:in `new'
from /home/bremner/.config/scripts/git-remote-nm:164:in `<main>'
If I make a fake .notmuch directory, then it seems to work. I'm not
sure if this is an issue with the bindings or with the script.
Conceptually there is also the question of how to handle split
configurations as a URL.
Performance-wise the initial clone seems pretty slow. For my 600k
messages I have been waiting a while now. htop tells me that
git-fast-import has about 45 minutes of CPU time at this point. This
machine is not that fast, but for comparison an initial (i.e. fresh
repo, no caching) "notmuch git commit" takes about 15-20s.
If you need a larger corpus of messages to play with, the notmuch
performance suite includes about 400k messages, and running T00-new.sh
will build a notmuch database that you can clone.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Reimagining notmuch-git/nmbug
2023-04-03 9:49 ` David Bremner
@ 2023-04-03 10:46 ` David Bremner
2023-04-04 0:47 ` Felipe Contreras
2023-04-03 11:48 ` Felipe Contreras
2023-04-03 16:01 ` Felipe Contreras
2 siblings, 1 reply; 17+ messages in thread
From: David Bremner @ 2023-04-03 10:46 UTC (permalink / raw)
To: Felipe Contreras, notmuch@notmuchmail.org
David Bremner <david@tethera.net> writes:
>
> I'm intrigued (and indeed I hadn't really thought about the degree to
> which we were re-inventing git-fast-import and friends); however so far
> my experiments did not get far enough to say anything conclusive.
>
I did manage to finish, about 70 minutes elapsed.
Although you'r probably right that a file of tags is the right
representation (it is what git-annex uses also), I think we'd need to
define a custom merge driver to take unions of lists in the same way
that git-annex does. Otherwise merging will be less automagic than it is
now.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Reimagining notmuch-git/nmbug
2023-04-03 9:49 ` David Bremner
2023-04-03 10:46 ` David Bremner
@ 2023-04-03 11:48 ` Felipe Contreras
2023-04-03 16:01 ` Felipe Contreras
2 siblings, 0 replies; 17+ messages in thread
From: Felipe Contreras @ 2023-04-03 11:48 UTC (permalink / raw)
To: David Bremner; +Cc: notmuch@notmuchmail.org
On Mon, Apr 3, 2023 at 4:49 AM David Bremner <david@tethera.net> wrote:
>
> Felipe Contreras <felipe.contreras@gmail.com> writes:
>
> > Hi,
> >
> > I noticed you promoted notmuch-git as a user tool to toy around with it.
> >
> > Very quickly I realized that most of what it does is something I've
> > been working on for at least 10 years: making git work with other
> > tools.
> >
> > I presume you haven't heard of git remote-helpers [1], because they do
> > precisely what notmuch-git is trying to do.
> >
> > As a proof of concept I created a remote helper for notmuch [2]. If
> > you have this script (`git-remote-nm`) anywhere in your path, git will
> > interpret URLs prefixed with "nm::" as notmuch transports, and you can
> > do:
> >
> > git clone nm::$HOME/mail
>
> I'm intrigued (and indeed I hadn't really thought about the degree to
> which we were re-inventing git-fast-import and friends); however so far
> my experiments did not get far enough to say anything conclusive.
>
> I tried your script with the bindings from master (554690) but it does
> not seem to like my split configuration, where the database lives in
> ~/.local/share/share/notmuch/default/xapian.
Just clone the xapian database instead of the Maildir:
% git clone nm::$HOME/.local/share/share/notmuch/default/
> Performance-wise the initial clone seems pretty slow. For my 600k
> messages I have been waiting a while now. htop tells me that
> git-fast-import has about 45 minutes of CPU time at this point. This
> machine is not that fast, but for comparison an initial (i.e. fresh
> repo, no caching) "notmuch git commit" takes about 15-20s.
That's weird. In my tests generating the fast-export output is almost
instantaneous, which means `git fast-import` is the one that is slow.
And it seems it starts to get slow after a certain point, so perhaps
it's not optimized to receive many files in one go.
> If you need a larger corpus of messages to play with, the notmuch
> performance suite includes about 400k messages, and running T00-new.sh
> will build a notmuch database that you can clone.
I tried that, the database has 194562 messages, and it takes 1:43
minutes to clone in my machine.
It's weird it takes so long in your machine.
Can you try to hardcode a search query to limit the number of messages?
Just put something in here:
$db.query('').search_messages.each
Cheers.
--
Felipe Contreras\r
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Reimagining notmuch-git/nmbug
2023-04-03 9:49 ` David Bremner
2023-04-03 10:46 ` David Bremner
2023-04-03 11:48 ` Felipe Contreras
@ 2023-04-03 16:01 ` Felipe Contreras
2023-04-03 18:42 ` David Bremner
2 siblings, 1 reply; 17+ messages in thread
From: Felipe Contreras @ 2023-04-03 16:01 UTC (permalink / raw)
To: David Bremner; +Cc: notmuch@notmuchmail.org
On Mon, Apr 3, 2023 at 4:49 AM David Bremner <david@tethera.net> wrote:
> Performance-wise the initial clone seems pretty slow. For my 600k
> messages I have been waiting a while now. htop tells me that
> git-fast-import has about 45 minutes of CPU time at this point. This
> machine is not that fast, but for comparison an initial (i.e. fresh
> repo, no caching) "notmuch git commit" takes about 15-20s.
I found the problem. If all the files are in the same directory, `git
fast-import` spends a lot of time comparing all the paths.
By distributing the files in multiple directories like notmuch-git
does using BLAKE2b, the operation is much faster.
I've pushed the changes, now there's a dependency, but you can just
`gem install blake2b`.
I'm able to clone the database of the performance corpus in 5 seconds:
% git clone --bare nm::$PWD/mail mail.git
Cheers.
--
Felipe Contreras\r
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Reimagining notmuch-git/nmbug
2023-04-03 16:01 ` Felipe Contreras
@ 2023-04-03 18:42 ` David Bremner
2023-04-03 19:40 ` David Bremner
0 siblings, 1 reply; 17+ messages in thread
From: David Bremner @ 2023-04-03 18:42 UTC (permalink / raw)
To: Felipe Contreras; +Cc: notmuch@notmuchmail.org
Felipe Contreras <felipe.contreras@gmail.com> writes:
> By distributing the files in multiple directories like notmuch-git
> does using BLAKE2b, the operation is much faster.
>
> I've pushed the changes, now there's a dependency, but you can just
> `gem install blake2b`.
>
> I'm able to clone the database of the performance corpus in 5 seconds:
>
> % git clone --bare nm::$PWD/mail mail.git
Indeed that speeds up the initial clone on this machine from 39 minutes
(I switched machines) to 30s. I will play with it a bit more, and report
back.
I had just finished a pretty graph showing nonlinear growth of the old
version, but I guess nobody cares now ;)
d
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Reimagining notmuch-git/nmbug
2023-04-03 18:42 ` David Bremner
@ 2023-04-03 19:40 ` David Bremner
2023-04-03 20:23 ` Felipe Contreras
0 siblings, 1 reply; 17+ messages in thread
From: David Bremner @ 2023-04-03 19:40 UTC (permalink / raw)
To: Felipe Contreras; +Cc: notmuch@notmuchmail.org
David Bremner <david@tethera.net> writes:
> Indeed that speeds up the initial clone on this machine from 39 minutes
> (I switched machines) to 30s. I will play with it a bit more, and report
> back.
It's not a showstopper, but "git pull" takes about 1/2 the wall time
(about 2/3 of the CPU time) of the original clone, even if there is only
one tag changed.
Two potential improvements I can think of.
- notmuch-dump.c calls notmuch_query_set_sort (query,
NOTMUCH_SORT_UNSORTED). I think I managed to do this (diff below),
but performance gain was negligible.
- Since you cache the lastmod value, you should be able to use it in a
query. This does make a big difference in my experiments. I had to
remove the 'deleteall' (otherwise only the changed messages are left
in the git repo). I'm not 100% this is correct, hopefully you see
quicker than I. In any case the lastmod query is what notmuch-git
uses.
diff --git a/git-remote-nm b/git-remote-nm
index c668b38..cabea26 100755
--- a/git-remote-nm
+++ b/git-remote-nm
@@ -148,9 +148,11 @@ def wr_import(ref)
wr_data("lastmod: %d\n" % ($lastmod || 0))
wr_l 'from refs/notmuch/master^0' if $lastmod
- wr_l 'deleteall'
+# wr_l 'deleteall'
- $db.query('').search_messages.each do |msg|
+ $query=$db.query("lastmod:%d.." % ($lastmod || 0) )
+ $query.sort=Notmuch::SORT_UNSORTED
+ $query.search_messages.each do |msg|
hash = Blake2b.hex(msg.message_id, Blake2b::Key.none, 2)
dir1, dir2 = hash[..1], hash[2..]
wr_l 'M 644 inline %s/%s/%s/tags' % [dir1, dir2, encode_filename(msg.message_id)]
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: Reimagining notmuch-git/nmbug
2023-04-03 19:40 ` David Bremner
@ 2023-04-03 20:23 ` Felipe Contreras
2023-04-03 23:37 ` David Bremner
0 siblings, 1 reply; 17+ messages in thread
From: Felipe Contreras @ 2023-04-03 20:23 UTC (permalink / raw)
To: David Bremner; +Cc: notmuch@notmuchmail.org
On Mon, Apr 3, 2023 at 2:40 PM David Bremner <david@tethera.net> wrote:
>
> David Bremner <david@tethera.net> writes:
>
> > Indeed that speeds up the initial clone on this machine from 39 minutes
> > (I switched machines) to 30s. I will play with it a bit more, and report
> > back.
>
> It's not a showstopper, but "git pull" takes about 1/2 the wall time
> (about 2/3 of the CPU time) of the original clone, even if there is only
> one tag changed.
Yes, every fetch should take as much time as the original clone.
> Two potential improvements I can think of.
>
> - notmuch-dump.c calls notmuch_query_set_sort (query,
> NOTMUCH_SORT_UNSORTED). I think I managed to do this (diff below),
> but performance gain was negligible.
OK.
> - Since you cache the lastmod value, you should be able to use it in a
> query. This does make a big difference in my experiments. I had to
> remove the 'deleteall' (otherwise only the changed messages are left
> in the git repo). I'm not 100% this is correct, hopefully you see
> quicker than I. In any case the lastmod query is what notmuch-git
> uses.
That should work to update existing tags, but how are we going to
detect if a message has disappeared? Or is that not a thing?
> diff --git a/git-remote-nm b/git-remote-nm
> index c668b38..cabea26 100755
> --- a/git-remote-nm
> +++ b/git-remote-nm
> @@ -148,9 +148,11 @@ def wr_import(ref)
> wr_data("lastmod: %d\n" % ($lastmod || 0))
> wr_l 'from refs/notmuch/master^0' if $lastmod
>
> - wr_l 'deleteall'
> +# wr_l 'deleteall'
>
> - $db.query('').search_messages.each do |msg|
> + $query=$db.query("lastmod:%d.." % ($lastmod || 0) )
Does "lastmod:0.." get all the revisions? If so, it might make sense
to set $lastmod to 0 initially.
Then we could unconditionally do:
$db.query('lastmod:%d..' % $lastmod, sort: Notmuch::SORT_UNSORTED)
--
Felipe Contreras\r
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Reimagining notmuch-git/nmbug
2023-04-03 20:23 ` Felipe Contreras
@ 2023-04-03 23:37 ` David Bremner
2023-04-04 0:29 ` Felipe Contreras
0 siblings, 1 reply; 17+ messages in thread
From: David Bremner @ 2023-04-03 23:37 UTC (permalink / raw)
To: Felipe Contreras; +Cc: notmuch@notmuchmail.org
Felipe Contreras <felipe.contreras@gmail.com> writes:
>
> That should work to update existing tags, but how are we going to
> detect if a message has disappeared? Or is that not a thing?
Indeed the same thought had occurred to me not long ago. I remembered
(belately) that I'd been through some similar thought process with nmbug.
Messages can and do disappear. So for I guess that optimization not OK,
at least not without some complications.
> Does "lastmod:0.." get all the revisions? If so, it might make sense
> to set $lastmod to 0 initially.
>
> Then we could unconditionally do:
>
> $db.query('lastmod:%d..' % $lastmod, sort: Notmuch::SORT_UNSORTED)
That would work, but as you point out, we'd need to deal with deletions
somehow. It occurs to me that wr_export also needs to be able to handle
disappearing message-ids. I suppose like notmuch-restore it can just
complain and skip any missing ones. It's tempting to try to do some kind
of lazy cleanup at that point, but I don't really see how that fits with
the remote-helper protocol.
d
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Reimagining notmuch-git/nmbug
2023-04-03 23:37 ` David Bremner
@ 2023-04-04 0:29 ` Felipe Contreras
2023-04-04 17:54 ` David Bremner
0 siblings, 1 reply; 17+ messages in thread
From: Felipe Contreras @ 2023-04-04 0:29 UTC (permalink / raw)
To: David Bremner; +Cc: notmuch@notmuchmail.org
On Mon, Apr 3, 2023 at 6:37 PM David Bremner <david@tethera.net> wrote:
>
> Felipe Contreras <felipe.contreras@gmail.com> writes:
>
> >
> > That should work to update existing tags, but how are we going to
> > detect if a message has disappeared? Or is that not a thing?
>
> Indeed the same thought had occurred to me not long ago. I remembered
> (belately) that I'd been through some similar thought process with nmbug.
> Messages can and do disappear. So for I guess that optimization not OK,
> at least not without some complications.
>
> > Does "lastmod:0.." get all the revisions? If so, it might make sense
> > to set $lastmod to 0 initially.
> >
> > Then we could unconditionally do:
> >
> > $db.query('lastmod:%d..' % $lastmod, sort: Notmuch::SORT_UNSORTED)
>
> That would work, but as you point out, we'd need to deal with deletions
> somehow. It occurs to me that wr_export also needs to be able to handle
> disappearing message-ids. I suppose like notmuch-restore it can just
> complain and skip any missing ones. It's tempting to try to do some kind
> of lazy cleanup at that point, but I don't really see how that fits with
> the remote-helper protocol.
We could have an external tool, something like `git-notmuch-fsck` or
something that the user has to regularly execute, as `git fsck` was in
the past.
Or we could say that after jumping a certain threshold of lastmod we
delete all the messages and start from scratch, perhaps every 1000
revisions.
Or maybe the query could generate a virtual tag if a message was
deleted since the previous lastmod (e.g. "nm::deleted"). Then it would
be trivial for the remote helper to tell that to git.
I lean towards the threshold, because that way the user doesn't need
to do anything, and there's no modifications needed in libnotmuch.
Cheers.
--
Felipe Contreras\r
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Reimagining notmuch-git/nmbug
2023-04-03 10:46 ` David Bremner
@ 2023-04-04 0:47 ` Felipe Contreras
2023-04-04 17:42 ` David Bremner
0 siblings, 1 reply; 17+ messages in thread
From: Felipe Contreras @ 2023-04-04 0:47 UTC (permalink / raw)
To: David Bremner; +Cc: notmuch@notmuchmail.org
On Mon, Apr 3, 2023 at 5:46 AM David Bremner <david@tethera.net> wrote:
>
> David Bremner <david@tethera.net> writes:
>
> >
> > I'm intrigued (and indeed I hadn't really thought about the degree to
> > which we were re-inventing git-fast-import and friends); however so far
> > my experiments did not get far enough to say anything conclusive.
> >
>
> I did manage to finish, about 70 minutes elapsed.
>
> Although you'r probably right that a file of tags is the right
> representation (it is what git-annex uses also), I think we'd need to
> define a custom merge driver to take unions of lists in the same way
> that git-annex does. Otherwise merging will be less automagic than it is
> now.
I'm not familiar with git-annex, I would need to see an example of
such merging happening.
One advantage of using the fast-import format is that it's easy to
change it, or support multiple formats.
In fact, the format could be specified in the URL, like
`nm::1:$HOME/mail` for the current notmuch-git format, and
`nm::2:$HOME/mail` for the new.
--
Felipe Contreras\r
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Reimagining notmuch-git/nmbug
2023-04-04 0:47 ` Felipe Contreras
@ 2023-04-04 17:42 ` David Bremner
0 siblings, 0 replies; 17+ messages in thread
From: David Bremner @ 2023-04-04 17:42 UTC (permalink / raw)
To: Felipe Contreras; +Cc: notmuch@notmuchmail.org
Felipe Contreras <felipe.contreras@gmail.com> writes:
>
> I'm not familiar with git-annex, I would need to see an example of
> such merging happening.
I was confused, git-annex is using the builtin merge strategy "union",
which is not eliminating duplicates or sorting, so probably not
applicable here. I still have to try some merges between different
machines to see what kind of conflicts can arise.
> One advantage of using the fast-import format is that it's easy to
> change it, or support multiple formats.
>
> In fact, the format could be specified in the URL, like
> `nm::1:$HOME/mail` for the current notmuch-git format, and
> `nm::2:$HOME/mail` for the new.
This might also be a way to handle the "prefix" setting that nmbug /
notmuch-git needs to only sync certain (e.g. notmuch::*) tags
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Reimagining notmuch-git/nmbug
2023-04-04 0:29 ` Felipe Contreras
@ 2023-04-04 17:54 ` David Bremner
2023-04-05 16:08 ` Felipe Contreras
0 siblings, 1 reply; 17+ messages in thread
From: David Bremner @ 2023-04-04 17:54 UTC (permalink / raw)
To: Felipe Contreras; +Cc: notmuch@notmuchmail.org
Felipe Contreras <felipe.contreras@gmail.com> writes:
> On Mon, Apr 3, 2023 at 6:37 PM David Bremner <david@tethera.net> wrote:
> Or we could say that after jumping a certain threshold of lastmod we
> delete all the messages and start from scratch, perhaps every 1000
> revisions.
>
> Or maybe the query could generate a virtual tag if a message was
> deleted since the previous lastmod (e.g. "nm::deleted"). Then it would
> be trivial for the remote helper to tell that to git.
A complication here is that tags be attached to mail message documents
in the database, so we would need to generate a so called "ghost
message", and clean those up somehow.
> I lean towards the threshold, because that way the user doesn't need
> to do anything, and there's no modifications needed in libnotmuch.
This sounds right. Can we use the detection of missing messages in
wr_export to reset the appropriate counters? It looks like yes, given
the call to store_lastmod.\r
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Reimagining notmuch-git/nmbug
2023-04-04 17:54 ` David Bremner
@ 2023-04-05 16:08 ` Felipe Contreras
2023-04-06 13:50 ` David Bremner
0 siblings, 1 reply; 17+ messages in thread
From: Felipe Contreras @ 2023-04-05 16:08 UTC (permalink / raw)
To: David Bremner; +Cc: notmuch@notmuchmail.org
On Tue, Apr 4, 2023 at 12:54 PM David Bremner <david@tethera.net> wrote:
>
> Felipe Contreras <felipe.contreras@gmail.com> writes:
>
> > On Mon, Apr 3, 2023 at 6:37 PM David Bremner <david@tethera.net> wrote:
>
> > Or we could say that after jumping a certain threshold of lastmod we
> > delete all the messages and start from scratch, perhaps every 1000
> > revisions.
> >
> > Or maybe the query could generate a virtual tag if a message was
> > deleted since the previous lastmod (e.g. "nm::deleted"). Then it would
> > be trivial for the remote helper to tell that to git.
>
> A complication here is that tags be attached to mail message documents
> in the database, so we would need to generate a so called "ghost
> message", and clean those up somehow.
I thought a little bit more about how I would use git-notmuch, and I
don't see the point in tracking messages that have no tags. In my view
the whole point of the tool is to backup the tags, and the whole point
of a backup is to eventually be able to restore it. But if there's
nothing to restore for a specific message, it might very well not
exist.
So instead of a `nm::deleted` tag, just no tags. I think from the
point of view of git-notmuch it shouldn't make a difference.
> > I lean towards the threshold, because that way the user doesn't need
> > to do anything, and there's no modifications needed in libnotmuch.
>
> This sounds right. Can we use the detection of missing messages in
> wr_export to reset the appropriate counters? It looks like yes, given
> the call to store_lastmod.
We would need to store them and use that information in the next
fetch. Although doable, it seems hacky, and in the past such things
have led to problems that are hard to solve due to inconsistent
states. For example what happens if in the next fetch we tell git that
some files have been removed, but we crash in the middle of it? The
next fetch we'll tell git that some files were removed, but git might
think they don't exist and fail. I think for that particular problem
git was fixed it shouldn't update the files unless the program exists
successfully, but I don't know.
I would rather go for a solution that is less hacky, and has less
chance of leaving the user in an unrecoverable state.
Cheers.
--
Felipe Contreras\r
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Reimagining notmuch-git/nmbug
2023-04-05 16:08 ` Felipe Contreras
@ 2023-04-06 13:50 ` David Bremner
0 siblings, 0 replies; 17+ messages in thread
From: David Bremner @ 2023-04-06 13:50 UTC (permalink / raw)
To: Felipe Contreras; +Cc: notmuch@notmuchmail.org
Felipe Contreras <felipe.contreras@gmail.com> writes:
> On Tue, Apr 4, 2023 at 12:54 PM David Bremner <david@tethera.net> wrote:
>>
>> This sounds right. Can we use the detection of missing messages in
>> wr_export to reset the appropriate counters? It looks like yes, given
>> the call to store_lastmod.
[snip]
> I would rather go for a solution that is less hacky, and has less
> chance of leaving the user in an unrecoverable state.
fair enough. Certainly notmuch-git has too many accumulated performance
hacks.\r
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2023-04-06 13:50 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-29 8:41 Reimagining notmuch-git/nmbug Felipe Contreras
2023-03-29 9:50 ` Michael J Gruber
2023-03-29 12:17 ` Felipe Contreras
2023-04-03 9:49 ` David Bremner
2023-04-03 10:46 ` David Bremner
2023-04-04 0:47 ` Felipe Contreras
2023-04-04 17:42 ` David Bremner
2023-04-03 11:48 ` Felipe Contreras
2023-04-03 16:01 ` Felipe Contreras
2023-04-03 18:42 ` David Bremner
2023-04-03 19:40 ` David Bremner
2023-04-03 20:23 ` Felipe Contreras
2023-04-03 23:37 ` David Bremner
2023-04-04 0:29 ` Felipe Contreras
2023-04-04 17:54 ` David Bremner
2023-04-05 16:08 ` Felipe Contreras
2023-04-06 13:50 ` David Bremner
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).