unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* Thanks for notmuch-lore
       [not found]         ` <8735l2b7ui.fsf@waldekranz.com>
@ 2022-03-22  0:00           ` Carl Worth
  2022-03-22  5:15             ` Kyle Meyer
  0 siblings, 1 reply; 3+ messages in thread
From: Carl Worth @ 2022-03-22  0:00 UTC (permalink / raw)
  To: Tobias Waldekranz; +Cc: notmuch


[-- Attachment #1.1: Type: text/plain, Size: 1705 bytes --]

On Tue, Feb 01 2022, Tobias Waldekranz wrote:
> I actually gave up on getting my mailinglists from my email provider,
> now I just download it directly from lore. I hacked together a script
> that will scrape a public-inbox repo and convert it to a Maildir:
>
> https://github.com/wkz/notmuch-lore

Thanks for sharing this, Tobias. I needed exactly this today, and was
happy to have found this.

It looks like you've coded something to efficiently do the work that's
needed periodically, (fetch new emails from the public-inbox git
repository, convert them to maildir files, and prune away git state
other than a pointer to what's been converted already).

What I'm missing is the piece to convert over the entire archive from
the past.

I can fetch it all easily enough with public-inbox-clone. Maybe what I
want could be captured in a tool named something like:

	public-inbox-export --output=maildir

After which I'd be all bootstrapped and ready to use your notmuch-lore
pre-new hook.

> As you can tell from the name, it is tailored for plugging into notmuch,
> but the guts are pretty generic.

Indeed. And it looks like all the code I would need for the export I
described above is right there in your script. It's as simple as:

	git rev-list | while read sha; do
            $git show $sha:m > $maildir/new/$sha
        done

So, next I should go put together a patch against public-inbox to add
that.

Thanks again,

-Carl

PS. I debated whether to CC lkml where the original message I was
replying to was from originally. I decided against it and almost just
emailed Tobias alone, but I really do want discussion like this to be
archived in public. So I CCed the notmuch mailing list at least.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Thanks for notmuch-lore
  2022-03-22  0:00           ` Thanks for notmuch-lore Carl Worth
@ 2022-03-22  5:15             ` Kyle Meyer
  2022-03-22 17:00               ` Carl Worth
  0 siblings, 1 reply; 3+ messages in thread
From: Kyle Meyer @ 2022-03-22  5:15 UTC (permalink / raw)
  To: Carl Worth; +Cc: Tobias Waldekranz, notmuch

Carl Worth writes:

> On Tue, Feb 01 2022, Tobias Waldekranz wrote:
>> I actually gave up on getting my mailinglists from my email provider,
>> now I just download it directly from lore. I hacked together a script
>> that will scrape a public-inbox repo and convert it to a Maildir:
>>
>> https://github.com/wkz/notmuch-lore
>
> Thanks for sharing this, Tobias. I needed exactly this today, and was
> happy to have found this.
>
> It looks like you've coded something to efficiently do the work that's
> needed periodically, (fetch new emails from the public-inbox git
> repository, convert them to maildir files, and prune away git state
> other than a pointer to what's been converted already).
>
> What I'm missing is the piece to convert over the entire archive from
> the past.

I may be missing something (I didn't know about notmuch-lore before
seeing it mentioned here), but it looks like the initialization step of
notmuch-lore's pre-new handles that already.  You just need to set
`since` far enough back:

--8<---------------cut here---------------start------------->8---
tmphome=$(mktemp -d "${TMPDIR:-/tmp}"/nm-lore-XXXXXXX)
cd "$tmphome"

HOME="$tmphome"
export HOME

mkdir mail
notmuch setup
notmuch new

mkdir -p mail/.notmuch/.lore  mail/.notmuch/hooks

cat >mail/.notmuch/.lore/sources <<'EOF'
[gwl]
url=https://yhetil.org/gwl/git
since=50 years ago
EOF

curl -fSsL \
     https://raw.githubusercontent.com/wkz/notmuch-lore/3e2a13b32b178a4d3296cee6f69ee3491eebdb9f/pre-new \
     >mail/.notmuch/hooks/pre-new
chmod +x mail/.notmuch/hooks/pre-new
./mail/.notmuch/hooks/pre-new
--8<---------------cut here---------------end--------------->8---

That returns the number of messages I expect for that (small) archive:

  $ find mail/gwl -type f | wc -l
  288

Also, just to list some other options in this space, l2md and impibe are
mentioned at <https://public-inbox.org/clients.html> as tools for
converting public-inbox archives into maildir format.  (I haven't used
either myself.)

Tobias, just a note of something I saw when looking over the script:

    $git rev-list $3 | while read sha; do
      $git show $sha:m >$db/$1/new/$sha
    done

This would error if it encounters a deleted message in the archive
because then the commit will have a "d" in the working tree instead of
an "m".  See <https://public-inbox.org/public-inbox-v2-format.html>.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Thanks for notmuch-lore
  2022-03-22  5:15             ` Kyle Meyer
@ 2022-03-22 17:00               ` Carl Worth
  0 siblings, 0 replies; 3+ messages in thread
From: Carl Worth @ 2022-03-22 17:00 UTC (permalink / raw)
  To: Kyle Meyer; +Cc: Tobias Waldekranz, notmuch


[-- Attachment #1.1: Type: text/plain, Size: 1145 bytes --]

On Tue, Mar 22 2022, Kyle Meyer wrote:
> I may be missing something (I didn't know about notmuch-lore before
> seeing it mentioned here), but it looks like the initialization step of
> notmuch-lore's pre-new handles that already.  You just need to set
> `since` far enough back:

Hmm... I did see the "since" parameter and cranked it back.

It didn't seem to do what I wanted, but it's possible the bug is only
with multi-epoch archives, (I was trying to bring in LKML).

From poking at it, it looked like it did perform a "deepening" operation
using the "since" parameter after the initial clone, but then didn't use
anything older than the most-recent upstream commit for the range of
commits from which to get messages out.

But my examination of the code and behavior was very cursory, I admit.

> Also, just to list some other options in this space, l2md and impibe are
> mentioned at <https://public-inbox.org/clients.html> as tools for
> converting public-inbox archives into maildir format.  (I haven't used
> either myself.)

Thanks! I clearly didn't look quite hard enough. I appreciate the
pointers.

-Carl

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-03-22 17:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20220131154655.1614770-1-tobias@waldekranz.com>
     [not found] ` <20220131154655.1614770-2-tobias@waldekranz.com>
     [not found]   ` <20220201170634.wnxy3s7f6jnmt737@skbuf>
     [not found]     ` <87a6fabbtb.fsf@waldekranz.com>
     [not found]       ` <20220201201141.u3qhhq75bo3xmpiq@skbuf>
     [not found]         ` <8735l2b7ui.fsf@waldekranz.com>
2022-03-22  0:00           ` Thanks for notmuch-lore Carl Worth
2022-03-22  5:15             ` Kyle Meyer
2022-03-22 17:00               ` Carl Worth

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).