From: Stewart Smith <stewart@flamingspork.com>
To: Ben Gamari <bgamari@gmail.com>, notmuch <notmuch@notmuchmail.org>
Subject: Re: Mail in git
Date: Wed, 17 Feb 2010 21:07:28 +1100 [thread overview]
Message-ID: <87mxz8jhun.fsf@willster.local.flamingspork.com> (raw)
In-Reply-To: <87ocjok8yo.fsf@willster.local.flamingspork.com>
[-- Attachment #1: Type: text/plain, Size: 3118 bytes --]
On Wed, 17 Feb 2010 11:21:51 +1100, Stewart Smith <stewart@flamingspork.com> wrote:
> Using fast-import is interesting. Does it update the working tree? The
> big thing I wanted to avoid was creating a working tree (another million
> inodes being created is not ever what I need)
>
> Also interesting is the mention of creating packs on the fly... this
> could save the time in first writing the object and then packing it (as
> my script does).
>
> I'm going to play with this....
and I did.
good news... on my mailstore (which, as I've previously mentioned, takes
about 10 minutes to run 'du' over, about the same time as 'notmuch new'
takes):
using the (attached) evenless.pl to create a single commit with
everything in it:
$ du -sh .git
3.4G .git
Down from a whopping 14-15GB!!!
My previous effort (git-write-object, create pack every 1000 messages,
rinse, repeat) took all night and got to 3.7GB.
This took only 108 minutes.
In both cases, i was creating the repository on another spindle (USB2.0
disk attached to my laptop).
git-ls-tree and git-cat-file both work for listing and getting objects.
The next thing to think about is adding objects as they come
in... creating a new commit with just an added file should be pretty
simple and easy... but this means we get to keep a "revision history" of
the mailstore, which is *possibly* not ideal in terms of storage
efficiency (i'll do a trial with mine of doing one message at a time and
seeing what the end size is).
however... commit per added mail (or mails) does give us the advantage
of a really well documented and tested backup system :)
Deleting could be hard.. if we actually want the objects to go away in a
"permanent" way (not just no longer be referenced).
for the stats nerds:
$ time perl /home/stewart/evenless/evenless.pl /home/stewart/Maildir/INBOX
git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects: 785000
Total objects: 781813 ( 79023 duplicates )
blobs : 781363 ( 79023 duplicates 708627 deltas)
trees : 449 ( 0 duplicates 0 deltas)
commits: 1 ( 0 duplicates 0 deltas)
tags : 0 ( 0 duplicates 0 deltas)
Total branches: 1 ( 1 loads )
marks: 1048576 ( 860386 unique )
atoms: 860557
Memory total: 182780 KiB
pools: 152116 KiB
objects: 30664 KiB
---------------------------------------------------------------------
pack_report: getpagesize() = 4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit = 8589934592
pack_report: pack_used_ctr = 1
pack_report: pack_mmap_calls = 1
pack_report: pack_open_windows = 1 / 1
pack_report: pack_mapped = 388496447 / 388496447
---------------------------------------------------------------------
real 107m43.130s
user 45m25.430s
sys 2m49.440s
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: evenless.pl: maildir to git using fast-import --]
[-- Type: text/x-perl, Size: 1413 bytes --]
#!/usr/bin/perl -w
use strict;
my $tree= "";
use IPC::Open2;
use File::stat;
my $FILES;
my $mark= 1;
my $stripdir= $ARGV[0];
sub fastimport_blobs ($);
sub fastimport_blobs ($)
{
my $dirname= shift @_;
opendir (my $dirhandle, $dirname);
foreach (readdir $dirhandle)
{
next if /^\.\.?$/;
next if /\.cmeta$/;
next if /\.ibex.index$/;
next if /\.ibex.index.data$/;
next if /\.ev-summary$/;
next if /\.ev-summary-meta$/;
next if /\.notmuch$/;
if (-d $dirname.'/'.$_)
{
print STDERR "Recursing into $_/ ";
fastimport_blobs($dirname.'/'.$_);
print STDERR "\n";
}
else
{
my $sb= stat("$dirname/$_");
print FASTIMPORT "blob\n";
print FASTIMPORT "mark :$mark\n";
print FASTIMPORT "data ".($sb->size)."\n";
open FILEIN, "$dirname/$_";
my $content;
sysread FILEIN, $content, $sb->size;
close FILEIN;
print FASTIMPORT $content;
my $storedir= "$dirname/$_";
$storedir=~ s/^$stripdir//;
$storedir=~ s/^\///;
$FILES.="M 0644 :$mark $storedir\n";
$mark++;
}
}
}
open FASTIMPORT, "| git fast-import --date-format=rfc2822";
fastimport_blobs($ARGV[0]);
print FASTIMPORT "commit refs/heads/master\n";
print FASTIMPORT "committer EvenLess <evenless\@evenless> ".`date -R`;
print FASTIMPORT "data 11\n";
print FASTIMPORT "mail commit\n";
print FASTIMPORT $FILES;
print FASTIMPORT "\n";
close FASTIMPORT;
[-- Attachment #3: Type: text/plain, Size: 22 bytes --]
--
Stewart Smith
next prev parent reply other threads:[~2010-02-17 10:07 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-15 0:29 Mail in git Stewart Smith
2010-02-16 9:08 ` Michal Sojka
2010-02-16 19:06 ` Ben Gamari
2010-02-17 0:21 ` Stewart Smith
2010-02-17 10:07 ` Stewart Smith [this message]
2011-05-21 7:05 ` martin f krafft
2011-05-21 7:25 ` Stewart Smith
2010-02-17 1:21 ` martin f krafft
2010-02-17 15:03 ` Ben Gamari
2010-02-17 19:23 ` Mark Anderson
2010-02-17 19:34 ` Ben Gamari
2010-02-17 23:52 ` martin f krafft
2010-02-18 0:39 ` Ben Gamari
2010-02-18 1:58 ` martin f krafft
2010-02-18 2:19 ` Ben Gamari
2010-02-18 2:48 ` nested tag trees (was: Mail in git) martin f krafft
2010-02-18 4:32 ` martin f krafft
[not found] ` <1266463007-sup-8777@ben-laptop>
2010-02-18 4:34 ` martin f krafft
[not found] ` <20100218034613.GD1991@lapse.rw.madduck.net>
2010-02-18 4:44 ` Ben Gamari
2010-02-18 4:59 ` martin f krafft
2010-02-18 5:10 ` Ben Gamari
2010-02-19 0:31 ` martin f krafft
2010-02-19 9:52 ` Michal Sojka
2010-02-19 14:27 ` Ben Gamari
2010-02-17 23:56 ` Mail in git Stewart Smith
2010-02-18 1:01 ` Ben Gamari
2010-02-18 2:00 ` martin f krafft
2010-02-18 2:11 ` Git ancestry and sync problems (was: Mail in git) martin f krafft
2010-02-18 8:34 ` racin
2010-02-18 12:20 ` Jameson Rollins
2010-02-18 12:47 ` Ben Gamari
2010-02-18 23:23 ` martin f krafft
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://notmuchmail.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87mxz8jhun.fsf@willster.local.flamingspork.com \
--to=stewart@flamingspork.com \
--cc=bgamari@gmail.com \
--cc=notmuch@notmuchmail.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).