From: ebiederm@xmission.com (Eric W. Biederman)
To: Eric Wong <e@80x24.org>
Cc: meta@public-inbox.org
Subject: Re: Searching via git grep?
Date: Fri, 20 Jul 2018 07:37:09 -0500 [thread overview]
Message-ID: <8736weaxsa.fsf@xmission.com> (raw)
In-Reply-To: <20180720061106.4f2u2zpdxnsilrxt@dcvr> (Eric Wong's message of "Fri, 20 Jul 2018 06:11:07 +0000")
Eric Wong <e@80x24.org> writes:
> "Eric W. Biederman" <ebiederm@xmission.com> wrote:
>> My current goal is to make it pleasant to read linux-kernel and possibly
>> other large archives on my personal machine. Right now the git
>> trees for linux-kernel are aboug 6.8G. Small enough to fit in RAM.
>>
>> The Xapian indexes are about 63G. Not small enough to fit in ram.
>> They are also not fast to update when I pull in a new batch of messages
>> from linux-kernel.
>
> Interesting, how long does it take to do an incremental index
> medium/full for you? Setting XAPIAN_FLUSH_THRESHOLD after my
> patch yesterday should help noticeably, especially if you're on
> HDD.
For a small sample less than a days worth of lkml messages
I get:
$ git --git-dir git/6.git/ fetch
Enter passphrase for key '/home/eric/.ssh/id_rsa':
Fetching origin
From https://git.kernel.org/pub/scm/public-inbox/vger.kernel.org/lkml/6
35280da650..0a97acb7e7 master -> master
remote: Counting objects: 1791, done.
remote: Compressing objects: 100% (1085/1085), done.
remote: Total 1791 (delta 109), reused 1791 (delta 109)
Receiving objects: 100% (1791/1791), 1.94 MiB | 1.98 MiB/s, done.
Resolving deltas: 100% (109/109), done.
From git:/public-inbox/vger.kernel.org/linux-kernel/6
35280da65057..0a97acb7e709 master -> master
$ time public-inbox-index
real 2m1.482s
user 0m26.084s
sys 0m20.792s
I am not on a HDD. I will play with XAPIAN_FLUSH_THRESHOLD next time
and see if things get better. Initially building the Xapian index was
extremely painful, with swapping and took over a day.
Subjectively searcing all of 6.git feels faster than those 2 minutes.
If for no other reason than I get some of the results back immediately.
>> So I am looking at using git grep as a stand-in for the Xapian indexes
>> when indexlevel eq 'basic'.
>>
>> Given my personal ratio of searches to indexing I think I will save
>> time in doing that. I don't have it all wired up yet to know if it will
>> work well, but I suspect it will.
>
> Totally understandable, and yes, if you can fit the LKML repos
> into RAM it should be usable enough for a single user.
>
> "git grep" also has the advantage of being able to use regexps,
> which isn't possible with Xapian at the moment.
My only concern with "git grep" for v2 is how do I get it to exclude
messages that have been deleted.
>> Is it only the web interface where the advanced search functionality is
>> available?
>
> Yes. I don't think there's a good way to implement search for
> NNTP on the server side... IMAP has specs for implementing
> search; but I don't know how much overlap there is with what
> our web UI currently offers.
I skimmed the IMAP rfcs earlier and the search sounds very close to what
Xapian makes available. Roughly terms and quoted terms (aka terms with
positions).
If the IMAP interface is sensible it might be worth doing the work to
extend NNTP to provide a search interface modeled on it.
Eric
next prev parent reply other threads:[~2018-07-20 12:37 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-19 20:47 Searching via git grep? Eric W. Biederman
2018-07-19 21:12 ` Eric Wong
2018-07-19 22:27 ` Eric W. Biederman
2018-07-20 6:11 ` Eric Wong
2018-07-20 12:37 ` Eric W. Biederman [this message]
2018-07-20 23:56 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8736weaxsa.fsf@xmission.com \
--to=ebiederm@xmission.com \
--cc=e@80x24.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).