From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Lennart Borgman Newsgroups: gmane.emacs.devel Subject: Re: Indexed search with grep-like output Date: Tue, 4 Jan 2011 09:39:28 +0100 Message-ID: References: <831v4wpcue.fsf@gnu.org> <83wrmone2h.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1294132300 24063 80.91.229.12 (4 Jan 2011 09:11:40 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 4 Jan 2011 09:11:40 +0000 (UTC) Cc: srackham@gmail.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Jan 04 10:11:34 2011 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Pa2vS-0000Nw-Ei for ged-emacs-devel@m.gmane.org; Tue, 04 Jan 2011 10:11:31 +0100 Original-Received: from localhost ([127.0.0.1]:45289 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Pa2rM-0003AB-MD for ged-emacs-devel@m.gmane.org; Tue, 04 Jan 2011 04:07:17 -0500 Original-Received: from [140.186.70.92] (port=41349 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Pa2Qp-00059g-IM for emacs-devel@gnu.org; Tue, 04 Jan 2011 03:39:52 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Pa2Qn-0005sd-Sf for emacs-devel@gnu.org; Tue, 04 Jan 2011 03:39:51 -0500 Original-Received: from mail-ew0-f67.google.com ([209.85.215.67]:46847) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Pa2Qn-0005sS-Jw; Tue, 04 Jan 2011 03:39:49 -0500 Original-Received: by ewy22 with SMTP id 22so1731017ewy.6 for ; Tue, 04 Jan 2011 00:39:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=v3MOmmz9vRC0/8R4u7x9ASdjh9tApCNpg9k3olEO4Wg=; b=CKJc7Vg9EtNXraZb+uvZIyZQV2JWBWTAwVWR/lCpYuLFVuIMTVZSzBArmOYQ1GqJQZ iyAo7pm+wUILTmEotSLzavrXiqihgqZ8WMcRHFRudc4J4/LKMv1F7zazfRlIMlMffpxS nHOodKKdsTT0t+ECC8dtrj9Qi1AxkTaTMfkps= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; b=BvilHrlzhidkVL4ByeFt3rlFIPDoLWm5SBveVdVtcLiPEG+7kmPdi+BKe8x8kHt4IF DT/blVPU8tY0cHr1oz2bxGIichNKl3KaHX3/NUOtV1owqRyBN4Pn4tP1kItbDrTrowQT GVW74FIcGIuGNESMz/3vkWdzavDVyvqN8YN3o= Original-Received: by 10.213.8.77 with SMTP id g13mr1010366ebg.58.1294130388583; Tue, 04 Jan 2011 00:39:48 -0800 (PST) Original-Received: by 10.213.20.148 with HTTP; Tue, 4 Jan 2011 00:39:28 -0800 (PST) In-Reply-To: X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:134250 Archived-At: On Tue, Jan 4, 2011 at 9:11 AM, Eli Zaretskii wrote: >> From: Lennart Borgman >> Date: Tue, 4 Jan 2011 08:22:09 +0100 >> Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org, srackham@gmail.com >> >> > In the directory where you installed docindexer, there's a file named >> > conf.py, a piece of Python code that describes the docindexer parser >> > configuration. =C2=A0Its syntax should be self-explanatory; you can ad= d >> > entries there for whatever source files you'd like to index. >> >> No, you do not have that file if you used the installer and installed >> the binary version. > > Well, I certainly did use the installer, and I do have that file. =C2=A0A= re > you sure you don't have it? Yes, and I just told what Stuart told me. > In any case, you can find it in the docindexer source distribution. > >> If you want to use that installer you can not >> change the how files with different extensions are parsed by docindex. > > But I just did change that. =C2=A0Here's the exact recipe: > > =C2=A0. Find config.py in the docindexer installation directory and edit = it > =C2=A0 to add a line for *.el files. > =C2=A0. Find a file named library.zip in the docindexer installation > =C2=A0 directory. =C2=A0This is the class library used by docindexer. > =C2=A0. Replace the file docindexer/config.pyc in library.zip with the > =C2=A0 edited docindexer/config.py. =C2=A0Note: the .pyc extension means = that > =C2=A0 the file was compiled by Python; the corresponding .py file is not > =C2=A0 compiled, but it will be used anyway -- this is similar to what > =C2=A0 Emacs does with *.el and *.elc files. > =C2=A0. Run "docindexer --config" and make sure you see the *.el line in > =C2=A0 the output. Hm. Nice. > After performing the above procedure, I have just indexed the entire > Emacs lisp/ directory. =C2=A0It took 3 minutes (yes, the indexer is not > very fast, which is why it's scheduled to run at night when I'm away; > mkid does the same job 3 times faster). > > Morale: Never underestimate the power of Free Software! =C2=A0When you ha= ve > sources, _you_ are in control, not the software developer. =C2=A0This is > what Free Software is all about. It is nice, but normally you do not want to go through trouble just to use a new bit of software. You might rightly suspect that it is a waste of time to do that in many cases. >> > Having said that, I don't think docindexer is the right tool for >> > indexing program source files. =C2=A0Lucene text analyzers are biased >> > towards indexing plain text, so they typically ignore one-letter >> > words, like "a" and "i", words like "the", "in", "on", "some", etc. -- >> > which could well be valid identifiers in a program. =C2=A0It really is= n't >> > the tool for this job. >> >> It does not give an index of the kind you want, that is correct. >> However I might still find it handy to quickly find parts of the code. > > Is it really handy? =C2=A0Lisp identifiers include punctuation characters > such as `-', `>', `:', etc. =C2=A0I'd guess that plain text indexing will > not index these identifiers as you'd want to. It is a good point, I do not know. But I wrote this mainly for org-mode files and thought it could be used for code too for quickly finding something. I still think it can - if I change one little thing: If the line matcher uses AND it would be much better. >> If you want to then feel free to add support for ID-utils to >> idxsearch.el. It should typically be a file on its own. The file >> idxdocindex.el is a good starting example. > > I'd rather extend id-utils.el, and eventually add that to Emacs. Well, it might be good they stay separate. Another way to handle it might be to use the parsers from ID-utils and feed a regular search engine. Though you might perhaps then still have trouble with the query language. (Which of course can be solved in free software...)