From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Indexed search with grep-like output Date: Tue, 04 Jan 2011 03:11:06 -0500 Message-ID: References: <831v4wpcue.fsf@gnu.org> <83wrmone2h.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: dough.gmane.org 1294128686 8806 80.91.229.12 (4 Jan 2011 08:11:26 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 4 Jan 2011 08:11:26 +0000 (UTC) Cc: srackham@gmail.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Lennart Borgman Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Jan 04 09:11:19 2011 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Pa1zC-0006fi-Nn for ged-emacs-devel@m.gmane.org; Tue, 04 Jan 2011 09:11:19 +0100 Original-Received: from localhost ([127.0.0.1]:38296 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Pa1zB-0003MU-Tu for ged-emacs-devel@m.gmane.org; Tue, 04 Jan 2011 03:11:17 -0500 Original-Received: from [140.186.70.92] (port=54333 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Pa1z5-0003Jy-J3 for emacs-devel@gnu.org; Tue, 04 Jan 2011 03:11:12 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Pa1z4-0007Az-6J for emacs-devel@gnu.org; Tue, 04 Jan 2011 03:11:11 -0500 Original-Received: from fencepost.gnu.org ([140.186.70.10]:44603) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Pa1z4-0007Av-3e for emacs-devel@gnu.org; Tue, 04 Jan 2011 03:11:10 -0500 Original-Received: from eliz by fencepost.gnu.org with local (Exim 4.69) (envelope-from ) id 1Pa1z0-0000RQ-CE; Tue, 04 Jan 2011 03:11:06 -0500 In-reply-to: (message from Lennart Borgman on Tue, 4 Jan 2011 08:22:09 +0100) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:134247 Archived-At: > From: Lennart Borgman > Date: Tue, 4 Jan 2011 08:22:09 +0100 > Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org, srackham@gmail.com > > > In the directory where you installed docindexer, there's a file named > > conf.py, a piece of Python code that describes the docindexer parser > > configuration.  Its syntax should be self-explanatory; you can add > > entries there for whatever source files you'd like to index. > > No, you do not have that file if you used the installer and installed > the binary version. Well, I certainly did use the installer, and I do have that file. Are you sure you don't have it? In any case, you can find it in the docindexer source distribution. > If you want to use that installer you can not > change the how files with different extensions are parsed by docindex. But I just did change that. Here's the exact recipe: . Find config.py in the docindexer installation directory and edit it to add a line for *.el files. . Find a file named library.zip in the docindexer installation directory. This is the class library used by docindexer. . Replace the file docindexer/config.pyc in library.zip with the edited docindexer/config.py. Note: the .pyc extension means that the file was compiled by Python; the corresponding .py file is not compiled, but it will be used anyway -- this is similar to what Emacs does with *.el and *.elc files. . Run "docindexer --config" and make sure you see the *.el line in the output. After performing the above procedure, I have just indexed the entire Emacs lisp/ directory. It took 3 minutes (yes, the indexer is not very fast, which is why it's scheduled to run at night when I'm away; mkid does the same job 3 times faster). Morale: Never underestimate the power of Free Software! When you have sources, _you_ are in control, not the software developer. This is what Free Software is all about. > > Having said that, I don't think docindexer is the right tool for > > indexing program source files.  Lucene text analyzers are biased > > towards indexing plain text, so they typically ignore one-letter > > words, like "a" and "i", words like "the", "in", "on", "some", etc. -- > > which could well be valid identifiers in a program.  It really isn't > > the tool for this job. > > It does not give an index of the kind you want, that is correct. > However I might still find it handy to quickly find parts of the code. Is it really handy? Lisp identifiers include punctuation characters such as `-', `>', `:', etc. I'd guess that plain text indexing will not index these identifiers as you'd want to. > If you want to then feel free to add support for ID-utils to > idxsearch.el. It should typically be a file on its own. The file > idxdocindex.el is a good starting example. I'd rather extend id-utils.el, and eventually add that to Emacs.