From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Indexed search with grep-like output Date: Tue, 04 Jan 2011 02:09:04 -0500 Message-ID: References: <831v4wpcue.fsf@gnu.org> <83wrmone2h.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: lo.gmane.org X-Trace: dough.gmane.org 1294124963 27244 80.91.229.12 (4 Jan 2011 07:09:23 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 4 Jan 2011 07:09:23 +0000 (UTC) Cc: srackham@gmail.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Lennart Borgman Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Jan 04 08:09:18 2011 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Pa11C-0000gf-Eo for ged-emacs-devel@m.gmane.org; Tue, 04 Jan 2011 08:09:18 +0100 Original-Received: from localhost ([127.0.0.1]:37327 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Pa11B-0000y6-CB for ged-emacs-devel@m.gmane.org; Tue, 04 Jan 2011 02:09:17 -0500 Original-Received: from [140.186.70.92] (port=38185 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Pa114-0000y0-7G for emacs-devel@gnu.org; Tue, 04 Jan 2011 02:09:11 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Pa112-00040M-Np for emacs-devel@gnu.org; Tue, 04 Jan 2011 02:09:10 -0500 Original-Received: from fencepost.gnu.org ([140.186.70.10]:44093) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Pa112-00040I-IC for emacs-devel@gnu.org; Tue, 04 Jan 2011 02:09:08 -0500 Original-Received: from eliz by fencepost.gnu.org with local (Exim 4.69) (envelope-from ) id 1Pa10y-0005n6-Mh; Tue, 04 Jan 2011 02:09:04 -0500 In-reply-to: (message from Lennart Borgman on Tue, 4 Jan 2011 04:21:22 +0100) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:134244 Archived-At: > From: Lennart Borgman > Date: Tue, 4 Jan 2011 04:21:22 +0100 > Cc: Eli Zaretskii , Emacs-Devel devel , > Stuart Rackham > > It works nicely for what it does. However unfortunately it is still > unusable since the binary version of docindexer currently does not let > you tell which files extensions it should index as text. In the directory where you installed docindexer, there's a file named conf.py, a piece of Python code that describes the docindexer parser configuration. Its syntax should be self-explanatory; you can add entries there for whatever source files you'd like to index. Having said that, I don't think docindexer is the right tool for indexing program source files. Lucene text analyzers are biased towards indexing plain text, so they typically ignore one-letter words, like "a" and "i", words like "the", "in", "on", "some", etc. -- which could well be valid identifiers in a program. It really isn't the tool for this job. For indexing source code, ID-utils is what you want; it doesn't currently have an ELisp parser, but if you are willing to index *.el files as plain text, you can tweak the id-util.map file to tell mkid to treat *.el files as text (a plain text _is_ included in ID-utils). If that's not good enough, writing an ELisp parser should be hard.