From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Lennart Borgman Newsgroups: gmane.emacs.devel Subject: Re: Indexed search with grep-like output Date: Tue, 4 Jan 2011 08:22:09 +0100 Message-ID: References: <831v4wpcue.fsf@gnu.org> <83wrmone2h.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1294125765 30166 80.91.229.12 (4 Jan 2011 07:22:45 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 4 Jan 2011 07:22:45 +0000 (UTC) Cc: srackham@gmail.com, monnier@iro.umontreal.ca, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Jan 04 08:22:40 2011 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Pa1E7-0007Pt-57 for ged-emacs-devel@m.gmane.org; Tue, 04 Jan 2011 08:22:39 +0100 Original-Received: from localhost ([127.0.0.1]:33178 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Pa1E6-0003ZW-FZ for ged-emacs-devel@m.gmane.org; Tue, 04 Jan 2011 02:22:38 -0500 Original-Received: from [140.186.70.92] (port=39356 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Pa1E0-0003YK-OM for emacs-devel@gnu.org; Tue, 04 Jan 2011 02:22:33 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Pa1Dz-00070b-Di for emacs-devel@gnu.org; Tue, 04 Jan 2011 02:22:32 -0500 Original-Received: from mail-ew0-f41.google.com ([209.85.215.41]:41970) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Pa1Dz-00070P-6C; Tue, 04 Jan 2011 02:22:31 -0500 Original-Received: by ewy27 with SMTP id 27so7141019ewy.0 for ; Mon, 03 Jan 2011 23:22:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=aKUjeNiEVxCZ/hzyIjx/Ua5kY/tUIQaGapK9IWiqOJI=; b=mgScrm85MqSZCo6GJNg79J+yeMIa81YUfICMYj5oWSJt00ITodgAuTMKn53AzOJOJl coYQpTUvycMQzr2oUVPIgI1J7FZ5K2VqUZF3RALI6tDaIjJ6LG5g8Zixj4se1WHdMOCw 81vQ99dROACgFvnLkPBMiHs8v53teP4nQh5sI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; b=Bil4138zRJAULFz1TlM91G8uHfk7TQbcMbOZXaClXpNjd6nT2CMRa8zCRHZhBPcEwQ mj/uyGGoNAn4h5WA8aVj0kGYYcKedGHG3SmZUlgthcjt4bSPJJEGOJ65V5zfBH1NOFJh MVkZgtKjiJ09lX8SWn3B8zDQuMjGnjdyRVrV4= Original-Received: by 10.213.28.66 with SMTP id l2mr17678333ebc.71.1294125750012; Mon, 03 Jan 2011 23:22:30 -0800 (PST) Original-Received: by 10.213.20.148 with HTTP; Mon, 3 Jan 2011 23:22:09 -0800 (PST) In-Reply-To: X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:134245 Archived-At: On Tue, Jan 4, 2011 at 8:09 AM, Eli Zaretskii wrote: >> From: Lennart Borgman >> Date: Tue, 4 Jan 2011 04:21:22 +0100 >> Cc: Eli Zaretskii , Emacs-Devel devel , >> =C2=A0 =C2=A0 =C2=A0 Stuart Rackham >> >> It works nicely for what it does. However unfortunately it is still >> unusable since the binary version of docindexer currently does not let >> you tell which files extensions it should index as text. > > In the directory where you installed docindexer, there's a file named > conf.py, a piece of Python code that describes the docindexer parser > configuration. =C2=A0Its syntax should be self-explanatory; you can add > entries there for whatever source files you'd like to index. No, you do not have that file if you used the installer and installed the binary version. If you want to use that installer you can not change the how files with different extensions are parsed by docindex. > Having said that, I don't think docindexer is the right tool for > indexing program source files. =C2=A0Lucene text analyzers are biased > towards indexing plain text, so they typically ignore one-letter > words, like "a" and "i", words like "the", "in", "on", "some", etc. -- > which could well be valid identifiers in a program. =C2=A0It really isn't > the tool for this job. It does not give an index of the kind you want, that is correct. However I might still find it handy to quickly find parts of the code. > For indexing source code, ID-utils is what you want; it doesn't > currently have an ELisp parser, but if you are willing to index *.el > files as plain text, you can tweak the id-util.map file to tell mkid > to treat *.el files as text (a plain text _is_ included in ID-utils). > If that's not good enough, writing an ELisp parser should be hard. I did not look into ID-utils, but perhaps it can use other programs to extract the id:s? In that case it could use Emacs. If you want to then feel free to add support for ID-utils to idxsearch.el. It should typically be a file on its own. The file idxdocindex.el is a good starting example.