From: John Kitchin <jkitchin@andrew.cmu.edu>
To: Oleg Sivokon <olegsivokon@gmail.com>,
"emacs-orgmode@gnu.org" <emacs-orgmode@gnu.org>
Subject: Re: Extract document structure from Org file
Date: Sat, 04 Jul 2015 11:54:39 -0400 [thread overview]
Message-ID: <m2pp47ex4w.fsf@andrew.cmu.edu> (raw)
In-Reply-To: <87a8vdfacd.fsf@gmail.com>
I worked out a new version of the swish-e org indexer that indexes
custom xml representing the org file that you may find interesting for
your project.
http://kitchingroup.cheme.cmu.edu/blog/2015/07/04/An-xml-representation-of-an-org-document-for-indexing-with-swish-e/
It enables a search like this:
swish-e -f index-org2xml.swish-e -w src-block.language=python -w src-block=diffusion
to find org files with a python source block containing the word
diffusion.
I think swish-e supports ranking
(http://swish-e.org/docs/swish-faq.html#how_is_ranking_calculated_) too,
but I have not tried it.
It is pretty interesting overall!
Oleg Sivokon writes:
> John Kitchin <jkitchin@andrew.cmu.edu> writes:
>
>> You would use org-element. Try org-element-parse-buffer and
>> org-element-map and maybe org-element-interpret-data. There's also a
>> bunch of regexp for identifying/finding particular types of elements.
>
> Thanks! I'm already looking into it.
>
>> That sounds really cool. I recently hacked a swish-e index of my org
>> files (there might have been 3000+!)
>> http://kitchingroup.cheme.cmu.edu/blog/2015/06/25/Integrating-swish-e-and-Emacs/.
>> and
>>
>> I just updated it to index the html version of an org-file so that I
>> take advantage of the structure in the
>> search. http://kitchingroup.cheme.cmu.edu/blog/2015/07/03/Using-swish-e-to-index-org-files-as-html/. It
>> would be cool to have more granular searching though.
>>
>> Is your info project visible
>> anywhere? i can imagine a close-file hook function that updates the
>> database automatically.
>
> Whoa, that's a lot of Org files :) What I wrote so far is on Github, but
> it's in a very early stage, so it's not something you could just drop
> into your Emacs directory and start using right away.
> https://github.com/wvxvw/sphinx-mode
> I've also looked into Swish some time ago. I also thought about using
> Nepomuk, but, in the later case, I've to admit, I didn't make it through
> the documentation.
>
> The difference in using Sphinx is that it has ranking, and it has a
> relatively terse way of specifying searching criteria. For example, you
> could ask to search for "some words in this phrase"/3 and it would look
> for occurances of 3 of 5 words given between the quotes. Or, you could
> ask it to search for @node "R" @contents "printf" "format", and this
> would search for node titles mentioning "R" and having contents with
> words "printf" and "format".
> I've to admit I didn't master it fully (there are far more options and
> settings) but it does something that seems reasonable (if I compare it
> to M-x info-apropos).
>
> I'm also still trying to learn what's the best way to do indenxing, so
> the project is still very raw, but I'll get there one day :)
>
> The ultimate goal is also to write a more human-friendly interface to
> Sphinx, where one could ask questions in a subset of natural language :)
> (but that's a very long way into the future!)
>
> PS. I see that many posts on this list are titled with [O]. What does
> it mean, should I do that too?
>
> Best.
>
> Oleg
--
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu
prev parent reply other threads:[~2015-07-04 15:54 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-03 8:39 Extract document structure from Org file Oleg Sivokon
2015-07-03 11:58 ` Rasmus
2015-07-03 14:20 ` John Kitchin
[not found] ` <87a8vdfacd.fsf@gmail.com>
2015-07-04 15:54 ` John Kitchin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.orgmode.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m2pp47ex4w.fsf@andrew.cmu.edu \
--to=jkitchin@andrew.cmu.edu \
--cc=emacs-orgmode@gnu.org \
--cc=olegsivokon@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).