* Extract document structure from Org file
@ 2015-07-03 8:39 Oleg Sivokon
2015-07-03 11:58 ` Rasmus
2015-07-03 14:20 ` John Kitchin
0 siblings, 2 replies; 4+ messages in thread
From: Oleg Sivokon @ 2015-07-03 8:39 UTC (permalink / raw)
To: emacs-orgmode
Hello list!
Suppose I wanted to extract the structure from an Org document, where,
what's important for me would be to have it cathegorically divided into
headers, paragraphs of text, technical information and inclusion of
other documents (code snippets). How would I do it?
The reason I'm asking is that I've a small project I work on, where I'm
trying to enhance the search in documents by using indexing combined
with queries based on things like distance between words, frequency of a
word appearing in a document and so on. (I'm using Sphinx for it.)
I've tried to do this with Info pages, and I liked the results, however,
in order to do this more intelligently, I'd like to index the documents
with better granularity (i.e. so that later on I could search assigning
different weights to words appearing in headers and words appearing in
comments).
Best.
Oleg
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Extract document structure from Org file
2015-07-03 8:39 Extract document structure from Org file Oleg Sivokon
@ 2015-07-03 11:58 ` Rasmus
2015-07-03 14:20 ` John Kitchin
1 sibling, 0 replies; 4+ messages in thread
From: Rasmus @ 2015-07-03 11:58 UTC (permalink / raw)
To: emacs-orgmode
Hi Oleg,
Oleg Sivokon <olegsivokon@gmail.com> writes:
> Suppose I wanted to extract the structure from an Org document, where,
> what's important for me would be to have it cathegorically divided into
> headers, paragraphs of text, technical information and inclusion of
> other documents (code snippets). How would I do it?
You would use org-element. Try org-element-parse-buffer and
org-element-map and maybe org-element-interpret-data. There's also a
bunch of regexp for identifying/finding particular types of elements.
Cheers,
Rasmus
--
To err is human. To screw up 10⁶ times per second, you need a computer
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Extract document structure from Org file
2015-07-03 8:39 Extract document structure from Org file Oleg Sivokon
2015-07-03 11:58 ` Rasmus
@ 2015-07-03 14:20 ` John Kitchin
[not found] ` <87a8vdfacd.fsf@gmail.com>
1 sibling, 1 reply; 4+ messages in thread
From: John Kitchin @ 2015-07-03 14:20 UTC (permalink / raw)
To: Oleg Sivokon; +Cc: emacs-orgmode
That sounds really cool. I recently hacked a swish-e index of my org
files (there might have been 3000+!)
http://kitchingroup.cheme.cmu.edu/blog/2015/06/25/Integrating-swish-e-and-Emacs/.
and
I just updated it to index the html version of an org-file so that I
take advantage of the structure in the
search. http://kitchingroup.cheme.cmu.edu/blog/2015/07/03/Using-swish-e-to-index-org-files-as-html/. It
would be cool to have more granular searching though.
Is your info project visible
anywhere? i can imagine a close-file hook function that updates the
database automatically.
Oleg Sivokon writes:
> Hello list!
>
> Suppose I wanted to extract the structure from an Org document, where,
> what's important for me would be to have it cathegorically divided into
> headers, paragraphs of text, technical information and inclusion of
> other documents (code snippets). How would I do it?
>
> The reason I'm asking is that I've a small project I work on, where I'm
> trying to enhance the search in documents by using indexing combined
> with queries based on things like distance between words, frequency of a
> word appearing in a document and so on. (I'm using Sphinx for it.)
> I've tried to do this with Info pages, and I liked the results, however,
> in order to do this more intelligently, I'd like to index the documents
> with better granularity (i.e. so that later on I could search assigning
> different weights to words appearing in headers and words appearing in
> comments).
>
> Best.
>
> Oleg
--
Professor John Kitchin
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
@johnkitchin
http://kitchingroup.cheme.cmu.edu
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-07-04 15:54 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-03 8:39 Extract document structure from Org file Oleg Sivokon
2015-07-03 11:58 ` Rasmus
2015-07-03 14:20 ` John Kitchin
[not found] ` <87a8vdfacd.fsf@gmail.com>
2015-07-04 15:54 ` John Kitchin
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.