unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* Exporting a single email as JSON
@ 2011-12-10 18:32 Ciprian Dorin Craciun
  2011-12-10 20:15 ` Jameson Graef Rollins
  0 siblings, 1 reply; 6+ messages in thread
From: Ciprian Dorin Craciun @ 2011-12-10 18:32 UTC (permalink / raw)
  To: notmuch

    Hello all!

    Quick question: why isn't it reasonable to export a **single**
email in JSON format (by using the `show` sub-command)? (I mean I
understand that in order to be able to correctly parse the output we
need only one "object" (i.e. a list of threads, containing a list of
emails, etc.). But there might be use cases in which we need a
"twist".)

    My current use case is: I want to import the JSON representation
of my emails in CouchDB, each email in a single document. And as I
already have my emails indexed with Notmuch, I hopped that -- with the
help of some Bash-fu and Curl -- it would have been trivial to
instruct notmuch to export all emails matching a certain criteria as
JSON...

    What would have been perfect in this case: each matching email
(with or without the `--entire-thread` flag) should be exported as a
single JSON object on a single line, thus each different email on a
single line. Thus I could have easily used `notmuch show
--output=json-line -- {criteria} | xargs -L 1 -- curl
{couchdb-magic}`.)

    For now I'll pre-process the current output in JavaScript.

    Thanks,
    Ciprian.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Exporting a single email as JSON
  2011-12-10 18:32 Exporting a single email as JSON Ciprian Dorin Craciun
@ 2011-12-10 20:15 ` Jameson Graef Rollins
  2011-12-10 22:46   ` Ciprian Dorin Craciun
  0 siblings, 1 reply; 6+ messages in thread
From: Jameson Graef Rollins @ 2011-12-10 20:15 UTC (permalink / raw)
  To: Ciprian Dorin Craciun, notmuch

[-- Attachment #1: Type: text/plain, Size: 915 bytes --]

On Sat, 10 Dec 2011 20:32:22 +0200, Ciprian Dorin Craciun <ciprian.craciun@gmail.com> wrote:
>     Quick question: why isn't it reasonable to export a **single**
> email in JSON format (by using the `show` sub-command)? (I mean I
> understand that in order to be able to correctly parse the output we
> need only one "object" (i.e. a list of threads, containing a list of
> emails, etc.). But there might be use cases in which we need a
> "twist".)

Hi, Ciprian.  I agree that it would be nice too have the ability to
output single messages without the rest of their thread.  I have on
occasion wanted this functionality, but never enough to get around to
implementing it.  It definitely wouldn't be that hard to implement,
though.

The notmuch show function is actually going through a pretty major
overhaul at the moment.  I bet as soon as that's done we can get some
sort of single-message output going.

jamie.

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Exporting a single email as JSON
  2011-12-10 20:15 ` Jameson Graef Rollins
@ 2011-12-10 22:46   ` Ciprian Dorin Craciun
  2011-12-10 23:19     ` Jameson Graef Rollins
  2011-12-11  4:29     ` Austin Clements
  0 siblings, 2 replies; 6+ messages in thread
From: Ciprian Dorin Craciun @ 2011-12-10 22:46 UTC (permalink / raw)
  To: Jameson Graef Rollins; +Cc: notmuch

On Sat, Dec 10, 2011 at 22:15, Jameson Graef Rollins
<jrollins@finestructure.net> wrote:
> On Sat, 10 Dec 2011 20:32:22 +0200, Ciprian Dorin Craciun <ciprian.craciun@gmail.com> wrote:
>>     Quick question: why isn't it reasonable to export a **single**
>> email in JSON format (by using the `show` sub-command)? (I mean I
>> understand that in order to be able to correctly parse the output we
>> need only one "object" (i.e. a list of threads, containing a list of
>> emails, etc.). But there might be use cases in which we need a
>> "twist".)
>
> Hi, Ciprian.  I agree that it would be nice too have the ability to
> output single messages without the rest of their thread.  I have on
> occasion wanted this functionality, but never enough to get around to
> implementing it.  It definitely wouldn't be that hard to implement,
> though.
>
> The notmuch show function is actually going through a pretty major
> overhaul at the moment.  I bet as soon as that's done we can get some
> sort of single-message output going.
>
> jamie.


    I've given a quick look into `notmuch-show.c` (commit from
December 4) and indeed it seems quite trivial to add new formats.

    Thus I wonder:
    a) Is the code suitable for experimenting such a feature? (I mean
is the "overhaul" almost done, or still in progress?)
    b) What would be the estimate for the "overhaul" completion? (To
start prototyping such a feature...)
    c) Would someone else be interested in such a feature? (Or it's
something so remote that only the two of us stumbled upon it?)

    I think it's quite hard to get this feature "right". I.e. I can
see the following different -- but equally likely -- use-cases:
    * in my use-case I would need each line of the output to be a
standalone JSON object of an individual message; (thus I can script
with Bash `notmuch ... | while read message ; do ... ; done`;)
    * maybe someone else would need that the output to contain
**exactly one** such message (maybe the first);
    * and maybe for someone else the use case involves having no
`--entire-thread` by default;
    * further more someone else would actually prefer a "flatten" list
of messages (not the currently nested list);
    * or maybe the separator in the first use case should be `\0`
instead of `\n`;

    Thanks,
    Ciprian.

    P.S.: I think all sub-commands that output line-feed separated
records should also have the option to split them instead with `\0`.
(I.e. `xargs` insists upon this I think, if not it separates by space
or new-line.)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Exporting a single email as JSON
  2011-12-10 22:46   ` Ciprian Dorin Craciun
@ 2011-12-10 23:19     ` Jameson Graef Rollins
  2011-12-11  1:03       ` Ciprian Dorin Craciun
  2011-12-11  4:29     ` Austin Clements
  1 sibling, 1 reply; 6+ messages in thread
From: Jameson Graef Rollins @ 2011-12-10 23:19 UTC (permalink / raw)
  To: Ciprian Dorin Craciun; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 1698 bytes --]

On Sun, 11 Dec 2011 00:46:51 +0200, Ciprian Dorin Craciun <ciprian.craciun@gmail.com> wrote:
>     I've given a quick look into `notmuch-show.c` (commit from
> December 4) and indeed it seems quite trivial to add new formats.
> 
>     Thus I wonder:
>     a) Is the code suitable for experimenting such a feature? (I mean
> is the "overhaul" almost done, or still in progress?)

I think it's more just beginning rather than hearing completion.  You
can follow what's happened so far in the thread starting here:

id:"1322446871-14986-1-git-send-email-amdragon@mit.edu"

>     c) Would someone else be interested in such a feature? (Or it's
> something so remote that only the two of us stumbled upon it?)

I think it would be a useful feature, and if it's useful for you it's
probably worth implementing.

>     * in my use-case I would need each line of the output to be a
> standalone JSON object of an individual message; (thus I can script
> with Bash `notmuch ... | while read message ; do ... ; done`;)

This is actually a slightly different idea than what I thought you were
originally proposing.  Outputting a series of json objects rather than a
single list has been talked about for notmuch search as well.  I'm don't
have a good sense of whether this is a sensible idea or not.

>     * maybe someone else would need that the output to contain
> **exactly one** such message (maybe the first);

This is what I thought we were talking about.  This is an option I would
like to see, at least.

>     * or maybe the separator in the first use case should be `\0`
> instead of `\n`;

I think specifying the separator should just be an option.

jamie.

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Exporting a single email as JSON
  2011-12-10 23:19     ` Jameson Graef Rollins
@ 2011-12-11  1:03       ` Ciprian Dorin Craciun
  0 siblings, 0 replies; 6+ messages in thread
From: Ciprian Dorin Craciun @ 2011-12-11  1:03 UTC (permalink / raw)
  To: Jameson Graef Rollins; +Cc: notmuch

On Sun, Dec 11, 2011 at 01:19, Jameson Graef Rollins
<jrollins@finestructure.net> wrote:
> On Sun, 11 Dec 2011 00:46:51 +0200, Ciprian Dorin Craciun <ciprian.craciun@gmail.com> wrote:
>>     * in my use-case I would need each line of the output to be a
>> standalone JSON object of an individual message; (thus I can script
>> with Bash `notmuch ... | while read message ; do ... ; done`;)
>
> This is actually a slightly different idea than what I thought you were
> originally proposing.  Outputting a series of json objects rather than a
> single list has been talked about for notmuch search as well.  I'm don't
> have a good sense of whether this is a sensible idea or not.
>
>>     * maybe someone else would need that the output to contain
>> **exactly one** such message (maybe the first);
>
> This is what I thought we were talking about.  This is an option I would
> like to see, at least.


    Indeed exporting multiple messages as top / root JSON objects
isn't quite usable except limited import / export use-cases, thus what
you propose is more sensible. And in the end by having this
possibility I could easily implement the solution I'm seeking as
simple as:
~~~~
notmuch --output=messages -- {criteria} \
| xargs -L 1 -- notmuch show --format=json -- \
| while read message_json ; do ... ; done
~~~~

    But there is only one problem with such an approach: efficiency.
With the snippet above I'll have as many `notmuch` process executions
as messages. (And I do have quite a few of them.) Thus although
Notmuch is quite fast -- as in human imperceptible -- still opening
and closing the Xapian database so many times does have quite an
overhead.

    So in the end I think a discussion about the needed (/ wanted)
use-cases would be better.

    Ciprian.

    P.S.: I could help implement (or at least prototype) some of these
use-cases. Thus I'll watch over the thread you've pointed me to.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Exporting a single email as JSON
  2011-12-10 22:46   ` Ciprian Dorin Craciun
  2011-12-10 23:19     ` Jameson Graef Rollins
@ 2011-12-11  4:29     ` Austin Clements
  1 sibling, 0 replies; 6+ messages in thread
From: Austin Clements @ 2011-12-11  4:29 UTC (permalink / raw)
  To: Ciprian Dorin Craciun; +Cc: notmuch

Just to add to Jameson's email...

Quoth Ciprian Dorin Craciun on Dec 11 at 12:46 am:
> On Sat, Dec 10, 2011 at 22:15, Jameson Graef Rollins
> <jrollins@finestructure.net> wrote:
> > On Sat, 10 Dec 2011 20:32:22 +0200, Ciprian Dorin Craciun <ciprian.craciun@gmail.com> wrote:
> >>     Quick question: why isn't it reasonable to export a **single**
> >> email in JSON format (by using the `show` sub-command)? (I mean I
> >> understand that in order to be able to correctly parse the output we
> >> need only one "object" (i.e. a list of threads, containing a list of
> >> emails, etc.). But there might be use cases in which we need a
> >> "twist".)
> >
> > Hi, Ciprian.  I agree that it would be nice too have the ability to
> > output single messages without the rest of their thread.  I have on
> > occasion wanted this functionality, but never enough to get around to
> > implementing it.  It definitely wouldn't be that hard to implement,
> > though.
> >
> > The notmuch show function is actually going through a pretty major
> > overhaul at the moment.  I bet as soon as that's done we can get some
> > sort of single-message output going.
> >
> > jamie.
> 
> 
>     I've given a quick look into `notmuch-show.c` (commit from
> December 4) and indeed it seems quite trivial to add new formats.

I think it might make sense for formats to accept options that
fine-tune their output in orthogonal ways, rather than guessing what
consumers need.

However, I don't think adding *new* formats is the way to go.  We need
to be careful to limit the formats in order to prevent divergence.
There's a lot of information notmuch show could include in its output
and the few existing formats already include very different subsets of
this information.  We don't want to get into a situation where, say,
the array-JSON format evolves to includes one thing while the
line-broken-JSON format evolves to includes another and consumers have
to choose based on the information they need and not on what's easiest
for them to consume.

>     I think it's quite hard to get this feature "right". I.e. I can
> see the following different -- but equally likely -- use-cases:
>     * in my use-case I would need each line of the output to be a
> standalone JSON object of an individual message; (thus I can script
> with Bash `notmuch ... | while read message ; do ... ; done`;)

As Jameson mentioned, similar things have been discussed in the
context of notmuch search.  And the motivation there is related: we
want it to be easy to consume one result at a time, which means it
needs to be easy to know when the input is complete enough to pass to
a JSON parser.  In the case of show, this doesn't have to be at odds
with the existing format; we can leave the giant array for consumers
that don't need the complexity of streaming, but ensure that newlines
only appear between top-level array elements and nowhere else,
providing an in-band framing for streaming consumers.  I'm not sure
how you would do this with show, given its nested structure.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-12-11  4:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-10 18:32 Exporting a single email as JSON Ciprian Dorin Craciun
2011-12-10 20:15 ` Jameson Graef Rollins
2011-12-10 22:46   ` Ciprian Dorin Craciun
2011-12-10 23:19     ` Jameson Graef Rollins
2011-12-11  1:03       ` Ciprian Dorin Craciun
2011-12-11  4:29     ` Austin Clements

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).