unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* Notmuch's search view sucks
@ 2009-12-03 13:33 Gregor Hoffleit
  2009-12-04  0:29 ` Carl Worth
  0 siblings, 1 reply; 7+ messages in thread
From: Gregor Hoffleit @ 2009-12-03 13:33 UTC (permalink / raw)
  To: notmuch

Hi there,

first a short introduction: I was a mutt user for ages. When I read
about Sup, I was intrigued. After a short evaluation period, I switched
to Sup, which I'm now using since six months. 

Sup has many rough edges on its own, and it's not that easy to fix some
of them from the current codebase. notmuch looks like a clean restart of
the same idea, but with a different architecture. I like the concept of
a command line tool with a minimal set of functionality as a common
core, upon which different clients can build on.


But. Compared to Sup, the current notmuch clients suck :-)


Today: Sup's search-results-mode. It has a lot of polish that's plainly
missing from notmuch.el (or notmuch.vim):

- Sup's display is much terse than notmuch, still
- Sup manages to display the first few words of the first unread message
  in the thread.
- If a thread contains many authors, Sup shows only the firstnames.
  If that's still too long to fit, it cuts off at some point.
- User's name is rewritten as 'me'.
- The message date format needs only 8 characters (notmuch: 12).
- Message count is only displayed when necessary (>=1).
- Threads with unread messages are bold (resp. hilighted).
- Threads with attachments are marked with an "@".
- Threads with mails to user are marked with an ">".
- Different colors of tags, message content.

All in all, 'notmuch search' is a raw representation of field values,
while Sup's search-results-mode shows a polished and terse
interpretation of the same values, for human beings, even optimized for
the current display width.

Now notmuch.el and notmuch.vim just display the output of 'notmuch
search', verbatim (perhaps enhanced with coloring based on regexes).


I'm experimenting with a notmuch web client (currently 'evenless'),
trying to replicate much of the feeling of Sup, in a web client.

First, I took the output of 'notmuch search', parsed it and tried to
reformat it like Sup. That worked well for all fields but the date
field: In contrast to the other fields, notmuch's date representation
is intended for direct consumption by humans (english-speaking, that is
;-).


I noticed this entry in TODO:

    Add a "--format" option to "notmuch search", (something printf-like
    for selecting what gets printed).

Since I'm not eager to write a format parser, I started to implement
--format as an enumerating option notmuch_format_t. By now, I have
NOTMUCH_FORMAT_DEFAULT and NOTMUCH_FORMAT_SUP. do_search_threads() does
the real work. In notmuch-time.c, I have implemented an alternative nice
and terse time representation, notmuch_time_relative8_date().

I realized, though, that at this point I would have to hardcode things
like ANSI coloring into NOTMUCH_FORMAT_SUP.

Also, any l10n (e.g. of time representation) would have to be hardcoded
as well (btw, anybody knows a library for human readable time
representations which supports l10n and i18n?).


So perhaps it's better to move the polishing into the client (Yeah!
Python to the rescue! ;-). But then, 'notmuch search' would need to
return some raw representation of the date field as well.


Any comment? Any other thoughts about this?



Regards,
    Gregor Hoffleit

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Notmuch's search view sucks
  2009-12-03 13:33 Notmuch's search view sucks Gregor Hoffleit
@ 2009-12-04  0:29 ` Carl Worth
  2009-12-04  7:14   ` Karl Wiberg
  0 siblings, 1 reply; 7+ messages in thread
From: Carl Worth @ 2009-12-04  0:29 UTC (permalink / raw)
  To: Gregor Hoffleit, notmuch

[-- Attachment #1: Type: text/plain, Size: 2670 bytes --]

On Thu, 03 Dec 2009 14:33:51 +0100, Gregor Hoffleit <gregor@hoffleit.de> wrote:
> first a short introduction: I was a mutt user for ages. When I read
> about Sup, I was intrigued. After a short evaluation period, I switched
> to Sup, which I'm now using since six months. 

Hi Gregor, welcome to notmuch!

> But. Compared to Sup, the current notmuch clients suck :-)

Hey, we like our rough edges *really* rough, dontcha know?

> I'm experimenting with a notmuch web client (currently 'evenless'),
> trying to replicate much of the feeling of Sup, in a web client.

Hey, that sounds really interesting! I'll definitely look forward to
what you come up with.

> Also, any l10n (e.g. of time representation) would have to be hardcoded
> as well (btw, anybody knows a library for human readable time
> representations which supports l10n and i18n?).

I'd love to see one. The quick scan I did for human-readable time
formatting found stuff in languages like perl, python, and ruby, but I
didn't notice much in C. I also didn't look close enough to see if any
of these have multi-language suport.

> So perhaps it's better to move the polishing into the client (Yeah!
> Python to the rescue! ;-). But then, 'notmuch search' would need to
> return some raw representation of the date field as well.

Good point. There's actually a weird mix of raw and cooked output from
the notmuch command line right now. As you noticed, "notmuch search"
cooks the date too much, (and in a way useful only to English speakers).

Meanwhile, the "notmuch show" output is far too raw to be read without a
client prettying it up. (The message{ header{ body{ body} header}
message} stuff is almost as bad as XML.)

> Any comment? Any other thoughts about this?

I think I'd like to see notmuch output get both more cooked and more raw
at the same time. I'd like things to be more cooked by default,
("notmuch show" shouldn't print the ugly delimiters, should indent
messages, and should start up a pager). And then we just need options
that frontends can pass to get the raw output, (but quoted
safely---which the current "notmuch show" output is *not*).

-Carl

PS. If you're worried about multi-lingualization issues for notmuch,
you'll want to know that notmuch is (for now) unconditionally
instructing Xapian to use an English-language stemmer when indexing
mail. Obviously we'll want to support a configuration option for
specifying a default stemmer, (Xapian has stemmers for many languages I
believe). And a step beyond that would support different languages for
different emails, but that sounds like something "hard" to identify.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Notmuch's search view sucks
  2009-12-04  0:29 ` Carl Worth
@ 2009-12-04  7:14   ` Karl Wiberg
  2009-12-04 10:36     ` Olly Betts
  2009-12-04 17:58     ` Baruch Even
  0 siblings, 2 replies; 7+ messages in thread
From: Karl Wiberg @ 2009-12-04  7:14 UTC (permalink / raw)
  To: Carl Worth; +Cc: notmuch

On Fri, Dec 4, 2009 at 1:29 AM, Carl Worth <cworth@cworth.org> wrote:
> And a step beyond that would support different languages for
> different emails, but that sounds like something "hard" to identify.

But probably not as hard as identifying spam. It could probably be
done with a simple Bayesian filter counting word frequencies---but
it'd be much better if somebody else had already solved the problem,
since this smells suspiciously like something that ought to be a
separate project and put in a library ... does anyone know if such a
project already exists? I know Google can do it ...

It'd be very cool to have notmuch automatically tag messages according
to what language they're in.

-- 
Karl Wiberg, kha@treskal.com
   subrabbit.wordpress.com
   www.treskal.com/kalle

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Notmuch's search view sucks
  2009-12-04  7:14   ` Karl Wiberg
@ 2009-12-04 10:36     ` Olly Betts
  2009-12-04 11:52       ` Aaron Ecay
  2009-12-04 17:58     ` Baruch Even
  1 sibling, 1 reply; 7+ messages in thread
From: Olly Betts @ 2009-12-04 10:36 UTC (permalink / raw)
  To: notmuch

Karl Wiberg writes:
> On Fri, Dec 4, 2009 at 1:29 AM, Carl Worth wrote:
> > And a step beyond that would support different languages for
> > different emails, but that sounds like something "hard" to identify.
> 
> But probably not as hard as identifying spam. It could probably be
> done with a simple Bayesian filter counting word frequencies---but
> it'd be much better if somebody else had already solved the problem,
> since this smells suspiciously like something that ought to be a
> separate project and put in a library ... does anyone know if such a
> project already exists?

There's TextCat:

http://www.let.rug.nl/vannoord/TextCat/

It looks at n-gram frequencies, and can guess pretty reliably from
even a fairly small amount of text.

TextCat is in Perl.  I don't know if there's a C or C++ implementation
but it isn't a huge piece of code - finding a good technique was the
clever part of it.

Cheers,
    Olly

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Notmuch's search view sucks
  2009-12-04 10:36     ` Olly Betts
@ 2009-12-04 11:52       ` Aaron Ecay
  2009-12-04 18:01         ` Carl Worth
  0 siblings, 1 reply; 7+ messages in thread
From: Aaron Ecay @ 2009-12-04 11:52 UTC (permalink / raw)
  To: notmuch

--- 2009ko Abenudak 4an, Olly Betts-ek idatzi zuen:

[...]

> TextCat is in Perl.  I don't know if there's a C or C++ implementation but
> it isn't a huge piece of code - finding a good technique was the clever part
> of it.

The same algorithm is implemented in C here:
http://www.mnogosearch.org/guesser/

Licensed under the GPL and includes presets for ~50 languages.  A potential
drawback is that it doesn't handle raw HTML very well, according to the
documentation.

Aaron

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Notmuch's search view sucks
  2009-12-04  7:14   ` Karl Wiberg
  2009-12-04 10:36     ` Olly Betts
@ 2009-12-04 17:58     ` Baruch Even
  1 sibling, 0 replies; 7+ messages in thread
From: Baruch Even @ 2009-12-04 17:58 UTC (permalink / raw)
  To: Karl Wiberg; +Cc: notmuch

Karl Wiberg wrote:
> On Fri, Dec 4, 2009 at 1:29 AM, Carl Worth <cworth@cworth.org> wrote:
>> And a step beyond that would support different languages for
>> different emails, but that sounds like something "hard" to identify.
> 
> But probably not as hard as identifying spam. It could probably be
> done with a simple Bayesian filter counting word frequencies---but
> it'd be much better if somebody else had already solved the problem,
> since this smells suspiciously like something that ought to be a
> separate project and put in a library ... does anyone know if such a
> project already exists? I know Google can do it ...
> 
> It'd be very cool to have notmuch automatically tag messages according
> to what language they're in.

What we should have is an interface to run an external program to 
classify a message when it's newly introduced and another that runs when 
tags are changed so that machine learning can be made to work when the 
user changes tags.

Baruch

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Notmuch's search view sucks
  2009-12-04 11:52       ` Aaron Ecay
@ 2009-12-04 18:01         ` Carl Worth
  0 siblings, 0 replies; 7+ messages in thread
From: Carl Worth @ 2009-12-04 18:01 UTC (permalink / raw)
  To: Aaron Ecay, notmuch

[-- Attachment #1: Type: text/plain, Size: 701 bytes --]

On Fri, 04 Dec 2009 06:52:38 -0500, Aaron Ecay <aaronecay@gmail.com> wrote:
> The same algorithm is implemented in C here:
> http://www.mnogosearch.org/guesser/
> 
> Licensed under the GPL and includes presets for ~50 languages.

That indeed does look very interesting, (at least what I can get from
google's cache of the website, as the server seems to be down just
now). Oh, but I can just "apt-get source mnogosearch" and find
src/mguesser.c and src/guesser.c at least.

> A potential drawback is that it doesn't handle raw HTML very well,
> according to the documentation.

Shouldn't really be an issue. Notmuch will already want to de-tagify
HTML before indexing anyway.

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-12-04 18:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-03 13:33 Notmuch's search view sucks Gregor Hoffleit
2009-12-04  0:29 ` Carl Worth
2009-12-04  7:14   ` Karl Wiberg
2009-12-04 10:36     ` Olly Betts
2009-12-04 11:52       ` Aaron Ecay
2009-12-04 18:01         ` Carl Worth
2009-12-04 17:58     ` Baruch Even

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).