unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* interesting project!
@ 2009-11-21  9:01 Dirk-Jan C. Binnema
  2009-11-21 12:10 ` Carl Worth
  0 siblings, 1 reply; 8+ messages in thread
From: Dirk-Jan C. Binnema @ 2009-11-21  9:01 UTC (permalink / raw)
  To: notmuch

Hi all,

Wow, 'notmuch' looks like a very interesting project. In 2008, I wrote an
e-mail (Maildir) search tool called 'mu'[1], also using Xapian and GMime; my
plan was at some point to turn it into a mail reader (use
offlineimap/fetchmail etc. for getting the mail, and something else for
sending it), but I never got that far. Search works pretty well
though. Anyhow, it seems notmuch is getting there quickly.

Anyhow, I'll study the notmuch code and see if there are some useful bits in
my code that might make sense there, e.g., various dir scanning optimizations,
see [2].

Good luck!
Dirk.


    [1] http://www.djcbsoftware.nl/code/mu/
    [2] http://djcbflux.blogspot.com/2008/10/seek-destroy.html

-- 
Dirk-Jan C. Binnema                  Helsinki, Finland
e:djcb@djcbsoftware.nl           w:www.djcbsoftware.nl
pgp: D09C E664 897D 7D39 5047 A178 E96A C7A1 017D DA3C

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: interesting project!
  2009-11-21  9:01 interesting project! Dirk-Jan C. Binnema
@ 2009-11-21 12:10 ` Carl Worth
  2009-11-21 16:43   ` Jameson Greaf Rollins
  2009-11-22 12:23   ` Dirk-Jan C. Binnema
  0 siblings, 2 replies; 8+ messages in thread
From: Carl Worth @ 2009-11-21 12:10 UTC (permalink / raw)
  To: djcb, notmuch

On Sat, 21 Nov 2009 11:01:46 +0200, Dirk-Jan C. Binnema <djcb.bulk@gmail.com> wrote:
> Hi all,

Hi, Dirk. Welcome to notmuch!

> Wow, 'notmuch' looks like a very interesting project. In 2008, I wrote an
> e-mail (Maildir) search tool called 'mu'[1], also using Xapian and GMime; my
> plan was at some point to turn it into a mail reader (use
> offlineimap/fetchmail etc. for getting the mail, and something else for
> sending it), but I never got that far. Search works pretty well
> though. Anyhow, it seems notmuch is getting there quickly.

Ah, how ignorant I was. I probably could have saved myself a bunch of
work if I had just started with mu. Oh, well.

> Anyhow, I'll study the notmuch code and see if there are some useful bits in
> my code that might make sense there, e.g., various dir scanning optimizations,
> see [2].

That sounds great. It's also good to have people with experience in this
area join and help out. I'll look forward to any ideas or other
contributions you will have.

>     [2] http://djcbflux.blogspot.com/2008/10/seek-destroy.html

Thanks. Stewart Smith contributed a patch to notmuch a couple of days
ago that added inode sorting, (which I was totally unaware of as an
optimization idea):

Read mail directory in inode number order
http://git.notmuchmail.org/git/notmuch?a=commitdiff;h=a45ff8c36112a2f17c1ad5c20a16c30a47759797

-Carl

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: interesting project!
  2009-11-21 12:10 ` Carl Worth
@ 2009-11-21 16:43   ` Jameson Greaf Rollins
  2009-11-22 12:23   ` Dirk-Jan C. Binnema
  1 sibling, 0 replies; 8+ messages in thread
From: Jameson Greaf Rollins @ 2009-11-21 16:43 UTC (permalink / raw)
  To: Carl Worth; +Cc: notmuch, djcb

[-- Attachment #1: Type: text/plain, Size: 826 bytes --]

On Sat, Nov 21, 2009 at 01:10:42PM +0100, Carl Worth wrote:
> On Sat, 21 Nov 2009 11:01:46 +0200, Dirk-Jan C. Binnema <djcb.bulk@gmail.com> wrote:
> > Anyhow, I'll study the notmuch code and see if there are some useful bits in
> > my code that might make sense there, e.g., various dir scanning optimizations,
> > see [2].
> 
> That sounds great. It's also good to have people with experience in this
> area join and help out. I'll look forward to any ideas or other
> contributions you will have.

I've been using mu for a while now and have found it incredibly
useful.  I just heard about notmuch and it seems like the mail
processing system I've been waiting for, so I'm incredibly excited.
The idea of the mu and notmuch folks working together sounds
incredibly awesome.  I am really encouraged.

jamie.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: interesting project!
  2009-11-21 12:10 ` Carl Worth
  2009-11-21 16:43   ` Jameson Greaf Rollins
@ 2009-11-22 12:23   ` Dirk-Jan C. Binnema
  2009-11-22 22:52     ` Carl Worth
  1 sibling, 1 reply; 8+ messages in thread
From: Dirk-Jan C. Binnema @ 2009-11-22 12:23 UTC (permalink / raw)
  To: Carl Worth; +Cc: notmuch@notmuchmail.org

Hi Carl,

>>>>> "Carl" == Carl Worth <cworth@cworth.org> writes:

    >> Anyhow, I'll study the notmuch code and see if there are some useful
    >> bits in my code that might make sense there, e.g., various dir scanning
    >> optimizations, see [2].

    Carl> That sounds great. It's also good to have people with experience in
    Carl> this area join and help out. I'll look forward to any ideas or other
    Carl> contributions you will have.

Thanks for the nice words!

A small question: it seems that notmuch is avoiding the use of GLib directly
(of course, it depend on it anyway through GMime); is this because of
OOM-handling? It'd be nice if GLib could be used, it would make some things
quite a bit easier.

Best wishes,
Dirk.

-- 
Dirk-Jan C. Binnema                  Helsinki, Finland
e:djcb@djcbsoftware.nl           w:www.djcbsoftware.nl
pgp: D09C E664 897D 7D39 5047 A178 E96A C7A1 017D DA3C

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: interesting project!
  2009-11-22 12:23   ` Dirk-Jan C. Binnema
@ 2009-11-22 22:52     ` Carl Worth
  2009-11-23  7:08       ` Dirk-Jan C. Binnema
  0 siblings, 1 reply; 8+ messages in thread
From: Carl Worth @ 2009-11-22 22:52 UTC (permalink / raw)
  To: djcb; +Cc: notmuch@notmuchmail.org

On Sun, 22 Nov 2009 14:23:10 +0200, Dirk-Jan C. Binnema <djcb.bulk@gmail.com> wrote:
> A small question: it seems that notmuch is avoiding the use of GLib directly
> (of course, it depend on it anyway through GMime); is this because of
> OOM-handling? It'd be nice if GLib could be used, it would make some things
> quite a bit easier.

It's true that I don't like the OOM handling in glib. I also think that
glib tries to be too many different things at the same time. And
finally, having some talloc-friendly data structures (like a hash-table)
would be really nice.

In the meantime, as you say, we're already linking with glib because of
GMime, so there's really no reason not to call functions that are there
and that do what we want. What kinds of things were you thinking of that
would be easier with glib?

-Carl

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: interesting project!
  2009-11-22 22:52     ` Carl Worth
@ 2009-11-23  7:08       ` Dirk-Jan C. Binnema
  2009-11-24  2:57         ` Carl Worth
  0 siblings, 1 reply; 8+ messages in thread
From: Dirk-Jan C. Binnema @ 2009-11-23  7:08 UTC (permalink / raw)
  To: Carl Worth; +Cc: notmuch@notmuchmail.org

Hi Carl,

>>>>> "Carl" == Carl Worth <cworth@cworth.org> writes:

    Carl> On Sun, 22 Nov 2009 14:23:10 +0200, Dirk-Jan C. Binnema
    Carl> <djcb.bulk@gmail.com> wrote:
    >> A small question: it seems that notmuch is avoiding the use of GLib directly
    >> (of course, it depend on it anyway through GMime); is this because of
    >> OOM-handling? It'd be nice if GLib could be used, it would make some things
    >> quite a bit easier.

    Carl> It's true that I don't like the OOM handling in glib. I also think that
    Carl> glib tries to be too many different things at the same time. And
    Carl> finally, having some talloc-friendly data structures (like a hash-table)
    Carl> would be really nice.

Well, the counter point to the OOM-problems is that is that in many programs,
the 'malloc returns NULL'-case is often not very well tested (because it's
rather hard to test), and that at least on Linux, it's unlikely that malloc
ever does return NULL. Lennart Poettering wrote this up in some more
detail[1]. Of course, the requirements for notmuch may be a bit different and
I definitely don't want to suggest any radical change here after only finding
out about notmuch a few days ago :)

(BTW, there is a hashtable implementation in libc, (hcreate(3) etc.). Is that
one not sufficiently 'talloc-friendly'? It's not very user-friendly, but
that's another matter)

    Carl> In the meantime, as you say, we're already linking with glib because of
    Carl> GMime, so there's really no reason not to call functions that are there
    Carl> and that do what we want. What kinds of things were you thinking of that
    Carl> would be easier with glib?

I could imagine the string functions could replace the ones in talloc. There
are many more string functions, e.g., for handling file names / paths, which
are quite useful. Then there are wrappers for gcc'isms (G_UNLIKELY etc.) that
would make the ones in notmuch unneeded, and a lot of compatibility things
like G_DIR_SEPARATOR. And the datastructures (GSlice/GList/GHashtable) are
nice. The UTF8 functionality might come in handy.

Anyway, I was just curious, people have survived without GLib before, and if
you dislike the OOM-strategy, it's a bit of a no-no of course.

Best wishes,
Dirk.


[1] http://article.gmane.org/gmane.comp.audio.jackit/19998

-- 
Dirk-Jan C. Binnema                  Helsinki, Finland
e:djcb@djcbsoftware.nl           w:www.djcbsoftware.nl
pgp: D09C E664 897D 7D39 5047 A178 E96A C7A1 017D DA3C

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: interesting project!
  2009-11-23  7:08       ` Dirk-Jan C. Binnema
@ 2009-11-24  2:57         ` Carl Worth
  2009-11-24 14:16           ` Dirk-Jan C. Binnema
  0 siblings, 1 reply; 8+ messages in thread
From: Carl Worth @ 2009-11-24  2:57 UTC (permalink / raw)
  To: djcb; +Cc: notmuch@notmuchmail.org

On Mon, 23 Nov 2009 09:08:34 +0200, Dirk-Jan C. Binnema <djcb.bulk@gmail.com> wrote:
> Well, the counter point to the OOM-problems is that is that in many programs,
> the 'malloc returns NULL'-case is often not very well tested (because it's
> rather hard to test), and that at least on Linux, it's unlikely that malloc
> ever does return NULL. Lennart Poettering wrote this up in some more
> detail[1]. Of course, the requirements for notmuch may be a bit different and
> I definitely don't want to suggest any radical change here after only finding
> out about notmuch a few days ago :)

No problem. I'm glad to discuss things. That's how I learn and find out
whether my decisions are sound or not. :-)

I agree that trying to support OOM doesn't make sense without
testing. But that's why I want to test notmuch with memory-fault
injection. We've been doing this with the cairo library with good
success for a while.

As for "unlikely that malloc ever returns NULL", that's simply a
system-configuration away (just turn off overcommit). And I can imagine
notmuch being used in lots of places, (netbooks, web servers, etc.), so
I do want to make it as robust as possible.

> (BTW, there is a hashtable implementation in libc, (hcreate(3) etc.). Is that
> one not sufficiently 'talloc-friendly'? It's not very user-friendly, but
> that's another matter)

Thanks for mentioning the hash table. The hash table is one of the few
things that I *am* using from glib right now in notmuch. It's got a
couple of bizarre things about it:

	1. The simpler-appearing g_hash_table_new function is useless
	   for common cases like hashing strings. It will just leak
	   memory. So g_hash_table_new_full is the only one worth using.

	2. There are two lookup functions, g_hash_table_lookup, and
	   g_hash_table_lookup_extended.

	   And a program like notmuch really does use the hash table in
	   two ways. In the simpler case, we're using the hash to simply
	   implement a set, (such as avoiding duplicates in a set of
	   tags). In the more complex case, we're associating actual
	   objects with the keys, (such as when linking messages
	   together into a tree for the thread).

	   So, it might make sense if a hash-table interface supported
	   these two modes well. What's bizarre about GHashTable though,
	   is that in the "just a set" case, we only use NULL as the
	   value when inserting. And distinguish "previously inserted
	   with NULL" from "never inserted" is the one thing that
	   g_hash_table_lookup can't do. So I've only found that I could
	   ever use g_hash_table_lookup_extended, (and pass a pair of
	   NULLs for the return arguments I don't need).

Fortunately, Eric Anholt spent *his* flight home coding up an nice
implementation of an open-addressed hash designed specifically to be a
tiny little implementation suitable for copying directly into
project. He's testing it with Mesa now, and I might pull it into notmuch
later.

> I could imagine the string functions could replace the ones in talloc. There
> are many more string functions, e.g., for handling file names / paths, which
> are quite useful. Then there are wrappers for gcc'isms (G_UNLIKELY etc.) that
> would make the ones in notmuch unneeded, and a lot of compatibility things
> like G_DIR_SEPARATOR. And the datastructures (GSlice/GList/GHashtable) are
> nice. The UTF8 functionality might come in handy.

Yes. The portability stuff I think is actually interesting. I've thought
it really might make sense to have something that gave you *just* that,
(without a main loop, an object system, several memory allocators or
pieces for making your own memory allocators, etc). I haven't had a
chance to look into gnulib yet, but I'd like to.

As for a list, I almost always find it cleaner to be able to just have
my own list data structures, (to avoid casts, etc.).

And for a hash table, I'm interested in what Eric's doing.

I'm really not prejudiced against using code that's already been
written, (in spite of what might appear I don't feel the need to
re-solve every problem that's already been solved). But I have long
thought that we could have better support for a "C programmers toolkit"
of commonly needed things than we have before.

I definitely like the idea of having tiny, focused libraries that do one
thing and do it well, (and maybe even some things so tiny that they are
actually designed to be copied into the application---like with gnulib
and with Eric's new hash table).

> Anyway, I was just curious, people have survived without GLib before, and if
> you dislike the OOM-strategy, it's a bit of a no-no of course.

Thanks for understanding. :-)

And I enjoy the conversation,

-Carl

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: interesting project!
  2009-11-24  2:57         ` Carl Worth
@ 2009-11-24 14:16           ` Dirk-Jan C. Binnema
  0 siblings, 0 replies; 8+ messages in thread
From: Dirk-Jan C. Binnema @ 2009-11-24 14:16 UTC (permalink / raw)
  To: Carl Worth; +Cc: notmuch@notmuchmail.org

Hi Carl,

>>>>> "Carl" == Carl Worth <cworth@cworth.org> writes:

    Carl> I agree that trying to support OOM doesn't make sense without
    Carl> testing. But that's why I want to test notmuch with memory-fault
    Carl> injection. We've been doing this with the cairo library with good
    Carl> success for a while.

    Carl> As for "unlikely that malloc ever returns NULL", that's simply a
    Carl> system-configuration away (just turn off overcommit). And I can imagine
    Carl> notmuch being used in lots of places, (netbooks, web servers, etc.), so
    Carl> I do want to make it as robust as possible.

That is a very laudable goal! But it's also quite hard to achieve, considering
that both GMime and Xapian may have some different ideas about that. And at
least in the current code, I see fprintfs in 'malloc-returns-NULL'-cases --
but fprintf itself will probably allocate memory too. Also, at least now, the
bad alloc exceptions for C++ are not caught. Of course, that can be changed,
but it's just to show that these things are hard to get right.

    Carl> Thanks for mentioning the hash table. The hash table is one of the few
    Carl> things that I *am* using from glib right now in notmuch. It's got a
    Carl> couple of bizarre things about it:

    Carl> 	1. The simpler-appearing g_hash_table_new function is useless
    Carl> 	   for common cases like hashing strings. It will just leak
    Carl> 	   memory. So g_hash_table_new_full is the only one worth using.

Hmmm, I never noticed that behavior. Tf you are using dynamically allocated
strings, GHashTable won't free them for you -- but I can really see how it
could (given that it takes generic pointers), so you have to free those
yourself. But any memleaks beyond that?

    Carl> 	2. There are two lookup functions, g_hash_table_lookup, and
    Carl> 	   g_hash_table_lookup_extended.

    Carl> 	   So, it might make sense if a hash-table interface supported
    Carl> 	   these two modes well. What's bizarre about GHashTable though,
    Carl> 	   is that in the "just a set" case, we only use NULL as the
    Carl> 	   value when inserting. And distinguish "previously inserted
    Carl> 	   with NULL" from "never inserted" is the one thing that
    Carl> 	   g_hash_table_lookup can't do. So I've only found that I could
    Carl> 	   ever use g_hash_table_lookup_extended, (and pass a pair of
    Carl> 	   NULLs for the return arguments I don't need).

Hmmn, well in I found that returning NULL for 'not set' works in many cases,
and it makes it quite easy for that. If you need to distinguish between NULL
and 'not set', you can use either the _extended version as you mention, or use
some special NOT_SET static ptr you can compare with (and handle it
appropriately in the destructor).

    Carl> I definitely like the idea of having tiny, focused libraries that do
    Carl> one thing and do it well, (and maybe even some things so tiny that
    Carl> they are actually designed to be copied into the application---like
    Carl> with gnulib and with Eric's new hash table).

Ok; glib fills the role pretty well for me, and I don't really pay for the
parts that I don't use. But tastes differ, no problem ;-)

    Carl> Thanks for understanding. :-)
    Carl> And I enjoy the conversation,

Same here :) 

Best wishes,
Dirk.

-- 
Dirk-Jan C. Binnema                  Helsinki, Finland
e:djcb@djcbsoftware.nl           w:www.djcbsoftware.nl
pgp: D09C E664 897D 7D39 5047 A178 E96A C7A1 017D DA3C

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-11-24 16:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-21  9:01 interesting project! Dirk-Jan C. Binnema
2009-11-21 12:10 ` Carl Worth
2009-11-21 16:43   ` Jameson Greaf Rollins
2009-11-22 12:23   ` Dirk-Jan C. Binnema
2009-11-22 22:52     ` Carl Worth
2009-11-23  7:08       ` Dirk-Jan C. Binnema
2009-11-24  2:57         ` Carl Worth
2009-11-24 14:16           ` Dirk-Jan C. Binnema

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).