* interesting project! @ 2009-11-21 9:01 Dirk-Jan C. Binnema 2009-11-21 12:10 ` Carl Worth 0 siblings, 1 reply; 8+ messages in thread From: Dirk-Jan C. Binnema @ 2009-11-21 9:01 UTC (permalink / raw) To: notmuch Hi all, Wow, 'notmuch' looks like a very interesting project. In 2008, I wrote an e-mail (Maildir) search tool called 'mu'[1], also using Xapian and GMime; my plan was at some point to turn it into a mail reader (use offlineimap/fetchmail etc. for getting the mail, and something else for sending it), but I never got that far. Search works pretty well though. Anyhow, it seems notmuch is getting there quickly. Anyhow, I'll study the notmuch code and see if there are some useful bits in my code that might make sense there, e.g., various dir scanning optimizations, see [2]. Good luck! Dirk. [1] http://www.djcbsoftware.nl/code/mu/ [2] http://djcbflux.blogspot.com/2008/10/seek-destroy.html -- Dirk-Jan C. Binnema Helsinki, Finland e:djcb@djcbsoftware.nl w:www.djcbsoftware.nl pgp: D09C E664 897D 7D39 5047 A178 E96A C7A1 017D DA3C ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: interesting project! 2009-11-21 9:01 interesting project! Dirk-Jan C. Binnema @ 2009-11-21 12:10 ` Carl Worth 2009-11-21 16:43 ` Jameson Greaf Rollins 2009-11-22 12:23 ` Dirk-Jan C. Binnema 0 siblings, 2 replies; 8+ messages in thread From: Carl Worth @ 2009-11-21 12:10 UTC (permalink / raw) To: djcb, notmuch On Sat, 21 Nov 2009 11:01:46 +0200, Dirk-Jan C. Binnema <djcb.bulk@gmail.com> wrote: > Hi all, Hi, Dirk. Welcome to notmuch! > Wow, 'notmuch' looks like a very interesting project. In 2008, I wrote an > e-mail (Maildir) search tool called 'mu'[1], also using Xapian and GMime; my > plan was at some point to turn it into a mail reader (use > offlineimap/fetchmail etc. for getting the mail, and something else for > sending it), but I never got that far. Search works pretty well > though. Anyhow, it seems notmuch is getting there quickly. Ah, how ignorant I was. I probably could have saved myself a bunch of work if I had just started with mu. Oh, well. > Anyhow, I'll study the notmuch code and see if there are some useful bits in > my code that might make sense there, e.g., various dir scanning optimizations, > see [2]. That sounds great. It's also good to have people with experience in this area join and help out. I'll look forward to any ideas or other contributions you will have. > [2] http://djcbflux.blogspot.com/2008/10/seek-destroy.html Thanks. Stewart Smith contributed a patch to notmuch a couple of days ago that added inode sorting, (which I was totally unaware of as an optimization idea): Read mail directory in inode number order http://git.notmuchmail.org/git/notmuch?a=commitdiff;h=a45ff8c36112a2f17c1ad5c20a16c30a47759797 -Carl ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: interesting project! 2009-11-21 12:10 ` Carl Worth @ 2009-11-21 16:43 ` Jameson Greaf Rollins 2009-11-22 12:23 ` Dirk-Jan C. Binnema 1 sibling, 0 replies; 8+ messages in thread From: Jameson Greaf Rollins @ 2009-11-21 16:43 UTC (permalink / raw) To: Carl Worth; +Cc: notmuch, djcb [-- Attachment #1: Type: text/plain, Size: 826 bytes --] On Sat, Nov 21, 2009 at 01:10:42PM +0100, Carl Worth wrote: > On Sat, 21 Nov 2009 11:01:46 +0200, Dirk-Jan C. Binnema <djcb.bulk@gmail.com> wrote: > > Anyhow, I'll study the notmuch code and see if there are some useful bits in > > my code that might make sense there, e.g., various dir scanning optimizations, > > see [2]. > > That sounds great. It's also good to have people with experience in this > area join and help out. I'll look forward to any ideas or other > contributions you will have. I've been using mu for a while now and have found it incredibly useful. I just heard about notmuch and it seems like the mail processing system I've been waiting for, so I'm incredibly excited. The idea of the mu and notmuch folks working together sounds incredibly awesome. I am really encouraged. jamie. [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: interesting project! 2009-11-21 12:10 ` Carl Worth 2009-11-21 16:43 ` Jameson Greaf Rollins @ 2009-11-22 12:23 ` Dirk-Jan C. Binnema 2009-11-22 22:52 ` Carl Worth 1 sibling, 1 reply; 8+ messages in thread From: Dirk-Jan C. Binnema @ 2009-11-22 12:23 UTC (permalink / raw) To: Carl Worth; +Cc: notmuch@notmuchmail.org Hi Carl, >>>>> "Carl" == Carl Worth <cworth@cworth.org> writes: >> Anyhow, I'll study the notmuch code and see if there are some useful >> bits in my code that might make sense there, e.g., various dir scanning >> optimizations, see [2]. Carl> That sounds great. It's also good to have people with experience in Carl> this area join and help out. I'll look forward to any ideas or other Carl> contributions you will have. Thanks for the nice words! A small question: it seems that notmuch is avoiding the use of GLib directly (of course, it depend on it anyway through GMime); is this because of OOM-handling? It'd be nice if GLib could be used, it would make some things quite a bit easier. Best wishes, Dirk. -- Dirk-Jan C. Binnema Helsinki, Finland e:djcb@djcbsoftware.nl w:www.djcbsoftware.nl pgp: D09C E664 897D 7D39 5047 A178 E96A C7A1 017D DA3C ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: interesting project! 2009-11-22 12:23 ` Dirk-Jan C. Binnema @ 2009-11-22 22:52 ` Carl Worth 2009-11-23 7:08 ` Dirk-Jan C. Binnema 0 siblings, 1 reply; 8+ messages in thread From: Carl Worth @ 2009-11-22 22:52 UTC (permalink / raw) To: djcb; +Cc: notmuch@notmuchmail.org On Sun, 22 Nov 2009 14:23:10 +0200, Dirk-Jan C. Binnema <djcb.bulk@gmail.com> wrote: > A small question: it seems that notmuch is avoiding the use of GLib directly > (of course, it depend on it anyway through GMime); is this because of > OOM-handling? It'd be nice if GLib could be used, it would make some things > quite a bit easier. It's true that I don't like the OOM handling in glib. I also think that glib tries to be too many different things at the same time. And finally, having some talloc-friendly data structures (like a hash-table) would be really nice. In the meantime, as you say, we're already linking with glib because of GMime, so there's really no reason not to call functions that are there and that do what we want. What kinds of things were you thinking of that would be easier with glib? -Carl ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: interesting project! 2009-11-22 22:52 ` Carl Worth @ 2009-11-23 7:08 ` Dirk-Jan C. Binnema 2009-11-24 2:57 ` Carl Worth 0 siblings, 1 reply; 8+ messages in thread From: Dirk-Jan C. Binnema @ 2009-11-23 7:08 UTC (permalink / raw) To: Carl Worth; +Cc: notmuch@notmuchmail.org Hi Carl, >>>>> "Carl" == Carl Worth <cworth@cworth.org> writes: Carl> On Sun, 22 Nov 2009 14:23:10 +0200, Dirk-Jan C. Binnema Carl> <djcb.bulk@gmail.com> wrote: >> A small question: it seems that notmuch is avoiding the use of GLib directly >> (of course, it depend on it anyway through GMime); is this because of >> OOM-handling? It'd be nice if GLib could be used, it would make some things >> quite a bit easier. Carl> It's true that I don't like the OOM handling in glib. I also think that Carl> glib tries to be too many different things at the same time. And Carl> finally, having some talloc-friendly data structures (like a hash-table) Carl> would be really nice. Well, the counter point to the OOM-problems is that is that in many programs, the 'malloc returns NULL'-case is often not very well tested (because it's rather hard to test), and that at least on Linux, it's unlikely that malloc ever does return NULL. Lennart Poettering wrote this up in some more detail[1]. Of course, the requirements for notmuch may be a bit different and I definitely don't want to suggest any radical change here after only finding out about notmuch a few days ago :) (BTW, there is a hashtable implementation in libc, (hcreate(3) etc.). Is that one not sufficiently 'talloc-friendly'? It's not very user-friendly, but that's another matter) Carl> In the meantime, as you say, we're already linking with glib because of Carl> GMime, so there's really no reason not to call functions that are there Carl> and that do what we want. What kinds of things were you thinking of that Carl> would be easier with glib? I could imagine the string functions could replace the ones in talloc. There are many more string functions, e.g., for handling file names / paths, which are quite useful. Then there are wrappers for gcc'isms (G_UNLIKELY etc.) that would make the ones in notmuch unneeded, and a lot of compatibility things like G_DIR_SEPARATOR. And the datastructures (GSlice/GList/GHashtable) are nice. The UTF8 functionality might come in handy. Anyway, I was just curious, people have survived without GLib before, and if you dislike the OOM-strategy, it's a bit of a no-no of course. Best wishes, Dirk. [1] http://article.gmane.org/gmane.comp.audio.jackit/19998 -- Dirk-Jan C. Binnema Helsinki, Finland e:djcb@djcbsoftware.nl w:www.djcbsoftware.nl pgp: D09C E664 897D 7D39 5047 A178 E96A C7A1 017D DA3C ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: interesting project! 2009-11-23 7:08 ` Dirk-Jan C. Binnema @ 2009-11-24 2:57 ` Carl Worth 2009-11-24 14:16 ` Dirk-Jan C. Binnema 0 siblings, 1 reply; 8+ messages in thread From: Carl Worth @ 2009-11-24 2:57 UTC (permalink / raw) To: djcb; +Cc: notmuch@notmuchmail.org On Mon, 23 Nov 2009 09:08:34 +0200, Dirk-Jan C. Binnema <djcb.bulk@gmail.com> wrote: > Well, the counter point to the OOM-problems is that is that in many programs, > the 'malloc returns NULL'-case is often not very well tested (because it's > rather hard to test), and that at least on Linux, it's unlikely that malloc > ever does return NULL. Lennart Poettering wrote this up in some more > detail[1]. Of course, the requirements for notmuch may be a bit different and > I definitely don't want to suggest any radical change here after only finding > out about notmuch a few days ago :) No problem. I'm glad to discuss things. That's how I learn and find out whether my decisions are sound or not. :-) I agree that trying to support OOM doesn't make sense without testing. But that's why I want to test notmuch with memory-fault injection. We've been doing this with the cairo library with good success for a while. As for "unlikely that malloc ever returns NULL", that's simply a system-configuration away (just turn off overcommit). And I can imagine notmuch being used in lots of places, (netbooks, web servers, etc.), so I do want to make it as robust as possible. > (BTW, there is a hashtable implementation in libc, (hcreate(3) etc.). Is that > one not sufficiently 'talloc-friendly'? It's not very user-friendly, but > that's another matter) Thanks for mentioning the hash table. The hash table is one of the few things that I *am* using from glib right now in notmuch. It's got a couple of bizarre things about it: 1. The simpler-appearing g_hash_table_new function is useless for common cases like hashing strings. It will just leak memory. So g_hash_table_new_full is the only one worth using. 2. There are two lookup functions, g_hash_table_lookup, and g_hash_table_lookup_extended. And a program like notmuch really does use the hash table in two ways. In the simpler case, we're using the hash to simply implement a set, (such as avoiding duplicates in a set of tags). In the more complex case, we're associating actual objects with the keys, (such as when linking messages together into a tree for the thread). So, it might make sense if a hash-table interface supported these two modes well. What's bizarre about GHashTable though, is that in the "just a set" case, we only use NULL as the value when inserting. And distinguish "previously inserted with NULL" from "never inserted" is the one thing that g_hash_table_lookup can't do. So I've only found that I could ever use g_hash_table_lookup_extended, (and pass a pair of NULLs for the return arguments I don't need). Fortunately, Eric Anholt spent *his* flight home coding up an nice implementation of an open-addressed hash designed specifically to be a tiny little implementation suitable for copying directly into project. He's testing it with Mesa now, and I might pull it into notmuch later. > I could imagine the string functions could replace the ones in talloc. There > are many more string functions, e.g., for handling file names / paths, which > are quite useful. Then there are wrappers for gcc'isms (G_UNLIKELY etc.) that > would make the ones in notmuch unneeded, and a lot of compatibility things > like G_DIR_SEPARATOR. And the datastructures (GSlice/GList/GHashtable) are > nice. The UTF8 functionality might come in handy. Yes. The portability stuff I think is actually interesting. I've thought it really might make sense to have something that gave you *just* that, (without a main loop, an object system, several memory allocators or pieces for making your own memory allocators, etc). I haven't had a chance to look into gnulib yet, but I'd like to. As for a list, I almost always find it cleaner to be able to just have my own list data structures, (to avoid casts, etc.). And for a hash table, I'm interested in what Eric's doing. I'm really not prejudiced against using code that's already been written, (in spite of what might appear I don't feel the need to re-solve every problem that's already been solved). But I have long thought that we could have better support for a "C programmers toolkit" of commonly needed things than we have before. I definitely like the idea of having tiny, focused libraries that do one thing and do it well, (and maybe even some things so tiny that they are actually designed to be copied into the application---like with gnulib and with Eric's new hash table). > Anyway, I was just curious, people have survived without GLib before, and if > you dislike the OOM-strategy, it's a bit of a no-no of course. Thanks for understanding. :-) And I enjoy the conversation, -Carl ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: interesting project! 2009-11-24 2:57 ` Carl Worth @ 2009-11-24 14:16 ` Dirk-Jan C. Binnema 0 siblings, 0 replies; 8+ messages in thread From: Dirk-Jan C. Binnema @ 2009-11-24 14:16 UTC (permalink / raw) To: Carl Worth; +Cc: notmuch@notmuchmail.org Hi Carl, >>>>> "Carl" == Carl Worth <cworth@cworth.org> writes: Carl> I agree that trying to support OOM doesn't make sense without Carl> testing. But that's why I want to test notmuch with memory-fault Carl> injection. We've been doing this with the cairo library with good Carl> success for a while. Carl> As for "unlikely that malloc ever returns NULL", that's simply a Carl> system-configuration away (just turn off overcommit). And I can imagine Carl> notmuch being used in lots of places, (netbooks, web servers, etc.), so Carl> I do want to make it as robust as possible. That is a very laudable goal! But it's also quite hard to achieve, considering that both GMime and Xapian may have some different ideas about that. And at least in the current code, I see fprintfs in 'malloc-returns-NULL'-cases -- but fprintf itself will probably allocate memory too. Also, at least now, the bad alloc exceptions for C++ are not caught. Of course, that can be changed, but it's just to show that these things are hard to get right. Carl> Thanks for mentioning the hash table. The hash table is one of the few Carl> things that I *am* using from glib right now in notmuch. It's got a Carl> couple of bizarre things about it: Carl> 1. The simpler-appearing g_hash_table_new function is useless Carl> for common cases like hashing strings. It will just leak Carl> memory. So g_hash_table_new_full is the only one worth using. Hmmm, I never noticed that behavior. Tf you are using dynamically allocated strings, GHashTable won't free them for you -- but I can really see how it could (given that it takes generic pointers), so you have to free those yourself. But any memleaks beyond that? Carl> 2. There are two lookup functions, g_hash_table_lookup, and Carl> g_hash_table_lookup_extended. Carl> So, it might make sense if a hash-table interface supported Carl> these two modes well. What's bizarre about GHashTable though, Carl> is that in the "just a set" case, we only use NULL as the Carl> value when inserting. And distinguish "previously inserted Carl> with NULL" from "never inserted" is the one thing that Carl> g_hash_table_lookup can't do. So I've only found that I could Carl> ever use g_hash_table_lookup_extended, (and pass a pair of Carl> NULLs for the return arguments I don't need). Hmmn, well in I found that returning NULL for 'not set' works in many cases, and it makes it quite easy for that. If you need to distinguish between NULL and 'not set', you can use either the _extended version as you mention, or use some special NOT_SET static ptr you can compare with (and handle it appropriately in the destructor). Carl> I definitely like the idea of having tiny, focused libraries that do Carl> one thing and do it well, (and maybe even some things so tiny that Carl> they are actually designed to be copied into the application---like Carl> with gnulib and with Eric's new hash table). Ok; glib fills the role pretty well for me, and I don't really pay for the parts that I don't use. But tastes differ, no problem ;-) Carl> Thanks for understanding. :-) Carl> And I enjoy the conversation, Same here :) Best wishes, Dirk. -- Dirk-Jan C. Binnema Helsinki, Finland e:djcb@djcbsoftware.nl w:www.djcbsoftware.nl pgp: D09C E664 897D 7D39 5047 A178 E96A C7A1 017D DA3C ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-11-24 16:16 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-11-21 9:01 interesting project! Dirk-Jan C. Binnema 2009-11-21 12:10 ` Carl Worth 2009-11-21 16:43 ` Jameson Greaf Rollins 2009-11-22 12:23 ` Dirk-Jan C. Binnema 2009-11-22 22:52 ` Carl Worth 2009-11-23 7:08 ` Dirk-Jan C. Binnema 2009-11-24 2:57 ` Carl Worth 2009-11-24 14:16 ` Dirk-Jan C. Binnema
Code repositories for project(s) associated with this public inbox https://yhetil.org/notmuch.git/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).