* Memory management issue in notmuch-haskell bindings
@ 2011-08-16 16:32 Ben Gamari
2011-08-28 21:27 ` Bug in GC's ordering of ForeignPtr finalization? Ben Gamari
0 siblings, 1 reply; 16+ messages in thread
From: Ben Gamari @ 2011-08-16 16:32 UTC (permalink / raw)
To: Bart Massey, haskell-cafe, notmuch
It seems that the notmuch-haskell bindings (version 0.2.2 built against
notmuch from git master; passes notmuch-test) aren't dealing with memory
management properly. In particular, the attached test code[1] causes
talloc to abort. Unfortunately, while the issue is consistently
reproducible, it only occurs with some queries (see source[1]). I have
been unable to establish the exact criterion for failure.
It seems that the crash is caused by an invalid access to a freed Query
object while freeing a Messages object (see Valgrind trace[3]). I've
taken a brief look at the bindings themselves but, being only minimally
familiar with the FFI, there's nothing obviously wrong (the finalizers
passed to newForeignPtr look sane). I was under the impression that
talloc was reference counted, so the Query object shouldn't have been
freed unless if there was still a Messages object holding a
reference. Any idea what might have gone wrong here? Thanks!
Cheers,
- Ben
[1] Test case,
import Data.List
import Control.Monad
import System.Environment
import Foreign.Notmuch
dbpath = "/home/ben/.mail"
getAddresses :: Database -> String -> IO [String]
getAddresses db q = do
query <- queryCreate db q
msgs <- queryMessages query
addrs <- mapM (flip messageGetHeader $ "From") msgs
return addrs
main = do
db <- databaseOpen dbpath DatabaseModeReadOnly
--addrs2 <- getAddresses db "tag:haskell" -- This succeeds
addrs3 <- getAddresses db "to:dietz" -- This fails
--print addrs2
--print addrs3
databaseClose db
[2] Crashed session and backtrace,
[1217 ben@ben-laptop ~] $ ghc test.hs -auto-all -rtsopts -prof && gdb ./test
GNU gdb (Ubuntu/Linaro 7.2-1ubuntu11) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/ben/test...(no debugging symbols found)...done.
(gdb) run
Starting program: /home/ben/test
[Thread debugging using libthread_db enabled]
Program received signal SIGABRT, Aborted.
0x00007ffff5979d05 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
in ../nptl/sysdeps/unix/sysv/linux/raise.c
(gdb) bt
#0 0x00007ffff5979d05 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007ffff597dab6 in abort () at abort.c:92
#2 0x00007ffff6de038c in talloc_abort (reason=0x7ffff6de56e8 "Bad talloc magic value - access after free") at ../talloc.c:210
#3 0x00007ffff6de0271 in talloc_abort_access_after_free (ptr=0x769190, location=0x7ffff7bd4e04 "lib/messages.c:142") at ../talloc.c:229
#4 talloc_chunk_from_ptr (ptr=0x769190, location=0x7ffff7bd4e04 "lib/messages.c:142") at ../talloc.c:250
#5 _talloc_free (ptr=0x769190, location=0x7ffff7bd4e04 "lib/messages.c:142") at ../talloc.c:1164
#6 0x00007ffff7bc7e65 in notmuch_messages_destroy (messages=0x769190) at lib/messages.c:142
#7 0x00000000004de1c9 in scheduleFinalizers ()
#8 0x00000000004e013d in GarbageCollect ()
#9 0x00000000004d9e40 in scheduleDoGC.clone.18 ()
#10 0x00000000004db0e0 in exitScheduler ()
#11 0x00000000004d9066 in hs_exit_ ()
#12 0x00000000004d940a in shutdownHaskellAndExit ()
#13 0x00000000004d8a91 in real_main ()
#14 0x00000000004d8ade in hs_main ()
#15 0x00007ffff5964eff in __libc_start_main (main=0x408ed0 <main>, argc=1, ubp_av=0x7fffffffe4f8, init=<value optimized out>, fini=<value optimized out>,
rtld_fini=<value optimized out>, stack_end=0x7fffffffe4e8) at libc-start.c:226
#16 0x0000000000407791 in _start ()
(gdb)
[3] Valgrind output,
==25241== Memcheck, a memory error detector
==25241== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==25241== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==25241== Command: ./test
==25241==
==25241== Conditional jump or move depends on uninitialised value(s)
==25241== at 0x52BB510: inflateReset2 (in /lib/x86_64-linux-gnu/libz.so.1.2.3.4)
==25241== by 0x52BB605: inflateInit2_ (in /lib/x86_64-linux-gnu/libz.so.1.2.3.4)
==25241== by 0x5F211BE: ChertTable::lazy_alloc_inflate_zstream() const (chert_table.cc:1672)
==25241== by 0x5F23B06: ChertTable::read_tag(Cursor*, std::string*, bool) const (chert_table.cc:1264)
==25241== by 0x5F260F9: ChertTable::get_exact_entry(std::string const&, std::string&) const (chert_table.cc:1210)
==25241== by 0x5F26DE2: ChertTermList::ChertTermList(Xapian::Internal::RefCntPtr<ChertDatabase const>, unsigned int) (chert_termlist.cc:44)
==25241== by 0x5EFF2E5: ChertDatabase::open_term_list(unsigned int) const (chert_database.cc:891)
==25241== by 0x5E7E7FB: Xapian::Document::termlist_begin() const (omdocument.cc:176)
==25241== by 0x4E41822: _notmuch_message_ensure_metadata(_notmuch_message*) (message.cc:309)
==25241== by 0x4E429DC: _notmuch_message_ensure_filename_list(_notmuch_message*) (message.cc:632)
==25241== by 0x4E42C37: notmuch_message_get_filename (message.cc:698)
==25241== by 0x4E41CA7: _notmuch_message_ensure_message_file(_notmuch_message*) (message.cc:403)
==25241==
==25241== Invalid read of size 4
==25241== at 0x5C25DED: _talloc_free (talloc.c:242)
==25241== by 0x4E39E64: notmuch_messages_destroy (messages.c:142)
==25241== by 0x4DE1C8: scheduleFinalizers (in /home/ben/test)
==25241== by 0x4E013C: GarbageCollect (in /home/ben/test)
==25241== by 0x4D9E3F: scheduleDoGC.clone.18 (in /home/ben/test)
==25241== by 0x4DB0DF: exitScheduler (in /home/ben/test)
==25241== by 0x4D9065: hs_exit_ (in /home/ben/test)
==25241== by 0x4D9409: shutdownHaskellAndExit (in /home/ben/test)
==25241== by 0x4D8A90: real_main (in /home/ben/test)
==25241== by 0x4D8ADD: hs_main (in /home/ben/test)
==25241== by 0x6F4FEFE: (below main) (libc-start.c:226)
==25241== Address 0x8324e90 is 64 bytes inside a block of size 136 free'd
==25241== at 0x4C282ED: free (vg_replace_malloc.c:366)
==25241== by 0x5C29884: _talloc_free_internal (talloc.c:714)
==25241== by 0x5C25F22: _talloc_free (talloc.c:667)
==25241== by 0x4E452BF: notmuch_query_destroy (query.cc:358)
==25241== by 0x4DE1C8: scheduleFinalizers (in /home/ben/test)
==25241== by 0x4E013C: GarbageCollect (in /home/ben/test)
==25241== by 0x4D9E3F: scheduleDoGC.clone.18 (in /home/ben/test)
==25241== by 0x4DB0DF: exitScheduler (in /home/ben/test)
==25241== by 0x4D9065: hs_exit_ (in /home/ben/test)
==25241== by 0x4D9409: shutdownHaskellAndExit (in /home/ben/test)
==25241== by 0x4D8A90: real_main (in /home/ben/test)
==25241== by 0x4D8ADD: hs_main (in /home/ben/test)
==25241==
==25241== Invalid read of size 8
==25241== at 0x5C26251: _talloc_free (talloc.c:249)
==25241== by 0x4E39E64: notmuch_messages_destroy (messages.c:142)
==25241== by 0x4DE1C8: scheduleFinalizers (in /home/ben/test)
==25241== by 0x4E013C: GarbageCollect (in /home/ben/test)
==25241== by 0x4D9E3F: scheduleDoGC.clone.18 (in /home/ben/test)
==25241== by 0x4DB0DF: exitScheduler (in /home/ben/test)
==25241== by 0x4D9065: hs_exit_ (in /home/ben/test)
==25241== by 0x4D9409: shutdownHaskellAndExit (in /home/ben/test)
==25241== by 0x4D8A90: real_main (in /home/ben/test)
==25241== by 0x4D8ADD: hs_main (in /home/ben/test)
==25241== by 0x6F4FEFE: (below main) (libc-start.c:226)
==25241== Address 0x8324e80 is 48 bytes inside a block of size 136 free'd
==25241== at 0x4C282ED: free (vg_replace_malloc.c:366)
==25241== by 0x5C29884: _talloc_free_internal (talloc.c:714)
==25241== by 0x5C25F22: _talloc_free (talloc.c:667)
==25241== by 0x4E452BF: notmuch_query_destroy (query.cc:358)
==25241== by 0x4DE1C8: scheduleFinalizers (in /home/ben/test)
==25241== by 0x4E013C: GarbageCollect (in /home/ben/test)
==25241== by 0x4D9E3F: scheduleDoGC.clone.18 (in /home/ben/test)
==25241== by 0x4DB0DF: exitScheduler (in /home/ben/test)
==25241== by 0x4D9065: hs_exit_ (in /home/ben/test)
==25241== by 0x4D9409: shutdownHaskellAndExit (in /home/ben/test)
==25241== by 0x4D8A90: real_main (in /home/ben/test)
==25241== by 0x4D8ADD: hs_main (in /home/ben/test)
==25241==
==25241==
==25241== HEAP SUMMARY:
==25241== in use at exit: 192,328 bytes in 555 blocks
==25241== total heap usage: 19,852 allocs, 19,297 frees, 35,375,726 bytes allocated
==25241==
==25241== LEAK SUMMARY:
==25241== definitely lost: 0 bytes in 0 blocks
==25241== indirectly lost: 0 bytes in 0 blocks
==25241== possibly lost: 27,799 bytes in 292 blocks
==25241== still reachable: 164,529 bytes in 263 blocks
==25241== suppressed: 0 bytes in 0 blocks
==25241== Rerun with --leak-check=full to see details of leaked memory
==25241==
==25241== For counts of detected and suppressed errors, rerun with: -v
==25241== Use --track-origins=yes to see where uninitialised values come from
==25241== ERROR SUMMARY: 4 errors from 3 contexts (suppressed: 4 from 4)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Bug in GC's ordering of ForeignPtr finalization?
2011-08-16 16:32 Memory management issue in notmuch-haskell bindings Ben Gamari
@ 2011-08-28 21:27 ` Ben Gamari
2011-08-29 3:26 ` Antoine Latter
[not found] ` <20110829183010.GA2605@24f89f8c-e6a1-4e75-85ee-bb8a3743bb9f>
0 siblings, 2 replies; 16+ messages in thread
From: Ben Gamari @ 2011-08-28 21:27 UTC (permalink / raw)
To: Bart Massey, haskell-cafe, glasgow-haskell-users; +Cc: notmuch
On Tue, 16 Aug 2011 12:32:13 -0400, Ben Gamari <bgamari.foss@gmail.com> wrote:
> It seems that the notmuch-haskell bindings (version 0.2.2 built against
> notmuch from git master; passes notmuch-test) aren't dealing with memory
> management properly. In particular, the attached test code[1] causes
> talloc to abort. Unfortunately, while the issue is consistently
> reproducible, it only occurs with some queries (see source[1]). I have
> been unable to establish the exact criterion for failure.
>
> It seems that the crash is caused by an invalid access to a freed Query
> object while freeing a Messages object (see Valgrind trace[3]). I've
> taken a brief look at the bindings themselves but, being only minimally
> familiar with the FFI, there's nothing obviously wrong (the finalizers
> passed to newForeignPtr look sane). I was under the impression that
> talloc was reference counted, so the Query object shouldn't have been
> freed unless if there was still a Messages object holding a
> reference. Any idea what might have gone wrong here? Thanks!
>
After looking into this issue in a bit more depth, I'm even more
confused. In fact, I would not be surprised if I have stumbled into a
bug in the GC. It seems that the notmuch-haskell bindings follow the
example of the python bindings in that child objects keep references to
their parents to prevent the garbage collector from releasing the
parent, which would in turn cause talloc to free the child objects,
resulting in odd behavior when the child objects were next accessed. For
instance, the Query and Messages objects are defined as follows,
type MessagesPtr = ForeignPtr S__notmuch_messages
type MessagePtr = ForeignPtr S__notmuch_message
newtype Query = Query (ForeignPtr S__notmuch_query)
data MessagesRef = QueryMessages { qmpp :: Query, msp :: MessagesPtr }
| ThreadMessages { tmpp :: Thread, msp :: MessagesPtr }
| MessageMessages { mmspp :: Message, msp :: MessagesPtr }
data Message = MessagesMessage { msmpp :: MessagesRef, mp :: MessagePtr }
| Message { mp :: MessagePtr }
type Messages = [Message]
As seen in the Valgrind dump given in my previous message, it seems that
the Query object is being freed before the Messages object. Since the
Messages object is a child of the Query object, this fails.
In my case, I'm calling queryMessages which begins by issuing a given
notmuch Query, resulting in a MessagesPtr. This is then packaged into a
QueryMessages object which is then passed off to
unpackMessages. unpackMessages iterates over this collection, creating
MessagesMessage objects which themselves refer to the QueryMessages
object. Finally, these MessagesMessage objects are packed into a list,
resulting in a Messages object. Thus we have the following chain of
references,
MessagesMessage
|
| msmpp
\/
QueryMessages
|
| qmpp
\/
Query
As we can see, each MessagesMessage object in the Messages list
resulting from queryMessages holds a reference to the Query object from
which it originated. For this reason, I fail to see how it is possible
that the RTS would attempt to free the Query before freeing the
MessagesPtr. Did I miss something in my analysis? Are there tools for
debugging issues such as this? Perhaps this is a bug in the GC?
Any help at all would be greatly appreciated.
Cheers,
- Ben
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Bug in GC's ordering of ForeignPtr finalization?
2011-08-28 21:27 ` Bug in GC's ordering of ForeignPtr finalization? Ben Gamari
@ 2011-08-29 3:26 ` Antoine Latter
2011-08-29 3:47 ` Ben Gamari
[not found] ` <20110829183010.GA2605@24f89f8c-e6a1-4e75-85ee-bb8a3743bb9f>
1 sibling, 1 reply; 16+ messages in thread
From: Antoine Latter @ 2011-08-29 3:26 UTC (permalink / raw)
To: Ben Gamari; +Cc: Bart Massey, glasgow-haskell-users, notmuch, haskell-cafe
On Sun, Aug 28, 2011 at 4:27 PM, Ben Gamari <bgamari.foss@gmail.com> wrote:
> On Tue, 16 Aug 2011 12:32:13 -0400, Ben Gamari <bgamari.foss@gmail.com> wrote:
>> It seems that the notmuch-haskell bindings (version 0.2.2 built against
>> notmuch from git master; passes notmuch-test) aren't dealing with memory
>> management properly. In particular, the attached test code[1] causes
>> talloc to abort. Unfortunately, while the issue is consistently
>> reproducible, it only occurs with some queries (see source[1]). I have
>> been unable to establish the exact criterion for failure.
>>
>> It seems that the crash is caused by an invalid access to a freed Query
>> object while freeing a Messages object (see Valgrind trace[3]). I've
>> taken a brief look at the bindings themselves but, being only minimally
>> familiar with the FFI, there's nothing obviously wrong (the finalizers
>> passed to newForeignPtr look sane). I was under the impression that
>> talloc was reference counted, so the Query object shouldn't have been
>> freed unless if there was still a Messages object holding a
>> reference. Any idea what might have gone wrong here? Thanks!
>>
> After looking into this issue in a bit more depth, I'm even more
> confused. In fact, I would not be surprised if I have stumbled into a
> bug in the GC. It seems that the notmuch-haskell bindings follow the
> example of the python bindings in that child objects keep references to
> their parents to prevent the garbage collector from releasing the
> parent, which would in turn cause talloc to free the child objects,
> resulting in odd behavior when the child objects were next accessed. For
> instance, the Query and Messages objects are defined as follows,
>
> type MessagesPtr = ForeignPtr S__notmuch_messages
> type MessagePtr = ForeignPtr S__notmuch_message
> newtype Query = Query (ForeignPtr S__notmuch_query)
> data MessagesRef = QueryMessages { qmpp :: Query, msp :: MessagesPtr }
> | ThreadMessages { tmpp :: Thread, msp :: MessagesPtr }
> | MessageMessages { mmspp :: Message, msp :: MessagesPtr }
> data Message = MessagesMessage { msmpp :: MessagesRef, mp :: MessagePtr }
> | Message { mp :: MessagePtr }
> type Messages = [Message]
>
One problem you might be running in to is that the optimization passes
can notice that a function isn't using all of its arguments, and then
it won't pass them. These even applies if the arguments are bound
together in a record type.
So if you have a record type:
> data QueryResult = QR {qrQueryPtr :: ForeignPtr (), qrResultPointer :: Ptr ()}
and a function:
> processQueryResult :: QueryResult -> IO (...)
If the function doesn't use the 'qrQueryPointer' part of the record,
the compiler may not even pass it in. This might run the finalizer for
the foreign pointer earlier than you expect. If the result pointer is
a part of the query foreign pointer, you're in trouble.
I'm not sure if this is what's happening, but it sounds like it could be.
If this is the case you might want to build some helper functions
using the function 'touchForeignPtr', which does nothing other than
make it look like the foreign pointer is still in use. In my example
it might be something like:
> withQueryResultPtr :: QueryResult -> (Ptr QueryResult -> IO a) -> IO a
> withQueryResultPtr qr k = do
> x <- k (qrQueryPtr qr)
> touchForeignPtr (qrResultPointer qr)
> return x
Antoine
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Bug in GC's ordering of ForeignPtr finalization?
2011-08-29 3:26 ` Antoine Latter
@ 2011-08-29 3:47 ` Ben Gamari
2011-08-29 4:03 ` Antoine Latter
0 siblings, 1 reply; 16+ messages in thread
From: Ben Gamari @ 2011-08-29 3:47 UTC (permalink / raw)
To: Antoine Latter; +Cc: Bart Massey, glasgow-haskell-users, notmuch, haskell-cafe
On Sun, 28 Aug 2011 22:26:05 -0500, Antoine Latter <aslatter@gmail.com> wrote:
> One problem you might be running in to is that the optimization passes
> can notice that a function isn't using all of its arguments, and then
> it won't pass them. These even applies if the arguments are bound
> together in a record type.
>
In this case I wouldn't be able to reproduce the problem with
optimization disabled, no? Unfortunately, this is not the case; the
problem persists even with -O0.
- Ben
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Bug in GC's ordering of ForeignPtr finalization?
2011-08-29 3:47 ` Ben Gamari
@ 2011-08-29 4:03 ` Antoine Latter
0 siblings, 0 replies; 16+ messages in thread
From: Antoine Latter @ 2011-08-29 4:03 UTC (permalink / raw)
To: Ben Gamari; +Cc: Bart Massey, glasgow-haskell-users, notmuch, haskell-cafe
On Sun, Aug 28, 2011 at 10:47 PM, Ben Gamari <bgamari.foss@gmail.com> wrote:
> On Sun, 28 Aug 2011 22:26:05 -0500, Antoine Latter <aslatter@gmail.com> wrote:
>> One problem you might be running in to is that the optimization passes
>> can notice that a function isn't using all of its arguments, and then
>> it won't pass them. These even applies if the arguments are bound
>> together in a record type.
>>
> In this case I wouldn't be able to reproduce the problem with
> optimization disabled, no? Unfortunately, this is not the case; the
> problem persists even with -O0.
>
Perhaps? I don't know the details about how the GC decides when
something is reachable. The scenario I described (which sounds similar
to yours?) is only safe in Haskell when using functions like
touchForeignPtr.
Antoine
^ permalink raw reply [flat|nested] 16+ messages in thread
* Memory management practices
[not found] ` <20110829183010.GA2605@24f89f8c-e6a1-4e75-85ee-bb8a3743bb9f>
@ 2011-08-29 20:30 ` Ben Gamari
2011-09-07 20:36 ` Ben Gamari
0 siblings, 1 reply; 16+ messages in thread
From: Ben Gamari @ 2011-08-29 20:30 UTC (permalink / raw)
To: notmuch; +Cc: Bertram Felgenhauer, Bart Massey
Hey all,
Over the last few weeks I've been trying to fix some brokeness in the
notmuch-haskell bindings with respect to memory management.
In discussion included below, I describe the issue as I approached it.
In short, it appears that GHC's garbage collector is quite liberal with
the order in which it frees resources (which is apparently permitted by
Haskell's FFI specification), allowing, for instance, a Messages object to be
freed before a Query object despite my attempts to hold the proper references in
the Haskell wrapper objects to keep the Query reachable.
In general, it seems to me that memory management in notmuch bindings is
a little bit harder than it needs to me due to the decision not to
talloc_ref parent objects when a new child object is created. This means
that a bindings author needs to recreate the ownership tree in their
binding, a task which is fairly easily done (except in the case of
Haskell due to the weak GC finalization guarantees) but seems quite
unnecessary. Is there a reason this decision was made? Would a patch be
accepted adding talloc_ref'ing parents in those functions creating
children and talloc_frees in *_destroys?
Cheers,
- Ben
On Mon, 29 Aug 2011 20:30:10 +0200, Bertram Felgenhauer <bertram.felgenhauer@googlemail.com> wrote:
> Dear Ben,
>
> Ben Gamari wrote:
> > After looking into this issue in a bit more depth, I'm even more
> > confused. In fact, I would not be surprised if I have stumbled into a
> > bug in the GC.
> [...]
> > MessagesMessage
> > |
> > | msmpp
> > \/
> > QueryMessages
> > |
> > | qmpp
> > \/
> > Query
> >
> > As we can see, each MessagesMessage object in the Messages list
> > resulting from queryMessages holds a reference to the Query object from
> > which it originated. For this reason, I fail to see how it is possible
> > that the RTS would attempt to free the Query before freeing the
> > MessagesPtr.
>
> When a garbage collection is performed, the RTS determines which heap
> objects are still reachable. The rest is then freed _simultaneously_,
> and the corresponding finalizers are run in some random order.
>
> So assuming the application holds a reference to the MessagesMessage
> object for a while and then drops it, the GC will detect unreachability
> of all the three objects at the same time and in the end, the finalizer
> for MessagesMessage may be run before that of Query.
>
> So I think this is not a bug.
>
> To solve this problem properly, libnotmuch should stop imposing order
> constraints on when objects are freed - this would mean tracking
> references using talloc_ref and talloc_unlink instead of
> talloc_free inside the library.
>
> For a bindings author who does not want to touch the library, the best
> idea I have is to add a layer with the sole purpose of tracking those
> implicit references.
>
> Best regards,
>
> Bertram
>
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users@haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Memory management practices
2011-08-29 20:30 ` Memory management practices Ben Gamari
@ 2011-09-07 20:36 ` Ben Gamari
2011-09-08 2:48 ` Austin Clements
0 siblings, 1 reply; 16+ messages in thread
From: Ben Gamari @ 2011-09-07 20:36 UTC (permalink / raw)
To: notmuch, Carl Worth; +Cc: Bertram Felgenhauer, Bart Massey
On Mon, 29 Aug 2011 16:30:57 -0400, Ben Gamari <bgamari.foss@gmail.com> wrote:
> [SNIP]
>
> In general, it seems to me that memory management in notmuch bindings is
> a little bit harder than it needs to me due to the decision not to
> talloc_ref parent objects when a new child object is created. This means
> that a bindings author needs to recreate the ownership tree in their
> binding, a task which is fairly easily done (except in the case of
> Haskell due to the weak GC finalization guarantees) but seems quite
> unnecessary. Is there a reason this decision was made? Would a patch be
> accepted adding talloc_ref'ing parents in those functions creating
> children and talloc_frees in *_destroys?
>
Any opinions concerning whether this is an acceptable idea? I wouldn't
mind putting together a patch-set, but I'd rather not waste my time if
the set would ultimately be rejected due to some technical objection I
have yet to think of.
Cheers,
- Ben
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Memory management practices
2011-09-07 20:36 ` Ben Gamari
@ 2011-09-08 2:48 ` Austin Clements
2011-09-08 3:05 ` Austin Clements
0 siblings, 1 reply; 16+ messages in thread
From: Austin Clements @ 2011-09-08 2:48 UTC (permalink / raw)
To: Ben Gamari; +Cc: Bertram Felgenhauer, notmuch, Bart Massey
On Wed, Sep 7, 2011 at 4:36 PM, Ben Gamari <bgamari.foss@gmail.com> wrote:
> On Mon, 29 Aug 2011 16:30:57 -0400, Ben Gamari <bgamari.foss@gmail.com> wrote:
>> [SNIP]
>>
>> In general, it seems to me that memory management in notmuch bindings is
>> a little bit harder than it needs to me due to the decision not to
>> talloc_ref parent objects when a new child object is created. This means
>> that a bindings author needs to recreate the ownership tree in their
>> binding, a task which is fairly easily done (except in the case of
>> Haskell due to the weak GC finalization guarantees) but seems quite
>> unnecessary. Is there a reason this decision was made? Would a patch be
>> accepted adding talloc_ref'ing parents in those functions creating
>> children and talloc_frees in *_destroys?
>>
> Any opinions concerning whether this is an acceptable idea? I wouldn't
> mind putting together a patch-set, but I'd rather not waste my time if
> the set would ultimately be rejected due to some technical objection I
> have yet to think of.
>
> Cheers,
I've been meaning to look in to this in depth. (I still haven't, but
wanted to give you some reply.)
In general (though perhaps not always?), libnotmuch uses talloc() to
allocate children objects, which already implicitly creates a talloc
reference from the parent object to the child object. You've
certainly thought about this harder than I have, but it seems like the
bindings should simply create an additional talloc reference and
unlink that reference in the GC finalizer, so that the library-created
references would maintain the integrity of the data structures, while
the binding-created references would maintain their extent. Hence, I
don't see why simultaneous GC would cause problems with talloc, or why
the bindings would have to recreate the reference tree.
I'm a bit confused by the reference tree you drew. The references in
the underlying libnotmuch objects are the other way around.
notmuch_query_t holds a talloc reference to every notmuch_messages_t
it produces, not the other way around. (Though, in reality, these
objects are completely independent of each other. This reference
exists purely as a convenience for C programmers to make it easy to
clean up all notmuch_messages_t objects when you destroy the
notmuch_query_t. This is probably a poor interface; it may be better
to take an explicit talloc context, which could be the query object,
or could be something else. In fact, I would expect this to cause
memory *leaks* in bindings if it were not handled carefully, rather
than premature GC.)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Memory management practices
2011-09-08 2:48 ` Austin Clements
@ 2011-09-08 3:05 ` Austin Clements
2011-09-08 13:50 ` Sebastian Spaeth
0 siblings, 1 reply; 16+ messages in thread
From: Austin Clements @ 2011-09-08 3:05 UTC (permalink / raw)
To: Ben Gamari; +Cc: Bertram Felgenhauer, notmuch, Bart Massey
On Wed, Sep 7, 2011 at 10:48 PM, Austin Clements <amdragon@mit.edu> wrote:
> *snip*
>
> I'm a bit confused by the reference tree you drew. The references in
> the underlying libnotmuch objects are the other way around.
> notmuch_query_t holds a talloc reference to every notmuch_messages_t
> it produces, not the other way around.
Sorry, I went back and re-read your earlier messages and now I see why
your references were the way they were. I stand by the rest of my
previous message though. I think the technique used in the Python
bindings only works because Python's GC happens to finalize in a
particular order (though I doubt that's guaranteed, and could easily
not be the case if you stray into the realm of its cycle collector).
In general, it seems like approach is trying to recreate C-like memory
management and is fragile as a result, whereas talloc should, I think,
allow bindings to express their runtime's memory management rather
naturally.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Memory management practices
2011-09-08 3:05 ` Austin Clements
@ 2011-09-08 13:50 ` Sebastian Spaeth
2011-09-08 15:15 ` Austin Clements
0 siblings, 1 reply; 16+ messages in thread
From: Sebastian Spaeth @ 2011-09-08 13:50 UTC (permalink / raw)
To: Austin Clements, Ben Gamari; +Cc: Bertram Felgenhauer, notmuch, Bart Massey
[-- Attachment #1: Type: text/plain, Size: 1011 bytes --]
On Wed, 7 Sep 2011 23:05:19 -0400, Austin Clements <amdragon@mit.edu> wrote:
> Sorry, I went back and re-read your earlier messages and now I see why
> your references were the way they were. I stand by the rest of my
> previous message though. I think the technique used in the Python
> bindings only works because Python's GC happens to finalize in a
> particular order (though I doubt that's guaranteed, and could easily
> not be the case if you stray into the realm of its cycle collector).
> In general, it seems like approach is trying to recreate C-like memory
> management and is fragile as a result, whereas talloc should, I think,
> allow bindings to express their runtime's memory management rather
> naturally.
Mmmh? Why would the method in python be fragile? Each message object
holds a reference to its parent query object to keep it alive. Are you
saying cycle collectors could kill off the query object nonetheless?
(Assume that I know nothing of GCs which comes close to reality)
Sebastian
[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Memory management practices
2011-09-08 13:50 ` Sebastian Spaeth
@ 2011-09-08 15:15 ` Austin Clements
2011-09-09 9:27 ` Sebastian Spaeth
0 siblings, 1 reply; 16+ messages in thread
From: Austin Clements @ 2011-09-08 15:15 UTC (permalink / raw)
To: Sebastian Spaeth; +Cc: Bertram Felgenhauer, Bart Massey, notmuch
Quoth Sebastian Spaeth on Sep 08 at 3:50 pm:
> On Wed, 7 Sep 2011 23:05:19 -0400, Austin Clements <amdragon@mit.edu> wrote:
> > Sorry, I went back and re-read your earlier messages and now I see why
> > your references were the way they were. I stand by the rest of my
> > previous message though. I think the technique used in the Python
> > bindings only works because Python's GC happens to finalize in a
> > particular order (though I doubt that's guaranteed, and could easily
> > not be the case if you stray into the realm of its cycle collector).
> > In general, it seems like approach is trying to recreate C-like memory
> > management and is fragile as a result, whereas talloc should, I think,
> > allow bindings to express their runtime's memory management rather
> > naturally.
>
> Mmmh? Why would the method in python be fragile? Each message object
> holds a reference to its parent query object to keep it alive. Are you
> saying cycle collectors could kill off the query object nonetheless?
> (Assume that I know nothing of GCs which comes close to reality)
In general, a garbage collector can't make any guarantees about
finalization order. When a collection of objects all become
unreachable simultaneously (for example, the last reference to any
Messages object is dropped, causing the Query object and the Message
object to both become unreachable), the garbage collector *could*
finalize the Query first (causing talloc to free the
notmuch_messages_t) and then the Messages object (causing it to
crash). There's no guarantee in general because, in the presence of
cycles, there is no meaningful finalization order.
That being said, this approach might be (probably is) fine in Python
because Python has an unusual hybrid garbage collector. Long ago,
Python had only a reference-count based garbage collector. It now has
a cycle detector layered on top of that [1], but that only kicks in if
there are reference cycles. Assuming there aren't cycles in the
objects created by the Python bindings, you should get the
deterministic behavior of the reference counted collector. This isn't
the case in Haskell, which has a generational collector that makes no
guarantees about finalization order (guarantees it couldn't always
keep).
[1] One day a student came to Moon and said: "I understand how to make
a better garbage collector. We must keep a reference count of the
pointers to each cons."
Moon patiently told the student the following story:
"One day a student came to Moon and said: 'I understand how to
make a better garbage collector...
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Memory management practices
2011-09-08 15:15 ` Austin Clements
@ 2011-09-09 9:27 ` Sebastian Spaeth
2011-09-09 17:53 ` Austin Clements
0 siblings, 1 reply; 16+ messages in thread
From: Sebastian Spaeth @ 2011-09-09 9:27 UTC (permalink / raw)
To: Austin Clements; +Cc: Bertram Felgenhauer, Bart Massey, notmuch
[-- Attachment #1: Type: text/plain, Size: 1304 bytes --]
On Thu, 8 Sep 2011 11:15:57 -0400, Austin Clements <amdragon@MIT.EDU> wrote:
> In general, a garbage collector can't make any guarantees about
> finalization order. When a collection of objects all become
> unreachable simultaneously (for example, the last reference to any
> Messages object is dropped, causing the Query object and the Message
> object to both become unreachable), the garbage collector *could*
> finalize the Query first (causing talloc to free the
> notmuch_messages_t) and then the Messages object (causing it to
> crash). There's no guarantee in general because, in the presence of
> cycles, there is no meaningful finalization order.
Right, but that should not pose a problem for python. If e.g. both a
Query and derived Message objects become unreachable, the python objects
would not care which object is ditched and deleted first. Currently, it
seems that we finalize the Messages first, and the Query second. But we
would not fail if the Query were finalized first. Granted, the
underlying libnotmuch Message objects were torn away while the python
Message objects were still around. But they would ultimately also be
sweeped away, and that would not cause any problems.
But I am sure that I am missing out something. I'll leave this
discussion to the pros :-).
Sebastian
[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Memory management practices
2011-09-09 9:27 ` Sebastian Spaeth
@ 2011-09-09 17:53 ` Austin Clements
2011-09-11 21:47 ` Ben Gamari
2011-09-12 12:30 ` Sebastian Spaeth
0 siblings, 2 replies; 16+ messages in thread
From: Austin Clements @ 2011-09-09 17:53 UTC (permalink / raw)
To: Sebastian Spaeth; +Cc: Bertram Felgenhauer, Bart Massey, notmuch
Quoth Sebastian Spaeth on Sep 09 at 11:27 am:
> On Thu, 8 Sep 2011 11:15:57 -0400, Austin Clements <amdragon@MIT.EDU> wrote:
> > In general, a garbage collector can't make any guarantees about
> > finalization order. When a collection of objects all become
> > unreachable simultaneously (for example, the last reference to any
> > Messages object is dropped, causing the Query object and the Message
> > object to both become unreachable), the garbage collector *could*
> > finalize the Query first (causing talloc to free the
> > notmuch_messages_t) and then the Messages object (causing it to
> > crash). There's no guarantee in general because, in the presence of
> > cycles, there is no meaningful finalization order.
>
> Right, but that should not pose a problem for python. If e.g. both a
> Query and derived Message objects become unreachable, the python objects
> would not care which object is ditched and deleted first. Currently, it
> seems that we finalize the Messages first, and the Query second. But we
> would not fail if the Query were finalized first. Granted, the
> underlying libnotmuch Message objects were torn away while the python
> Message objects were still around. But they would ultimately also be
> sweeped away, and that would not cause any problems.
>
> But I am sure that I am missing out something. I'll leave this
> discussion to the pros :-).
Ah, the *Python* objects don't care, but the underlying C objects do.
Suppose the Query were finalized first. Python calls Query.__del__,
which calls notmuch_query_destroy, which releases the underlying
talloc references to the C notmuch_messages_t objects, causing talloc
to free the notmuch_messages_t. Messages._msgs now points to freed
memory, so when Python then finalizes the Messages object,
Messages.__del__ will pass this dangling pointer to
notmuch_messages_destroy, which will crash.
Hence my suggestion that, rather than trying to emulate C-style memory
management in bindings, bindings should create an additional talloc
reference to the underlying objects and rather than calling
notmuch_*_destroy during finalization, they should simply unlink this
additional reference. Any remaining library-created references will
keep the object alive as long as it's still needed by the library.
Then there's also no need to replicate the library's reference
structure in the bindings (though there is a danger of needlessly
delaying free's when the library creates convenience references like
the one from notmuch_query_t to notmuch_messages_t; for these I'd
recommend that the bindings undo such references, which requires a
little knowledge of the library's reference structure, but nothing
beyond what should be documented).
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Memory management practices
2011-09-09 17:53 ` Austin Clements
@ 2011-09-11 21:47 ` Ben Gamari
2011-09-12 2:18 ` Austin Clements
2011-09-12 12:30 ` Sebastian Spaeth
1 sibling, 1 reply; 16+ messages in thread
From: Ben Gamari @ 2011-09-11 21:47 UTC (permalink / raw)
To: Austin Clements, Sebastian Spaeth, Carl Worth
Cc: Bertram Felgenhauer, notmuch, Bart Massey
Sorry I've been so quiet on this recently. I've been a little under the
weather.
On Fri, 9 Sep 2011 13:53:28 -0400, Austin Clements <amdragon@MIT.EDU> wrote:
> Ah, the *Python* objects don't care, but the underlying C objects do.
> Suppose the Query were finalized first. Python calls Query.__del__,
> which calls notmuch_query_destroy, which releases the underlying
> talloc references to the C notmuch_messages_t objects, causing talloc
> to free the notmuch_messages_t. Messages._msgs now points to freed
> memory, so when Python then finalizes the Messages object,
> Messages.__del__ will pass this dangling pointer to
> notmuch_messages_destroy, which will crash.
Exactly. This is exactly what I suspect is happening in my case.
>
> Hence my suggestion that, rather than trying to emulate C-style memory
> management in bindings, bindings should create an additional talloc
> reference to the underlying objects and rather than calling
> notmuch_*_destroy during finalization, they should simply unlink this
> additional reference.
Currently talloc's reference counting interface is hidden behind
_destroy. While this might be a fairly intrusive change, perhaps notmuch
wants to juse expose a pair of reference counting *_ref/unref functions
instead of the *_destroy. Most users would simply need to change
existing *_destroy()s to _unref()s. Furthermore, this would allow
bindings authors to easily ensure non-broken GC behavior.
Does this sound completely insane, somewhat insane, or reasonable?
Cheers,
- Ben
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Memory management practices
2011-09-11 21:47 ` Ben Gamari
@ 2011-09-12 2:18 ` Austin Clements
0 siblings, 0 replies; 16+ messages in thread
From: Austin Clements @ 2011-09-12 2:18 UTC (permalink / raw)
To: Ben Gamari; +Cc: Bertram Felgenhauer, Bart Massey, notmuch
Quoth Ben Gamari on Sep 11 at 5:47 pm:
> Sorry I've been so quiet on this recently. I've been a little under the
> weather.
No worries.
> On Fri, 9 Sep 2011 13:53:28 -0400, Austin Clements <amdragon@MIT.EDU> wrote:
> > Hence my suggestion that, rather than trying to emulate C-style memory
> > management in bindings, bindings should create an additional talloc
> > reference to the underlying objects and rather than calling
> > notmuch_*_destroy during finalization, they should simply unlink this
> > additional reference.
>
> Currently talloc's reference counting interface is hidden behind
> _destroy. While this might be a fairly intrusive change, perhaps notmuch
> wants to juse expose a pair of reference counting *_ref/unref functions
> instead of the *_destroy. Most users would simply need to change
> existing *_destroy()s to _unref()s. Furthermore, this would allow
> bindings authors to easily ensure non-broken GC behavior.
I think the _destroy functions are silly. They all just call
talloc_free and, indeed, it would arguably be incorrect for them to do
anything more (any additional cleanup should be in a talloc
destructor). talloc is never explicitly mentioned in lib/notmuch.h
(intentionally, I would assume) but talloc-style notions of
"ownership" pervade the library documentation. IMO, the library
should just admit to using talloc, rather than try to wrap all of the
not-insubstantial talloc functionality a caller may need.
In the language of talloc, it's very natural to express the needs of
bindings in terms talloc_reference and talloc_unlink. The bindings
could maintain a per-Database context and track their own ownership by
adding a talloc reference from this context to each object returned
from the bindings; the finalizer would simply unlink the finalized
object from this context. Bindings could also use a global context
(though that would obviously be awkward in Haskell without biting the
unsafePerformIO bullet). Alternatively, bindings could use the NULL
context, which has the advantage of not actually tracking ownership in
talloc, but the disadvantage of making it harder to track down bugs
(since any code can reference or unlink from NULL).
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Memory management practices
2011-09-09 17:53 ` Austin Clements
2011-09-11 21:47 ` Ben Gamari
@ 2011-09-12 12:30 ` Sebastian Spaeth
1 sibling, 0 replies; 16+ messages in thread
From: Sebastian Spaeth @ 2011-09-12 12:30 UTC (permalink / raw)
To: Austin Clements; +Cc: Bertram Felgenhauer, Bart Massey, notmuch
[-- Attachment #1: Type: text/plain, Size: 1447 bytes --]
On Fri, 9 Sep 2011 13:53:28 -0400, Austin Clements <amdragon@MIT.EDU> wrote:
> Ah, the *Python* objects don't care, but the underlying C objects do.
[...]
Thanks for the elaboration. I understand now and agree with the analysis..
> Hence my suggestion that, rather than trying to emulate C-style memory
> management in bindings, bindings should create an additional talloc
> reference to the underlying objects and rather than calling
> notmuch_*_destroy during finalization, they should simply unlink this
> additional reference.
Agreed, that sounds like a much better option, although it would keep a
(underlying C object) for Query and all derived Messages around, even
when I explicitely "del query" in python, as long as the python GC keeps
any of those Message() objects alive and around, wouldn't it? (which
would probably be an ok behavior).
But the talloc ref/unref is not exposed through the lib currently, of course.
> Then there's also no need to replicate the library's reference
> structure in the bindings (though there is a danger of needlessly
> delaying free's when the library creates convenience references like
> the one from notmuch_query_t to notmuch_messages_t; for these I'd
> recommend that the bindings undo such references, which requires a
> little knowledge of the library's reference structure, but nothing
> beyond what should be documented).
Right, that would of course solve the above 'problem'.
Sebastian
[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2011-09-12 12:30 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-16 16:32 Memory management issue in notmuch-haskell bindings Ben Gamari
2011-08-28 21:27 ` Bug in GC's ordering of ForeignPtr finalization? Ben Gamari
2011-08-29 3:26 ` Antoine Latter
2011-08-29 3:47 ` Ben Gamari
2011-08-29 4:03 ` Antoine Latter
[not found] ` <20110829183010.GA2605@24f89f8c-e6a1-4e75-85ee-bb8a3743bb9f>
2011-08-29 20:30 ` Memory management practices Ben Gamari
2011-09-07 20:36 ` Ben Gamari
2011-09-08 2:48 ` Austin Clements
2011-09-08 3:05 ` Austin Clements
2011-09-08 13:50 ` Sebastian Spaeth
2011-09-08 15:15 ` Austin Clements
2011-09-09 9:27 ` Sebastian Spaeth
2011-09-09 17:53 ` Austin Clements
2011-09-11 21:47 ` Ben Gamari
2011-09-12 2:18 ` Austin Clements
2011-09-12 12:30 ` Sebastian Spaeth
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).