unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [PATCH 0/2] ruby: enable garbage collection
@ 2021-05-17 19:39 Felipe Contreras
  2021-05-17 19:39 ` [PATCH 1/2] ruby: create an actual wrapper struct Felipe Contreras
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Felipe Contreras @ 2021-05-17 19:39 UTC (permalink / raw)
  To: notmuch; +Cc: Tomi Ollila

Ruby is a gc language, we shouldn't be doing workaround to free memory when Ruby is perfectly
capable of doing so.

The problem is that talloc wants to be smart, and Ruby and talloc both fight to free memory. We can
let Ruby win by stealing all the objects from talloc control.

Thanks to the previous cleanup patches it's now possible to easily do this.

In order to test this series I've used the following script:

  require 'notmuch'

  $db = Notmuch::Database.new(ENV['HOME'] + '/mail')
  $do_destroy = true

  while true
    query = $db.query('')
    threads = query.search_threads
    threads.each do |thread|
      puts '%s: %s' % [thread.thread_id, thread.subject]
      thread.destroy! if $do_destroy
    end
    threads.destroy! if $do_destroy
    query.destroy! if $do_destroy
  end

  $db.close

All threads from the database are fetched over and over with no significant increase in memory.

The old method of destroying objects with destroy! still works, but now it's not necessary.

I tried other methods, like increasing the reference counter and adding a second parent to talloc
objects, but for some reason they didn't work. I investiged why but couldn't reach any conclussion.
On the other hand the talloc_steal approach works perfectly fine.

Felipe Contreras (2):
  ruby: create an actual wrapper struct
  ruby: enable garbage collection using talloc

 bindings/ruby/database.c |  2 +-
 bindings/ruby/defs.h     | 42 +++++++++++++++++++++++++++++++++++-----
 bindings/ruby/extconf.rb |  1 +
 bindings/ruby/init.c     |  6 ++++++
 4 files changed, 45 insertions(+), 6 deletions(-)

-- 
2.31.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] ruby: create an actual wrapper struct
  2021-05-17 19:39 [PATCH 0/2] ruby: enable garbage collection Felipe Contreras
@ 2021-05-17 19:39 ` Felipe Contreras
  2021-05-17 19:39 ` [PATCH 2/2] ruby: enable garbage collection using talloc Felipe Contreras
  2021-05-22 10:49 ` [PATCH 0/2] ruby: enable garbage collection David Bremner
  2 siblings, 0 replies; 7+ messages in thread
From: Felipe Contreras @ 2021-05-17 19:39 UTC (permalink / raw)
  To: notmuch; +Cc: Tomi Ollila

Currently Ruby data points directly to a notmuch object (e.g.
notmuch_database_t), since we don't need any extra data that is fine.

However, in the next commit we will need extra data, therefore we create
a new struct notmuch_rb_object_t wrapper which contains nothing but a
pointer to the current pointer (e.g. notmuch_database_t).

This struct is tied to the Ruby object, and is freed when the Ruby
object is freed by the garbage collector.

We do nothing with this wrapper, so no functionality should be changed.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
---
 bindings/ruby/database.c |  2 +-
 bindings/ruby/defs.h     | 39 ++++++++++++++++++++++++++++++++++-----
 bindings/ruby/init.c     |  6 ++++++
 3 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/bindings/ruby/database.c b/bindings/ruby/database.c
index bb993d86..66100de2 100644
--- a/bindings/ruby/database.c
+++ b/bindings/ruby/database.c
@@ -81,7 +81,7 @@ notmuch_rb_database_initialize (int argc, VALUE *argv, VALUE self)
 	ret = notmuch_database_open (path, mode, &database);
     notmuch_rb_status_raise (ret);
 
-    DATA_PTR (self) = database;
+    DATA_PTR (self) = notmuch_rb_object_create (database);
 
     return self;
 }
diff --git a/bindings/ruby/defs.h b/bindings/ruby/defs.h
index 9860ee17..1413eb72 100644
--- a/bindings/ruby/defs.h
+++ b/bindings/ruby/defs.h
@@ -66,7 +66,7 @@ extern const rb_data_type_t notmuch_rb_messages_type;
 extern const rb_data_type_t notmuch_rb_message_type;
 extern const rb_data_type_t notmuch_rb_tags_type;
 
-#define Data_Get_Notmuch_Object(obj, type, ptr)					    \
+#define Data_Get_Notmuch_Rb_Object(obj, type, ptr)		    		    \
     do {									    \
 	(ptr) = rb_check_typeddata ((obj), (type));				    \
 	if (RB_UNLIKELY (!(ptr))) {						    \
@@ -75,8 +75,15 @@ extern const rb_data_type_t notmuch_rb_tags_type;
 	}									    \
     } while (0)
 
+#define Data_Get_Notmuch_Object(obj, type, ptr)			\
+    do {							\
+	notmuch_rb_object_t *rb_wrapper;			\
+	Data_Get_Notmuch_Rb_Object ((obj), (type), rb_wrapper);	\
+	(ptr) = rb_wrapper->nm_object;				\
+    } while (0)
+
 #define Data_Wrap_Notmuch_Object(klass, type, ptr) \
-    TypedData_Wrap_Struct ((klass), (type), (ptr))
+    TypedData_Wrap_Struct ((klass), (type), notmuch_rb_object_create ((ptr)))
 
 #define Data_Get_Notmuch_Database(obj, ptr) \
     Data_Get_Notmuch_Object ((obj), &notmuch_rb_database_type, (ptr))
@@ -105,16 +112,38 @@ extern const rb_data_type_t notmuch_rb_tags_type;
 #define Data_Get_Notmuch_Tags(obj, ptr) \
     Data_Get_Notmuch_Object ((obj), &notmuch_rb_tags_type, (ptr))
 
+typedef struct {
+    void *nm_object;
+} notmuch_rb_object_t;
+
+static inline void *
+notmuch_rb_object_create (void *nm_object)
+{
+    notmuch_rb_object_t *rb_wrapper = malloc (sizeof (*rb_wrapper));
+    if (RB_UNLIKELY (!rb_wrapper))
+	return NULL;
+
+    rb_wrapper->nm_object = nm_object;
+    return rb_wrapper;
+}
+
+static inline void
+notmuch_rb_object_free (void *rb_wrapper)
+{
+    free (rb_wrapper);
+}
+
 static inline notmuch_status_t
 notmuch_rb_object_destroy (VALUE rb_object, const rb_data_type_t *type)
 {
-    void *nm_object;
+    notmuch_rb_object_t *rb_wrapper;
     notmuch_status_t ret;
 
-    Data_Get_Notmuch_Object (rb_object, type, nm_object);
+    Data_Get_Notmuch_Rb_Object (rb_object, type, rb_wrapper);
 
     /* Call the corresponding notmuch_*_destroy function */
-    ret = ((notmuch_status_t (*)(void *)) type->data) (nm_object);
+    ret = ((notmuch_status_t (*)(void *)) type->data) (rb_wrapper->nm_object);
+    notmuch_rb_object_free (rb_wrapper);
     DATA_PTR (rb_object) = NULL;
 
     return ret;
diff --git a/bindings/ruby/init.c b/bindings/ruby/init.c
index 62515eca..831f7695 100644
--- a/bindings/ruby/init.c
+++ b/bindings/ruby/init.c
@@ -48,6 +48,9 @@ ID ID_db_mode;
 
 const rb_data_type_t notmuch_rb_object_type = {
     .wrap_struct_name = "notmuch_object",
+    .function = {
+	.dfree = notmuch_rb_object_free,
+    },
 };
 
 #define define_type(id) \
@@ -55,6 +58,9 @@ const rb_data_type_t notmuch_rb_object_type = {
 	.wrap_struct_name = "notmuch_" #id, \
 	.parent = &notmuch_rb_object_type, \
 	.data = &notmuch_ ## id ## _destroy, \
+	.function = { \
+	    .dfree = notmuch_rb_object_free, \
+	}, \
     }
 
 define_type (database);
-- 
2.31.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 2/2] ruby: enable garbage collection using talloc
  2021-05-17 19:39 [PATCH 0/2] ruby: enable garbage collection Felipe Contreras
  2021-05-17 19:39 ` [PATCH 1/2] ruby: create an actual wrapper struct Felipe Contreras
@ 2021-05-17 19:39 ` Felipe Contreras
  2021-06-11  0:46   ` David Bremner
  2021-05-22 10:49 ` [PATCH 0/2] ruby: enable garbage collection David Bremner
  2 siblings, 1 reply; 7+ messages in thread
From: Felipe Contreras @ 2021-05-17 19:39 UTC (permalink / raw)
  To: notmuch; +Cc: Tomi Ollila

We basically steal all the objects from their notmuch parents, therefore
they are completely under Ruby's gc control.

The order at which these objects are freed does not matter any more,
because destroying the database does not destroy all the children
objects, since they belong to Ruby now.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
---
 bindings/ruby/database.c |  2 +-
 bindings/ruby/defs.h     | 11 +++++++----
 bindings/ruby/extconf.rb |  1 +
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/bindings/ruby/database.c b/bindings/ruby/database.c
index 66100de2..3737be17 100644
--- a/bindings/ruby/database.c
+++ b/bindings/ruby/database.c
@@ -81,7 +81,7 @@ notmuch_rb_database_initialize (int argc, VALUE *argv, VALUE self)
 	ret = notmuch_database_open (path, mode, &database);
     notmuch_rb_status_raise (ret);
 
-    DATA_PTR (self) = notmuch_rb_object_create (database);
+    DATA_PTR (self) = notmuch_rb_object_create (database, "notmuch_rb_database");
 
     return self;
 }
diff --git a/bindings/ruby/defs.h b/bindings/ruby/defs.h
index 1413eb72..5cebd5fa 100644
--- a/bindings/ruby/defs.h
+++ b/bindings/ruby/defs.h
@@ -23,6 +23,7 @@
 
 #include <notmuch.h>
 #include <ruby.h>
+#include <talloc.h>
 
 extern VALUE notmuch_rb_cDatabase;
 extern VALUE notmuch_rb_cDirectory;
@@ -83,7 +84,7 @@ extern const rb_data_type_t notmuch_rb_tags_type;
     } while (0)
 
 #define Data_Wrap_Notmuch_Object(klass, type, ptr) \
-    TypedData_Wrap_Struct ((klass), (type), notmuch_rb_object_create ((ptr)))
+    TypedData_Wrap_Struct ((klass), (type), notmuch_rb_object_create ((ptr), "notmuch_rb_object: " __location__))
 
 #define Data_Get_Notmuch_Database(obj, ptr) \
     Data_Get_Notmuch_Object ((obj), &notmuch_rb_database_type, (ptr))
@@ -117,20 +118,22 @@ typedef struct {
 } notmuch_rb_object_t;
 
 static inline void *
-notmuch_rb_object_create (void *nm_object)
+notmuch_rb_object_create (void *nm_object, const char *name)
 {
-    notmuch_rb_object_t *rb_wrapper = malloc (sizeof (*rb_wrapper));
+    notmuch_rb_object_t *rb_wrapper = talloc_named_const (NULL, sizeof (*rb_wrapper), name);
+
     if (RB_UNLIKELY (!rb_wrapper))
 	return NULL;
 
     rb_wrapper->nm_object = nm_object;
+    talloc_steal (rb_wrapper, nm_object);
     return rb_wrapper;
 }
 
 static inline void
 notmuch_rb_object_free (void *rb_wrapper)
 {
-    free (rb_wrapper);
+    talloc_free (rb_wrapper);
 }
 
 static inline notmuch_status_t
diff --git a/bindings/ruby/extconf.rb b/bindings/ruby/extconf.rb
index 161de5a2..d914537c 100644
--- a/bindings/ruby/extconf.rb
+++ b/bindings/ruby/extconf.rb
@@ -19,6 +19,7 @@ if not ENV['LIBNOTMUCH']
 end
 
 $LOCAL_LIBS += ENV['LIBNOTMUCH']
+$LIBS += " -ltalloc"
 
 # Create Makefile
 dir_config('notmuch')
-- 
2.31.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] ruby: enable garbage collection
  2021-05-17 19:39 [PATCH 0/2] ruby: enable garbage collection Felipe Contreras
  2021-05-17 19:39 ` [PATCH 1/2] ruby: create an actual wrapper struct Felipe Contreras
  2021-05-17 19:39 ` [PATCH 2/2] ruby: enable garbage collection using talloc Felipe Contreras
@ 2021-05-22 10:49 ` David Bremner
  2021-05-28  3:10   ` Felipe Contreras
  2 siblings, 1 reply; 7+ messages in thread
From: David Bremner @ 2021-05-22 10:49 UTC (permalink / raw)
  To: Felipe Contreras, notmuch; +Cc: Tomi Ollila

Felipe Contreras <felipe.contreras@gmail.com> writes:

> Ruby is a gc language, we shouldn't be doing workaround to free memory when Ruby is perfectly
> capable of doing so.
>
> The problem is that talloc wants to be smart, and Ruby and talloc both fight to free memory. We can
> let Ruby win by stealing all the objects from talloc control.
>
> Thanks to the previous cleanup patches it's now possible to easily do this.
>
> In order to test this series I've used the following script:

I still haven't had a chance to look at the series, but how about making
this script either a test or a performance-test, as appropriate?

d

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] ruby: enable garbage collection
  2021-05-22 10:49 ` [PATCH 0/2] ruby: enable garbage collection David Bremner
@ 2021-05-28  3:10   ` Felipe Contreras
  0 siblings, 0 replies; 7+ messages in thread
From: Felipe Contreras @ 2021-05-28  3:10 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch, Tomi Ollila

On Sat, May 22, 2021 at 5:49 AM David Bremner <david@tethera.net> wrote:
>
> Felipe Contreras <felipe.contreras@gmail.com> writes:
>
> > Ruby is a gc language, we shouldn't be doing workaround to free memory when Ruby is perfectly
> > capable of doing so.
> >
> > The problem is that talloc wants to be smart, and Ruby and talloc both fight to free memory. We can
> > let Ruby win by stealing all the objects from talloc control.
> >
> > Thanks to the previous cleanup patches it's now possible to easily do this.
> >
> > In order to test this series I've used the following script:
>
> I still haven't had a chance to look at the series, but how about making
> this script either a test or a performance-test, as appropriate?

I gave this a try, and even compiled Ruby with --with-valgrind
(apparently Arch Linux doesn't do that), but doesn't seem to work
right:

% valgrind /opt/ruby/bin/ruby -e 'p true'

  HEAP SUMMARY:
      in use at exit: 37,193,788 bytes in 21,951 blocks
    total heap usage: 61,729 allocs, 39,778 frees, 51,503,615 bytes allocated

  LEAK SUMMARY:
     definitely lost: 510,491 bytes in 4,617 blocks
     indirectly lost: 795,436 bytes in 9,411 blocks
       possibly lost: 2,128,254 bytes in 7,100 blocks
     still reachable: 33,759,607 bytes in 823 blocks
          suppressed: 0 bytes in 0 blocks
  Rerun with --leak-check=full to see details of leaked memory

Trying to run the simplest of Ruby commands throws a log 93,000 lines long.

I did try to search online resources to use valgrind with Ruby to no
avail. Apparently everyone is using valgrind with a baseline (if
valgrind shows 500 KiB lost as a start, how much does it change after
my changes?). Sure, we could try to massage a valgrind suppression
file, but is it worth the effort if the Ruby project itself hasn't
even tried to do that?

If we wanted to measure the memory performance of such a command
(which I think would be very nice) a different strategy is needed.

I also didn't see any talloc output.

For now I don't think the series should be blocked by this setback.

Just try to run the command yourself with top. You can see it works.

Cheers.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] ruby: enable garbage collection using talloc
  2021-05-17 19:39 ` [PATCH 2/2] ruby: enable garbage collection using talloc Felipe Contreras
@ 2021-06-11  0:46   ` David Bremner
  2021-06-11  8:54     ` David Bremner
  0 siblings, 1 reply; 7+ messages in thread
From: David Bremner @ 2021-06-11  0:46 UTC (permalink / raw)
  To: Felipe Contreras, notmuch; +Cc: Tomi Ollila

Felipe Contreras <felipe.contreras@gmail.com> writes:

> We basically steal all the objects from their notmuch parents, therefore
> they are completely under Ruby's gc control.
>
> The order at which these objects are freed does not matter any more,
> because destroying the database does not destroy all the children
> objects, since they belong to Ruby now.
>

I guess from a purist point of view this is a kind of layering
violation, since the use of talloc is purportedly an internal
implementation detail of the library. Still, I think it's a reasonable
approach given that the ruby bindings are maintained as part of notmuch,
and we are not very likely to abandon talloc.

I am OK with applying the series as is. The only sensible thing I can
think of at the moment for testing is to run something like the script
of id:20210517193915.1220114-1-felipe.contreras@gmail.com as a "time
test", so not attempt to get valgrind working, but just run it on some
decent size corpuses and see that it it does not crash or leak too much
memory to complete.

d

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] ruby: enable garbage collection using talloc
  2021-06-11  0:46   ` David Bremner
@ 2021-06-11  8:54     ` David Bremner
  0 siblings, 0 replies; 7+ messages in thread
From: David Bremner @ 2021-06-11  8:54 UTC (permalink / raw)
  To: Felipe Contreras, notmuch; +Cc: Tomi Ollila

David Bremner <david@tethera.net> writes:

> Felipe Contreras <felipe.contreras@gmail.com> writes:
>
>> We basically steal all the objects from their notmuch parents, therefore
>> they are completely under Ruby's gc control.
>>
>> The order at which these objects are freed does not matter any more,
>> because destroying the database does not destroy all the children
>> objects, since they belong to Ruby now.
>>
>
> I guess from a purist point of view this is a kind of layering
> violation, since the use of talloc is purportedly an internal
> implementation detail of the library. Still, I think it's a reasonable
> approach given that the ruby bindings are maintained as part of notmuch,
> and we are not very likely to abandon talloc.
>

One issue to double check: in a few places we explicitely _don't_ use
talloc. What happens when those objects are passed to talloc_steal?

d

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-06-11  8:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-17 19:39 [PATCH 0/2] ruby: enable garbage collection Felipe Contreras
2021-05-17 19:39 ` [PATCH 1/2] ruby: create an actual wrapper struct Felipe Contreras
2021-05-17 19:39 ` [PATCH 2/2] ruby: enable garbage collection using talloc Felipe Contreras
2021-06-11  0:46   ` David Bremner
2021-06-11  8:54     ` David Bremner
2021-05-22 10:49 ` [PATCH 0/2] ruby: enable garbage collection David Bremner
2021-05-28  3:10   ` Felipe Contreras

unofficial mirror of notmuch@notmuchmail.org

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://yhetil.org/notmuch/0 notmuch/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 notmuch notmuch/ https://yhetil.org/notmuch \
		notmuch@notmuchmail.org
	public-inbox-index notmuch

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.yhetil.org/yhetil.mail.notmuch.general
	nntp://news.gmane.io/gmane.mail.notmuch.general


code repositories for project(s) associated with this inbox:

	notmuch.git.git (no URL configured)

AGPL code for this site: git clone http://ou63pmih66umazou.onion/public-inbox.git