unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* About shared substrings
       [not found] ` <rmillo7u2pv.fsf@fnord.ir.bbn.com>
@ 2004-01-16 20:35   ` Roland Orre
  2004-01-16 21:12     ` Neil Jerram
                       ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Roland Orre @ 2004-01-16 20:35 UTC (permalink / raw)


Today I became really impressed.
For me shared substrings was an essential feature of guile as this made
me able to speed up pure scheme based reading of fixed width text files
in a dramatic way as I only needed to declare a buffer and from this
buffer allocate a set of substrings corresponding to the fields of that
table. Using this scheme I didn't need to do any more memory allocations
of substrings as every field was immediately accessible by standard
scheme string conversion routines when the buffer was read.

Now when I found that shared substrings were removed from guile 1.7
I first came up with a complicated scheme using guile_gc_protect_object
but where I would have to do explicit deallocation of shared strings.

I discussed this issue with Mikael Djurfeldt today and then he came up
with the following solution:

SCM substring_table;

SCM scm_make_shared_substring (SCM parent, SCM start, SCM end)
{
  SCM substring;
  char *mem;
  int c_start, c_end;
  SCM_VALIDATE_SUBSTRING_SPEC_COPY (1, parent, mem,
                                    2, start, c_start,
                                    3, end, c_end);
  substring = scm_cell (SCM_MAKE_STRING_TAG (c_end - c_start),
                        (scm_t_bits) (mem + c_start));
  scm_hash_set_x (substring_table, substring, parent);
  return substring;
}

where the following is put in the main:
substring_table
= scm_permanent_object (scm_make_weak_key_hash_table (SCM_UNDEFINED));

This is almost magical :) It works perfectly well and I don't need to
bother about any explicit deallocation. This is also the first time I
really understand the purpose of these weak hash tables. For weak hash
tables the hash entry will be garbage collected first when the key is
seen as garbage. With this scheme we still have the same essential
functionality from my perspective about shared substrings but we do
no longer need an explicit tag for shared substrings.

Mikael Djurfeldt is relly the most clever programmer I know.

	Many thanks!
	Roland Orre




_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: About shared substrings
  2004-01-16 20:35   ` About shared substrings Roland Orre
@ 2004-01-16 21:12     ` Neil Jerram
  2004-01-17 22:34       ` Keith Wright
  2004-01-16 21:54     ` Mikael Djurfeldt
  2004-01-16 22:00     ` Tom Lord
  2 siblings, 1 reply; 5+ messages in thread
From: Neil Jerram @ 2004-01-16 21:12 UTC (permalink / raw)
  Cc: guile-user, guile-devel

>>>>> "Roland" == Roland Orre <orre@nada.kth.se> writes:

    Roland> This is almost magical :)

Indeed, but there's one thing I don't quite see.  When a shared
substring cell is GC'd, what prevents it from trying to free the
substring chars that it is pointing to?

        Neil



_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: About shared substrings
  2004-01-16 20:35   ` About shared substrings Roland Orre
  2004-01-16 21:12     ` Neil Jerram
@ 2004-01-16 21:54     ` Mikael Djurfeldt
  2004-01-16 22:00     ` Tom Lord
  2 siblings, 0 replies; 5+ messages in thread
From: Mikael Djurfeldt @ 2004-01-16 21:54 UTC (permalink / raw)
  Cc: guile-user, djurfeldt, guile-devel

Roland Orre <orre@nada.kth.se> writes:

> I discussed this issue with Mikael Djurfeldt today and then he came up
> with the following solution:
>
> SCM substring_table;
>
> SCM scm_make_shared_substring (SCM parent, SCM start, SCM end)
> {
>   SCM substring;
>   char *mem;
>   int c_start, c_end;
>   SCM_VALIDATE_SUBSTRING_SPEC_COPY (1, parent, mem,
>                                     2, start, c_start,
>                                     3, end, c_end);
>   substring = scm_cell (SCM_MAKE_STRING_TAG (c_end - c_start),
>                         (scm_t_bits) (mem + c_start));
>   scm_hash_set_x (substring_table, substring, parent);
>   return substring;
> }
>
> where the following is put in the main:
> substring_table
> = scm_permanent_object (scm_make_weak_key_hash_table (SCM_UNDEFINED));
>
> This is almost magical :)

Unfortunately, it *is* magical.  In addition, it is necessary to hand
the newly created substring to a guardian.  If we then add a hook
function to some suitable GC hook which loops through all unreferenced
substrings and sets their address pointer to NULL, everything should
work.

If we *don't* do that, the same area of memory will be freed
repeatedly, which is not optimal.

> Mikael Djurfeldt is relly the most clever programmer I know.

Many thanks for the kind words!  Yes, I am usually a blazing genius.
My mistake above must certainly be due to something in my coffee or
so...  :)

M


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: About shared substrings
  2004-01-16 20:35   ` About shared substrings Roland Orre
  2004-01-16 21:12     ` Neil Jerram
  2004-01-16 21:54     ` Mikael Djurfeldt
@ 2004-01-16 22:00     ` Tom Lord
  2 siblings, 0 replies; 5+ messages in thread
From: Tom Lord @ 2004-01-16 22:00 UTC (permalink / raw)
  Cc: guile-user, guile-devel



    > From: Roland Orre <orre@nada.kth.se>

    > I discussed this issue with Mikael Djurfeldt today and then he came up
    > with the following solution:

    > SCM substring_table;

    > SCM scm_make_shared_substring (SCM parent, SCM start, SCM end)
    > {
    >   SCM substring;
    >   char *mem;
    >   int c_start, c_end;
    >   SCM_VALIDATE_SUBSTRING_SPEC_COPY (1, parent, mem,
    >                                     2, start, c_start,
    >                                     3, end, c_end);
    >   substring = scm_cell (SCM_MAKE_STRING_TAG (c_end - c_start),
    >                         (scm_t_bits) (mem + c_start));
    >   scm_hash_set_x (substring_table, substring, parent);
    >   return substring;
    > }

    > where the following is put in the main:
    > substring_table
    > = scm_permanent_object (scm_make_weak_key_hash_table (SCM_UNDEFINED));

    > This is almost magical :) It works perfectly well and I don't need to
    > bother about any explicit deallocation. This is also the first time I
    > really understand the purpose of these weak hash tables. For weak hash
    > tables the hash entry will be garbage collected first when the key is
    > seen as garbage. With this scheme we still have the same essential
    > functionality from my perspective about shared substrings but we do
    > no longer need an explicit tag for shared substrings.

How can this actually work?

It ensures that SUBSTRING, while live, protects PARENT.  I see that
and it's _roughly_ what's wanted.

But SUBSTRING is tagged as a string, no?  When that key (the
substring) is collected -- won't that lead to a bogus free of the
substring data?

(I'm looking at the 1.6.4 GC implementation.  Apologies if things have
changed in some significant way in 1.7)

-t



_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: About shared substrings
  2004-01-16 21:12     ` Neil Jerram
@ 2004-01-17 22:34       ` Keith Wright
  0 siblings, 0 replies; 5+ messages in thread
From: Keith Wright @ 2004-01-17 22:34 UTC (permalink / raw)


> From: Neil Jerram <neil@ossau.uklinux.net>
> 
>     Roland> This is almost magical :)
> 
> When a shared substring cell is GC'd, what prevents it from trying
> to free the substring chars that it is pointing to?

   *   Any sufficiently advanced technology  *
   *   is no more reliable than magic.       *

(Sorry, I'm reading e-mail and listening to Arthur C. Clark on NPR.)

-- 
     -- Keith Wright  <kwright@free-comp-shop.com>

Programmer in Chief, Free Computer Shop <http://www.free-comp-shop.com>
         ---  Food, Shelter, Source code.  ---


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-01-17 22:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1074242951.6739.5.camel@localhost>
     [not found] ` <rmillo7u2pv.fsf@fnord.ir.bbn.com>
2004-01-16 20:35   ` About shared substrings Roland Orre
2004-01-16 21:12     ` Neil Jerram
2004-01-17 22:34       ` Keith Wright
2004-01-16 21:54     ` Mikael Djurfeldt
2004-01-16 22:00     ` Tom Lord

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).