unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* cmod-play 1 available + modsup.h additions
@ 2003-11-13 21:55 Thien-Thi Nguyen
  2003-11-14  8:26 ` Ludovic Courtès
  2003-11-14 14:17 ` Marius Vollmer
  0 siblings, 2 replies; 21+ messages in thread
From: Thien-Thi Nguyen @ 2003-11-13 21:55 UTC (permalink / raw)
  Cc: guile-user

folks,

libtool modules cannot depend on other libtool modules (according to the
documentation on libltdl), so we can't co-opt that strategy if we want
to design compiled modules that depend on other modules.  sure, we can
throw up our hands and relegate everything to the system linker/loader,
but why pass up a thorny question for others to play with?  i mean, to
do system programming you need to grab control of the system, and to do
module system programming you need to grab control of the module system.

so, this message actually presents two pieces of source code: (1) a
pointer to the documented exploratory process:

  http://www.glug.org/people/ttn/software/cmod-play/

and (2) a request for comments on the tentative conclusions of the above
exploration, as expressed by the additional <guile/modsup.h> interface
elements excerpted below.  if All Goes Well, they will appear in guile
1.4.1.97.

in other news, i will be using this new support immediately for
guile-sdl compiled modules (the motivation for all this, you see), so to
spare everyone the agony guile-sdl will not be released until 1.4.1.97
is out.  (however, everything works swimmingly from cvs, if you are
feeling adventurous. :-)

thi

_____________________________________________________________
/*:Return the @var{obj} given, but marked as "permanent".
   This means that it can never be garbage collected.
*/
#define GHSTONED(obj) \
  scm_permanent_object (obj)

/*:Declare and later arrange for @var{cvar} (type SCM) to hold a resolved
   module object for @var{fullname}, a C string such as "(ice-9 q)".  The
   string is saved in a C variable named by prefixing "s_" to @var{cvar}.
   You must use @var{cvar} as the second arg to @code{MUSEMODULEVAR}.
*/
#define MUSEMODULE(cvar,fullname) \
SCM_SNARF_HERE (static char * s_ ## cvar = fullname; static SCM cvar) \
SCM_SNARF_INIT (cvar = GHSTONED (gh_resolve_module (s_ ## cvar));)

/*:Declare and later arrange for @var{cvar} (type SCM) to have the
   same value as the imported module @var{m_cvar} variable @var{s_name}.
   @var{m_cvar} is the SCM object declared with @code{MUSEMODULE}, and
   @var{s_name} is a string such as "q-empty?".  If the imported value
   is a procedure, you can use @code{gh_apply} or @code{gh_call0} through
   @code{gh_call3} on it.
*/
#define MUSEMODULEVAR(cvar,m_cvar,s_name) \
SCM_SNARF_HERE (static SCM cvar) \
SCM_SNARF_INIT (cvar = GHSTONED (gh_module_lookup (m_cvar, s_name));)

/*:Declare and define a procedure @var{cvar} that takes 0 (zero) args,
   which returns the result of calling @code{gh_call0} on @var{proc_cvar}.
   @var{proc_cvar} is the SCM object declared with @code{MUSEMODULEVAR}.
*/
#define MUSEMODULEPROC0(cvar,proc_cvar) \
  static SCM cvar (void) \
  { return gh_call0 (proc_cvar); }

/*:Declare and define a procedure @var{cvar} that takes 1 (one) SCM arg,
   which returns the result of calling @code{gh_call1} on @var{proc_cvar}
   and this arg.  @var{proc_var} is the SCM object declared with
   @{MUSEMODULEVAR}.
*/
#define MUSEMODULEPROC1(cvar,proc_cvar) \
  static SCM cvar (SCM a1) \
  { return gh_call1 (proc_cvar, a1); }

/*:Declare and define a procedure @var{cvar} that takes 2 (two) SCM args,
   which returns the result of calling @code{gh_call2} on @var{proc_cvar}
   and the args.  @var{proc_var} is the SCM object declared with
   @{MUSEMODULEVAR}.
*/
#define MUSEMODULEPROC2(cvar,proc_cvar) \
  static SCM cvar (SCM a1, SCM a2) \
  { return gh_call2 (proc_cvar, a1, a2); }

/*:Declare and define a procedure @var{cvar} that takes 3 (three) SCM args,
   which returns the result of calling @code{gh_call3} on @var{proc_cvar}
   and the args.  @var{proc_var} is the SCM object declared with
   @{MUSEMODULEVAR}.
*/
#define MUSEMODULEPROC3(cvar,proc_cvar) \
  static SCM cvar (SCM a1, SCM a2, SCM a3) \
  { return gh_call3 (proc_cvar, a1, a2, a3); }

[modsup.h excerpt ends here]


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: cmod-play 1 available + modsup.h additions
  2003-11-13 21:55 cmod-play 1 available + modsup.h additions Thien-Thi Nguyen
@ 2003-11-14  8:26 ` Ludovic Courtès
  2003-11-14 13:10   ` Thien-Thi Nguyen
  2003-11-14 14:29   ` Marius Vollmer
  2003-11-14 14:17 ` Marius Vollmer
  1 sibling, 2 replies; 21+ messages in thread
From: Ludovic Courtès @ 2003-11-14  8:26 UTC (permalink / raw)
  Cc: guile-user, guile-sources

Hi,

The naming style of the macros in modsup.h doesn't seem to be very
consistent with the rest of the API.  In particular, macros are not
defined in the SCM_ namespace and their name are pretty concise
(abbreviations, no underscores in between words, etc.).  Am I missing
something?

Also, will this be part of Guile 1.6.x too?

Thanks,
Ludovic.

Yesterday, 10 hours, 25 minutes, 3 seconds ago, Thien-Thi Nguyen wrote:
> folks,
> 
> libtool modules cannot depend on other libtool modules (according to the
> documentation on libltdl), so we can't co-opt that strategy if we want
> to design compiled modules that depend on other modules.  sure, we can
> throw up our hands and relegate everything to the system linker/loader,
> but why pass up a thorny question for others to play with?  i mean, to
> do system programming you need to grab control of the system, and to do
> module system programming you need to grab control of the module system.
> 
> so, this message actually presents two pieces of source code: (1) a
> pointer to the documented exploratory process:
> 
>   http://www.glug.org/people/ttn/software/cmod-play/
> 
> and (2) a request for comments on the tentative conclusions of the above
> exploration, as expressed by the additional <guile/modsup.h> interface
> elements excerpted below.  if All Goes Well, they will appear in guile
> 1.4.1.97.
> 
> in other news, i will be using this new support immediately for
> guile-sdl compiled modules (the motivation for all this, you see), so to
> spare everyone the agony guile-sdl will not be released until 1.4.1.97
> is out.  (however, everything works swimmingly from cvs, if you are
> feeling adventurous. :-)
> 
> thi
> 
> _____________________________________________________________
> /*:Return the @var{obj} given, but marked as "permanent".
>    This means that it can never be garbage collected.
> */
> #define GHSTONED(obj) \
>   scm_permanent_object (obj)
> 
> /*:Declare and later arrange for @var{cvar} (type SCM) to hold a resolved
>    module object for @var{fullname}, a C string such as "(ice-9 q)".  The
>    string is saved in a C variable named by prefixing "s_" to @var{cvar}.
>    You must use @var{cvar} as the second arg to @code{MUSEMODULEVAR}.
> */
> #define MUSEMODULE(cvar,fullname) \
> SCM_SNARF_HERE (static char * s_ ## cvar = fullname; static SCM cvar) \
> SCM_SNARF_INIT (cvar = GHSTONED (gh_resolve_module (s_ ## cvar));)
> 
> /*:Declare and later arrange for @var{cvar} (type SCM) to have the
>    same value as the imported module @var{m_cvar} variable @var{s_name}.
>    @var{m_cvar} is the SCM object declared with @code{MUSEMODULE}, and
>    @var{s_name} is a string such as "q-empty?".  If the imported value
>    is a procedure, you can use @code{gh_apply} or @code{gh_call0} through
>    @code{gh_call3} on it.
> */
> #define MUSEMODULEVAR(cvar,m_cvar,s_name) \
> SCM_SNARF_HERE (static SCM cvar) \
> SCM_SNARF_INIT (cvar = GHSTONED (gh_module_lookup (m_cvar, s_name));)
> 
> /*:Declare and define a procedure @var{cvar} that takes 0 (zero) args,
>    which returns the result of calling @code{gh_call0} on @var{proc_cvar}.
>    @var{proc_cvar} is the SCM object declared with @code{MUSEMODULEVAR}.
> */
> #define MUSEMODULEPROC0(cvar,proc_cvar) \
>   static SCM cvar (void) \
>   { return gh_call0 (proc_cvar); }
> 
> /*:Declare and define a procedure @var{cvar} that takes 1 (one) SCM arg,
>    which returns the result of calling @code{gh_call1} on @var{proc_cvar}
>    and this arg.  @var{proc_var} is the SCM object declared with
>    @{MUSEMODULEVAR}.
> */
> #define MUSEMODULEPROC1(cvar,proc_cvar) \
>   static SCM cvar (SCM a1) \
>   { return gh_call1 (proc_cvar, a1); }
> 
> /*:Declare and define a procedure @var{cvar} that takes 2 (two) SCM args,
>    which returns the result of calling @code{gh_call2} on @var{proc_cvar}
>    and the args.  @var{proc_var} is the SCM object declared with
>    @{MUSEMODULEVAR}.
> */
> #define MUSEMODULEPROC2(cvar,proc_cvar) \
>   static SCM cvar (SCM a1, SCM a2) \
>   { return gh_call2 (proc_cvar, a1, a2); }
> 
> /*:Declare and define a procedure @var{cvar} that takes 3 (three) SCM args,
>    which returns the result of calling @code{gh_call3} on @var{proc_cvar}
>    and the args.  @var{proc_var} is the SCM object declared with
>    @{MUSEMODULEVAR}.
> */
> #define MUSEMODULEPROC3(cvar,proc_cvar) \
>   static SCM cvar (SCM a1, SCM a2, SCM a3) \
>   { return gh_call3 (proc_cvar, a1, a2, a3); }
> 
> [modsup.h excerpt ends here]
> 
> 
> _______________________________________________
> Guile-user mailing list
> Guile-user@gnu.org
> http://mail.gnu.org/mailman/listinfo/guile-user


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: cmod-play 1 available + modsup.h additions
  2003-11-14  8:26 ` Ludovic Courtès
@ 2003-11-14 13:10   ` Thien-Thi Nguyen
  2003-11-14 13:37     ` Ludovic Courtès
  2003-11-14 14:29   ` Marius Vollmer
  1 sibling, 1 reply; 21+ messages in thread
From: Thien-Thi Nguyen @ 2003-11-14 13:10 UTC (permalink / raw)
  Cc: guile-user

   From: Ludovic =?iso-8859-1?Q?Court=E8s?= <ludovic.courtes@laas.fr>
   Date: Fri, 14 Nov 2003 09:26:58 +0100

   The naming style of the macros in modsup.h doesn't seem to be very
   consistent with the rest of the API.  In particular, macros are not
   defined in the SCM_ namespace and their name are pretty concise
   (abbreviations, no underscores in between words, etc.).

i'd like to keep them out of SCM_ namespace, actually, and tried for the
"M" namespace, which is admittedly easy to stumble on inadvertantly by
other code.  guile 1.4.x has SCM_ and GH_ already, maybe GHM_ (for guile
high-level "module support") is better.  how does that sound?

   Also, will this be part of Guile 1.6.x too?

since it's an additive change, introducing these macros does not raise
compatibility issues, but of course there may be other criterion by
which they will be rejected.  there's a lot of stuff in 1.4.x[1] that
could easily be added to other guile branches but the will to do so has
not yet materialized (and might never).  that's not something i can
control so i don't worry about it, generally.

thi


________________________________________________________
[1] for example, machine assisted doc maintenance:
    http://www.glug.org/docbits/guile/1.4.x/Doc-Maintenance.html


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: cmod-play 1 available + modsup.h additions
  2003-11-14 13:10   ` Thien-Thi Nguyen
@ 2003-11-14 13:37     ` Ludovic Courtès
  2003-11-14 17:38       ` Thien-Thi Nguyen
  0 siblings, 1 reply; 21+ messages in thread
From: Ludovic Courtès @ 2003-11-14 13:37 UTC (permalink / raw)
  Cc: guile-user

Hi,

Today, 24 minutes, 28 seconds ago, Thien-Thi Nguyen wrote:
> i'd like to keep them out of SCM_ namespace, actually, and tried for the
> "M" namespace, which is admittedly easy to stumble on inadvertantly by
> other code.  guile 1.4.x has SCM_ and GH_ already, maybe GHM_ (for guile
> high-level "module support") is better.  how does that sound?

I actually don't see the point in having different namespaces for Guile
things. :)

> since it's an additive change, introducing these macros does not raise
> compatibility issues, but of course there may be other criterion by
> which they will be rejected.  there's a lot of stuff in 1.4.x[1] that
> could easily be added to other guile branches but the will to do so has
> not yet materialized (and might never).  that's not something i can
> control so i don't worry about it, generally.

The doc snarfing example you gave is a nice thing.  Unfortunately, there
is no such thing in Guile 1.6.  Is there any reason for continuing
development for 1.4.x rather than working on 1.6.x?

Thanks,
Ludovic.


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: cmod-play 1 available + modsup.h additions
  2003-11-13 21:55 cmod-play 1 available + modsup.h additions Thien-Thi Nguyen
  2003-11-14  8:26 ` Ludovic Courtès
@ 2003-11-14 14:17 ` Marius Vollmer
  2003-11-14 15:28   ` Does anyone have a better scm_string_hash ? Roland Orre
  2003-11-14 17:40   ` cmod-play 1 available + modsup.h additions Thien-Thi Nguyen
  1 sibling, 2 replies; 21+ messages in thread
From: Marius Vollmer @ 2003-11-14 14:17 UTC (permalink / raw)
  Cc: guile-user

Thien-Thi Nguyen <ttn@surf.glug.org> writes:

> libtool modules cannot depend on other libtool modules (according to the
> documentation on libltdl),

The libtool documentation seems to be out-of-date with respect to
this.  libltdl from libtool 1.5 (and probably earlier versions) does
load the 'dependency_libs' listed in a .la file.  (We have been
relying on this in guile-gtk for quite some time now.)

-- 
GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3  331E FAF8 226A D5D4 E405


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: cmod-play 1 available + modsup.h additions
  2003-11-14  8:26 ` Ludovic Courtès
  2003-11-14 13:10   ` Thien-Thi Nguyen
@ 2003-11-14 14:29   ` Marius Vollmer
  1 sibling, 0 replies; 21+ messages in thread
From: Marius Vollmer @ 2003-11-14 14:29 UTC (permalink / raw)


Ludovic Courtès <ludovic.courtes@laas.fr> writes:

> Also, will this be part of Guile 1.6.x too?

In 1.6 and going forward, 'compiled modules' are done differently.  A
'use-modules' statement, say, does not directly load a shared library.
Instead, it loads a small Scheme wrapper file that then in turn loads
the shared library via 'load-extension'.  That way, modules can depend
on one another and load each other in the same way used for Scheme
code.

A second problem is that a shared library might depend on a shared
library that is not asssociated with a module.  For example, the
module (gtk-1.2 gtk) loads libguile-gtk-1.2.so (via load-extension)
and libguile-gtk-1.2.so depends on libgtk.so which in turn depends on
libgdk.so, etc.  The (gtk-1.2 gtk) module only loads
libguile-gtk-1.2.so and the remaining libs need to be loaded
automatically.  This does indeed happen with libltdl (which is used by
Guile).

As far as I understand it, Thien's proposal does only solve the first
problem, that of modules depending on each other, but not the second
one, that of a shared library depending on another shared library that
is not used by/as a module.

The first problem does not exist in 1.6 since shared libraries are not
modules anylonger, the second one does not exist since libltdl handles
it already correctly.

( 1.6 still supports loading shared libraries as modules, but
  deprecates it. )

-- 
GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3  331E FAF8 226A D5D4 E405


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Does anyone have a better scm_string_hash ?
  2003-11-14 14:17 ` Marius Vollmer
@ 2003-11-14 15:28   ` Roland Orre
  2003-11-14 15:51     ` Ludovic Courtès
  2003-11-14 17:40   ` cmod-play 1 available + modsup.h additions Thien-Thi Nguyen
  1 sibling, 1 reply; 21+ messages in thread
From: Roland Orre @ 2003-11-14 15:28 UTC (permalink / raw)


I noticed that our data mining software run very very slow with a
new data base. I localized the problem to scm_string_hash.

A hash table in this case was loaded with 14166 strings. I have
a function which creates a reasonable sized hash table, in this
case the hash table size was 8209.

13856 of these strings were hashed to the same index= 1067.
303 strings got index = 8061.
2 strings got the index = 754.
8201 entries were empty.

We are running guile 1.6 but I checked the scm_string_hash from a recent
1.7 CVS also and the function in hash.c there is identical.

I added a few of the symbols hashing to 1067 below. One can of course
argue that the symbols in this case should be hashed as numbers.
Anyway, does anyone have any hint or have a better string hash function?

	Best regards
	Roland Orre


A few strings hashing to entry 1067 for hash table length 8209:
"01632001" "01627301" "01626801" "01626601" "01626501" "01626401"
"01626301" "01626101" "01626001" "01625901" "01625801" "01625701"
"01625401" "01625301" "01625101" "01625001" "01624801" "01624601"
"01624501" "01624401" "01624301" "01624101" "01624001" "01623901"
"01623801" "01623701" "01623601" "01623501" "01623401" "01623201"
"01623101" "01622901" "01622801" "01622701" "01622601" "01622401"




_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Does anyone have a better scm_string_hash ?
  2003-11-14 15:28   ` Does anyone have a better scm_string_hash ? Roland Orre
@ 2003-11-14 15:51     ` Ludovic Courtès
  2003-11-17  8:33       ` Roland Orre
  0 siblings, 1 reply; 21+ messages in thread
From: Ludovic Courtès @ 2003-11-14 15:51 UTC (permalink / raw)
  Cc: guile-user

You might want to look at
http://srfi.schemers.org/srfi-13/mail-archive/msg00112.html .

Basically, the idea there is that a hash key for string cn..c0 is:
     h = (c0 + c1*37 + c2*37^2 + ...) % hash_size.
I used that and it *seems* to work quite well.

This can be written as follows:

  k = 0;
  while (*string)
    {
      k  = (k * 37) + (*string);
      k %= hash_size;
      string++;
    }
  return k;

Cheers,
Ludovic.

Today, 15 minutes, 40 seconds ago, Roland Orre wrote:
> I noticed that our data mining software run very very slow with a
> new data base. I localized the problem to scm_string_hash.
> 
> A hash table in this case was loaded with 14166 strings. I have
> a function which creates a reasonable sized hash table, in this
> case the hash table size was 8209.
> 
> 13856 of these strings were hashed to the same index= 1067.
> 303 strings got index = 8061.
> 2 strings got the index = 754.
> 8201 entries were empty.
> 
> We are running guile 1.6 but I checked the scm_string_hash from a recent
> 1.7 CVS also and the function in hash.c there is identical.
> 
> I added a few of the symbols hashing to 1067 below. One can of course
> argue that the symbols in this case should be hashed as numbers.
> Anyway, does anyone have any hint or have a better string hash function?
> 
> 	Best regards
> 	Roland Orre
> 
> 
> A few strings hashing to entry 1067 for hash table length 8209:
> "01632001" "01627301" "01626801" "01626601" "01626501" "01626401"
> "01626301" "01626101" "01626001" "01625901" "01625801" "01625701"
> "01625401" "01625301" "01625101" "01625001" "01624801" "01624601"
> "01624501" "01624401" "01624301" "01624101" "01624001" "01623901"
> "01623801" "01623701" "01623601" "01623501" "01623401" "01623201"
> "01623101" "01622901" "01622801" "01622701" "01622601" "01622401"
> 
> 
> 
> 
> _______________________________________________
> Guile-user mailing list
> Guile-user@gnu.org
> http://mail.gnu.org/mailman/listinfo/guile-user


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: cmod-play 1 available + modsup.h additions
  2003-11-14 13:37     ` Ludovic Courtès
@ 2003-11-14 17:38       ` Thien-Thi Nguyen
  0 siblings, 0 replies; 21+ messages in thread
From: Thien-Thi Nguyen @ 2003-11-14 17:38 UTC (permalink / raw)
  Cc: guile-user

   From: Ludovic =?iso-8859-1?Q?Court=E8s?= <ludovic.courtes@laas.fr>
   Date: Fri, 14 Nov 2003 14:37:38 +0100

   I actually don't see the point in having different namespaces for
   Guile things. :)

what you don't see i can't make you see, but i'm happy to understand
your not seeing and leave it at that.  i'll probably change "M" to
"GH_M", that is, add to GH_ rather than appropriate another namespace
entirely.  thanks for the feedback.

   The doc snarfing example you gave is a nice thing.  Unfortunately,
   there is no such thing in Guile 1.6.  Is there any reason for
   continuing development for 1.4.x rather than working on 1.6.x?

there are practical reasons having to do w/ lack of write access to the
repository, primarily.  a secondary reason is that 1.6.x introduces
onerous "deprecation" machinery, and insists on installing lots of
un-neighborly shared object libraries in ${libdir}, which would be
difficult to reconcile w/ 1.4.x's compatibility and encapsulation goals.
1.7.x has lots of internal improvements (that i will snarf for 1.4.x
once they settle down) but its packaging still suffers the same problems
as 1.6.x as far as i can tell (from watching the cvs commits m.l.).

re doc snarfing, you might try to simply overlay the 1.4.x scripts/
subdir over that of the other branches, or add its parent dir to
GUILE_LOAD_PATH, or explore guile-tools options `--scriptsdir' and
`--guileversion' (see "guile-tools --help").

i've been toying w/ the idea of excerpting and releasing scripts/ as a
separate package entirely, for the benefit of other guile branches, but
haven't found the time/inclination.  feel free to complete this program
and put your name on it, etc.:

  #!/bin/sh
  # $1 is an optional tag spec, like "-D DATE" or "-r TAG"
  set -ex
  cvs -d :pserver:anonymous@cvs.glug.org:/home/ttn/cvsroot \
      co -d guile-scripts $1 guile-core/scripts
  cd guile-scripts
  cp -p ${someplace}/{autogen.sh,configure.ac,Makefile.am,...} .
  sh -x autogen.sh
  ./configure -C --enable-maintainer-mode --prefix /tmp/a/b/c
  make
  make check
  make install
  make installcheck
  make uninstall
  make dist
  mv guile-scripts-*.tar.gz ..
  cd ..
  rm -rf guile-scripts

you just need to define ${someplace} and put the correct files there.
if you publish this package you would help other users invest in their
freedom (should they have the desire to do so).

thi


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: cmod-play 1 available + modsup.h additions
  2003-11-14 14:17 ` Marius Vollmer
  2003-11-14 15:28   ` Does anyone have a better scm_string_hash ? Roland Orre
@ 2003-11-14 17:40   ` Thien-Thi Nguyen
  1 sibling, 0 replies; 21+ messages in thread
From: Thien-Thi Nguyen @ 2003-11-14 17:40 UTC (permalink / raw)
  Cc: guile-user

   From: Marius Vollmer <mvo@zagadka.de>
   Date: Fri, 14 Nov 2003 15:17:43 +0100

   libltdl from libtool 1.5 (and probably earlier versions) does
   load the 'dependency_libs' listed in a .la file.

cool, will check it out.  thanks for the tip.

thi


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Does anyone have a better scm_string_hash ?
  2003-11-14 15:51     ` Ludovic Courtès
@ 2003-11-17  8:33       ` Roland Orre
  2003-11-17 13:01         ` Ludovic Courtès
  2003-11-17 15:42         ` Marius Vollmer
  0 siblings, 2 replies; 21+ messages in thread
From: Roland Orre @ 2003-11-17  8:33 UTC (permalink / raw)
  Cc: guile-user

Thanks,
I replaced the scm_string_hash with the one below, reminding about what
Ludovic Courtès suggested. Of course it could have been preferrably to
use it as a private hash function only, but I prefer to use the
standard functions at all places, especially as I hadn't designed
around using a private hash function (it was used in many places...).

The performance increase we obtained with this hash function were
tremendous. With the problematic symbol table the average lookup
time went down from 10 ms to 9 us on this specific machine (1 GHz PIII).
The max number of collisions went down from 13856 to 7 (on a total
of 14166 strings in a table of length 8209). The number of empty
entries went down from 8201 to 909. The run time for a data mining
scan on a database went down from 5 hours to 15 minutes. The reason
I added i in the hash function was to decrease the risk that two
strings with same but permuted contents would hash to the same value.
I have not done any thorough tests on this hash function yet.

	Best regards
	Roland Orre

/* replaced by Roland Orre orre@nada.kth.se */
#define hash_mask 0x00FFFFFF
unsigned long 
scm_string_hash (const unsigned char *str, size_t len)
{
  /* from suggestion at: */
  /* http://srfi.schemers.org/srfi-13/mail-archive/msg00112.html */
  int i;
  /* originally h=0 was suggested. orre@nada.kth.se */
  unsigned long h=177;

  for (i = len-1; i >= 0; i--)
    {
      /* h = (str[i] +i+ h*37) & hash_mask; */
      h = (str[i] + i + (h<<5) + (h<<2) + h) & hash_mask;
    }
  return h;
} /* scm_string_hash */




On Fri, 2003-11-14 at 16:51, Ludovic Courtès wrote:
> You might want to look at
> http://srfi.schemers.org/srfi-13/mail-archive/msg00112.html .
> 
> Basically, the idea there is that a hash key for string cn..c0 is:
>      h = (c0 + c1*37 + c2*37^2 + ...) % hash_size.
> I used that and it *seems* to work quite well.
> 
> This can be written as follows:
> 
>   k = 0;
>   while (*string)
>     {
>       k  = (k * 37) + (*string);
>       k %= hash_size;
>       string++;
>     }
>   return k;
> 
> Cheers,
> Ludovic.
> 
> Today, 15 minutes, 40 seconds ago, Roland Orre wrote:
> > I noticed that our data mining software run very very slow with a
> > new data base. I localized the problem to scm_string_hash.
> > 
> > A hash table in this case was loaded with 14166 strings. I have
> > a function which creates a reasonable sized hash table, in this
> > case the hash table size was 8209.
> > 
> > 13856 of these strings were hashed to the same index= 1067.
> > 303 strings got index = 8061.
> > 2 strings got the index = 754.
> > 8201 entries were empty.
> > 
> > We are running guile 1.6 but I checked the scm_string_hash from a recent
> > 1.7 CVS also and the function in hash.c there is identical.
> > 
> > I added a few of the symbols hashing to 1067 below. One can of course
> > argue that the symbols in this case should be hashed as numbers.
> > Anyway, does anyone have any hint or have a better string hash function?
> > 
> > 	Best regards
> > 	Roland Orre
> > 
> > 
> > A few strings hashing to entry 1067 for hash table length 8209:
> > "01632001" "01627301" "01626801" "01626601" "01626501" "01626401"
> > "01626301" "01626101" "01626001" "01625901" "01625801" "01625701"
> > "01625401" "01625301" "01625101" "01625001" "01624801" "01624601"
> > "01624501" "01624401" "01624301" "01624101" "01624001" "01623901"
> > "01623801" "01623701" "01623601" "01623501" "01623401" "01623201"
> > "01623101" "01622901" "01622801" "01622701" "01622601" "01622401"
> > 
> > 
> > 
> > 
> > _______________________________________________
> > Guile-user mailing list
> > Guile-user@gnu.org
> > http://mail.gnu.org/mailman/listinfo/guile-user



_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Does anyone have a better scm_string_hash ?
  2003-11-17  8:33       ` Roland Orre
@ 2003-11-17 13:01         ` Ludovic Courtès
  2003-11-17 15:42         ` Marius Vollmer
  1 sibling, 0 replies; 21+ messages in thread
From: Ludovic Courtès @ 2003-11-17 13:01 UTC (permalink / raw)
  Cc: guile-user

Hi,

Today, 3 hours, 58 minutes, 19 seconds ago, Roland Orre wrote:
> The performance increase we obtained with this hash function were
> tremendous. With the problematic symbol table the average lookup
> time went down from 10 ms to 9 us on this specific machine (1 GHz PIII).
> The max number of collisions went down from 13856 to 7 (on a total
> of 14166 strings in a table of length 8209). The number of empty
> entries went down from 8201 to 909. The run time for a data mining
> scan on a database went down from 5 hours to 15 minutes. The reason
> I added i in the hash function was to decrease the risk that two
> strings with same but permuted contents would hash to the same value.
> I have not done any thorough tests on this hash function yet.

Looks cool.

> /* replaced by Roland Orre orre@nada.kth.se */
> #define hash_mask 0x00FFFFFF
> unsigned long 
> scm_string_hash (const unsigned char *str, size_t len)
> {
>   /* from suggestion at: */
>   /* http://srfi.schemers.org/srfi-13/mail-archive/msg00112.html */
>   int i;
>   /* originally h=0 was suggested. orre@nada.kth.se */
>   unsigned long h=177;
> 
>   for (i = len-1; i >= 0; i--)
>     {
>       /* h = (str[i] +i+ h*37) & hash_mask; */
>       h = (str[i] + i + (h<<5) + (h<<2) + h) & hash_mask;
>     }
>   return h;
> } /* scm_string_hash */

I don't think you need to compute a modulo (logical and with HASH_MASK)
of the hash key, because scm_hasher () returns scm_string_hash () % n.

Thanks,
Ludovic.


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Does anyone have a better scm_string_hash ?
  2003-11-17  8:33       ` Roland Orre
  2003-11-17 13:01         ` Ludovic Courtès
@ 2003-11-17 15:42         ` Marius Vollmer
  2003-11-17 16:02           ` Marius Vollmer
  1 sibling, 1 reply; 21+ messages in thread
From: Marius Vollmer @ 2003-11-17 15:42 UTC (permalink / raw)
  Cc: guile-user

Roland Orre <orre@nada.kth.se> writes:

> /* replaced by Roland Orre orre@nada.kth.se */
> #define hash_mask 0x00FFFFFF
> unsigned long 
> scm_string_hash (const unsigned char *str, size_t len)
> {
>   /* from suggestion at: */
>   /* http://srfi.schemers.org/srfi-13/mail-archive/msg00112.html */
>   int i;
>   /* originally h=0 was suggested. orre@nada.kth.se */
>   unsigned long h=177;
>
>   for (i = len-1; i >= 0; i--)
>     {
>       /* h = (str[i] +i+ h*37) & hash_mask; */
>       h = (str[i] + i + (h<<5) + (h<<2) + h) & hash_mask;
>     }
>   return h;
> } /* scm_string_hash */

g_string_hash from glib uses the same approach, but with 31 instead of
37 and with a h=0 seed:

    /* 31 bit hash function */
    guint
    g_str_hash (gconstpointer key)
    {
      const char *p = key;
      guint h = *p;

      if (h)
        for (p += 1; *p != '\0'; p++)
          h = (h << 5) - h + *p;

      return h;
    }

So which one is it? :)

Both should be much better than the one we have right now and I'm
going to install the one from Roland without the hash_mask.

OK?

-- 
GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3  331E FAF8 226A D5D4 E405


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Does anyone have a better scm_string_hash ?
  2003-11-17 15:42         ` Marius Vollmer
@ 2003-11-17 16:02           ` Marius Vollmer
  2003-11-17 16:29             ` Marius Vollmer
  2003-11-19  9:04             ` Does anyone have a better scm_string_hash ? Ludovic Courtès
  0 siblings, 2 replies; 21+ messages in thread
From: Marius Vollmer @ 2003-11-17 16:02 UTC (permalink / raw)
  Cc: guile-user

Marius Vollmer <mvo@zagadka.de> writes:

> Both should be much better than the one we have right now and I'm
> going to install the one from Roland without the hash_mask.

Ok, I have installed the following hash function into 1.7.  As a nice
side effect, guile seems to start a bit faster now.  (Although the new
function does more work than the old.  Talk about being too clever.)

    unsigned long 
    scm_string_hash (const unsigned char *str, size_t len)
    {
      /* from suggestion at: */
      /* http://srfi.schemers.org/srfi-13/mail-archive/msg00112.html */

      unsigned long h = 0;
      while (len-- > 0)
        {
          /* h = (str[i] + h*37) % 2^bits_per_long; */
          h = *str++ + (h<<5) + (h<<2) + h;
        }
      return h;
    }

I removed 'i' from the calculation since neither the glib function nor
the one from Java uses it and I'm wary of heuristical magic when it
comes to random number generation and hash functions.  Permutations of
the input will still lead to different hash values since, for example
str[0]+str[1]*37 is different from str[1]+str[0]*37.

Just for kicks, I'm now going to see what kind of code GCC generates
for h*37 as compared to (h<<5) + (h<<2) + h...

-- 
GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3  331E FAF8 226A D5D4 E405


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Does anyone have a better scm_string_hash ?
  2003-11-17 16:02           ` Marius Vollmer
@ 2003-11-17 16:29             ` Marius Vollmer
  2003-11-17 16:48               ` Allister MacLeod
  2003-11-19  9:04             ` Does anyone have a better scm_string_hash ? Ludovic Courtès
  1 sibling, 1 reply; 21+ messages in thread
From: Marius Vollmer @ 2003-11-17 16:29 UTC (permalink / raw)
  Cc: guile-user

Marius Vollmer <mvo@zagadka.de> writes:

> Just for kicks, I'm now going to see what kind of code GCC generates
> for h*37 as compared to (h<<5) + (h<<2) + h...

Interesting.  For h = a + (h<<5) + (h<<2) + h we get this sequence
(one line is one machine instruction):

    x = h
    x = x << 5
    a = a + x
    a = a + h*4
    h = a + h

and for h = a + h*37 we get

    x = h + h*8
    x = h + x*4
    h = x + a*1

which is nearly twice as clever...

-- 
GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3  331E FAF8 226A D5D4 E405


_______________________________________________
Guile-devel mailing list
Guile-devel@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-devel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Does anyone have a better scm_string_hash ?
  2003-11-17 16:29             ` Marius Vollmer
@ 2003-11-17 16:48               ` Allister MacLeod
  2003-11-17 17:57                 ` Marius Vollmer
  0 siblings, 1 reply; 21+ messages in thread
From: Allister MacLeod @ 2003-11-17 16:48 UTC (permalink / raw)


On Mon, Nov 17, 2003 at 05:29:48PM +0100, Marius Vollmer wrote:
> Marius Vollmer <mvo@zagadka.de> writes:
> > Just for kicks, I'm now going to see what kind of code GCC generates
> > for h*37 as compared to (h<<5) + (h<<2) + h...
> Interesting.  For h = a + (h<<5) + (h<<2) + h we get this sequence
> (one line is one machine instruction):
>     x = h
>     x = x << 5
>     a = a + x
>     a = a + h*4
>     h = a + h
> and for h = a + h*37 we get
>     x = h + h*8
>     x = h + x*4
>     h = x + a*1
> which is nearly twice as clever...

Just for reference, what are the timings on a Pentium for these
instructions?  I assume from your pleased comments that
assign-from-plus-and-multiply is sufficiently fast that two of them
are less costly than a pair of assign-from-plus, an assign-from-shift,
and an assignment.  I'm curious as to why the last line of the second
way ends up being h=x+a*1, instead of h=x+a.  Are a series of
same-width instructions faster than ones with different widths?
(gawd, CISC rox, bleh)

Cheers,
 Allister

-- 
Allister MacLeod <amacleod@mv3d.com> | http://amacleod.is-a-geek.org/
 Elen síla lúmenn'omentielvo.


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Does anyone have a better scm_string_hash ?
  2003-11-17 16:48               ` Allister MacLeod
@ 2003-11-17 17:57                 ` Marius Vollmer
  2003-11-17 19:17                   ` OT: x86 assembly timings/size (was Re: Does anyone have a better scm_string_hash ?) Allister MacLeod
  0 siblings, 1 reply; 21+ messages in thread
From: Marius Vollmer @ 2003-11-17 17:57 UTC (permalink / raw)


Allister MacLeod <amacleod@mv3d.com> writes:

> Just for reference, what are the timings on a Pentium for these
> instructions?

I don't know...

> I assume from your pleased comments that
> assign-from-plus-and-multiply is sufficiently fast

"x = h + h*8" is in reality "lea (%edi,%edi,8),%edx" which is not a
general multiply-add.

> I'm curious as to why the last line of the second
> way ends up being h=x+a*1, instead of h=x+a.

It's "lea (%edx,%eax,1),%edi".  Maybe the x86 doesn't have a 3-op add
instruction?

-- 
GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3  331E FAF8 226A D5D4 E405


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 21+ messages in thread

* OT: x86 assembly timings/size (was Re: Does anyone have a better scm_string_hash ?)
  2003-11-17 17:57                 ` Marius Vollmer
@ 2003-11-17 19:17                   ` Allister MacLeod
  2003-11-17 21:27                     ` OT: x86 assembly timings/size Marius Vollmer
  0 siblings, 1 reply; 21+ messages in thread
From: Allister MacLeod @ 2003-11-17 19:17 UTC (permalink / raw)


On Mon, Nov 17, 2003 at 06:57:58PM +0100, Marius Vollmer wrote:
> Allister MacLeod <amacleod@mv3d.com> writes:
> > I assume from your pleased comments that
> > assign-from-plus-and-multiply is sufficiently fast
> "x = h + h*8" is in reality "lea (%edi,%edi,8),%edx" which is not a
> general multiply-add.

Ok.. judging from that, I'd guess that x86 has a set of instructions
of the form: "lea (R1,R2,N),D" where R1, R2, and D are registers, and
N is a power of 2.  Perhaps a better made-up name for it would be
"assign from plus and multiply by power of two."  I forget the
original expansion of lea.. maybe load extended address?

I really ought to go look at an assembly reference myself.  A brief
google for "x86 assembly timings lea" didn't turn up anything
immediately and eminently useful in the first few hits.  It did list a
PDF about Pentium-optimized hashing, though.

> > I'm curious as to why the last line of the second
> > way ends up being h=x+a*1, instead of h=x+a.
> It's "lea (%edx,%eax,1),%edi".  Maybe the x86 doesn't have a 3-op add
> instruction?

Well, I think your transliteration of the first method showed some
assignments from an addition.  Looking back at it, though, it looks
like in each case one of the operands to the addition is also the
destination for the result.  More like += than =...+...

Anyway, probably the one with just lea's is slightly faster.
Certainly unless the D=R1+R2*N instruction is more than twice as big
as D=D+R1 or D=D<<N, the resulting binary will be smaller :^)

Cheers,
 Allister

-- 
Allister MacLeod <amacleod@mv3d.com> | http://amacleod.is-a-geek.org/
 Elen síla lúmenn'omentielvo.


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: OT: x86 assembly timings/size
  2003-11-17 19:17                   ` OT: x86 assembly timings/size (was Re: Does anyone have a better scm_string_hash ?) Allister MacLeod
@ 2003-11-17 21:27                     ` Marius Vollmer
  0 siblings, 0 replies; 21+ messages in thread
From: Marius Vollmer @ 2003-11-17 21:27 UTC (permalink / raw)


Allister MacLeod <amacleod@mv3d.com> writes:

> I forget the original expansion of lea.. maybe load extended
> address?

My guess would be "load effective address"... but it's not _really_
relevant, right? :-)

> I really ought to go look at an assembly reference myself.  A brief
> google for "x86 assembly timings lea" didn't turn up anything
> immediately and eminently useful in the first few hits.

I doubt that there are any simple and reliable timing diagrams for
IA-32 instructions in general.  I would imagine tho that lea uses the
addressing unit while shl goes thru the regular integer unit, on chips
that make the distinction.  It's all pretty idle speculation on my
part, tho.  Assembler used to be fun on the 68k but I've never written
more than three consecutive lines of x86 assembler.

> Anyway, probably the one with just lea's is slightly faster.
> Certainly unless the D=R1+R2*N instruction is more than twice as big
> as D=D+R1 or D=D<<N, the resulting binary will be smaller :^)

The main point is that it doesn't pay to try to outsmart the compiler.
Isn't that a comfortable insight?  I wouldn't want to worry whether
n-- > 0 is faster than --n >= 0.  Especially when n is unsigned...

-- 
GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3  331E FAF8 226A D5D4 E405


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Does anyone have a better scm_string_hash ?
  2003-11-17 16:02           ` Marius Vollmer
  2003-11-17 16:29             ` Marius Vollmer
@ 2003-11-19  9:04             ` Ludovic Courtès
  2003-11-19 15:02               ` Marius Vollmer
  1 sibling, 1 reply; 21+ messages in thread
From: Ludovic Courtès @ 2003-11-19  9:04 UTC (permalink / raw)
  Cc: guile-user, guile-devel

Hi,

One day, 17 hours, 34 seconds ago, 
Marius Vollmer wrote:
> Ok, I have installed the following hash function into 1.7.  As a nice
> side effect, guile seems to start a bit faster now.  (Although the new
> function does more work than the old.  Talk about being too clever.)

BTW, is this function also used for hashing symbol names?  If so, then
it must slightly improve performance!

Thanks,
Ludovic.


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Does anyone have a better scm_string_hash ?
  2003-11-19  9:04             ` Does anyone have a better scm_string_hash ? Ludovic Courtès
@ 2003-11-19 15:02               ` Marius Vollmer
  0 siblings, 0 replies; 21+ messages in thread
From: Marius Vollmer @ 2003-11-19 15:02 UTC (permalink / raw)
  Cc: guile-user

Ludovic Courtès <ludovic.courtes@laas.fr> writes:

> Hi,
>
> BTW, is this function also used for hashing symbol names?

Yes.

-- 
GPG: D5D4E405 - 2F9B BCCC 8527 692A 04E3  331E FAF8 226A D5D4 E405


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2003-11-19 15:02 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-11-13 21:55 cmod-play 1 available + modsup.h additions Thien-Thi Nguyen
2003-11-14  8:26 ` Ludovic Courtès
2003-11-14 13:10   ` Thien-Thi Nguyen
2003-11-14 13:37     ` Ludovic Courtès
2003-11-14 17:38       ` Thien-Thi Nguyen
2003-11-14 14:29   ` Marius Vollmer
2003-11-14 14:17 ` Marius Vollmer
2003-11-14 15:28   ` Does anyone have a better scm_string_hash ? Roland Orre
2003-11-14 15:51     ` Ludovic Courtès
2003-11-17  8:33       ` Roland Orre
2003-11-17 13:01         ` Ludovic Courtès
2003-11-17 15:42         ` Marius Vollmer
2003-11-17 16:02           ` Marius Vollmer
2003-11-17 16:29             ` Marius Vollmer
2003-11-17 16:48               ` Allister MacLeod
2003-11-17 17:57                 ` Marius Vollmer
2003-11-17 19:17                   ` OT: x86 assembly timings/size (was Re: Does anyone have a better scm_string_hash ?) Allister MacLeod
2003-11-17 21:27                     ` OT: x86 assembly timings/size Marius Vollmer
2003-11-19  9:04             ` Does anyone have a better scm_string_hash ? Ludovic Courtès
2003-11-19 15:02               ` Marius Vollmer
2003-11-14 17:40   ` cmod-play 1 available + modsup.h additions Thien-Thi Nguyen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).