unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* argz SMOB
@ 2004-01-05 17:40 Brian S McQueen
  2004-01-06 19:54 ` Daniel Skarda
  0 siblings, 1 reply; 19+ messages in thread
From: Brian S McQueen @ 2004-01-05 17:40 UTC (permalink / raw)


I have the beginnings of a SMOB library for GNU libc's argz.  I will also
be adding the envz functionality.

Brian McQueen


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: argz SMOB
  2004-01-05 17:40 argz SMOB Brian S McQueen
@ 2004-01-06 19:54 ` Daniel Skarda
  2004-01-08 16:44   ` Brian S McQueen
  2004-01-15 18:43   ` Brian S McQueen
  0 siblings, 2 replies; 19+ messages in thread
From: Daniel Skarda @ 2004-01-06 19:54 UTC (permalink / raw)
  Cc: guile-user

Hello,

Brian S McQueen <bqueen@nas.nasa.gov> writes:
> I have the beginnings of a SMOB library for GNU libc's argz.  I will also
> be adding the envz functionality.

 I am sorry I do not see what is it good for. Could you please explain what is
your itch you are trying to scratch with this SMOB library?

0.


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: argz SMOB
  2004-01-06 19:54 ` Daniel Skarda
@ 2004-01-08 16:44   ` Brian S McQueen
  2004-01-09 14:08     ` Daniel Skarda
  2004-01-15 18:43   ` Brian S McQueen
  1 sibling, 1 reply; 19+ messages in thread
From: Brian S McQueen @ 2004-01-08 16:44 UTC (permalink / raw)
  Cc: guile-user

I have several libraries in use here which make database queries,
returning all the results in an argz (actually as an envz).  I wanted to
be able to tweak anything via guile. So I added guile to the project,
giving me the ability to tweak from a script file without recompiling.

Brian

On Tue, 6 Jan 2004, Daniel Skarda wrote:

> Hello,
>
> Brian S McQueen <bqueen@nas.nasa.gov> writes:
> > I have the beginnings of a SMOB library for GNU libc's argz.  I will also
> > be adding the envz functionality.
>
>  I am sorry I do not see what is it good for. Could you please explain what is
> your itch you are trying to scratch with this SMOB library?
>
> 0.
>


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: argz SMOB
  2004-01-08 16:44   ` Brian S McQueen
@ 2004-01-09 14:08     ` Daniel Skarda
  2004-01-12 16:08       ` Brian S McQueen
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Skarda @ 2004-01-09 14:08 UTC (permalink / raw)
  Cc: guile-user


> I have several libraries in use here which make database queries,
> returning all the results in an argz (actually as an envz).  I wanted to
> be able to tweak anything via guile. So I added guile to the project,
> giving me the ability to tweak from a script file without recompiling.

  Before I sent you my message, I read argz/envz description in libc reference
manual and wondered why anybody would like to use such strange data structure.
(which I still do not understand :)

  IMHO it would be better to convert output from your database queries to lists
of strings and symbols (or alists in case of envz). They are more "natural" and
"convenient" way for data representation in Scheme/Lisp and for processing
results you can use tools already present in Guile (for-each, map, fold, ...)
instead of reinventing your own for argz/envz SMOBS.

0.


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: argz SMOB
  2004-01-09 14:08     ` Daniel Skarda
@ 2004-01-12 16:08       ` Brian S McQueen
  0 siblings, 0 replies; 19+ messages in thread
From: Brian S McQueen @ 2004-01-12 16:08 UTC (permalink / raw)
  Cc: guile-user

The db libraries are a fact I must live with here.  But I made the SMOB
interface, and started fiddling with it and noticed that, since NULL is
allowed in a scheme string, that the entire argz thing can be viewed as a
single string, and the guile string interface can be used.  I will try it
today.

BTW - GNU C's argz and envz are quite simple to use.  You can do bust up a
query string into a hash like structure with near perl like ease.  There
is no memory allocation.  Another very cool feature of GNU C is the
obstack.

I will post again when I test out the SCM string to C argz.

Brian

On Fri, 9 Jan 2004, Daniel Skarda wrote:

>
> > I have several libraries in use here which make database queries,
> > returning all the results in an argz (actually as an envz).  I wanted to
> > be able to tweak anything via guile. So I added guile to the project,
> > giving me the ability to tweak from a script file without recompiling.
>
>   Before I sent you my message, I read argz/envz description in libc reference
> manual and wondered why anybody would like to use such strange data structure.
> (which I still do not understand :)
>
>   IMHO it would be better to convert output from your database queries to lists
> of strings and symbols (or alists in case of envz). They are more "natural" and
> "convenient" way for data representation in Scheme/Lisp and for processing
> results you can use tools already present in Guile (for-each, map, fold, ...)
> instead of reinventing your own for argz/envz SMOBS.
>
> 0.
>


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: argz SMOB
  2004-01-06 19:54 ` Daniel Skarda
  2004-01-08 16:44   ` Brian S McQueen
@ 2004-01-15 18:43   ` Brian S McQueen
  2004-01-16  0:21     ` Paul Jarc
  1 sibling, 1 reply; 19+ messages in thread
From: Brian S McQueen @ 2004-01-15 18:43 UTC (permalink / raw)
  Cc: guile-user

Since scheme strings can contain the null character, I noticed there was
no need for an argz smob.  I removed it by using scheme strings instead
and it works excellently.  The C functions that produce argzs, are all
callable from scheme since the argz is defined by a pointer and a length,
just as are scheme strings.  The C functions are wrapped in simple
functions which use the guile API, and the trailing null char in each argz
is dropped.

I am sure that nobody is particularly interested in argzs, but some
readers may be interested in a real life example of the guile API.  Below
are some examples of how I gained access to some C function from guile.
If any of you veterans have any more constructive advice, I would be glad
to hear it, but don't ask me to get rid of the argz.  They are here to
stay!  Particularly, I wonder about the best way to produce a null
terminated C string from a scheme string.  I used scm_must_malloc, memcpy,
memset.  I expected a ready made guile call for this purpose, but
I did not find any.

A simple call to a function which is expecting an argz and returns
nothing:

static SCM printer_hostile_printer(SCM scm_out_buff) {

  struct argz_holder out_buff;

  SCM_ASSERT (SCM_STRINGP (scm_out_buff), scm_out_buff, SCM_ARG1,
"printer_hostile_printer");

  out_buff.argz = SCM_STRING_CHARS(scm_out_buff);
  out_buff.argz_len = SCM_STRING_LENGTH(scm_out_buff);

  output_printer(&out_buff);

  return SCM_UNDEFINED;

}

A call to a database query function which returns an argz full of query
results:

static SCM get_from_db(SCM scm_login) {

  char *login_chrs;
  int login_len;

  char * login;

  struct db_parm_holder  login_parm = { 0 };
  struct db_parm_holder argz_parm = { 0 };

  SCM ret_val;

  SCM_ASSERT (SCM_STRINGP (scm_login), scm_login, SCM_ARG1,
"get_from_db");

  login_chrs = SCM_STRING_CHARS(scm_login);
  login_len = SCM_STRING_LENGTH(scm_login);

  login  = (char *)scm_must_malloc(login_len + 1, "get_from_db");
  memcpy(login, login_chrs, login_len);
  memset(login + login_len, '\0', 1);

  set_db_in_parm_str(&login_parm, "@login", USER_ID_LEN, login);
  set_db_ret_parm_envz(&argz_parm, NULL);
  db_query_va("get_from_db", &login_parm, &argz_parm, NULL);

  //don't take the last null term on the argz
  ret_val =  scm_mem2string(argz_parm.data, argz_parm.data_len - 1);

  free(argz_parm.data);

  scm_must_free(login);

  return ret_val;

}





_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: argz SMOB
  2004-01-15 18:43   ` Brian S McQueen
@ 2004-01-16  0:21     ` Paul Jarc
  2004-01-16  9:10       ` null terminated strings (was: argz SMOB) Andreas Voegele
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Jarc @ 2004-01-16  0:21 UTC (permalink / raw)
  Cc: guile-user, Daniel Skarda

Brian S McQueen <bqueen@nas.nasa.gov> wrote:
> Particularly, I wonder about the best way to produce a null
> terminated C string from a scheme string.

SCM_STRING_COERCE_0TERMINATION_X(scheme_string);
char* s=SCM_STRING_CHARS(scheme_string);

Or do you need a separate copy?

> static SCM printer_hostile_printer(SCM scm_out_buff) {
>
>   struct argz_holder out_buff;
>
>   SCM_ASSERT (SCM_STRINGP (scm_out_buff), scm_out_buff, SCM_ARG1,
> "printer_hostile_printer");

You could also write that as:
#define FUNC_NAME s_printer_hostile_printer
SCM_DEFINE(printer_hostile_printer, "printer_hostile_printer",
           1, 0, 0, (SCM scm_out_buff))
{
  struct argz_holder out_buff;
  SCM_VALIDATE_STRING(SCM_ARG1, scm_out_buff);


paul


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 19+ messages in thread

* null terminated strings (was: argz SMOB)
  2004-01-16  0:21     ` Paul Jarc
@ 2004-01-16  9:10       ` Andreas Voegele
       [not found]         ` <1074245327.6733.9.camel@localhost>
  0 siblings, 1 reply; 19+ messages in thread
From: Andreas Voegele @ 2004-01-16  9:10 UTC (permalink / raw)


Paul Jarc writes:

> Brian S McQueen <bqueen@nas.nasa.gov> wrote:
>> Particularly, I wonder about the best way to produce a null
>> terminated C string from a scheme string.
>
> SCM_STRING_COERCE_0TERMINATION_X(scheme_string);
> char* s=SCM_STRING_CHARS(scheme_string);

Is it necessary or desirable to use the macro
SCM_STRING_COERCE_0TERMINATION_X if one uses Guile 1.6?  I've found
the following statement in the file libguile/strings.c that comes with
Guile 1.6.4:

"[...] we promise that strings are null-terminated."

And in guile-readline/Changelog:

"Remove calls to SCM_STRING_COERCE_0TERMINATION_X.  Since the
substring type is gone, all strings are 0-terminated anyway."


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: null terminated strings
       [not found]         ` <1074245327.6733.9.camel@localhost>
@ 2004-01-16 10:17           ` Andreas Voegele
  2004-01-16 11:02             ` Roland Orre
  0 siblings, 1 reply; 19+ messages in thread
From: Andreas Voegele @ 2004-01-16 10:17 UTC (permalink / raw)


Roland Orre writes:

> On Fri, 2004-01-16 at 10:10, Andreas Voegele wrote:
>
>> "Remove calls to SCM_STRING_COERCE_0TERMINATION_X.  Since the
>> substring type is gone, all strings are 0-terminated anyway."
>
> Such a statement is very worrying. What happened to the promises
> that all substrings will be shared strings?

I'm new to Guile.  I don't know why shared strings were removed from
Guile, but things were probably simplified greatly by this decision.
I think that it is very useful to zero terminate all strings since
embedding and extending Guile becomes easier.

It seems that you'll have to stick to Thien-Thi Nguyen's Guile
version, available at http://www.glug.org/, if you need shared
strings.


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: null terminated strings
  2004-01-16 10:17           ` null terminated strings Andreas Voegele
@ 2004-01-16 11:02             ` Roland Orre
  2004-01-16 12:24               ` Andreas Voegele
  0 siblings, 1 reply; 19+ messages in thread
From: Roland Orre @ 2004-01-16 11:02 UTC (permalink / raw)
  Cc: guile-user

On Fri, 2004-01-16 at 11:17, Andreas Voegele wrote:
> Roland Orre writes:
> 
> > On Fri, 2004-01-16 at 10:10, Andreas Voegele wrote:
> >
> >> "Remove calls to SCM_STRING_COERCE_0TERMINATION_X.  Since the
> >> substring type is gone, all strings are 0-terminated anyway."
> >
> > Such a statement is very worrying. What happened to the promises
> > that all substrings will be shared strings?
> 
> I'm new to Guile.  I don't know why shared strings were removed from
> Guile, but things were probably simplified greatly by this decision.
> I think that it is very useful to zero terminate all strings since
> embedding and extending Guile becomes easier.

During all years I've used scm and later guile (I think 14 years now)
I have very very rarely had any need for null terminated strings. There
is a function in guile to convert a null terminated string to a guile
string, this I've used a few times. In C programming I often don't
rely on null terminated strings either as there are many C library
functions that works with length, which I consider more elegant. The
absence of shared substrings on the other hand means a lot of special
code to be able to do something similar without having to copy and
create strings all the time. For instance in a conversion routine for
fixed data base tables I made some years ago I had first used
substrings. The program took 15 hours to run on a specific table. When
I changed to use shared substrings it took about 1 hour on the same
table. 

I think it is quite bad to have the requirement that guile should rely
on null terminated strings as it is often quite easy to come around 
this from my point of view. In guile 1.6.4 it is also expressed so that
all substrings are intended to become internally shared, which I don't
see have happened yet.

> It seems that you'll have to stick to Thien-Thi Nguyen's Guile
> version, available at http://www.glug.org/, if you need shared
> strings.

OK, I haven't haven't followed his development very carefully, but
maybe I should take a closer look. I don't know if he has catched
up with the functionality with goops and such, which I need for
matrix calculations.

	Best regards
	Roland Orre




_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: null terminated strings
  2004-01-16 11:02             ` Roland Orre
@ 2004-01-16 12:24               ` Andreas Voegele
  2004-01-16 18:20                 ` Brian S McQueen
  0 siblings, 1 reply; 19+ messages in thread
From: Andreas Voegele @ 2004-01-16 12:24 UTC (permalink / raw)


Roland Orre writes:

> On Fri, 2004-01-16 at 11:17, Andreas Voegele wrote:
>> Roland Orre writes:
>> 
>> > On Fri, 2004-01-16 at 10:10, Andreas Voegele wrote:
>> >
>> >> "Remove calls to SCM_STRING_COERCE_0TERMINATION_X.  Since the
>> >> substring type is gone, all strings are 0-terminated anyway."
>> >
>> > Such a statement is very worrying. What happened to the promises
>> > that all substrings will be shared strings?
>> 
>> I'm new to Guile.  I don't know why shared strings were removed from
>> Guile, but things were probably simplified greatly by this decision.
>> I think that it is very useful to zero terminate all strings since
>> embedding and extending Guile becomes easier.
>
> During all years I've used scm and later guile (I think 14 years now)
> I have very very rarely had any need for null terminated strings. There
> is a function in guile to convert a null terminated string to a guile
> string, this I've used a few times. In C programming I often don't
> rely on null terminated strings either as there are many C library
> functions that works with length, which I consider more elegant.

It probably depends on what you're doing with Guile.  I intend to use
Guile to glue together code from a lot of existing C libraries.  I'm
currently writing wrappers for several C libraries and a lot of
functions provided by these libraries require null terminated strings.

On the other hand, I agree that a stable API is important.  Because of
this I also thought about using Guile 1.4 but I came to the conclusion
that Guile 1.6 is better when you need to wrap a lot of C libraries.
(I know SWIG but I don't like it very much.)

> The absence of shared substrings on the other hand means a lot of
> special code to be able to do something similar without having to
> copy and create strings all the time. For instance in a conversion
> routine for fixed data base tables I made some years ago I had first
> used substrings. The program took 15 hours to run on a specific
> table. When I changed to use shared substrings it took about 1 hour
> on the same table.

That's indeed a huge difference.

> I think it is quite bad to have the requirement that guile should
> rely on null terminated strings as it is often quite easy to come
> around this from my point of view. In guile 1.6.4 it is also
> expressed so that all substrings are intended to become internally
> shared, which I don't see have happened yet.

I'm wondering how substrings could be shared when there's also a
promise that all strings are null terminated.


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: null terminated strings
  2004-01-16 12:24               ` Andreas Voegele
@ 2004-01-16 18:20                 ` Brian S McQueen
  2004-01-16 20:36                   ` Paul Jarc
  0 siblings, 1 reply; 19+ messages in thread
From: Brian S McQueen @ 2004-01-16 18:20 UTC (permalink / raw)


I know you folks are discussing an important topic regarding the design
and future of libguile, but my more simple, but I am a very basic question
on this subject.  I am wondering about the most practical way for a
programmer to get a C string from a scheme string.  So what are some
common techniques?

The approach I used was tedious, so I expect you folks have a better way.
I did this:

  login_chrs = SCM_STRING_CHARS(login_scm);
  login_len = SCM_STRING_LENGTH(login_scm);

  login  = (char *)scm_must_malloc(login_len + 1, "func");
  memcpy(login, login_chrs, login_len);
  memset(login + login_len, '\0', 1);

  If the scheme string is truly null terminated, this can be greatly
simplified.

Brian

On Fri, 16 Jan 2004, Andreas Voegele wrote:

> Roland Orre writes:
>
> > On Fri, 2004-01-16 at 11:17, Andreas Voegele wrote:
> >> Roland Orre writes:
> >>
> >> > On Fri, 2004-01-16 at 10:10, Andreas Voegele wrote:
> >> >
> >> >> "Remove calls to SCM_STRING_COERCE_0TERMINATION_X.  Since the
> >> >> substring type is gone, all strings are 0-terminated anyway."
> >> >
> >> > Such a statement is very worrying. What happened to the promises
> >> > that all substrings will be shared strings?
> >>
> >> I'm new to Guile.  I don't know why shared strings were removed from
> >> Guile, but things were probably simplified greatly by this decision.
> >> I think that it is very useful to zero terminate all strings since
> >> embedding and extending Guile becomes easier.
> >
> > During all years I've used scm and later guile (I think 14 years now)
> > I have very very rarely had any need for null terminated strings. There
> > is a function in guile to convert a null terminated string to a guile
> > string, this I've used a few times. In C programming I often don't
> > rely on null terminated strings either as there are many C library
> > functions that works with length, which I consider more elegant.
>
> It probably depends on what you're doing with Guile.  I intend to use
> Guile to glue together code from a lot of existing C libraries.  I'm
> currently writing wrappers for several C libraries and a lot of
> functions provided by these libraries require null terminated strings.
>
> On the other hand, I agree that a stable API is important.  Because of
> this I also thought about using Guile 1.4 but I came to the conclusion
> that Guile 1.6 is better when you need to wrap a lot of C libraries.
> (I know SWIG but I don't like it very much.)
>
> > The absence of shared substrings on the other hand means a lot of
> > special code to be able to do something similar without having to
> > copy and create strings all the time. For instance in a conversion
> > routine for fixed data base tables I made some years ago I had first
> > used substrings. The program took 15 hours to run on a specific
> > table. When I changed to use shared substrings it took about 1 hour
> > on the same table.
>
> That's indeed a huge difference.
>
> > I think it is quite bad to have the requirement that guile should
> > rely on null terminated strings as it is often quite easy to come
> > around this from my point of view. In guile 1.6.4 it is also
> > expressed so that all substrings are intended to become internally
> > shared, which I don't see have happened yet.
>
> I'm wondering how substrings could be shared when there's also a
> promise that all strings are null terminated.
>
>
> _______________________________________________
> Guile-user mailing list
> Guile-user@gnu.org
> http://mail.gnu.org/mailman/listinfo/guile-user
>


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: null terminated strings
  2004-01-16 18:20                 ` Brian S McQueen
@ 2004-01-16 20:36                   ` Paul Jarc
  2004-01-16 21:06                     ` Tom Lord
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Jarc @ 2004-01-16 20:36 UTC (permalink / raw)
  Cc: guile-user

Brian S McQueen <bqueen@nas.nasa.gov> wrote:
>   If the scheme string is truly null terminated, this can be greatly
> simplified.

Apparently, whether it is guaranteed to be terminated depends on what
version of Guile you're using.  With 1.6.4, it might not be terminated
(judging by the fact that make-shared-substring exists in that
version).  But regardless of whether the guarantee is there in
general, you can always make it so for a particular string using
SCM_STRING_COERCE_0TERMINATION_X, as I showed before.  Versions of
Guile that guarantee termination can define that macro as a no-op to
support existing code.

FWIW, I think that (preferably copy-on-write) shared substrings are
valuable enough for performance (not to mention backward
compatibility) that Guile should not remove them for the sake of
guaranteeing termination.  With shared substrings, we can still get
termination when we need it with SCM_STRING_COERCE_0TERMINATION_X, but
without shared substrings, we cannot get performance when we need it.


paul


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: null terminated strings
  2004-01-16 21:06                     ` Tom Lord
@ 2004-01-16 21:02                       ` Paul Jarc
  2004-01-16 21:27                         ` Roland Orre
  2004-01-19 17:28                       ` Ken Anderson
  1 sibling, 1 reply; 19+ messages in thread
From: Paul Jarc @ 2004-01-16 21:02 UTC (permalink / raw)
  Cc: guile-user

Tom Lord <lord@emf.net> wrote:
>     From: prj@po.cwru.edu (Paul Jarc)
>
>     > FWIW, I think that (preferably copy-on-write) shared substrings are
>     > valuable enough for performance
>
> mutation-effects-both shared substrings are valuable as a feature of
> (extended) Scheme.

I agree... now. :)

> mutation-effects-both shared substrings are an improvement to Scheme.

Unless you're a fan of pure functional languages, of course.


paul


_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: null terminated strings
  2004-01-16 20:36                   ` Paul Jarc
@ 2004-01-16 21:06                     ` Tom Lord
  2004-01-16 21:02                       ` Paul Jarc
  2004-01-19 17:28                       ` Ken Anderson
  0 siblings, 2 replies; 19+ messages in thread
From: Tom Lord @ 2004-01-16 21:06 UTC (permalink / raw)
  Cc: guile-user


    > From: prj@po.cwru.edu (Paul Jarc)

    > FWIW, I think that (preferably copy-on-write) shared substrings are
    > valuable enough for performance 

mutation-effects-both shared substrings are valuable as a feature of
(extended) Scheme.

A common idiom in string-manipulating programs is to manipulate and
pass around triples (STRING START END).

It's well worthwhile to make such triples a first-class type -- the
alternative is to have to write lots of string manipulation functions
in a style where they take three actual parameters (ideally with two
of those being optional) to represent a single conceptual parameter.

And if you have that abstract data type, since it is compatible with
the RnRS requirements for the STRING? type, it may as well be a subset
of the STRING? type.

copy-on-write shared substrings are a performance feature --
mutation-effects-both shared substrings are an improvement to Scheme.

-t




_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: null terminated strings
  2004-01-16 21:02                       ` Paul Jarc
@ 2004-01-16 21:27                         ` Roland Orre
  0 siblings, 0 replies; 19+ messages in thread
From: Roland Orre @ 2004-01-16 21:27 UTC (permalink / raw)
  Cc: guile-user

On Fri, 2004-01-16 at 22:02, Paul Jarc wrote:
>> Tom Lord <lord@emf.net> wrote:
>> mutation-effects-both shared substrings are an improvement to Scheme.
>>
> Unless you're a fan of pure functional languages, of course.

>From my point of view fans of pure functional languages will not
usually see much of the reality. At least not as long as we are
speaking (semi-)interpreted languages as guile, even though I
myself, as being a quite imperative scheme programmer often looses
in contests with a clever functional programmer...

	Roland Orre




_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: null terminated strings
  2004-01-16 21:06                     ` Tom Lord
  2004-01-16 21:02                       ` Paul Jarc
@ 2004-01-19 17:28                       ` Ken Anderson
  2004-01-19 18:46                         ` Per Bothner
  1 sibling, 1 reply; 19+ messages in thread
From: Ken Anderson @ 2004-01-19 17:28 UTC (permalink / raw)
  Cc: guile-user, prj

At 01:06 PM 1/16/2004 -0800, Tom Lord wrote:

>    > From: prj@po.cwru.edu (Paul Jarc)
>
>    > FWIW, I think that (preferably copy-on-write) shared substrings are
>    > valuable enough for performance 
>
>mutation-effects-both shared substrings are valuable as a feature of
>(extended) Scheme.
>
>A common idiom in string-manipulating programs is to manipulate and
>pass around triples (STRING START END).
>
>It's well worthwhile to make such triples a first-class type -- the
>alternative is to have to write lots of string manipulation functions
>in a style where they take three actual parameters (ideally with two
>of those being optional) to represent a single conceptual parameter.
>
>And if you have that abstract data type, since it is compatible with
>the RnRS requirements for the STRING? type, it may as well be a subset
>of the STRING? type.
>
>copy-on-write shared substrings are a performance feature --
>mutation-effects-both shared substrings are an improvement to Scheme.

Can you describe how this is an improvement to Scheme?  I don't think i've ever wanted to do this.   In Java, which does copy-on-write i often find myself carefully copying the substrings so they don't share structure.  This is because of things like:

- i don't know how long the underlying string (char array actuall) is.  It can be much longer than the last line i've read.
- many fields are a relatively small set of values so interning them helps.
- some fields become something besides strings, such as numbers.

Java only has one kind of string, which is fairly heavy weight.  For example, the string "" takes 36 bytes:
> (describe "")

 is an instance of java.lang.String

  // from java.lang.String
  value: [C@d42d08
  offset: 0
  count: 0
  hash: 0

So, it might help to have several representations for strings.  In Gule, of course, you can play even more games because you have C underneath.

 From the applications i've seen, interning strings has been more useful  



_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: null terminated strings
  2004-01-19 17:28                       ` Ken Anderson
@ 2004-01-19 18:46                         ` Per Bothner
  2004-01-19 19:16                           ` Ken Anderson
  0 siblings, 1 reply; 19+ messages in thread
From: Per Bothner @ 2004-01-19 18:46 UTC (permalink / raw)
  Cc: guile-user

Ken Anderson wrote:

 >    In Java, which does copy-on-write

String (including substrings) are immutable, so they cannot be written.
The implementation of the StringBuffer class does do copy-on-write, but
that doesn't affect substrings.

> i often find myself  carefully copying the substrings so they don't share structure.

Why?  The only reason I can think of is garbage collection:  A shared
substring prevents the base from being collected.

> This is because of things like:
> - i don't know how long the underlying string (char array actuall) is.

So?

> Java only has one kind of string, which is fairly heavy weight.  For example, the string "" takes 36 bytes:
> 
>>(describe "")
>  is an instance of java.lang.String
> 
>   // from java.lang.String
>   value: [C@d42d08
>   offset: 0
>   count: 0
>   hash: 0

This depends on the implementation, and the version of the
implementation.

GCJ uses for "":
   object header (4 bytes on 32-but systems)
   private Object data; /* points to itself in this case */
   private int boffset; /* offset of first char within data */
   int count; /* number of character */
   private int cachedHashCode;
   /* chars follow if data==this */
(The data and boffset fields are only accessed by native C++ code.)

Total 20 bytes.
-- 
	--Per Bothner
per@bothner.com   http://per.bothner.com/



_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: null terminated strings
  2004-01-19 18:46                         ` Per Bothner
@ 2004-01-19 19:16                           ` Ken Anderson
  0 siblings, 0 replies; 19+ messages in thread
From: Ken Anderson @ 2004-01-19 19:16 UTC (permalink / raw)
  Cc: guile-user

At 10:46 AM 1/19/2004 -0800, Per Bothner wrote:
>Ken Anderson wrote:
>
>>    In Java, which does copy-on-write
>
>String (including substrings) are immutable, so they cannot be written.
>The implementation of the StringBuffer class does do copy-on-write, but
>that doesn't affect substrings.
>
>>i often find myself  carefully copying the substrings so they don't share structure.
>
>Why?  The only reason I can think of is garbage collection:  A shared
>substring prevents the base from being collected.

Yes.  Say you do something like (this is JScheme):
> (define text "foo bar")
"foo bar"
> (define r (StringReader. text))
java.io.StringReader@9945ce
> (define b (BufferedReader. r))
java.io.BufferedReader@2d96f2
> (define line (.readLine b))
"foo bar"
> (define a (.substring line 0 3))
"foo"
> (define b (.substring line 4))
"bar"
> (describe a)
foo
 is an instance of java.lang.String

  // from java.lang.String
  value: [C@79e304
  offset: 0
  count: 3
  hash: 0
()
> (describe b)
bar
 is an instance of java.lang.String

  // from java.lang.String
  value: [C@79e304
  offset: 4
  count: 3
  hash: 0
()
> (vector-length (.value$# a))
80

a and b share the same char[] of size 80, which wastes a lot of space in this case. (80 is the default string buffer size in BufferedReader).


>>This is because of things like:
>>- i don't know how long the underlying string (char array actuall) is.
>
>So?

So you don't know how much space your line is taking up.

>>Java only has one kind of string, which is fairly heavy weight.  For example, the string "" takes 36 bytes:
>>
>>>(describe "")
>> is an instance of java.lang.String
>>  // from java.lang.String
>>  value: [C@d42d08
>>  offset: 0
>>  count: 0
>>  hash: 0
>
>This depends on the implementation, and the version of the
>implementation.
>
>GCJ uses for "":
>  object header (4 bytes on 32-but systems)
>  private Object data; /* points to itself in this case */
>  private int boffset; /* offset of first char within data */
>  int count; /* number of character */
>  private int cachedHashCode;
>  /* chars follow if data==this */
>(The data and boffset fields are only accessed by native C++ code.)
>
>Total 20 bytes.

Much better. 



_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2004-01-19 19:16 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-01-05 17:40 argz SMOB Brian S McQueen
2004-01-06 19:54 ` Daniel Skarda
2004-01-08 16:44   ` Brian S McQueen
2004-01-09 14:08     ` Daniel Skarda
2004-01-12 16:08       ` Brian S McQueen
2004-01-15 18:43   ` Brian S McQueen
2004-01-16  0:21     ` Paul Jarc
2004-01-16  9:10       ` null terminated strings (was: argz SMOB) Andreas Voegele
     [not found]         ` <1074245327.6733.9.camel@localhost>
2004-01-16 10:17           ` null terminated strings Andreas Voegele
2004-01-16 11:02             ` Roland Orre
2004-01-16 12:24               ` Andreas Voegele
2004-01-16 18:20                 ` Brian S McQueen
2004-01-16 20:36                   ` Paul Jarc
2004-01-16 21:06                     ` Tom Lord
2004-01-16 21:02                       ` Paul Jarc
2004-01-16 21:27                         ` Roland Orre
2004-01-19 17:28                       ` Ken Anderson
2004-01-19 18:46                         ` Per Bothner
2004-01-19 19:16                           ` Ken Anderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).