unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: Ken Anderson <kanderson@bbn.com>
Cc: guile-user@gnu.org
Subject: Re: null terminated strings
Date: Mon, 19 Jan 2004 14:16:44 -0500	[thread overview]
Message-ID: <5.2.0.9.2.20040119135621.0a570940@zima.bbn.com> (raw)
In-Reply-To: <400C260B.8050306@bothner.com>

At 10:46 AM 1/19/2004 -0800, Per Bothner wrote:
>Ken Anderson wrote:
>
>>    In Java, which does copy-on-write
>
>String (including substrings) are immutable, so they cannot be written.
>The implementation of the StringBuffer class does do copy-on-write, but
>that doesn't affect substrings.
>
>>i often find myself  carefully copying the substrings so they don't share structure.
>
>Why?  The only reason I can think of is garbage collection:  A shared
>substring prevents the base from being collected.

Yes.  Say you do something like (this is JScheme):
> (define text "foo bar")
"foo bar"
> (define r (StringReader. text))
java.io.StringReader@9945ce
> (define b (BufferedReader. r))
java.io.BufferedReader@2d96f2
> (define line (.readLine b))
"foo bar"
> (define a (.substring line 0 3))
"foo"
> (define b (.substring line 4))
"bar"
> (describe a)
foo
 is an instance of java.lang.String

  // from java.lang.String
  value: [C@79e304
  offset: 0
  count: 3
  hash: 0
()
> (describe b)
bar
 is an instance of java.lang.String

  // from java.lang.String
  value: [C@79e304
  offset: 4
  count: 3
  hash: 0
()
> (vector-length (.value$# a))
80

a and b share the same char[] of size 80, which wastes a lot of space in this case. (80 is the default string buffer size in BufferedReader).


>>This is because of things like:
>>- i don't know how long the underlying string (char array actuall) is.
>
>So?

So you don't know how much space your line is taking up.

>>Java only has one kind of string, which is fairly heavy weight.  For example, the string "" takes 36 bytes:
>>
>>>(describe "")
>> is an instance of java.lang.String
>>  // from java.lang.String
>>  value: [C@d42d08
>>  offset: 0
>>  count: 0
>>  hash: 0
>
>This depends on the implementation, and the version of the
>implementation.
>
>GCJ uses for "":
>  object header (4 bytes on 32-but systems)
>  private Object data; /* points to itself in this case */
>  private int boffset; /* offset of first char within data */
>  int count; /* number of character */
>  private int cachedHashCode;
>  /* chars follow if data==this */
>(The data and boffset fields are only accessed by native C++ code.)
>
>Total 20 bytes.

Much better. 



_______________________________________________
Guile-user mailing list
Guile-user@gnu.org
http://mail.gnu.org/mailman/listinfo/guile-user


      reply	other threads:[~2004-01-19 19:16 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-01-05 17:40 argz SMOB Brian S McQueen
2004-01-06 19:54 ` Daniel Skarda
2004-01-08 16:44   ` Brian S McQueen
2004-01-09 14:08     ` Daniel Skarda
2004-01-12 16:08       ` Brian S McQueen
2004-01-15 18:43   ` Brian S McQueen
2004-01-16  0:21     ` Paul Jarc
2004-01-16  9:10       ` null terminated strings (was: argz SMOB) Andreas Voegele
     [not found]         ` <1074245327.6733.9.camel@localhost>
2004-01-16 10:17           ` null terminated strings Andreas Voegele
2004-01-16 11:02             ` Roland Orre
2004-01-16 12:24               ` Andreas Voegele
2004-01-16 18:20                 ` Brian S McQueen
2004-01-16 20:36                   ` Paul Jarc
2004-01-16 21:06                     ` Tom Lord
2004-01-16 21:02                       ` Paul Jarc
2004-01-16 21:27                         ` Roland Orre
2004-01-19 17:28                       ` Ken Anderson
2004-01-19 18:46                         ` Per Bothner
2004-01-19 19:16                           ` Ken Anderson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5.2.0.9.2.20040119135621.0a570940@zima.bbn.com \
    --to=kanderson@bbn.com \
    --cc=guile-user@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).