* shared-substrings missing in 1.7 @ 2004-01-16 8:49 Roland Orre 2004-01-16 12:52 ` Marco Parrone 2004-01-16 19:13 ` Greg Troxel 0 siblings, 2 replies; 14+ messages in thread From: Roland Orre @ 2004-01-16 8:49 UTC (permalink / raw) I'm still trying to adapt to guile 1.7. It is always annoying when some function one makes heavy usage of disappears. make-shared-substring is such a function, which is very handy for conversions and reading fields from fixed width data base tables. As I've used shared substrings it is also non trivial to change the code. In guile 1.6 it was said that explicit shared substrings would disappear, and be replaced by shared strings internally, which I interpret so that e.g. (substring ...) would return a shared substring, which would preserve the functionality, but... substrings are still made by copying in 1.7, and now the tag for shared_substring is also removed. To be able to continue adapt to guile 1.7 now I have to do something quickly. In about half of my code I can easily replace make-shared-substring with normal substring, as I there have used them for efficiecny reasons only, but in the rest of the code the functionality of shared substrings is essential so I need to reimplement them. The obvious quick and dirty solution is to implement shared-substrings as scm_tc7_string. (Of course I then have to keep track of the shared strings to garb them explicitly) Does anyone have a better idea? Best regards Roland Orre _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: shared-substrings missing in 1.7 2004-01-16 8:49 shared-substrings missing in 1.7 Roland Orre @ 2004-01-16 12:52 ` Marco Parrone 2004-01-16 13:49 ` Roland Orre 2004-01-16 20:29 ` Tom Lord 2004-01-16 19:13 ` Greg Troxel 1 sibling, 2 replies; 14+ messages in thread From: Marco Parrone @ 2004-01-16 12:52 UTC (permalink / raw) Cc: guile-user -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Roland Orre on Fri, 16 Jan 2004 09:49:11 +0100 writes: > In about half of my code I can easily replace make-shared-substring > with normal substring, as I there have used them for efficiecny > reasons only, but in the rest of the code the functionality of shared > substrings is essential so I need to reimplement them. Why don't you copy the make-shared-substring code into a new Guile module? So you can use it whithout needing to patch Guile (and so whithout needing to syncronize your modifications with the main tree). Then all what you have to change will be adding and :use-module to the needed modules, right? - -- Marco Parrone - marc0@autistici.org - 0x45070AD6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) iD8DBQFAB95di2MRZ0UHCtYRAghfAKCEoMNnvY/qqhsqS7QfY+l9vtqBCwCfWOjI FQq/Dr5W6CQlcb4HaFjSPpo= =BCX6 -----END PGP SIGNATURE----- _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: shared-substrings missing in 1.7 2004-01-16 12:52 ` Marco Parrone @ 2004-01-16 13:49 ` Roland Orre 2004-01-16 20:29 ` Tom Lord 1 sibling, 0 replies; 14+ messages in thread From: Roland Orre @ 2004-01-16 13:49 UTC (permalink / raw) Cc: guile-user On Fri, 2004-01-16 at 13:52, Marco Parrone wrote: > Roland Orre on Fri, 16 Jan 2004 09:49:11 +0100 writes: > > > In about half of my code I can easily replace make-shared-substring > > with normal substring, as I there have used them for efficiecny > > reasons only, but in the rest of the code the functionality of shared > > substrings is essential so I need to reimplement them. > > Why don't you copy the make-shared-substring code into a new Guile > module? So you can use it whithout needing to patch Guile (and so > whithout needing to syncronize your modifications with the main tree). > > Then all what you have to change will be adding and :use-module to the > needed modules, right? It is not that easy because the tag type substring scm_tc7_substring doesn't exist at the moment. Further on will the garbage collector not be aware about substrings, therefore I need to protect them, as well as the original string so I can deallocate them in the right order. I just tested this idea and got segmentation fault when I deallocated just the original string. I'll probably have to implement a deallocate_all_shared substrings as well. Anyway, this is a simple solution to keep me going, my old code will still work, but somewhat more messy, so it is not a long term solution. I'm slightly worried what will happen with substrings in the future. Neither do I understand the reason why removing explicitly shared substrings and then later add implicit shared substrings as was the intention. With implicit internal substrings, I see no reason to not have explicit substrings as well. Even though make-shared-substring would become unnecessary in the future if all substrings are shared, I consider it wrong to remove a facility before the replacement implementation is done. Best regards Roland Orre _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: shared-substrings missing in 1.7 2004-01-16 12:52 ` Marco Parrone 2004-01-16 13:49 ` Roland Orre @ 2004-01-16 20:29 ` Tom Lord 1 sibling, 0 replies; 14+ messages in thread From: Tom Lord @ 2004-01-16 20:29 UTC (permalink / raw) Cc: guile-user, orre > From: Marco Parrone <marc0@autistici.org> > Why don't you copy the make-shared-substring code into a new Guile > module? So you can use it whithout needing to patch Guile (and so > whithout needing to syncronize your modifications with the main tree). > Then all what you have to change will be adding and :use-module to the > needed modules, right? If someone makes a dynamically loadable module to "add back" shared substrings, will other dynamically loaded modules and the core of guile interoperate with the new string type as a string type? or will it be disjoint from STRING?. If it's disjoint from STRING?, would it be difficult to compile a new version of the core and the other modules in which it wouldn't be disjoint? -t _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: shared-substrings missing in 1.7 2004-01-16 8:49 shared-substrings missing in 1.7 Roland Orre 2004-01-16 12:52 ` Marco Parrone @ 2004-01-16 19:13 ` Greg Troxel 2004-01-16 20:24 ` Paul Jarc ` (2 more replies) 1 sibling, 3 replies; 14+ messages in thread From: Greg Troxel @ 2004-01-16 19:13 UTC (permalink / raw) Cc: guile-user This is a guile-specific non-R5RS feature, AFAICT. It seems a bit non-Schemely, too, for a mutation of one object to cause another to change (exercise for the reader: write denotational semantics for this :-). But I see the point of not withdrawing stuff that people depend on. And I note that make-shared-substring is not flagged as deprecated in 1.6.3, even though Marius says it is: http://mail.gnu.org/archive/html/guile-devel/2003-10/msg00027.html gdt 775 ~ > GUILE_WARN_DEPRECATED=detailed guile guile> (version) "1.6.3" guile> (activate-readline) guile> (define a "foobar") guile> (define b (make-shared-substring a 0 3)) guile> (string-upcase! b) "FOO" guile> b "FOO" guile> a "FOObar" guile> (exit) gdt 776 ~ > (I know GUILE_WARN_DEPRECATED=detailed works since I just hit some deprecated stuff elsewhere with this same guile binary, such as export vs re-export and c calls to define functions.) But, guile.info indeed says it is deprecated: Guile Scheme provides the concept of the "shared substring" to improve performance of many substring-related operations. A shared substring is an object that mostly behaves just like an ordinary substring, except that it actually shares storage space with its parent string. - Deprecated Scheme Procedure: make-shared-substring str [start [end]] But, shared substrings are supposed to be immutable (even though they are not): Because creating a shared substring does not require allocating new storage from the heap, it is a very fast operation. However, because it shares memory with its parent string, a change to the contents of the parent string will implicitly change the contents of its shared substrings. (string-set! foo 7 #\r) bar => "quirk" Guile considers shared substrings to be immutable. This is because programmers might not always be aware that a given string is really a shared substring, and might innocently try to mutate it without realizing that the change would affect its parent string. (We are currently considering a "copy-on-write" strategy that would permit modifying shared substrings without affecting the parent string.) In general, shared substrings are useful in circumstances where it is important to divide a string into smaller portions, but you do not expect to change the contents of any of the strings involved. So even though I really don't like backwards-incompat changes, it seems that removing them is reasonable, and that you are having trouble because you are relying on undefined behavior: As I've used shared substrings it is also non trivial to change the code. So it sounds like you have code that depends on the cross-mutability behavior. It seems that would also break if the COW semantics were implemented as threatened (with hidden shared substrings). Does your code break if you (define make-shared-substring substring)? -- Greg Troxel <gdt@ir.bbn.com> _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: shared-substrings missing in 1.7 2004-01-16 19:13 ` Greg Troxel @ 2004-01-16 20:24 ` Paul Jarc 2004-01-16 20:35 ` About shared substrings Roland Orre 2004-01-16 20:45 ` shared-substrings missing in 1.7 Tom Lord 2 siblings, 0 replies; 14+ messages in thread From: Paul Jarc @ 2004-01-16 20:24 UTC (permalink / raw) Cc: guile-user, Roland Orre Greg Troxel <gdt@ir.bbn.com> wrote: > This is a guile-specific non-R5RS feature, AFAICT. It seems a bit > non-Schemely, too, for a mutation of one object to cause another to > change Implicit copy-on-write substrings would be nice, though. Not that I'm volunteering. :) > As I've used shared substrings it is also non trivial to > change the code. > > So it sounds like you have code that depends on the cross-mutability > behavior. No, I think his code simply runs extremely slowly without shared substrings. paul _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 14+ messages in thread
* About shared substrings 2004-01-16 19:13 ` Greg Troxel 2004-01-16 20:24 ` Paul Jarc @ 2004-01-16 20:35 ` Roland Orre 2004-01-16 21:12 ` Neil Jerram ` (2 more replies) 2004-01-16 20:45 ` shared-substrings missing in 1.7 Tom Lord 2 siblings, 3 replies; 14+ messages in thread From: Roland Orre @ 2004-01-16 20:35 UTC (permalink / raw) Today I became really impressed. For me shared substrings was an essential feature of guile as this made me able to speed up pure scheme based reading of fixed width text files in a dramatic way as I only needed to declare a buffer and from this buffer allocate a set of substrings corresponding to the fields of that table. Using this scheme I didn't need to do any more memory allocations of substrings as every field was immediately accessible by standard scheme string conversion routines when the buffer was read. Now when I found that shared substrings were removed from guile 1.7 I first came up with a complicated scheme using guile_gc_protect_object but where I would have to do explicit deallocation of shared strings. I discussed this issue with Mikael Djurfeldt today and then he came up with the following solution: SCM substring_table; SCM scm_make_shared_substring (SCM parent, SCM start, SCM end) { SCM substring; char *mem; int c_start, c_end; SCM_VALIDATE_SUBSTRING_SPEC_COPY (1, parent, mem, 2, start, c_start, 3, end, c_end); substring = scm_cell (SCM_MAKE_STRING_TAG (c_end - c_start), (scm_t_bits) (mem + c_start)); scm_hash_set_x (substring_table, substring, parent); return substring; } where the following is put in the main: substring_table = scm_permanent_object (scm_make_weak_key_hash_table (SCM_UNDEFINED)); This is almost magical :) It works perfectly well and I don't need to bother about any explicit deallocation. This is also the first time I really understand the purpose of these weak hash tables. For weak hash tables the hash entry will be garbage collected first when the key is seen as garbage. With this scheme we still have the same essential functionality from my perspective about shared substrings but we do no longer need an explicit tag for shared substrings. Mikael Djurfeldt is relly the most clever programmer I know. Many thanks! Roland Orre _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: About shared substrings 2004-01-16 20:35 ` About shared substrings Roland Orre @ 2004-01-16 21:12 ` Neil Jerram 2004-01-17 22:34 ` Keith Wright 2004-01-16 21:54 ` Mikael Djurfeldt 2004-01-16 22:00 ` Tom Lord 2 siblings, 1 reply; 14+ messages in thread From: Neil Jerram @ 2004-01-16 21:12 UTC (permalink / raw) Cc: guile-user, guile-devel >>>>> "Roland" == Roland Orre <orre@nada.kth.se> writes: Roland> This is almost magical :) Indeed, but there's one thing I don't quite see. When a shared substring cell is GC'd, what prevents it from trying to free the substring chars that it is pointing to? Neil _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: About shared substrings 2004-01-16 21:12 ` Neil Jerram @ 2004-01-17 22:34 ` Keith Wright 0 siblings, 0 replies; 14+ messages in thread From: Keith Wright @ 2004-01-17 22:34 UTC (permalink / raw) > From: Neil Jerram <neil@ossau.uklinux.net> > > Roland> This is almost magical :) > > When a shared substring cell is GC'd, what prevents it from trying > to free the substring chars that it is pointing to? * Any sufficiently advanced technology * * is no more reliable than magic. * (Sorry, I'm reading e-mail and listening to Arthur C. Clark on NPR.) -- -- Keith Wright <kwright@free-comp-shop.com> Programmer in Chief, Free Computer Shop <http://www.free-comp-shop.com> --- Food, Shelter, Source code. --- _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: About shared substrings 2004-01-16 20:35 ` About shared substrings Roland Orre 2004-01-16 21:12 ` Neil Jerram @ 2004-01-16 21:54 ` Mikael Djurfeldt 2004-01-16 22:00 ` Tom Lord 2 siblings, 0 replies; 14+ messages in thread From: Mikael Djurfeldt @ 2004-01-16 21:54 UTC (permalink / raw) Cc: guile-user, djurfeldt, guile-devel Roland Orre <orre@nada.kth.se> writes: > I discussed this issue with Mikael Djurfeldt today and then he came up > with the following solution: > > SCM substring_table; > > SCM scm_make_shared_substring (SCM parent, SCM start, SCM end) > { > SCM substring; > char *mem; > int c_start, c_end; > SCM_VALIDATE_SUBSTRING_SPEC_COPY (1, parent, mem, > 2, start, c_start, > 3, end, c_end); > substring = scm_cell (SCM_MAKE_STRING_TAG (c_end - c_start), > (scm_t_bits) (mem + c_start)); > scm_hash_set_x (substring_table, substring, parent); > return substring; > } > > where the following is put in the main: > substring_table > = scm_permanent_object (scm_make_weak_key_hash_table (SCM_UNDEFINED)); > > This is almost magical :) Unfortunately, it *is* magical. In addition, it is necessary to hand the newly created substring to a guardian. If we then add a hook function to some suitable GC hook which loops through all unreferenced substrings and sets their address pointer to NULL, everything should work. If we *don't* do that, the same area of memory will be freed repeatedly, which is not optimal. > Mikael Djurfeldt is relly the most clever programmer I know. Many thanks for the kind words! Yes, I am usually a blazing genius. My mistake above must certainly be due to something in my coffee or so... :) M _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: About shared substrings 2004-01-16 20:35 ` About shared substrings Roland Orre 2004-01-16 21:12 ` Neil Jerram 2004-01-16 21:54 ` Mikael Djurfeldt @ 2004-01-16 22:00 ` Tom Lord 2 siblings, 0 replies; 14+ messages in thread From: Tom Lord @ 2004-01-16 22:00 UTC (permalink / raw) Cc: guile-user, guile-devel > From: Roland Orre <orre@nada.kth.se> > I discussed this issue with Mikael Djurfeldt today and then he came up > with the following solution: > SCM substring_table; > SCM scm_make_shared_substring (SCM parent, SCM start, SCM end) > { > SCM substring; > char *mem; > int c_start, c_end; > SCM_VALIDATE_SUBSTRING_SPEC_COPY (1, parent, mem, > 2, start, c_start, > 3, end, c_end); > substring = scm_cell (SCM_MAKE_STRING_TAG (c_end - c_start), > (scm_t_bits) (mem + c_start)); > scm_hash_set_x (substring_table, substring, parent); > return substring; > } > where the following is put in the main: > substring_table > = scm_permanent_object (scm_make_weak_key_hash_table (SCM_UNDEFINED)); > This is almost magical :) It works perfectly well and I don't need to > bother about any explicit deallocation. This is also the first time I > really understand the purpose of these weak hash tables. For weak hash > tables the hash entry will be garbage collected first when the key is > seen as garbage. With this scheme we still have the same essential > functionality from my perspective about shared substrings but we do > no longer need an explicit tag for shared substrings. How can this actually work? It ensures that SUBSTRING, while live, protects PARENT. I see that and it's _roughly_ what's wanted. But SUBSTRING is tagged as a string, no? When that key (the substring) is collected -- won't that lead to a bogus free of the substring data? (I'm looking at the 1.6.4 GC implementation. Apologies if things have changed in some significant way in 1.7) -t _______________________________________________ Guile-devel mailing list Guile-devel@gnu.org http://mail.gnu.org/mailman/listinfo/guile-devel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: shared-substrings missing in 1.7 2004-01-16 19:13 ` Greg Troxel 2004-01-16 20:24 ` Paul Jarc 2004-01-16 20:35 ` About shared substrings Roland Orre @ 2004-01-16 20:45 ` Tom Lord 2004-01-19 16:54 ` Greg Troxel 2 siblings, 1 reply; 14+ messages in thread From: Tom Lord @ 2004-01-16 20:45 UTC (permalink / raw) Cc: guile-user, orre > From: Greg Troxel <gdt@ir.bbn.com> > This is a guile-specific non-R5RS feature, AFAICT. It seems a bit > non-Schemely, too, for a mutation of one object to cause another to > change (exercise for the reader: write denotational semantics for this > :-). Why is that significantly different from a mutation of one list changing another? (define x (list 'b 'c 'd)) (define y (cons 'a x)) y => (a b c d) (set-car! x 'semantics-schmemantics) y => (a semantics-schmemantics c d) Specifying the semantics of mutation-effects-both shared substrings is as easy as specifying the semantics of cons pairs. > But, shared substrings are supposed to be immutable Ideally there would be two kinds of shared substring: copy-on-write and mutation-effects-both. MAKE-SHARED-SUBSTRING would create a mutation-effects-both shared substring; SUBSTRING could create a copy-on-write shared substring. MAKE-COW-SUBSTRING could have the same meaning as SUBSTRING but provide a distinct hint to the implementation about how the string is expected to be used. > Guile considers shared substrings to be immutable. This is because > programmers might not always be aware that a given string is really a > shared substring, and might innocently try to mutate it without > realizing that the change would affect its parent string. (We are > currently considering a "copy-on-write" strategy that would permit > modifying shared substrings without affecting the parent string.) That's all fine but when a programmer knows what she is doing, a mutation-effects-both string is potentially exactly what is wanted. How about something like: (string-downcase! (make-shared-substring from:-line hostname-start hostname-end)) to turn: "Greg Troxel" <gdt@Ir.Bbn.Com> into "Greg Troxel" <gdt@ir.bbn.com> The equivalent-performance alternative there is to make STRING-DOWNCASE! and all similar functions accept optional START/END parameters -- a solution that quickly gets quite out of hand. > In general, shared substrings are useful in circumstances where it is > important to divide a string into smaller portions, but you do not > expect to change the contents of any of the strings involved. Not "In general" but "In some cases". > So even though I really don't like backwards-incompat changes, it > seems that removing them is reasonable, and that you are having > trouble because you are relying on undefined behavior: It didn't start out as undefined. When first added, MAKE-SHARED-SUBSTRING returned a mutation-effects-both substring. It was a feature. -t _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: shared-substrings missing in 1.7 2004-01-16 20:45 ` shared-substrings missing in 1.7 Tom Lord @ 2004-01-19 16:54 ` Greg Troxel 2004-01-20 9:02 ` tomas 0 siblings, 1 reply; 14+ messages in thread From: Greg Troxel @ 2004-01-19 16:54 UTC (permalink / raw) Cc: guile-user, orre Tom's arguments and Roland's explanation of what he is doing are very persuasive, so I withdraw my comments that removing make-shared-substring is a reasonable change. I think Tom's comments about having an explict cross-mutation semantics for make-shared-substring and having 'make-substring' just share implementation but not semantics (hence cow). While I am awed by the whole guardian and weak ref scheme, it might make sense to just have strings have a refcount (of other scheme objects that point to their storage), possibly splitting 'string storage' and 'string object'; the former doesn't even need to be a scheme object, just length, pointer, and refcount. It would be cool to enable shortening up the storage if the really big string is gone, but some smaller ones remain. But this gets into searching for references and computing what can go, or having references be per-character with some sparse/range encoding, and that sounds too hairy. -- Greg Troxel <gdt@ir.bbn.com> _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: shared-substrings missing in 1.7 2004-01-19 16:54 ` Greg Troxel @ 2004-01-20 9:02 ` tomas 0 siblings, 0 replies; 14+ messages in thread From: tomas @ 2004-01-20 9:02 UTC (permalink / raw) Cc: guile-user, orre On Mon, Jan 19, 2004 at 11:54:21AM -0500, Greg Troxel wrote: [...] > While I am awed by the whole guardian and weak ref scheme, it might > make sense to just have strings have a refcount (of other scheme [...] Ah, but refcount is just an instance of memory management (and almost always an inferior strategy). Letting the garbage collector cope with it is far more promising (and that's what all this guardian stuff is for, as far as I understood). > It would be cool to enable shortening up the storage if the really big > string is gone, but some smaller ones remain. [...] This would be really cool, yes. Maybe even at the price of double indirections? Regards -- tomas _______________________________________________ Guile-user mailing list Guile-user@gnu.org http://mail.gnu.org/mailman/listinfo/guile-user ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2004-01-20 9:02 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-01-16 8:49 shared-substrings missing in 1.7 Roland Orre 2004-01-16 12:52 ` Marco Parrone 2004-01-16 13:49 ` Roland Orre 2004-01-16 20:29 ` Tom Lord 2004-01-16 19:13 ` Greg Troxel 2004-01-16 20:24 ` Paul Jarc 2004-01-16 20:35 ` About shared substrings Roland Orre 2004-01-16 21:12 ` Neil Jerram 2004-01-17 22:34 ` Keith Wright 2004-01-16 21:54 ` Mikael Djurfeldt 2004-01-16 22:00 ` Tom Lord 2004-01-16 20:45 ` shared-substrings missing in 1.7 Tom Lord 2004-01-19 16:54 ` Greg Troxel 2004-01-20 9:02 ` tomas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).