unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* Request to add *-resize! functions for contiguous mutable data structures.
@ 2021-08-06 14:33 Vijay Marupudi
  2021-08-07 10:31 ` Taylan Kammer
  2021-08-07 11:09 ` Maxime Devos
  0 siblings, 2 replies; 9+ messages in thread
From: Vijay Marupudi @ 2021-08-06 14:33 UTC (permalink / raw)
  To: guile-devel

Hello!

I was curious if Guile would be willing to provide a series of
new procedures for resizing contiguous memory regions.

(bytevector-resize! <bytevector> new-size [fill])
(vector-resize! <vector> new-size [fill])

The [fill] parameter could be used if the new-size is bigger than
the current size.

This would make writing imperative code easier and more
performant. I acknowledge that it is not idiomatic Scheme to use
mutable data structures, however this is useful to me for
dealing with large amounts of text data, in which I need random
access and flexible data storage. It would allow me to move off
my custom C extension vector and allow me to use other
vector-* functions.

Ideally, this would use libc's `realloc` to make the resize
quick, so that it can avoid data copying whenever possible.

Regards

Vijay Marupudi
PhD Student in Human Centered-Computing
Georgia Tech



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Request to add *-resize! functions for contiguous mutable data structures.
  2021-08-06 14:33 Request to add *-resize! functions for contiguous mutable data structures Vijay Marupudi
@ 2021-08-07 10:31 ` Taylan Kammer
  2021-08-07 21:19   ` tomas
  2021-08-07 11:09 ` Maxime Devos
  1 sibling, 1 reply; 9+ messages in thread
From: Taylan Kammer @ 2021-08-07 10:31 UTC (permalink / raw)
  To: Vijay Marupudi, guile-devel

On 06.08.2021 16:33, Vijay Marupudi wrote:
> Hello!
> 
> I was curious if Guile would be willing to provide a series of
> new procedures for resizing contiguous memory regions.
> 
> (bytevector-resize! <bytevector> new-size [fill])
> (vector-resize! <vector> new-size [fill])
> 
> The [fill] parameter could be used if the new-size is bigger than
> the current size.
> 
> This would make writing imperative code easier and more
> performant. I acknowledge that it is not idiomatic Scheme to use
> mutable data structures, however this is useful to me for
> dealing with large amounts of text data, in which I need random
> access and flexible data storage. It would allow me to move off
> my custom C extension vector and allow me to use other
> vector-* functions.
> 
> Ideally, this would use libc's `realloc` to make the resize
> quick, so that it can avoid data copying whenever possible.
> 
> Regards
> 
> Vijay Marupudi
> PhD Student in Human Centered-Computing
> Georgia Tech
> 
Sounds like a good idea to me.  I didn't know realloc() was a
thing in C (I don't write much C) and I suppose it's not possible
to implement equivalent functionality with equivalent performance
purely in Scheme.

I'm on vacation for the next three weeks and will try to write a
patch to implement this if no one beats me to it. :-)

One consideration is how this should behave in the case of
bytevectors that were created from an FFI pointer.  In the FFI,
you provide a pointer and specify how long the bytevector should
be, which means it's a memory-unsafe operation.  I think it would
be ideal to offer a way of forcing an in-place resize, basically
overriding the formerly provided size value.  That means it's
also memory-unsafe, but in some cases that's what you want.

-- 
Taylan



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Request to add *-resize! functions for contiguous mutable data structures.
  2021-08-06 14:33 Request to add *-resize! functions for contiguous mutable data structures Vijay Marupudi
  2021-08-07 10:31 ` Taylan Kammer
@ 2021-08-07 11:09 ` Maxime Devos
  2021-08-07 17:46   ` Taylan Kammer
  1 sibling, 1 reply; 9+ messages in thread
From: Maxime Devos @ 2021-08-07 11:09 UTC (permalink / raw)
  To: Vijay Marupudi, guile-devel

[-- Attachment #1: Type: text/plain, Size: 2481 bytes --]

Vijay Marupudi schreef op vr 06-08-2021 om 09:33 [-0500]:
> Hello!
> 
> I was curious if Guile would be willing to provide a series of
> new procedures for resizing contiguous memory regions.
> 
> (bytevector-resize! <bytevector> new-size [fill])
> (vector-resize! <vector> new-size [fill])
> 
> The [fill] parameter could be used if the new-size is bigger than
> the current size.
>
> This would make writing imperative code easier and more
> performant.

A problem is that this prevents optimisations and can currently
introduce bugs in concurrent code.  Consider the following code:

b.scm:
(use-modules (rnrs bytevectors))

(define (bv-first-two bv)
  (unless (bytevector? bv)
    (error "not a bv"))
  (unless (>= (bytevector-length bv) 2) ; L6
    (error "too small"))
  (values (bytevector-u8-ref bv 0)   ; L8
          (bytevector-u8-ref bv 1))) ; L9
bv-first-two


Compile it with optimisations enabled:

  guild compile b.scm -o b.go -O3 && guild disassemble b.go

(Unfortunately, guile cannot yet compile the bounds check at L8 and L9 away
even though we performed a bounds check at L6 away.)

I can't say I understand the disassembled code very well, but I do note
that the bounds checks (search for (jl ...), (jnl ...) and imm-u64<?,
s64-imm<? and u64<?) are separate from the read of the 'length' of 
bytevector (maybe 'word-ref/immediate' or pointer-ref/immediate) and
the reading of the first and second byte of the bytevector (maybe (u8-ref 5 3 0),
(u8-ref 2 3 2)).

Now suppose some concurrent thread resizes the bytevector between the bounds
check and the actual reading, then there will be an out-of-bounds access ...

> I acknowledge that it is not idiomatic Scheme to use
> mutable data structures, however this is useful to me for
> dealing with large amounts of text data, in which I need random
> access and flexible data storage. It would allow me to move off
> my custom C extension vector and allow me to use other
> vector-* functions.
> 
> Ideally, this would use libc's `realloc` to make the resize
> quick, so that it can avoid data copying whenever possible.

If you're very careful, you can use 'bytevector->pointer', 'pointer->bytevector'
and (foreign-library-function ... "malloc" ...),
(foreign-library-function ... "realloc" ...),
(foreign-library-function ... "free" ...).

(ice-9 vlist) and <https://github.com/ijp/fectors> might be interesting as well.

Greetings,
Maxime.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Request to add *-resize! functions for contiguous mutable data structures.
  2021-08-07 11:09 ` Maxime Devos
@ 2021-08-07 17:46   ` Taylan Kammer
  2021-08-09  4:02     ` Vijay Marupudi
  0 siblings, 1 reply; 9+ messages in thread
From: Taylan Kammer @ 2021-08-07 17:46 UTC (permalink / raw)
  To: Maxime Devos, Vijay Marupudi, guile-devel

On 07.08.2021 13:09, Maxime Devos wrote:

> 
> A problem is that this prevents optimisations and can currently
> introduce bugs in concurrent code.  Consider the following code:
> 
> [... snip ... ]
> 
> Greetings,
> Maxime.
> 

Couldn't we just state that resizing a vector/bytevector is a
thread-unsafe operation?

-- 
Taylan



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Request to add *-resize! functions for contiguous mutable data structures.
  2021-08-07 10:31 ` Taylan Kammer
@ 2021-08-07 21:19   ` tomas
  2021-08-08 12:17     ` Taylan Kammer
  0 siblings, 1 reply; 9+ messages in thread
From: tomas @ 2021-08-07 21:19 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 1310 bytes --]

On Sat, Aug 07, 2021 at 12:31:09PM +0200, Taylan Kammer wrote:
> On 06.08.2021 16:33, Vijay Marupudi wrote:
> > Hello!
> > 
> > I was curious if Guile would be willing to provide a series of
> > new procedures for resizing contiguous memory regions.

[...]

> Sounds like a good idea to me.  I didn't know realloc() was a
> thing in C (I don't write much C) and I suppose it's not possible
> to implement equivalent functionality with equivalent performance
> purely in Scheme.
> 
> I'm on vacation for the next three weeks and will try to write a
> patch to implement this if no one beats me to it. :-)
> 
> One consideration is how this should behave in the case of
> bytevectors that were created from an FFI pointer [...]

Hm. I don't understand. Realloc /may/ return a different pointer
from the one it receives, for example if there isn't enough
room "beyond" the currently allocated. It will copy over the
contents, but if someone is holding a pointer to the old area
(as I understand you, this will be the case with an FFI pointer),
this isn't going to end well...

And then there is the constraint that the (original) pointer
passed to realloc /must/ be one returned by one of the malloc
family (how would the allocator know the original size otherwise?)

Cheers
 - t

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Request to add *-resize! functions for contiguous mutable data structures.
  2021-08-07 21:19   ` tomas
@ 2021-08-08 12:17     ` Taylan Kammer
  0 siblings, 0 replies; 9+ messages in thread
From: Taylan Kammer @ 2021-08-08 12:17 UTC (permalink / raw)
  To: tomas, guile-devel

On 07.08.2021 23:19, tomas@tuxteam.de wrote:
> On Sat, Aug 07, 2021 at 12:31:09PM +0200, Taylan Kammer wrote:
>> One consideration is how this should behave in the case of
>> bytevectors that were created from an FFI pointer [...]
> 
> Hm. I don't understand. Realloc /may/ return a different pointer
> from the one it receives, for example if there isn't enough
> room "beyond" the currently allocated. It will copy over the
> contents, but if someone is holding a pointer to the old area
> (as I understand you, this will be the case with an FFI pointer),
> this isn't going to end well...
> 
> And then there is the constraint that the (original) pointer
> passed to realloc /must/ be one returned by one of the malloc
> family (how would the allocator know the original size otherwise?)

Bytevectors created from FFI pointers definitely have to be specially
handled so as not to call realloc() on the original pointer.  There are
two ways bytevector-resize! could work in this case:

1. If the requested size is less than or equal to the current size,
   merely change the size information of the bytevector.  Otherwise,
   return an entirely new bytevector that doesn't point to the original
   location anymore.  This is the "safe" variant.

2. Always just change the size information of the bytevector.  This is
   unsafe because providing a larger size might mean that the bytevector
   starts allowing access to unallocated memory.  But it's not any less
   safe than the original pointer->bytevector operation where you already
   provided the size information yourself.

The second behavior would help with this issue:

https://github.com/TaylanUB/scheme-bytestructures/issues/41

-- 
Taylan



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Request to add *-resize! functions for contiguous mutable data structures.
  2021-08-07 17:46   ` Taylan Kammer
@ 2021-08-09  4:02     ` Vijay Marupudi
  2021-08-09 18:24       ` Maxime Devos
  0 siblings, 1 reply; 9+ messages in thread
From: Vijay Marupudi @ 2021-08-09  4:02 UTC (permalink / raw)
  To: Taylan Kammer, Maxime Devos, guile-devel

Thank you for your responses Taylan and Maxime!

My initial reaction to the concern about multithreaded code is similar
to Taylan. I'm not sure if Guile has multithreading concepts built into
the compiler. If so, one can only check the length again after a mutex.

Appreciate the malloc, realloc, and free FFI solution. Ideally I
wouldn't have to do that, but it does work. I have to manually free it
though.


~ Vijay


On 8/7/21 12:46 PM, Taylan Kammer wrote:
> On 07.08.2021 13:09, Maxime Devos wrote:
> 
>>
>> A problem is that this prevents optimisations and can currently
>> introduce bugs in concurrent code.  Consider the following code:
>>
>> [... snip ... ]
>>
>> Greetings,
>> Maxime.
>>
> 
> Couldn't we just state that resizing a vector/bytevector is a
> thread-unsafe operation?
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Request to add *-resize! functions for contiguous mutable data structures.
  2021-08-09  4:02     ` Vijay Marupudi
@ 2021-08-09 18:24       ` Maxime Devos
  2021-08-09 18:35         ` Vijay Marupudi
  0 siblings, 1 reply; 9+ messages in thread
From: Maxime Devos @ 2021-08-09 18:24 UTC (permalink / raw)
  To: Vijay Marupudi, Taylan Kammer, guile-devel

[-- Attachment #1: Type: text/plain, Size: 731 bytes --]

Vijay Marupudi schreef op zo 08-08-2021 om 23:02 [-0500]:
> Thank you for your responses Taylan and Maxime!
> 
> My initial reaction to the concern about multithreaded code is similar
> to Taylan. I'm not sure if Guile has multithreading concepts built into
> the compiler. If so, one can only check the length again after a mutex.
> 
> Appreciate the malloc, realloc, and free FFI solution. Ideally I
> wouldn't have to do that, but it does work. I have to manually free it
> though.

You can avoid explicit free by using GC_MALLOC_ATOMIC and GC_REALLOC from
bdw-gc (the C library Guile uses for garbage collection) instead of malloc
and realloc, see <https://hboehm.info/gc/gcinterface.html>.

Greetings,
Maxme.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Request to add *-resize! functions for contiguous mutable data structures.
  2021-08-09 18:24       ` Maxime Devos
@ 2021-08-09 18:35         ` Vijay Marupudi
  0 siblings, 0 replies; 9+ messages in thread
From: Vijay Marupudi @ 2021-08-09 18:35 UTC (permalink / raw)
  To: Maxime Devos, Taylan Kammer, guile-devel

That does help with the freeing requirement, thank you Maxime!

The request for this feature, if possible, still stands, because of
ergonomic reasons, and also I want to store Scheme strings in a vector
that can be resized.

Until then, I will make do with this and the make-vector/vector-copy
strategy.

~ Vijay

On 8/9/21 1:24 PM, Maxime Devos wrote:
> Vijay Marupudi schreef op zo 08-08-2021 om 23:02 [-0500]:
>> Thank you for your responses Taylan and Maxime!
>>
>> My initial reaction to the concern about multithreaded code is similar
>> to Taylan. I'm not sure if Guile has multithreading concepts built into
>> the compiler. If so, one can only check the length again after a mutex.
>>
>> Appreciate the malloc, realloc, and free FFI solution. Ideally I
>> wouldn't have to do that, but it does work. I have to manually free it
>> though.
> 
> You can avoid explicit free by using GC_MALLOC_ATOMIC and GC_REALLOC from
> bdw-gc (the C library Guile uses for garbage collection) instead of malloc
> and realloc, see <https://hboehm.info/gc/gcinterface.html>.
> 
> Greetings,
> Maxme.
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-08-09 18:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-06 14:33 Request to add *-resize! functions for contiguous mutable data structures Vijay Marupudi
2021-08-07 10:31 ` Taylan Kammer
2021-08-07 21:19   ` tomas
2021-08-08 12:17     ` Taylan Kammer
2021-08-07 11:09 ` Maxime Devos
2021-08-07 17:46   ` Taylan Kammer
2021-08-09  4:02     ` Vijay Marupudi
2021-08-09 18:24       ` Maxime Devos
2021-08-09 18:35         ` Vijay Marupudi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).