unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* mmap for guile
@ 2022-06-26 15:37 Matt Wette
  2022-06-26 16:21 ` Matt Wette
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Matt Wette @ 2022-06-26 15:37 UTC (permalink / raw)
  To: guile-devel, Guile User

I have a candidate mmap implementation for guile (3.0.8) in

	github.com/mwette/guile-contrib/patch/3.0.8/mmap-api.patch

It probably needs some review and update.  Maybe windows mmap-file would be nice.
I'm not sure I updated the configure correctly but --disable-mmap-api should remove.
It exposes features 'mmap-api and 'mmap-file.

scheme@(guile-user)> ,d mmap
- Scheme Procedure: mmap addr len [prot [flags [fd [offset]]]]
      See the man page.  Returns a bytevector.  Note that the region
      returned by mmap will NOT be searched by the garbage collector for
      pointers.  See also mmap/search.  Defaults are:

      prot
           (logior PROT_READ PROT_WRITE)

      flags
           (logior MAP_ANON MAP_PRIVATE)

      fd
           -1

      offset
           0

scheme@(guile-user)> ,d mmap/search
- Scheme Procedure: mmap/search addr len [prot [flags [fd [offset]]]]
      See the unix man page for mmap.  Returns a bytevector.  Note that
      the region allocated will be searched by the garbage collector for
      pointers.  Defaults:

      prot
           (logior PROT_READ PROT_WRITE)

      flags
           (logior MAP_ANON MAP_PRIVATE)

      fd
           -1

      offset
           0

scheme@(guile-user)> ,d mmap-file
- Scheme Procedure: mmap-file file [prot]
      This procedure accepts a file in the form of filename, file-port or
      fd.  It returns a bytevector.  It must not contain scheme allocated
      objects as it will not be searched for pointers.  Default PROT is
      `"r"'.




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mmap for guile
  2022-06-26 15:37 mmap for guile Matt Wette
@ 2022-06-26 16:21 ` Matt Wette
  2022-06-26 17:06 ` Olivier Dion via Developers list for Guile, the GNU extensibility library
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Matt Wette @ 2022-06-26 16:21 UTC (permalink / raw)
  To: guile-devel, Guile User

Appologies: link is 
https://github.com/mwette/guile-contrib/blob/main/patch/3.0.8/mmap-api.patch

On 6/26/22 8:37 AM, Matt Wette wrote:
> I have a candidate mmap implementation for guile (3.0.8) in
>
>     github.com/mwette/guile-contrib/patch/3.0.8/mmap-api.patch
>
> It probably needs some review and update.  Maybe windows mmap-file 
> would be nice.
> I'm not sure I updated the configure correctly but --disable-mmap-api 
> should remove.
> It exposes features 'mmap-api and 'mmap-file.
>
> scheme@(guile-user)> ,d mmap
> - Scheme Procedure: mmap addr len [prot [flags [fd [offset]]]]
>      See the man page.  Returns a bytevector.  Note that the region
>      returned by mmap will NOT be searched by the garbage collector for
>      pointers.  See also mmap/search.  Defaults are:
>
>      prot
>           (logior PROT_READ PROT_WRITE)
>
>      flags
>           (logior MAP_ANON MAP_PRIVATE)
>
>      fd
>           -1
>
>      offset
>           0
>
> scheme@(guile-user)> ,d mmap/search
> - Scheme Procedure: mmap/search addr len [prot [flags [fd [offset]]]]
>      See the unix man page for mmap.  Returns a bytevector.  Note that
>      the region allocated will be searched by the garbage collector for
>      pointers.  Defaults:
>
>      prot
>           (logior PROT_READ PROT_WRITE)
>
>      flags
>           (logior MAP_ANON MAP_PRIVATE)
>
>      fd
>           -1
>
>      offset
>           0
>
> scheme@(guile-user)> ,d mmap-file
> - Scheme Procedure: mmap-file file [prot]
>      This procedure accepts a file in the form of filename, file-port or
>      fd.  It returns a bytevector.  It must not contain scheme allocated
>      objects as it will not be searched for pointers.  Default PROT is
>      `"r"'.
>
>




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mmap for guile
  2022-06-26 15:37 mmap for guile Matt Wette
  2022-06-26 16:21 ` Matt Wette
@ 2022-06-26 17:06 ` Olivier Dion via Developers list for Guile, the GNU extensibility library
  2022-06-26 18:11 ` Maxime Devos
  2022-06-26 18:21 ` Maxime Devos
  3 siblings, 0 replies; 13+ messages in thread
From: Olivier Dion via Developers list for Guile, the GNU extensibility library @ 2022-06-26 17:06 UTC (permalink / raw)
  To: Matt Wette, guile-devel, Guile User

On Sun, 26 Jun 2022, Matt Wette <matt.wette@gmail.com> wrote:
>       flags
>            (logior MAP_ANON MAP_PRIVATE)

Why MAP_ANON instead of MAP_ANONYMOUS.  The latter is more clear and you
don't have to care about compatibility like C does.

Also, does MAP_FIXED or MAP_FIXED_NOREPLACE is passed to `flags' if
`(not (eq? addr %null-pointer))'?

> scheme@(guile-user)> ,d mmap/search

What's the difference with `mmap'?

> scheme@(guile-user)> ,d mmap-file
> - Scheme Procedure: mmap-file file [prot]
>       This procedure accepts a file in the form of filename, file-port or
>       fd.  It returns a bytevector.  It must not contain scheme allocated
>       objects as it will not be searched for pointers.  Default PROT is
>       `"r"'.

I assume that this map the entire file at offset 0 with PROT_READ?  In
that case, is it with MAP_SHARED or MAP_PRIVATE?  I think it's important
to mentioned this point.

Regards,
old

-- 
Olivier Dion
oldiob.dev



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mmap for guile
  2022-06-26 15:37 mmap for guile Matt Wette
  2022-06-26 16:21 ` Matt Wette
  2022-06-26 17:06 ` Olivier Dion via Developers list for Guile, the GNU extensibility library
@ 2022-06-26 18:11 ` Maxime Devos
  2022-07-04 10:09   ` Ludovic Courtès
  2022-06-26 18:21 ` Maxime Devos
  3 siblings, 1 reply; 13+ messages in thread
From: Maxime Devos @ 2022-06-26 18:11 UTC (permalink / raw)
  To: Matt Wette, guile-devel, Guile User

[-- Attachment #1: Type: text/plain, Size: 3966 bytes --]

Some old mmap things that might be useful:

* https://lists.nongnu.org/archive/html/guile-devel/2013-04/msg00235.html
* https://lists.gnu.org/archive/html/bug-guile/2017-11/msg00033.html
* https://lists.gnu.org/archive/html/guile-user/2006-11/msg00013.html

+SCM_DEFINE (scm_mmap_search, "mmap/search", 2, 4, 0, 
+            (SCM addr, SCM len, SCM prot, SCM flags, SCM fd, SCM
offset),
+	    "See the unix man page for mmap.  Returns a bytevector.\n"
+	    "Note that the region allocated will be searched by the
garbage\n"
+	    "collector for pointers.  Defaults:\n"

I think it would be a good idea to document it will be automatically
unmapped during GC, as this is a rather low-leel interface

Also, what if you mmap a region, use bytevector->pointer and pass it to
some C thing, which saves the pointer somewhere where boehm-gc can find
it and boehm-gc considers it to be live, is there something that
prevents boehm-gc from improperly calling the finalizer & unmapping the
region, causing a dangling pointer?

Also, WDYT of using ports instead of raw fds in the API?  That would
play nicer with move->fdes etc.

> +  GC_exclude_static_roots(ptr, (char*)ptr + len);

After an unmap, will the GC properly forget the roots information?

>+  /* Invalidate further work on this bytevector. */
>+  SCM_BYTEVECTOR_SET_LENGTH (bvec, 0);
>+  SCM_BYTEVECTOR_SET_CONTENTS (bvec, NULL);

Possibly Guile's optimiser assumes that bytevectors never change in
length (needs to be checked).  So unless the relevant optimiser code is
changed, and it is documented that bytevectors can change in length, I
think it would be safer to not have an unmapping procedure in Scheme
(though a procedure for remapping it as /dev/zero should be safe).

> +  bvec = scm_c_take_typed_bytevector((signed char *) c_mem + c_offset, c_len,
> +				     SCM_ARRAY_ELEMENT_TYPE_VU8, pointer);

Would scm_pointer_to_bytevector fit here?  Also, scm_c_make_typed_bytevector
looks like something that can cause an out-of-memory-exception but the
finaliser hasn't been set yet.


>+// call fstat to get file size
>+SCM_DEFINE (scm_mmap_file, "mmap-file", 1, 1, 0, 
>+            (SCM file, SCM prot),
>+	    "This procedure accepts a file in the form of filename,\n"
>+            " file-port or fd.  It returns a bytevector.  It must
>not\n"
>+            " contain scheme allocated objects as it will not be\n"
>+            " searched for pointers. Default @var{prot} is @code{\"r\"}.")

I would restrict the C code to only ports and file descriptors, and leave file
names to a Scheme wrapper.  That way, you automatically get appropriate E...
errors in case open-file fails (and maybe &i/o-filename etc. if the core Guile
FS functions are later changed to R6RS), less chance on accidentally forgetting
to close a fd (*) ...

(*) One possible problem: if the file is opened, and mmap fails, then you still
need to close the file port (and rethrow), so some exception handling is still
required, though no C-style exception handling ...

> +/* The following copied from bytevectors.c. Kludge? */
> +#define SCM_BYTEVECTOR_SET_LENGTH(_bv, _len)            \
> +  SCM_SET_CELL_WORD_1 ((_bv), (scm_t_bits) (_len))
> +#define SCM_BYTEVECTOR_SET_CONTENTS(_bv, _contents)	\
> +  SCM_SET_CELL_WORD_2 ((_bv), (scm_t_bits) (_contents))

To avoid accidental problems if bytevectors.c is modified later, I'd add add a comment
in bytevectors.c referring to this file, to add a reminder that mmunmap
makes assumptions about the layout.

> +  scm_c_define ("PAGE_SIZE", scm_from_int (getpagesize()));

From local man page:

       SVr4,  4.4BSD,  SUSv2.   In SUSv2 the getpagesize() call is labeled LEGACY, and in
       POSIX.1-2001 it has been dropped; HP-UX does not have this call.

NOTES
       Portable applications should  employ  sysconf(_SC_PAGESIZE)  instead  of  getpage‐
       size():

Greetings,
Maxime.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mmap for guile
  2022-06-26 15:37 mmap for guile Matt Wette
                   ` (2 preceding siblings ...)
  2022-06-26 18:11 ` Maxime Devos
@ 2022-06-26 18:21 ` Maxime Devos
  3 siblings, 0 replies; 13+ messages in thread
From: Maxime Devos @ 2022-06-26 18:21 UTC (permalink / raw)
  To: Matt Wette, guile-devel, Guile User

[-- Attachment #1: Type: text/plain, Size: 1239 bytes --]

Matt Wette schreef op zo 26-06-2022 om 08:37 [-0700]:
> scheme@(guile-user)> ,d mmap/search
> - Scheme Procedure: mmap/search addr len [prot [flags [fd [offset]]]]
>       See the unix man page for mmap.  Returns a bytevector.  Note that
>       the region allocated will be searched by the garbage collector for
>       pointers.  Defaults:

The types aren't documented and I don't think just referring to the man
page is sufficient, because C and Scheme have actually different APIs
and semantics -- e.g., if I don't care about the address, can I set it
to #false?  Is it only fds or also ports?  Why does the abbreviation
'addr' mean (why not in full: address, likewise for 'len')?  C has its
own style of error reporting, but how does this map to Scheme? Do I get
a ENOENT return value, or a ENOENT system-error, an &i/o-file-is-read-
only, an &i/o-filename, an &i/o-invalid-position? Is the address a raw
number or a bytevector (made with pointer->bytevector) or a pointer? 
Is the offset inside the file, or w.r.t. the current position?

I think it's fine to not document all the MAP_ flags, but I think we
need at least the basics (MAP_ANONYMOUS, PROT_READ, PROT_WRITE).

Greetings,
Maxime.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mmap for guile
  2022-06-26 18:11 ` Maxime Devos
@ 2022-07-04 10:09   ` Ludovic Courtès
  2022-07-04 13:14     ` Greg Troxel
                       ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Ludovic Courtès @ 2022-07-04 10:09 UTC (permalink / raw)
  To: guile-user; +Cc: guile-devel

Hello,

Having ‘mmap’ bindings would be much welcome.

Maxime Devos <maximedevos@telenet.be> skribis:

> +SCM_DEFINE (scm_mmap_search, "mmap/search", 2, 4, 0, 
> +            (SCM addr, SCM len, SCM prot, SCM flags, SCM fd, SCM
> offset),
> +	    "See the unix man page for mmap.  Returns a bytevector.\n"
> +	    "Note that the region allocated will be searched by the
> garbage\n"
> +	    "collector for pointers.  Defaults:\n"
>
> I think it would be a good idea to document it will be automatically
> unmapped during GC, as this is a rather low-leel interface

Agreed.

I was wondering about offering an explicit ‘munmap’ interface: it would
be useful for fine-grain OS resource management, just like ‘close-fdes’.

Doing that naively would mean that one can trivially get a pure Scheme
program to segfault, which is contrary to what we do.

But we could provide special semantics: the bytevector would become
zero-length (possible, but weird, as Maxime points out), or it would be
turned into a /dev/zero mapping (weird as well).

Thoughts?

> Also, what if you mmap a region, use bytevector->pointer and pass it to
> some C thing, which saves the pointer somewhere where boehm-gc can find
> it and boehm-gc considers it to be live, is there something that
> prevents boehm-gc from improperly calling the finalizer & unmapping the
> region, causing a dangling pointer?

There’s a risk, but I don’t think it’s specific to mmap.

> Also, WDYT of using ports instead of raw fds in the API?  That would
> play nicer with move->fdes etc.

Agreed.

>>+  /* Invalidate further work on this bytevector. */
>>+  SCM_BYTEVECTOR_SET_LENGTH (bvec, 0);
>>+  SCM_BYTEVECTOR_SET_CONTENTS (bvec, NULL);
>
> Possibly Guile's optimiser assumes that bytevectors never change in
> length (needs to be checked).  So unless the relevant optimiser code is
> changed, and it is documented that bytevectors can change in length, I
> think it would be safer to not have an unmapping procedure in Scheme
> (though a procedure for remapping it as /dev/zero should be safe).

I don’t think the optimizer makes any such assumption, except for
literal bytevectors.

Besides what Maxime points out, some more superficial issues:

  • In documentation, please refer to the relevant glibc section instead
    of “See man page” (info "(libc) Memory-mapped I/O").

  • Please update doc/ref with a section on memory-mapped I/O.

  • Make sure to follow the GNU coding in C: space before opening paren,
    braces on a line of their own, etc.

Since you already have a copyright assignment on file, there won’t be
administrative delays, which is a good thing.  :-)

I hope we can have those ‘mmap’ bindings soonish!

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mmap for guile
  2022-07-04 10:09   ` Ludovic Courtès
@ 2022-07-04 13:14     ` Greg Troxel
  2022-07-04 20:03       ` Ludovic Courtès
  2022-07-19 13:20     ` Maxime Devos
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Greg Troxel @ 2022-07-04 13:14 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-user, guile-devel

[-- Attachment #1: Type: text/plain, Size: 945 bytes --]


Ludovic Courtès <ludo@gnu.org> writes:

> Besides what Maxime points out, some more superficial issues:
>
>   • In documentation, please refer to the relevant glibc section instead
>     of “See man page” (info "(libc) Memory-mapped I/O").
>
>   • Please update doc/ref with a section on memory-mapped I/O.
>
>   • Make sure to follow the GNU coding in C: space before opening paren,
>     braces on a line of their own, etc.

I have been meaning to try to build this under NetBSD, to check
portability. I think the mmap code should by default rely only on what
POSIX guarantees:
  https://pubs.opengroup.org/onlinepubs/9699919799/functions/mmap.html

As for referring to glibc, that reference only resolves on GNU/Linux
systems, whereas any POSIX system ought to have an mmap man page, so it
would be nice not to drop the man page ref, esp. as it grounds the
implementation as being about the POSIX interface.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 194 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mmap for guile
  2022-07-04 13:14     ` Greg Troxel
@ 2022-07-04 20:03       ` Ludovic Courtès
  2022-07-05 12:49         ` Greg Troxel
  0 siblings, 1 reply; 13+ messages in thread
From: Ludovic Courtès @ 2022-07-04 20:03 UTC (permalink / raw)
  To: Greg Troxel; +Cc: guile-user, guile-devel

Hi,

Greg Troxel <gdt@lexort.com> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Besides what Maxime points out, some more superficial issues:
>>
>>   • In documentation, please refer to the relevant glibc section instead
>>     of “See man page” (info "(libc) Memory-mapped I/O").
>>
>>   • Please update doc/ref with a section on memory-mapped I/O.
>>
>>   • Make sure to follow the GNU coding in C: space before opening paren,
>>     braces on a line of their own, etc.
>
> I have been meaning to try to build this under NetBSD, to check
> portability. I think the mmap code should by default rely only on what
> POSIX guarantees:
>   https://pubs.opengroup.org/onlinepubs/9699919799/functions/mmap.html

Agreed.

> As for referring to glibc, that reference only resolves on GNU/Linux
> systems, whereas any POSIX system ought to have an mmap man page, so it
> would be nice not to drop the man page ref, esp. as it grounds the
> implementation as being about the POSIX interface.

The manual can mention the man page too, sure, but note that the manual
already refers to glibc’s for all things libc.

Ludo’.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mmap for guile
  2022-07-04 20:03       ` Ludovic Courtès
@ 2022-07-05 12:49         ` Greg Troxel
  0 siblings, 0 replies; 13+ messages in thread
From: Greg Troxel @ 2022-07-05 12:49 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 1589 bytes --]


Ludovic Courtès <ludo@gnu.org> writes:

> Greg Troxel <gdt@lexort.com> skribis:
>
>> I have been meaning to try to build this under NetBSD, to check
>> portability. I think the mmap code should by default rely only on what
>> POSIX guarantees:
>>   https://pubs.opengroup.org/onlinepubs/9699919799/functions/mmap.html
>
> Agreed.

Great to hear that.  There are a bunch of guile fans in BSD-land and we
try to keep up :-) After I wrote, I actually did get to testing on
netbsd-9 amd64, which was easier than I thought it might be.  With the
patch Matt sent, guile 3.0.8 built (and packaged under pkgsrc), and I
was able to see the mmap functions docs in the repl.

I tried to run the tests but got an undefined symbol, not apparently
related.  My machine is in the process of rebuilding most packages and I
will revist when that's done, starting with tests on unmodified 3.0.8.

>> As for referring to glibc, that reference only resolves on GNU/Linux
>> systems, whereas any POSIX system ought to have an mmap man page, so it
>> would be nice not to drop the man page ref, esp. as it grounds the
>> implementation as being about the POSIX interface.
>
> The manual can mention the man page too, sure, but note that the manual
> already refers to glibc’s for all things libc.

I see -- I guess there is a fine line between glibc the implementation
of what C99/POSIX defines as "what must be in libc" the spec, but that's
a big can of worms and if this documentation ends up like the existing
one, then it is not reasonable for me to worry about it :-)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 194 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mmap for guile
  2022-07-04 10:09   ` Ludovic Courtès
  2022-07-04 13:14     ` Greg Troxel
@ 2022-07-19 13:20     ` Maxime Devos
  2022-07-21  9:14       ` Ludovic Courtès
  2022-07-19 13:30     ` Maxime Devos
  2022-07-19 13:34     ` Maxime Devos
  3 siblings, 1 reply; 13+ messages in thread
From: Maxime Devos @ 2022-07-19 13:20 UTC (permalink / raw)
  To: Ludovic Courtès, guile-user; +Cc: guile-devel

Ludovic Courtès schreef op ma 04-07-2022 om 12:09 [+0200]:
> I don’t think the optimizer makes any such assumption, except for
> literal bytevectors.

It _does_ assume that bytevector lengths don't change:

,use (rnrs bytevectors)
,compile (bytevector-u8-ref some-variable 999)

;;; <unknown-location>: warning: possibly unbound variable `some-variable'
Disassembly of <unnamed function> at #xe8:

   0    (instrument-entry 254)                                at (unknown file):4:9
   2    (assert-nargs-ee/locals 1 3)    ;; 4 slots (0 args)
   3    (call-scm<-thread 3 62)                               at (unknown file):4:28
   5    (static-ref 2 187)              ;; some-variable
   7    (call-scm<-scm-scm 3 3 2 111)   
   9    (scm-ref/immediate 3 3 1)       
  10    (make-immediate 2 3998)         ;; 999                at (unknown file):4:42
  11    (immediate-tag=? 3 7 0)         ;; heap-object?       at (unknown file):4:9
  13    (jne 21)                        ;; -> L2
  14    (heap-tag=? 3 127 77)           ;; bytevector?
  16    (jne 18)                        ;; -> L2
  17    (word-ref/immediate 1 3 1)      
  18    (load-s64 0 0 999)              
  21    (imm-u64<? 1 0)                 
  22    (jnl 10)                        ;; -> L1
  23    (usub/immediate 1 1 0)          
  24    (u64<? 0 1)                     
  25    (jnl 7)                         ;; -> L1
  26    (pointer-ref/immediate 2 3 2)   
  27    (u8-ref 3 2 0)                  
  28    (tag-fixnum 3 3)                
  29    (reset-frame 1)                 ;; 1 slot
  30    (handle-interrupts)             
  31    (return-values)                 
L1:
  32    (throw/value+data 2 188)        ;; #(out-of-range "bytevector-u8-ref" "Argument 2 out of range: ~S")
L2:
  34    (throw/value+data 3 212)        ;; #(wrong-type-arg "bytevector-u8-ref" "Wrong type argument in position 1 (ex…")

As can be seen in the above output, it first determines
the length of the bytevector and then compares it again the index
(for the index check) and then actually reads th byte
so it assumes that the length doesn't change in-between -- it's not an
atomic operation!

Even if it didn't, being able to assume bytevector lengths don't change
is important for optimising code like (begin (bytevector-ref bv 9000) (bytevector-ref 8999) ...)),
to optimise out many range checks, though I don't know if Guile currently
does that.

Greetings,
Maxime.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mmap for guile
  2022-07-04 10:09   ` Ludovic Courtès
  2022-07-04 13:14     ` Greg Troxel
  2022-07-19 13:20     ` Maxime Devos
@ 2022-07-19 13:30     ` Maxime Devos
  2022-07-19 13:34     ` Maxime Devos
  3 siblings, 0 replies; 13+ messages in thread
From: Maxime Devos @ 2022-07-19 13:30 UTC (permalink / raw)
  To: Ludovic Courtès, guile-user; +Cc: guile-devel

Ludovic Courtès schreef op ma 04-07-2022 om 12:09 [+0200]:
> But we could provide special semantics: the bytevector would become
> zero-length (possible, but weird, as Maxime points out), or it would
> be turned into a /dev/zero mapping (weird as well).
> 
> Thoughts?

The former is weird and can currently cause out-of-bound reads/writes
(see the TOCTTOU issue I pointed out).  It also would also prevent
optimisations (for things like (bytevector-ref bv 9999) (bytevector-ref
bv 1), to eliminate some range checks).

The latter is weird too, but it's easy to reason about -- the compiler
can assume the bytevector length is fixed, so no TOCTTOU issue and
possibility of optimisations, user code also doesn't have to worry
about changing lengths, no bad interactions with the
bytevector->pointer + pointer->bytevector trick to reduce the range of
a bytevector ... (except potential GC issues, to which I haven't yet
received a response to test for them or resolve them or document
limitations).

As such, I would recommend the latter.  Though if we go for the former,
we might as well implement a bytevector-free! (why is there a munmap
but not a bytevector-free!).

Greetings,
Maxime.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mmap for guile
  2022-07-04 10:09   ` Ludovic Courtès
                       ` (2 preceding siblings ...)
  2022-07-19 13:30     ` Maxime Devos
@ 2022-07-19 13:34     ` Maxime Devos
  3 siblings, 0 replies; 13+ messages in thread
From: Maxime Devos @ 2022-07-19 13:34 UTC (permalink / raw)
  To: Ludovic Courtès, guile-user; +Cc: guile-devel

Ludovic Courtès schreef op ma 04-07-2022 om 12:09 [+0200]:
> > Also, what if you mmap a region, use bytevector->pointer and pass
> > it to
> > some C thing, which saves the pointer somewhere where boehm-gc can
> > find
> > it and boehm-gc considers it to be live, is there something that
> > prevents boehm-gc from improperly calling the finalizer & unmapping
> > the
> > region, causing a dangling pointer?
> 
> There’s a risk, but I don’t think it’s specific to mmap.

AFAIK, make-bytevector doesn't have an issue here, so by default I
would assume mmap to not have it either, unless documented otherwise.
However, I am not seeing tests or analysis that verify there's no issue
here even though plausibly there might be an issue and neither am I
seeing documentation that this doesn't work (for mmap in specific, or
more generally).

Greetings,
Maxime.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mmap for guile
  2022-07-19 13:20     ` Maxime Devos
@ 2022-07-21  9:14       ` Ludovic Courtès
  0 siblings, 0 replies; 13+ messages in thread
From: Ludovic Courtès @ 2022-07-21  9:14 UTC (permalink / raw)
  To: Maxime Devos; +Cc: guile-user, guile-devel

Hi,

Maxime Devos <maximedevos@telenet.be> skribis:

> Ludovic Courtès schreef op ma 04-07-2022 om 12:09 [+0200]:
>> I don’t think the optimizer makes any such assumption, except for
>> literal bytevectors.
>
> It _does_ assume that bytevector lengths don't change:

[...]

> As can be seen in the above output, it first determines
> the length of the bytevector and then compares it again the index
> (for the index check) and then actually reads th byte
> so it assumes that the length doesn't change in-between -- it's not an
> atomic operation!

Oh right.

> Even if it didn't, being able to assume bytevector lengths don't change
> is important for optimising code like (begin (bytevector-ref bv 9000) (bytevector-ref 8999) ...)),
> to optimise out many range checks, though I don't know if Guile currently
> does that.

Indeed.

Ludo’.



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-07-21  9:14 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-26 15:37 mmap for guile Matt Wette
2022-06-26 16:21 ` Matt Wette
2022-06-26 17:06 ` Olivier Dion via Developers list for Guile, the GNU extensibility library
2022-06-26 18:11 ` Maxime Devos
2022-07-04 10:09   ` Ludovic Courtès
2022-07-04 13:14     ` Greg Troxel
2022-07-04 20:03       ` Ludovic Courtès
2022-07-05 12:49         ` Greg Troxel
2022-07-19 13:20     ` Maxime Devos
2022-07-21  9:14       ` Ludovic Courtès
2022-07-19 13:30     ` Maxime Devos
2022-07-19 13:34     ` Maxime Devos
2022-06-26 18:21 ` Maxime Devos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).