unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* [Feature Request] Some ideas on 'mmap'
@ 2013-04-30  9:27 Nala Ginrut
  2013-04-30 10:12 ` Daniel Hartwig
  2013-05-01  0:08 ` Ian Price
  0 siblings, 2 replies; 12+ messages in thread
From: Nala Ginrut @ 2013-04-30  9:27 UTC (permalink / raw)
  To: guile-devel

hi guys!
A discussion on IRC about adding 'mmap' raised, and I will share some
ideas on this topic:

1. Complex one or simple one?
The simple one is just a simple wrapper taking advantage of (system
foreign).
The complex one is more alike Python's mmap module. Contains a special
type of <mmap>, and a bunch of functions to handle it.

2. If simple one, just leave the rest of the mail alone.
But if complex one was chosen, we need (ice-9 mmap).

3. I'll choose Python alike interface for mmap:
(mmap port #:optional length prog flags offset)
And if port is #f, it works as anonymous-mapping.

4. mmap returns a new type <mmap>, and it may work like 'array':
(define m (mmap port))
(mmap-ref m 10 20)
(mmap-set! m 0 0 1024)
The interfaces maybe:
(mmap-ref <mmap> from to) ==> return u8 bytevector
(mmap-set! <mmap> byte from to) ==> return unspecified

5. use munmap to release it
(munmap <mmap>)

6. other helper functions also available:
(mmap-find <mmap> string #:optional start end)
(mmap-flush <mmap> #:optional offset size)
(mmap-move <mmap> dest src count)
(mmap-read <mmap> num)
(mmap-readbyte <mmap>)
(mmap-readline <mmap>)
(mmap-resize <mmap> newsize)
(mmap-rfind <mmap> string #:optional start end)
(mmap-seek <mmap> pos #:optional whence)
(mmap-size <mmap>)
(mmap-tell <mmap>) ; Returns the current position of the file pointer.
(mmap-write <mmap> str/bv)
(mmap-writebyte <mmap> byte)

Comments?
Thanks!





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Feature Request] Some ideas on 'mmap'
  2013-04-30  9:27 [Feature Request] Some ideas on 'mmap' Nala Ginrut
@ 2013-04-30 10:12 ` Daniel Hartwig
  2013-04-30 13:49   ` Nala Ginrut
  2013-05-01  0:08 ` Ian Price
  1 sibling, 1 reply; 12+ messages in thread
From: Daniel Hartwig @ 2013-04-30 10:12 UTC (permalink / raw)
  To: Nala Ginrut; +Cc: guile-devel

On 30 April 2013 17:27, Nala Ginrut <nalaginrut@gmail.com> wrote:
> hi guys!
> A discussion on IRC about adding 'mmap' raised, and I will share some
> ideas on this topic:
>
> 1. Complex one or simple one?
> The simple one is just a simple wrapper taking advantage of (system
> foreign).
> The complex one is more alike Python's mmap module. Contains a special
> type of <mmap>, and a bunch of functions to handle it.
>
> 2. If simple one, just leave the rest of the mail alone.
> But if complex one was chosen, we need (ice-9 mmap).
>
> 3. I'll choose Python alike interface for mmap:
> (mmap port #:optional length prog flags offset)
> And if port is #f, it works as anonymous-mapping.
>
> 4. mmap returns a new type <mmap>, and it may work like 'array':
> (define m (mmap port))
> (mmap-ref m 10 20)
> (mmap-set! m 0 0 1024)
> The interfaces maybe:
> (mmap-ref <mmap> from to) ==> return u8 bytevector
> (mmap-set! <mmap> byte from to) ==> return unspecified
>

You may as well just have mmap return a bytevector or pointer to the
start, and forget about these redundant procedures.  mmaped data is a
bytevector, so no need to reimplement that interface with different
names.

> 5. use munmap to release it
> (munmap <mmap>)
>

> 6. other helper functions also available:

If you want a port, use a port.  Likewise for strings, bytevectors.

> (mmap-find <mmap> string #:optional start end)
> (mmap-flush <mmap> #:optional offset size)
> (mmap-move <mmap> dest src count)
> (mmap-read <mmap> num)
> (mmap-readbyte <mmap>)
> (mmap-readline <mmap>)
> (mmap-resize <mmap> newsize)
> (mmap-rfind <mmap> string #:optional start end)
> (mmap-seek <mmap> pos #:optional whence)
> (mmap-size <mmap>)
> (mmap-tell <mmap>) ; Returns the current position of the file pointer.
> (mmap-write <mmap> str/bv)
> (mmap-writebyte <mmap> byte)
>
> Comments?
> Thanks!
>
>
>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Feature Request] Some ideas on 'mmap'
  2013-04-30 10:12 ` Daniel Hartwig
@ 2013-04-30 13:49   ` Nala Ginrut
  2013-04-30 13:57     ` Daniel Hartwig
  2013-04-30 14:13     ` Daniel Hartwig
  0 siblings, 2 replies; 12+ messages in thread
From: Nala Ginrut @ 2013-04-30 13:49 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: guile-devel

On Tue, 2013-04-30 at 18:12 +0800, Daniel Hartwig wrote:
> On 30 April 2013 17:27, Nala Ginrut <nalaginrut@gmail.com> wrote:
> > hi guys!
> > A discussion on IRC about adding 'mmap' raised, and I will share some
> > ideas on this topic:
> >
> > 1. Complex one or simple one?
> > The simple one is just a simple wrapper taking advantage of (system
> > foreign).
> > The complex one is more alike Python's mmap module. Contains a special
> > type of <mmap>, and a bunch of functions to handle it.
> >
> > 2. If simple one, just leave the rest of the mail alone.
> > But if complex one was chosen, we need (ice-9 mmap).
> >
> > 3. I'll choose Python alike interface for mmap:
> > (mmap port #:optional length prog flags offset)
> > And if port is #f, it works as anonymous-mapping.
> >
> > 4. mmap returns a new type <mmap>, and it may work like 'array':
> > (define m (mmap port))
> > (mmap-ref m 10 20)
> > (mmap-set! m 0 0 1024)
> > The interfaces maybe:
> > (mmap-ref <mmap> from to) ==> return u8 bytevector
> > (mmap-set! <mmap> byte from to) ==> return unspecified
> >
> 
> You may as well just have mmap return a bytevector or pointer to the
> start, and forget about these redundant procedures.  mmaped data is a
> bytevector, so no need to reimplement that interface with different
> names.
> 

<mmap> is just a structure like this (maybe more?):
(define-record-type <mmap>
  ...
  (pointer mmap-pointer)
  (flag mmap-flag)
  (size mmap-size))
So it's actually a pointer, and with 'size', we could release it without
keep in mind the size we demanded.

If I use bytevector instead, it means I have to read all the content
from a file first. I don't think it's the same with mmap in POSIX.
mmap is used for very large data I/O, if we decide to read them all, we
lose the game.
mmap does lazy disk I/O automatically for the file.

> > 5. use munmap to release it
> > (munmap <mmap>)
> >
> 
> > 6. other helper functions also available:
> 
> If you want a port, use a port.  Likewise for strings, bytevectors.
> 

For an instance, in a multi-thread program, if we use port and need to
move the cursor, we have to remember/restore the cursor for other
threads. But if we use mmap, we don't have to do that, each thread keeps
their own pointer/index.
And why not read them all into a bytevector? Yes, it helps, but as I
explained, the very big file. 

> > (mmap-find <mmap> string #:optional start end)
> > (mmap-flush <mmap> #:optional offset size)
> > (mmap-move <mmap> dest src count)
> > (mmap-read <mmap> num)
> > (mmap-readbyte <mmap>)
> > (mmap-readline <mmap>)
> > (mmap-resize <mmap> newsize)
> > (mmap-rfind <mmap> string #:optional start end)
> > (mmap-seek <mmap> pos #:optional whence)
> > (mmap-size <mmap>)
> > (mmap-tell <mmap>) ; Returns the current position of the file pointer.
> > (mmap-write <mmap> str/bv)
> > (mmap-writebyte <mmap> byte)
> >
> > Comments?
> > Thanks!
> >
> >
> >





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Feature Request] Some ideas on 'mmap'
  2013-04-30 13:49   ` Nala Ginrut
@ 2013-04-30 13:57     ` Daniel Hartwig
  2013-04-30 14:23       ` Nala Ginrut
  2013-04-30 14:13     ` Daniel Hartwig
  1 sibling, 1 reply; 12+ messages in thread
From: Daniel Hartwig @ 2013-04-30 13:57 UTC (permalink / raw)
  To: Nala Ginrut; +Cc: guile-devel

On 30 April 2013 21:49, Nala Ginrut <nalaginrut@gmail.com> wrote:
> If I use bytevector instead, it means I have to read all the content
> from a file first. I don't think it's the same with mmap in POSIX.
> mmap is used for very large data I/O, if we decide to read them all, we
> lose the game.
> mmap does lazy disk I/O automatically for the file.
>

With the pointer that mmap returns you can pointer->bytevector.  This
will not read any of the file.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Feature Request] Some ideas on 'mmap'
  2013-04-30 13:49   ` Nala Ginrut
  2013-04-30 13:57     ` Daniel Hartwig
@ 2013-04-30 14:13     ` Daniel Hartwig
  1 sibling, 0 replies; 12+ messages in thread
From: Daniel Hartwig @ 2013-04-30 14:13 UTC (permalink / raw)
  To: Nala Ginrut; +Cc: guile-devel

On 30 April 2013 21:49, Nala Ginrut <nalaginrut@gmail.com> wrote:
>> > 6. other helper functions also available:
>>
>> If you want a port, use a port.  Likewise for strings, bytevectors.
>>
>
> For an instance, in a multi-thread program, if we use port and need to
> move the cursor, we have to remember/restore the cursor for other
> threads. But if we use mmap, we don't have to do that, each thread keeps
> their own pointer/index.
> And why not read them all into a bytevector? Yes, it helps, but as I
> explained, the very big file.
>

After mmap you have a pointer.  You can make any number of ports from
that pointer, each thread can just have its own port if that behaviour
is desired.

The useful information from mmap is a pointer.  There are already
procedures in place to wrap other datatypes/interface around pointers,
for example:

(define ptr (mmap arg ...))
(define port (open-bytevector-input-port (pointer->bytevector ptr len)))

No need to duplicate any of port, string, etc. interfaces.

Regards



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Feature Request] Some ideas on 'mmap'
  2013-04-30 13:57     ` Daniel Hartwig
@ 2013-04-30 14:23       ` Nala Ginrut
  2013-04-30 15:45         ` Noah Lavine
                           ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Nala Ginrut @ 2013-04-30 14:23 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: guile-devel

On Tue, 2013-04-30 at 21:57 +0800, Daniel Hartwig wrote:
> On 30 April 2013 21:49, Nala Ginrut <nalaginrut@gmail.com> wrote:
> > If I use bytevector instead, it means I have to read all the content
> > from a file first. I don't think it's the same with mmap in POSIX.
> > mmap is used for very large data I/O, if we decide to read them all, we
> > lose the game.
> > mmap does lazy disk I/O automatically for the file.
> >
> 
> With the pointer that mmap returns you can pointer->bytevector.  This
> will not read any of the file.

Ah~nice! That's the critical hint to reduce the work.
Yes, after mmap, we don't need other things anymore.
But I still recommend that store 'size' & 'flags', which need a new
record-type and to write some helper functions, but very less code.

What other guys think?

And I'm amazing by the cool of Guile, again. ;-P
Thanks!





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Feature Request] Some ideas on 'mmap'
  2013-04-30 14:23       ` Nala Ginrut
@ 2013-04-30 15:45         ` Noah Lavine
  2013-05-02  9:44           ` Nala Ginrut
  2013-04-30 22:47         ` Daniel Hartwig
  2013-05-01 10:54         ` Chaos Eternal
  2 siblings, 1 reply; 12+ messages in thread
From: Noah Lavine @ 2013-04-30 15:45 UTC (permalink / raw)
  To: Nala Ginrut; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 1316 bytes --]

Hello,

Apologies if this is well-known and I just forgot it, but can bytevectors
be read-only? I think we'd need that to handle read-only mmap'ed memory.
(If not, I hope we could allow read-only bytevectors.)

Bytevectors include size, so there's no need to put that in a struct, but
I'm not sure what I think about the flags.

Noah


On Tue, Apr 30, 2013 at 7:23 AM, Nala Ginrut <nalaginrut@gmail.com> wrote:

> On Tue, 2013-04-30 at 21:57 +0800, Daniel Hartwig wrote:
> > On 30 April 2013 21:49, Nala Ginrut <nalaginrut@gmail.com> wrote:
> > > If I use bytevector instead, it means I have to read all the content
> > > from a file first. I don't think it's the same with mmap in POSIX.
> > > mmap is used for very large data I/O, if we decide to read them all, we
> > > lose the game.
> > > mmap does lazy disk I/O automatically for the file.
> > >
> >
> > With the pointer that mmap returns you can pointer->bytevector.  This
> > will not read any of the file.
>
> Ah~nice! That's the critical hint to reduce the work.
> Yes, after mmap, we don't need other things anymore.
> But I still recommend that store 'size' & 'flags', which need a new
> record-type and to write some helper functions, but very less code.
>
> What other guys think?
>
> And I'm amazing by the cool of Guile, again. ;-P
> Thanks!
>
>
>
>

[-- Attachment #2: Type: text/html, Size: 1995 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Feature Request] Some ideas on 'mmap'
  2013-04-30 14:23       ` Nala Ginrut
  2013-04-30 15:45         ` Noah Lavine
@ 2013-04-30 22:47         ` Daniel Hartwig
  2013-05-01 10:54         ` Chaos Eternal
  2 siblings, 0 replies; 12+ messages in thread
From: Daniel Hartwig @ 2013-04-30 22:47 UTC (permalink / raw)
  To: Nala Ginrut; +Cc: guile-devel

On 30 April 2013 22:23, Nala Ginrut <nalaginrut@gmail.com> wrote:
> But I still recommend that store 'size' & 'flags', which need a new
> record-type and to write some helper functions, but very less code.

And have you considered that you can munmap only some sections, how is
that going to work with your proposed struct and flags?

It is a low-level, posix interface.  The most sensible thing to do is
just return the pointer, let the user handle and track it however is
appropriate for them.

> Apologies if this is well-known and I just forgot it, but can
> bytevectors be read-only? I think we'd need that to handle
> read-only mmap'ed memory. (If not, I hope we could allow
> read-only bytevectors.)

No need, and complicating the implementation anyway.  Attempts to
write read-only memory is simple “an error”, which may or may not
generate a posix error.  Again, this is low-level stuff, it is enough
to leave the responsibility to the user who was the one to specify the
constraints in the first place.  Think of all the interfaces that can
possible wrap the mmaped region: bytevectors, ports, strings, arrays …
it becomes impossible to handle with assurance, except that if a write
(or other invalid operation) makes its way to a system call, it may
produce a posix error which is the catch all.

There is also the performance consideration, where potentially
multiple guile-level error checks (due to nested data types) will
occur on every access.  At that point, you are now losing some of the
benefits of having the mmap in the first place.

Low-level stuff, if you hurt yourself, thats just too bad.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Feature Request] Some ideas on 'mmap'
  2013-04-30  9:27 [Feature Request] Some ideas on 'mmap' Nala Ginrut
  2013-04-30 10:12 ` Daniel Hartwig
@ 2013-05-01  0:08 ` Ian Price
  1 sibling, 0 replies; 12+ messages in thread
From: Ian Price @ 2013-05-01  0:08 UTC (permalink / raw)
  To: Nala Ginrut; +Cc: guile-devel


There has been a lot of talk about the interface, but here's a simple
idea: Post some examples of how you want to use it.

We can all sit around and argue forever and a day about the API, but if
you want it, you probably have some real-word examples in mind and if
you don't, then that's a good reason not to add it.

(As an aside, exposing the mmap'ed memory as a bytevector was my first
instinct, but I worry about this complicating matters if we ever wanted
to change gc)
-- 
Ian Price -- shift-reset.com

"Programming is like pinball. The reward for doing it well is
the opportunity to do it again" - from "The Wizardy Compiled"



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Feature Request] Some ideas on 'mmap'
  2013-04-30 14:23       ` Nala Ginrut
  2013-04-30 15:45         ` Noah Lavine
  2013-04-30 22:47         ` Daniel Hartwig
@ 2013-05-01 10:54         ` Chaos Eternal
  2013-05-01 15:25           ` Nala Ginrut
  2 siblings, 1 reply; 12+ messages in thread
From: Chaos Eternal @ 2013-05-01 10:54 UTC (permalink / raw)
  To: Nala Ginrut; +Cc: guile-devel

but howto protect your pointer ?

Also, mmap-ed spaces can hardly be GCed, this will introduce extra complexities.

still see no extra necessary in compare to ports.

On Tue, Apr 30, 2013 at 10:23 PM, Nala Ginrut <nalaginrut@gmail.com> wrote:
> On Tue, 2013-04-30 at 21:57 +0800, Daniel Hartwig wrote:
>> On 30 April 2013 21:49, Nala Ginrut <nalaginrut@gmail.com> wrote:
>> > If I use bytevector instead, it means I have to read all the content
>> > from a file first. I don't think it's the same with mmap in POSIX.
>> > mmap is used for very large data I/O, if we decide to read them all, we
>> > lose the game.
>> > mmap does lazy disk I/O automatically for the file.
>> >
>>
>> With the pointer that mmap returns you can pointer->bytevector.  This
>> will not read any of the file.
>
> Ah~nice! That's the critical hint to reduce the work.
> Yes, after mmap, we don't need other things anymore.
> But I still recommend that store 'size' & 'flags', which need a new
> record-type and to write some helper functions, but very less code.
>
> What other guys think?
>
> And I'm amazing by the cool of Guile, again. ;-P
> Thanks!
>
>
>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Feature Request] Some ideas on 'mmap'
  2013-05-01 10:54         ` Chaos Eternal
@ 2013-05-01 15:25           ` Nala Ginrut
  0 siblings, 0 replies; 12+ messages in thread
From: Nala Ginrut @ 2013-05-01 15:25 UTC (permalink / raw)
  To: Chaos Eternal; +Cc: guile-devel

On Wed, 2013-05-01 at 18:54 +0800, Chaos Eternal wrote:
> but howto protect your pointer ?
> 

There's no bared pointer after pointer->bytevector or wrapped with
record-type. Anyway, since Guile can't operate pointers directly, so
it's no need to compare with C about 'protect pointer'.

> Also, mmap-ed spaces can hardly be GCed, this will introduce extra complexities.
> 

It's not 'hardly GCed', it has to be closed/munmap explicitly, just like
python's mmap does.

> still see no extra necessary in compare to ports.
> 
> On Tue, Apr 30, 2013 at 10:23 PM, Nala Ginrut <nalaginrut@gmail.com> wrote:
> > On Tue, 2013-04-30 at 21:57 +0800, Daniel Hartwig wrote:
> >> On 30 April 2013 21:49, Nala Ginrut <nalaginrut@gmail.com> wrote:
> >> > If I use bytevector instead, it means I have to read all the content
> >> > from a file first. I don't think it's the same with mmap in POSIX.
> >> > mmap is used for very large data I/O, if we decide to read them all, we
> >> > lose the game.
> >> > mmap does lazy disk I/O automatically for the file.
> >> >
> >>
> >> With the pointer that mmap returns you can pointer->bytevector.  This
> >> will not read any of the file.
> >
> > Ah~nice! That's the critical hint to reduce the work.
> > Yes, after mmap, we don't need other things anymore.
> > But I still recommend that store 'size' & 'flags', which need a new
> > record-type and to write some helper functions, but very less code.
> >
> > What other guys think?
> >
> > And I'm amazing by the cool of Guile, again. ;-P
> > Thanks!
> >
> >
> >





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Feature Request] Some ideas on 'mmap'
  2013-04-30 15:45         ` Noah Lavine
@ 2013-05-02  9:44           ` Nala Ginrut
  0 siblings, 0 replies; 12+ messages in thread
From: Nala Ginrut @ 2013-05-02  9:44 UTC (permalink / raw)
  To: Noah Lavine; +Cc: guile-devel

Here's the implementation in Artanis, and all the constant like
PROT_READ definition are ignored:

------------------ code -----------------
(define *libc-ffi* (dynamic-link))

(define %mmap
  (pointer->procedure '*
                      (dynamic-func "mmap" *libc-ffi*)
                      (list '* size_t int int int size_t)))

(define %munmap
  (pointer->procedure int
                      (dynamic-func "munmap" *libc-ffi*)
                      (list '* size_t)))

(define* (mmap size #:key (addr %null-pointer) (fd -1) 
              (prot MAP_SHARED) (flags PROT_READ) (offset 0))
  (pointer->bytevector (%mmap addr size prot flags fd offset) size))

(define (munmap bv size)
  (%munmap (bytevector->pointer bv size) size))
-------------------------end-----------------------------


Comments?
Thanks!





^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-05-02  9:44 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-30  9:27 [Feature Request] Some ideas on 'mmap' Nala Ginrut
2013-04-30 10:12 ` Daniel Hartwig
2013-04-30 13:49   ` Nala Ginrut
2013-04-30 13:57     ` Daniel Hartwig
2013-04-30 14:23       ` Nala Ginrut
2013-04-30 15:45         ` Noah Lavine
2013-05-02  9:44           ` Nala Ginrut
2013-04-30 22:47         ` Daniel Hartwig
2013-05-01 10:54         ` Chaos Eternal
2013-05-01 15:25           ` Nala Ginrut
2013-04-30 14:13     ` Daniel Hartwig
2013-05-01  0:08 ` Ian Price

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).