unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* thinking out loud: wip-rtl, ELF, pages, and mmap
@ 2013-04-24 20:23 Andy Wingo
  2013-04-24 22:20 ` dsmich
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Andy Wingo @ 2013-04-24 20:23 UTC (permalink / raw)
  To: guile-devel

Hi,

I've been working on wip-rtl recently.  The goal is to implement good
debugging.  I'll give a bit of background and then get to my problem.

In master, ".go" files are written in the ELF format.  ELF is nice
because it embodies common wisdom on how to structure object files, and
this wisdom applies to Guile fairly directly.  To simplify, ELF files
are cheap to load and useful to introspect.  The former is achieved with
"segments", which basically correspond to mmap'd blocks of memory.  The
latter is achieved by "sections", which describe parts of the file.  The
table of segments is usually written at the beginning of the file, to
make loading easier, and the table of sections is usually at the end, as
it's not usually needed at runtime.  There are usually fewer segments
than sections.  You can have segments in the file that are marked as not
being loaded into memory at runtime.  Usually this is the case for
debugging information.

OK, so that's ELF.  The conventional debugging format to use with ELF is
DWARF, and it's pretty well thought out.  In Guile we'll probably use
DWARF, along with some more basic metadata in .symtab sections.

I should mention that in master, the ELF files are simple wrappers over
2.0-style objcode.  The wip-rtl branch takes more advantage of ELF --
for example, to allocate some constants in read-only shareable memory,
and to statically allocate any constants that need initialization or
relocation at runtime.  ELF also has advantages when we start to do
native compilation: native code can go in another section, for example.

                            *   *   *

OK, so that's the thing.  I recently added support for writing .symtab
sections, and have been looking on how to load that up at runtime, for
example when disassembling functions.  To be complete, there are a few
other common operations that would require loading debug information:

  * Procedure names.
  * Line/column information, for example in backtraces.
  * Arity information and argument names.
  * Local variable names and live ranges (the ,locals REPL command).
  * Generic procedure metadata.

Anyway!  How do you avoid loading this information at runtime?

The original solution I had in mind was to put them in ELF segments that
don't get loaded.  Then at runtime you would somehow map from an IP to
an ELF object, and at that point you would lazily load the unloaded ELF
sections.

But that has a few disadvantages.  One is that it's difficult to ensure
that the lazily-loaded object is the same as the one that you originally
loaded.  We don't keep .go file descriptors open currently, and
debugging would be a bad reason to do so.

Another more serious is that this is a lot of work, actually.  There's a
constant overhead of the data about what is loaded and how to load what
isn't, and the cross-references from the debug info to the loaded info
is tricky.

Then I realized: why am I doing all of this if the kernel has a virtual
memory system already that does all this for me?

So I have a new plan, I think.  I'll change the linker to always emit
sections and segments that correspond exactly in their on-disk layout
and in their in-memory layout.  (In ELF terms: segments are contiguous,
with p_memsz == p_filesz.)  I'll put commonly needed things at the
beginning, and debugging info and the section table at the end.  Then
I'll just map the whole thing with PROT_READ, and set PROT_WRITE on
those page-aligned segments that need it.  (Obviously in the future,
PROT_EXEC as well.)

Then I'll just record a list of ELF objects that have been loaded.
Simple bisection will map IP -> ELF, and from there we have the section
table in memory (lazily paged in by the virtual memory system) and can
find the symtab and other debug info.

So that's the plan.  It's a significant change, and I wondered if folks
had some experience or reactions.

Note that we have a read()-based fallback if mmap is not available.
This strategy also makes the read-based fallback easier.

Thoughts?

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: thinking out loud: wip-rtl, ELF, pages, and mmap
  2013-04-24 20:23 thinking out loud: wip-rtl, ELF, pages, and mmap Andy Wingo
@ 2013-04-24 22:20 ` dsmich
  2013-04-28 12:45 ` Andy Wingo
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: dsmich @ 2013-04-24 22:20 UTC (permalink / raw)
  To: guile-devel, Andy Wingo

---- Andy Wingo <wingo@pobox.com> wrote: 

...

> Then I realized: why am I doing all of this if the kernel has a virtual
> memory system already that does all this for me?
> 
> So I have a new plan, I think.  I'll change the linker to always emit
> sections and segments that correspond exactly in their on-disk layout
> and in their in-memory layout.  (In ELF terms: segments are contiguous,
> with p_memsz == p_filesz.)  I'll put commonly needed things at the
> beginning, and debugging info and the section table at the end.  Then
> I'll just map the whole thing with PROT_READ, and set PROT_WRITE on
> those page-aligned segments that need it.  (Obviously in the future,
> PROT_EXEC as well.)
> 
> Then I'll just record a list of ELF objects that have been loaded.
> Simple bisection will map IP -> ELF, and from there we have the section
> table in memory (lazily paged in by the virtual memory system) and can
> find the symtab and other debug info.
> 
> So that's the plan.  It's a significant change, and I wondered if folks
> had some experience or reactions.
> 
> Note that we have a read()-based fallback if mmap is not available.
> This strategy also makes the read-based fallback easier.
> 
> Thoughts?

Sounds like a great idea!  I think it's good to use the facilities
that already exist instead of reinventing the wheel.

A thought was that this would artificially bloat the size of the process
when the extra info isn't being used.  But so what?  The *real* memory
won't be consumed.  And it's not like we are running out of addressable
memory space anyway.  It will be a bit ickier for systems without
virtual memory.

What about non-ELF systems?  You are using your own ELF reader and mmap, not
relying on the system to load the files right?  So that's not an issue?

-Dale




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: thinking out loud: wip-rtl, ELF, pages, and mmap
  2013-04-24 20:23 thinking out loud: wip-rtl, ELF, pages, and mmap Andy Wingo
  2013-04-24 22:20 ` dsmich
@ 2013-04-28 12:45 ` Andy Wingo
  2013-04-28 15:23 ` Ludovic Courtès
  2013-04-29  5:47 ` Nala Ginrut
  3 siblings, 0 replies; 5+ messages in thread
From: Andy Wingo @ 2013-04-28 12:45 UTC (permalink / raw)
  To: guile-devel

Hi,

On Wed 24 Apr 2013 22:23, Andy Wingo <wingo@pobox.com> writes:

> I'll change the linker to always emit sections and segments that
> correspond exactly in their on-disk layout and in their in-memory
> layout.  (In ELF terms: segments are contiguous, with p_memsz ==
> p_filesz.)  I'll put commonly needed things at the beginning, and
> debugging info and the section table at the end.  Then I'll just map
> the whole thing with PROT_READ, and set PROT_WRITE on those
> page-aligned segments that need it.  (Obviously in the future,
> PROT_EXEC as well.)
>
> Then I'll just record a list of ELF objects that have been loaded.

I've done this now.  If you are on wip-rtl, you will need to recompile
all of your .go files, as the loader now checks for these constraints.

> Simple bisection will map IP -> ELF, and from there we have the section
> table in memory (lazily paged in by the virtual memory system) and can
> find the symtab and other debug info.

I haven't done this yet, but at this point we can do it in Scheme I
think.

Regards,

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: thinking out loud: wip-rtl, ELF, pages, and mmap
  2013-04-24 20:23 thinking out loud: wip-rtl, ELF, pages, and mmap Andy Wingo
  2013-04-24 22:20 ` dsmich
  2013-04-28 12:45 ` Andy Wingo
@ 2013-04-28 15:23 ` Ludovic Courtès
  2013-04-29  5:47 ` Nala Ginrut
  3 siblings, 0 replies; 5+ messages in thread
From: Ludovic Courtès @ 2013-04-28 15:23 UTC (permalink / raw)
  To: guile-devel

Hello!

From my high-level POV, this all makes sense, and sounds very cool.  :-)

Ludo’.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: thinking out loud: wip-rtl, ELF, pages, and mmap
  2013-04-24 20:23 thinking out loud: wip-rtl, ELF, pages, and mmap Andy Wingo
                   ` (2 preceding siblings ...)
  2013-04-28 15:23 ` Ludovic Courtès
@ 2013-04-29  5:47 ` Nala Ginrut
  3 siblings, 0 replies; 5+ messages in thread
From: Nala Ginrut @ 2013-04-29  5:47 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

On Wed, 2013-04-24 at 22:23 +0200, Andy Wingo wrote:
> Hi,
> 
> I've been working on wip-rtl recently.  The goal is to implement good
> debugging.  I'll give a bit of background and then get to my problem.
> 
> In master, ".go" files are written in the ELF format.  ELF is nice
> because it embodies common wisdom on how to structure object files, and
> this wisdom applies to Guile fairly directly.  To simplify, ELF files
> are cheap to load and useful to introspect.  The former is achieved with
> "segments", which basically correspond to mmap'd blocks of memory.  The
> latter is achieved by "sections", which describe parts of the file.  The
> table of segments is usually written at the beginning of the file, to
> make loading easier, and the table of sections is usually at the end, as
> it's not usually needed at runtime.  There are usually fewer segments
> than sections.  You can have segments in the file that are marked as not
> being loaded into memory at runtime.  Usually this is the case for
> debugging information.
> 
> OK, so that's ELF.  The conventional debugging format to use with ELF is
> DWARF, and it's pretty well thought out.  In Guile we'll probably use
> DWARF, along with some more basic metadata in .symtab sections.
> 

I'm very glad to see that ;-)
And we it's possible to debug .go with GDB.

> I should mention that in master, the ELF files are simple wrappers over
> 2.0-style objcode.  The wip-rtl branch takes more advantage of ELF --
> for example, to allocate some constants in read-only shareable memory,
> and to statically allocate any constants that need initialization or
> relocation at runtime.  ELF also has advantages when we start to do
> native compilation: native code can go in another section, for example.
> 

Seems rtl's compiling is faster, at least for boot-9.scm
But I didn't give it a test.

It's possible to have more than one external AOT compiler except the
official inner one. Maybe it's unnecessary.

>                             *   *   *
> 
> OK, so that's the thing.  I recently added support for writing .symtab
> sections, and have been looking on how to load that up at runtime, for
> example when disassembling functions.  To be complete, there are a few
> other common operations that would require loading debug information:
> 
>   * Procedure names.
>   * Line/column information, for example in backtraces.
>   * Arity information and argument names.
>   * Local variable names and live ranges (the ,locals REPL command).
>   * Generic procedure metadata.
> 

And I hope there's the number of begin line and the end line for a
procedure. It's easy to record it when compiling. If no, I have to parse
the source file to confirm it, and provide the source code printing in
REPL/debugger.

> Anyway!  How do you avoid loading this information at runtime?
> 

IMO, we should provide the strip command to guild.
Or vice versa, --debug to the compile option.
Let users decide whether to keep the debug info.

> The original solution I had in mind was to put them in ELF segments that
> don't get loaded.  Then at runtime you would somehow map from an IP to
> an ELF object, and at that point you would lazily load the unloaded ELF
> sections.
> 
> But that has a few disadvantages.  One is that it's difficult to ensure
> that the lazily-loaded object is the same as the one that you originally
> loaded.  We don't keep .go file descriptors open currently, and
> debugging would be a bad reason to do so.
> 
> Another more serious is that this is a lot of work, actually.  There's a
> constant overhead of the data about what is loaded and how to load what
> isn't, and the cross-references from the debug info to the loaded info
> is tricky.
> 
> Then I realized: why am I doing all of this if the kernel has a virtual
> memory system already that does all this for me?
> 
> So I have a new plan, I think.  I'll change the linker to always emit
> sections and segments that correspond exactly in their on-disk layout
> and in their in-memory layout.  (In ELF terms: segments are contiguous,
> with p_memsz == p_filesz.)  I'll put commonly needed things at the
> beginning, and debugging info and the section table at the end.  Then
> I'll just map the whole thing with PROT_READ, and set PROT_WRITE on
> those page-aligned segments that need it.  (Obviously in the future,
> PROT_EXEC as well.)
> 

Yeah, when we have AOT ;-P

> Then I'll just record a list of ELF objects that have been loaded.
> Simple bisection will map IP -> ELF, and from there we have the section
> table in memory (lazily paged in by the virtual memory system) and can
> find the symtab and other debug info.
> 
> So that's the plan.  It's a significant change, and I wondered if folks
> had some experience or reactions.
> 
> Note that we have a read()-based fallback if mmap is not available.
> This strategy also makes the read-based fallback easier.
> 
> Thoughts?
> 
> Andy





^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-04-29  5:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-24 20:23 thinking out loud: wip-rtl, ELF, pages, and mmap Andy Wingo
2013-04-24 22:20 ` dsmich
2013-04-28 12:45 ` Andy Wingo
2013-04-28 15:23 ` Ludovic Courtès
2013-04-29  5:47 ` Nala Ginrut

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).