thinking out loud: wip-rtl, ELF, pages, and mmap

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

From: Andy Wingo <wingo@pobox.com>
To: guile-devel <guile-devel@gnu.org>
Subject: thinking out loud: wip-rtl, ELF, pages, and mmap
Date: Wed, 24 Apr 2013 22:23:33 +0200	[thread overview]
Message-ID: <874nevhe4a.fsf@pobox.com> (raw)

Hi,

I've been working on wip-rtl recently.  The goal is to implement good
debugging.  I'll give a bit of background and then get to my problem.

In master, ".go" files are written in the ELF format.  ELF is nice
because it embodies common wisdom on how to structure object files, and
this wisdom applies to Guile fairly directly.  To simplify, ELF files
are cheap to load and useful to introspect.  The former is achieved with
"segments", which basically correspond to mmap'd blocks of memory.  The
latter is achieved by "sections", which describe parts of the file.  The
table of segments is usually written at the beginning of the file, to
make loading easier, and the table of sections is usually at the end, as
it's not usually needed at runtime.  There are usually fewer segments
than sections.  You can have segments in the file that are marked as not
being loaded into memory at runtime.  Usually this is the case for
debugging information.

OK, so that's ELF.  The conventional debugging format to use with ELF is
DWARF, and it's pretty well thought out.  In Guile we'll probably use
DWARF, along with some more basic metadata in .symtab sections.

I should mention that in master, the ELF files are simple wrappers over
2.0-style objcode.  The wip-rtl branch takes more advantage of ELF --
for example, to allocate some constants in read-only shareable memory,
and to statically allocate any constants that need initialization or
relocation at runtime.  ELF also has advantages when we start to do
native compilation: native code can go in another section, for example.

                            *   *   *

OK, so that's the thing.  I recently added support for writing .symtab
sections, and have been looking on how to load that up at runtime, for
example when disassembling functions.  To be complete, there are a few
other common operations that would require loading debug information:

  * Procedure names.
  * Line/column information, for example in backtraces.
  * Arity information and argument names.
  * Local variable names and live ranges (the ,locals REPL command).
  * Generic procedure metadata.

Anyway!  How do you avoid loading this information at runtime?

The original solution I had in mind was to put them in ELF segments that
don't get loaded.  Then at runtime you would somehow map from an IP to
an ELF object, and at that point you would lazily load the unloaded ELF
sections.

But that has a few disadvantages.  One is that it's difficult to ensure
that the lazily-loaded object is the same as the one that you originally
loaded.  We don't keep .go file descriptors open currently, and
debugging would be a bad reason to do so.

Another more serious is that this is a lot of work, actually.  There's a
constant overhead of the data about what is loaded and how to load what
isn't, and the cross-references from the debug info to the loaded info
is tricky.

Then I realized: why am I doing all of this if the kernel has a virtual
memory system already that does all this for me?

So I have a new plan, I think.  I'll change the linker to always emit
sections and segments that correspond exactly in their on-disk layout
and in their in-memory layout.  (In ELF terms: segments are contiguous,
with p_memsz == p_filesz.)  I'll put commonly needed things at the
beginning, and debugging info and the section table at the end.  Then
I'll just map the whole thing with PROT_READ, and set PROT_WRITE on
those page-aligned segments that need it.  (Obviously in the future,
PROT_EXEC as well.)

Then I'll just record a list of ELF objects that have been loaded.
Simple bisection will map IP -> ELF, and from there we have the section
table in memory (lazily paged in by the virtual memory system) and can
find the symtab and other debug info.

So that's the plan.  It's a significant change, and I wondered if folks
had some experience or reactions.

Note that we have a read()-based fallback if mmap is not available.
This strategy also makes the read-based fallback easier.

Thoughts?

Andy
-- 
http://wingolog.org/

next             reply	other threads:[~2013-04-24 20:23 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-24 20:23 Andy Wingo [this message]
2013-04-24 22:20 ` thinking out loud: wip-rtl, ELF, pages, and mmap dsmich
2013-04-28 12:45 ` Andy Wingo
2013-04-28 15:23 ` Ludovic Courtès
2013-04-29  5:47 ` Nala Ginrut

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874nevhe4a.fsf@pobox.com \
    --to=wingo@pobox.com \
    --cc=guile-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).