unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* rtl metadata musings
@ 2013-05-10  5:07 Andy Wingo
  2013-05-11  4:48 ` Mark H Weaver
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Andy Wingo @ 2013-05-10  5:07 UTC (permalink / raw)
  To: guile-devel

Hi,

For many days I have been hemming and hawing about how to serialize
debugging information into the new toolchain.  Here's a braindump of a
new plan.

To recap, the new toolchain has the new RTL assembly embedded in ELF.
There are 6 things we need to put in the ELF file somehow:

  (1) Procedure names and bounds.
  (2) Docstrings.
  (3) Generic procedure metadata (for procedure-properties).
  (4) Arity information (see docs for program-arities).
  (5) Information about local variables for the debugger.
  (6) Line numbers.

None of these things are "on the main path" -- loading a module
shouldn't even page any of this information into memory.  But it is all
useful to have, and sometimes you need to be able to access it
efficiently if it is there.

All of this data should be strippable from the .go files (which I guess
we should rename to .so files).  This constraint means there should be
no link from the "main" data out to the "debugging" data -- only the
other way around.  Otherwise stripping debug data could corrupt your
main program.

So those are the design constraints.

For (1) we use the standard ELF .symtab / .strtab mechanism.

For the rest I had considered encoding it all into DWARF, but I think it
can make sense to leave DWARF to handle the things that it knows best
like (5) and (6) and to provide special support for (2), (3), and (4).
You should be able to strip these different pieces separately.

(2): For docstrings, my idea is to make a .guile.docstr section with
entries like this:

  struct guile_docstring
  {
    Elf_Addr pc;
    Elf_Off str;
  }

The "pc" is the rtl-program-code, and the "str" is an offset into the
linked (via the section's sh_link member) .guile.docstrtab section.
Searching for a docstring does a bisection over the .guile.docstr for a
(rtl-program-code prog) and then loads the string from the table.

(3): Of course it's possible for a procedure's "documentation" property
to not be a string, and procedures can have any number of other
properties:

  (lambda ()
    #((foo . qux)
      (bar . "hi")
      ...)
    10)

Procedures with extended metadata get an entry in .guile.procprops:

  struct guile_procprops
  {
    Elf_Addr pc;
    Elf_Addr data;
  }

Here "data" points to an "absolute" address of the property alist, which
is part of the .data section along with any other program literal data.
(The address is absolute relative to the ELF image; at runtime you have
to add the base address the image is loaded at.)

As you might know, literals like conses are statically allocated in the
ELF memory image, but if they contain links to non-immediates like
symbols or other conses, those links need to be patched up when the ELF
is loaded.  In this way, generic procedure metadata does contribute to
runtime cost, because it needs relocation.  But it's not that common,
not too much work, and you don't need a guile_procprops entry if you
don't have extended metadata.

(4) Arity information describes the arities of the various case-lambda
clauses that a function has.  This information is used when printing a
function, to show the formals, and also when compiling, to check
arities.  It would be cleaner to have the compiler emit separate
functions for the different clauses, but that's not what happens now.
Anyway the plan is for another section, .guile.arities:

  struct guile_arity {
    Elf_Addr pc;
    Elf_Off size;
    nreq; // encodings for these not determined yet
    nopt;
    flags; // has-keyword-args, has-rest, is-case-lambda
    Elf_Offset offset;
  }

An entry describes how many required, optional, keyword, and rest
arguments a function has.  The .guile.arities section is prefixed by a
length indicating how many entries there are, then all the arity
structures, sorted by pc.  Note that one arity may contain another!  In
particular for case-lambda clauses you can have one arity for the whole
function, then a number of other ones for the cases.

After the arities, you have a block of offsets to another string table
to give the names and to give more information on keywords.  So all in
all it looks like this:

  Elf_Off n_arity_entries;
  struct guile_arity foo_arity = { PC, SIZE, 1, 2, 0, OFFSET }
  ...
OFFSET:
  X -> offset into associated .guile.arities_strtab for first req. arg
  Y -> offset into associated .guile.arities_strtab for first opt. arg
  Y -> offset into associated .guile.arities_strtab for second opt. arg
  offsets for next function...

Like metadata, keyword arguments would have an absolute address to the
.data section to link to the keywords literal associated with this
clause.

In this way we can share storage for formal parameters, have easy access
to arities without too much searching or consing, and also be able to
strip the arities section if needed without affecting anything else.

(5) and (6): Local variable information and line numbers can go into
.debug_info / .debug_lines / .debug_str as usual with DWARF.  DWARF does
well for this.  Not sure if I want to try to encode arity information
into DWARF; at least in the beginning it won't be necessary, so I'll
avoid it.

OK this thought was burning my neuron this morning and I wanted to get
it out.  I'll start working on it shortly.

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: rtl metadata musings
  2013-05-10  5:07 rtl metadata musings Andy Wingo
@ 2013-05-11  4:48 ` Mark H Weaver
  2013-05-12 21:20   ` Andy Wingo
  2013-05-16 21:42 ` Andy Wingo
  2013-05-19 21:52 ` Ludovic Courtès
  2 siblings, 1 reply; 11+ messages in thread
From: Mark H Weaver @ 2013-05-11  4:48 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

Hi Andy,

This all sounds great! :)

I have only one comment for now, which regards arity information.  The
required/optional/rest representation is not sufficiently general.  Not
only is it unable to handle empty case-lambdas, but it's also unable to
properly represent a case-lambda that can accept 1 or 3 arguments, but
not 2.

One possibility would for each procedure to have a (possibly empty) list
of supported arities, where each arity corresponds to a case-lambda
clause.

What do you think?

      Mark



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: rtl metadata musings
  2013-05-11  4:48 ` Mark H Weaver
@ 2013-05-12 21:20   ` Andy Wingo
  0 siblings, 0 replies; 11+ messages in thread
From: Andy Wingo @ 2013-05-12 21:20 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-devel

On Sat 11 May 2013 06:48, Mark H Weaver <mhw@netris.org> writes:

> I have only one comment for now, which regards arity information.  The
> required/optional/rest representation is not sufficiently general.  Not
> only is it unable to handle empty case-lambdas, but it's also unable to
> properly represent a case-lambda that can accept 1 or 3 arguments, but
> not 2.

I did think about this and I think I just didn't express myself well.  I
said:

      struct guile_arity {
        Elf_Addr pc;
        Elf_Off size;
        nreq; // encodings for these not determined yet
        nopt;
        flags; // has-keyword-args, has-rest, is-case-lambda
        Elf_Offset offset;
      }

    An entry describes how many required, optional, keyword, and rest
    arguments a function has.  The .guile.arities section is prefixed by a
    length indicating how many entries there are, then all the arity
    structures, sorted by pc.  Note that one arity may contain another!  In
    particular for case-lambda clauses you can have one arity for the whole
    function, then a number of other ones for the cases.

The case-lambda as a whole would get the is-case-lambda flag.  The
arities of the clauses would follow and have their [pc*, pc*+size*]
within the [pc, pc+size] of the case-lambda entry.

I don't know whether to use the nreq/nopt/flags of the "outer" arity for
any purpose or not.

Cheers,

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: rtl metadata musings
  2013-05-10  5:07 rtl metadata musings Andy Wingo
  2013-05-11  4:48 ` Mark H Weaver
@ 2013-05-16 21:42 ` Andy Wingo
  2013-05-19 21:52 ` Ludovic Courtès
  2 siblings, 0 replies; 11+ messages in thread
From: Andy Wingo @ 2013-05-16 21:42 UTC (permalink / raw)
  To: guile-devel

Hi,

On Fri 10 May 2013 07:07, Andy Wingo <wingo@pobox.com> writes:

> To recap, the new toolchain has the new RTL assembly embedded in ELF.
> There are 6 things we need to put in the ELF file somehow:
>
>   (1) Procedure names and bounds.
>   (2) Docstrings.
>   (3) Generic procedure metadata (for procedure-properties).
>   (4) Arity information (see docs for program-arities).
>   (5) Information about local variables for the debugger.
>   (6) Line numbers.

A status update: (1), (2), and (4) are done.  The arities code doesn't
use quite the same runtime interface as 2.0; not sure what can be done
there.  I'll take a look later.  I'll do (3) soon.  (5) and (6) require
a bit of time and mental clarity; maybe in a few weeks.

Cheers,

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: rtl metadata musings
  2013-05-10  5:07 rtl metadata musings Andy Wingo
  2013-05-11  4:48 ` Mark H Weaver
  2013-05-16 21:42 ` Andy Wingo
@ 2013-05-19 21:52 ` Ludovic Courtès
  2013-05-20 14:23   ` Andy Wingo
  2 siblings, 1 reply; 11+ messages in thread
From: Ludovic Courtès @ 2013-05-19 21:52 UTC (permalink / raw)
  To: guile-devel

Hi!

Andy Wingo <wingo@pobox.com> skribis:

> To recap, the new toolchain has the new RTL assembly embedded in ELF.
> There are 6 things we need to put in the ELF file somehow:
>
>   (1) Procedure names and bounds.
>   (2) Docstrings.
>   (3) Generic procedure metadata (for procedure-properties).
>   (4) Arity information (see docs for program-arities).
>   (5) Information about local variables for the debugger.
>   (6) Line numbers.

This sounds great!

I guess literal strings would go out as per ‘SCM_IMMUTABLE_STRING’
(which needs relocation), right?

Perhaps the .guile.docstr section could eventually be used to contain
stexi, but that seems to already fit into the plan anyway.

Ludo’.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: rtl metadata musings
  2013-05-19 21:52 ` Ludovic Courtès
@ 2013-05-20 14:23   ` Andy Wingo
  2013-05-20 16:37     ` Ludovic Courtès
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Wingo @ 2013-05-20 14:23 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

Hi,

On Sun 19 May 2013 23:52, ludo@gnu.org (Ludovic Courtès) writes:

> I guess literal strings would go out as per ‘SCM_IMMUTABLE_STRING’
> (which needs relocation), right?

Yep.  Right now the stringbuf goes into read-only memory, but the string
itself goes in writable memory as it needs its link to the stringbuf
fixed up (relocated) at runtime.

> Perhaps the .guile.docstr section could eventually be used to contain
> stexi, but that seems to already fit into the plan anyway.

That can happen already, but I think if we do texinfo we should
serialize the string as texinfo -- that way no relocs are needed if
docstrings aren't used, because if we use the .docstr string table, it's
just an offset into the image of a NUL-terminated UTF-8 byte sequence.
(I suppose we should be careful about embedded NUL characters; perhaps
we should use some other format for the string tables.)

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: rtl metadata musings
  2013-05-20 14:23   ` Andy Wingo
@ 2013-05-20 16:37     ` Ludovic Courtès
  2013-05-20 16:48       ` Mike Gran
  2013-05-20 18:29       ` Andy Wingo
  0 siblings, 2 replies; 11+ messages in thread
From: Ludovic Courtès @ 2013-05-20 16:37 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

Andy Wingo <wingo@pobox.com> skribis:

> On Sun 19 May 2013 23:52, ludo@gnu.org (Ludovic Courtès) writes:
>
>> I guess literal strings would go out as per ‘SCM_IMMUTABLE_STRING’
>> (which needs relocation), right?
>
> Yep.  Right now the stringbuf goes into read-only memory, but the string
> itself goes in writable memory as it needs its link to the stringbuf
> fixed up (relocated) at runtime.

OK.  It could be in a PT_GNU_RELRO segment, which the loader (well, the
other one, from glibc ;-)) remaps read-only after relocation.

[A moment of enlightenment when one realizes what it means to have our
own ELF toolchain.  :-)]

>> Perhaps the .guile.docstr section could eventually be used to contain
>> stexi, but that seems to already fit into the plan anyway.
>
> That can happen already, but I think if we do texinfo we should
> serialize the string as texinfo -- that way no relocs are needed if
> docstrings aren't used, because if we use the .docstr string table, it's
> just an offset into the image of a NUL-terminated UTF-8 byte sequence.

Right, makes sense.

> (I suppose we should be careful about embedded NUL characters; perhaps
> we should use some other format for the string tables.)

NULs in string contents should not be a problem, as long as there’s
info somewhere about the string length, no?

UTF-8-encoded ELF symbols may be more of a problem.  How could NULs in
symbols be handled?

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: rtl metadata musings
  2013-05-20 16:37     ` Ludovic Courtès
@ 2013-05-20 16:48       ` Mike Gran
  2013-05-20 18:29       ` Andy Wingo
  1 sibling, 0 replies; 11+ messages in thread
From: Mike Gran @ 2013-05-20 16:48 UTC (permalink / raw)
  To: Ludovic Courtès, Andy Wingo; +Cc: guile-devel@gnu.org



> UTF-8-encoded ELF symbols may be more of a problem.  How could NULs in
> symbols be handled?

You could re-map NUL to one of the PUA characters, perhaps.  It seems unlikely
that ELF symbols should ever contain private use characters.

-Mike



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: rtl metadata musings
  2013-05-20 16:37     ` Ludovic Courtès
  2013-05-20 16:48       ` Mike Gran
@ 2013-05-20 18:29       ` Andy Wingo
  2013-05-20 19:28         ` Ludovic Courtès
  1 sibling, 1 reply; 11+ messages in thread
From: Andy Wingo @ 2013-05-20 18:29 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

On Mon 20 May 2013 18:37, ludo@gnu.org (Ludovic Courtès) writes:

> Andy Wingo <wingo@pobox.com> skribis:
>
>> On Sun 19 May 2013 23:52, ludo@gnu.org (Ludovic Courtès) writes:
>>
>>> I guess literal strings would go out as per ‘SCM_IMMUTABLE_STRING’
>>> (which needs relocation), right?
>>
>> Yep.  Right now the stringbuf goes into read-only memory, but the string
>> itself goes in writable memory as it needs its link to the stringbuf
>> fixed up (relocated) at runtime.
>
> OK.  It could be in a PT_GNU_RELRO segment, which the loader (well, the
> other one, from glibc ;-)) remaps read-only after relocation.
>
> [A moment of enlightenment when one realizes what it means to have our
> own ELF toolchain.  :-)]

Right :) We don't need to rely on the loader, and in fact should not in
general do so.  Some "relocations" are actually more complicated than
what glibc does; for example, for symbols or keywords.

>> (I suppose we should be careful about embedded NUL characters; perhaps
>> we should use some other format for the string tables.)
>
> NULs in string contents should not be a problem, as long as there’s
> info somewhere about the string length, no?

There isn't -- not in ELF string tables.  They're NUL-terminated.

> UTF-8-encoded ELF symbols may be more of a problem.  How could NULs in
> symbols be handled?

Well we can just use some other data structure that's not a standard ELF
string table; since we have the linker and loader and we are defining
custom sections (.guile.docstrs for example) we can do what we like.

A
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: rtl metadata musings
  2013-05-20 18:29       ` Andy Wingo
@ 2013-05-20 19:28         ` Ludovic Courtès
  2013-05-20 20:24           ` Andy Wingo
  0 siblings, 1 reply; 11+ messages in thread
From: Ludovic Courtès @ 2013-05-20 19:28 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

Andy Wingo <wingo@pobox.com> skribis:

> On Mon 20 May 2013 18:37, ludo@gnu.org (Ludovic Courtès) writes:
>
>> Andy Wingo <wingo@pobox.com> skribis:
>>
>>> On Sun 19 May 2013 23:52, ludo@gnu.org (Ludovic Courtès) writes:
>>>
>>>> I guess literal strings would go out as per ‘SCM_IMMUTABLE_STRING’
>>>> (which needs relocation), right?
>>>
>>> Yep.  Right now the stringbuf goes into read-only memory, but the string
>>> itself goes in writable memory as it needs its link to the stringbuf
>>> fixed up (relocated) at runtime.
>>
>> OK.  It could be in a PT_GNU_RELRO segment, which the loader (well, the
>> other one, from glibc ;-)) remaps read-only after relocation.
>>
>> [A moment of enlightenment when one realizes what it means to have our
>> own ELF toolchain.  :-)]
>
> Right :) We don't need to rely on the loader, and in fact should not in
> general do so.  Some "relocations" are actually more complicated than
> what glibc does; for example, for symbols or keywords.

Yes.  I meant, there are things Guile’s loader could remap read-only
once the relocations are done, as glibc’s loader does for PT_GNU_RELRO.

>>> (I suppose we should be careful about embedded NUL characters; perhaps
>>> we should use some other format for the string tables.)
>>
>> NULs in string contents should not be a problem, as long as there’s
>> info somewhere about the string length, no?
>
> There isn't -- not in ELF string tables.  They're NUL-terminated.
>
>> UTF-8-encoded ELF symbols may be more of a problem.  How could NULs in
>> symbols be handled?
>
> Well we can just use some other data structure that's not a standard ELF
> string table; since we have the linker and loader and we are defining
> custom sections (.guile.docstrs for example) we can do what we like.

OK.

Ludo’.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: rtl metadata musings
  2013-05-20 19:28         ` Ludovic Courtès
@ 2013-05-20 20:24           ` Andy Wingo
  0 siblings, 0 replies; 11+ messages in thread
From: Andy Wingo @ 2013-05-20 20:24 UTC (permalink / raw)
  To: guile-devel

On Mon 20 May 2013 21:28, ludo@gnu.org (Ludovic Courtès) writes:

> I meant, there are things Guile’s loader could remap read-only once
> the relocations are done, as glibc’s loader does for PT_GNU_RELRO.

Yes, the ELF toolchain on GNU systems has a lot of stuff to teach us!

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-05-20 20:24 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-10  5:07 rtl metadata musings Andy Wingo
2013-05-11  4:48 ` Mark H Weaver
2013-05-12 21:20   ` Andy Wingo
2013-05-16 21:42 ` Andy Wingo
2013-05-19 21:52 ` Ludovic Courtès
2013-05-20 14:23   ` Andy Wingo
2013-05-20 16:37     ` Ludovic Courtès
2013-05-20 16:48       ` Mike Gran
2013-05-20 18:29       ` Andy Wingo
2013-05-20 19:28         ` Ludovic Courtès
2013-05-20 20:24           ` Andy Wingo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).