unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* The case for more macro-instructions?
@ 2024-04-17 13:26 Ludovic Courtès
  2024-04-17 20:34 ` Maxime Devos
  0 siblings, 1 reply; 2+ messages in thread
From: Ludovic Courtès @ 2024-04-17 13:26 UTC (permalink / raw)
  To: guile-devel, Andy Wingo

Hi Andy and all,

Looking at the disassembly of -O1 code in a quest for more concise
bytecode¹ (a quest that’s not necessarily always relevant but probably
is at -O1), I noticed a few things:

  1. Code for free variable lookup, emitted by
     ‘emit-cached-toplevel-box’, is too large (~7 instructions per
     variable) for little in return.

  2. The ‘.data’ section is surprisingly large: for each symbol in the
     source, we end up in that section with a string, a stringbuf
     (pointing to contents in the ‘.rodata’ section), and a symbol.
     More on that below.

  3. ‘*lcm-page-size*’ is set to 64 KiB for the purposes of reducing the
     number of .go variants needed under prebuilt/.

     Should we default to sysconf(_SC_PAGESIZE) and use that common
     denominator only when building .go files under prebuilt/ (this
     requires adding a compiler flag to choose a different alignment)?

     (In the meantime, I changed the linker to create sparse files in
     commit 112b617f5921c67b4b2c45aae39f54cccd34d7ef.)

Regarding ‘.data’, look:

--8<---------------cut here---------------start------------->8---
$ echo sym > /tmp/t.scm
$ ./meta/uninstalled-env guild compile /tmp/t.scm -o /tmp/t.go
wrote `/tmp/t.go'
$ readelf -a /tmp/t.go |grep -A10 "^Section Headers"
readelf: Warning: [ 5]: Link field (0) should index a string section.
readelf: Warning: local symbol 0 found at index >= .symtab's sh_info value of 0
readelf: Warning: local symbol 1 found at index >= .symtab's sh_info value of 0
Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .guile.procprops  PROGBITS         0000000000000000  00010588
       0000000000000000  0000000000000000           0     0     8
  [ 2] .rodata           PROGBITS         0000000000000158  00000158
       0000000000000014  0000000000000000   A       0     0     8
  [ 3] .data             PROGBITS         0000000000010000  00010000
       0000000000000058  0000000000000000  WA       0     0     8
--8<---------------cut here---------------end--------------->8---

That’s 88 bytes for ‘.data’.

If ‘lookup-bound’ would take a string (instead of a symbol) like
‘lookup-bound-private’ and ‘lookup-bound-public’ do, we’d save
relocations and space.

Perhaps these instructions (or rather variants thereof) could even take
a raw UTF-8 buffer instead?

As for #1, I’m not sure what the best option is.  I initially thought
about adding a new macro-instruction, but then we’d lose on cache-hit
path, which is not good.

Thoughts?

Ludo’.

¹ https://issues.guix.gnu.org/70398



^ permalink raw reply	[flat|nested] 2+ messages in thread

* RE: The case for more macro-instructions?
  2024-04-17 13:26 The case for more macro-instructions? Ludovic Courtès
@ 2024-04-17 20:34 ` Maxime Devos
  0 siblings, 0 replies; 2+ messages in thread
From: Maxime Devos @ 2024-04-17 20:34 UTC (permalink / raw)
  To: Ludovic Courtès, guile-devel@gnu.org, Andy Wingo

[-- Attachment #1: Type: text/plain, Size: 4235 bytes --]

>Looking at the disassembly of -O1 code in a quest for more concise
>bytecode¹ (a quest that’s not necessarily always relevant but probably
I>s at -O1), I noticed a few things:

>  1. Code for free variable lookup, emitted by
>    ‘emit-cached-toplevel-box’, is too large (~7 instructions per
>    variable) for little in return.
> [...]
> As for #1, I’m not sure what the best option is.  I initially thought
> about adding a new macro-instruction, but then we’d lose on cache-hit
> path, which is not good.

Is this (Guile) Scheme indirection useful? IIRC/IIUC, ELF doesn’t need much special instructions and instead has what it calls ‘relocations’, which as I understand it has fairly minimal overhead and as such I wouldn’t expect it to benefit much from caching(*). Perhaps something akin to relocations in ELF could be both performant and compact.

(*) besides the lazy relocation when not doing early binding (not sure if I got the right terminology, has been a while)

>  2. The ‘.data’ section is surprisingly large: for each symbol in the
>    source, we end up in that section with a string, a stringbuf
>     (pointing to contents in the ‘.rodata’ section), and a symbol.
>     More on that below.

I have heard that ELF is quite flexible. Perhaps it would be possible to let ‘stringbuf’ (I’m not familiar with that word) point to the string in the symbol table (where “string” = “insert-procedure-name-here” and ‘symbol table’ = ELF’s mapping from strings to procedures/variable values), eliminating the duplicate that’s (IIUC) currently in .rodata?

  3. ‘*lcm-page-size*’ is set to 64 KiB for the purposes of reducing the
     number of .go variants needed under prebuilt/.

     Should we default to sysconf(_SC_PAGESIZE) and use that common
     denominator only when building .go files under prebuilt/ (this
     requires adding a compiler flag to choose a different alignment)?

On using _SC_PAGESIZE: that would be non-deterministic IIUC. Some architectures support multiple page sizes and as such the page size can depend on kernel configuration (I don’t know if sysconf(__SC_PAGESIZE) reports the current page size or a common divisor of 
possible page sizes).  (I recall reading something like that on lwn.net somewhere, but I don’t know is sysconf(__SC_PAGESIZE) itself is non-deterministic(*).)

(*) in the reproducible builds sense

Given that ‘--target’ exists, no compiler flag for choosing a different alignment is necessary. It could perhaps useful, but I don’t see a necessity (supposedly larger page sizes can be more performant, at least if all of it is actually utilized, which doesn’t appear to be the case here.)

     (In the meantime, I changed the linker to create sparse files in
     commit 112b617f5921c67b4b2c45aae39f54cccd34d7ef.)

For reproducibility of produced tar files, if it hasn’t been done already, I recommend adding whatever’s the tar option for sparsifying files (and also recording them sparsified) (or, alternatively, for recording them non-sparse, compression can easily take care of the many zeroes)

Also, a fourth option: many .go/modules come in groups – if you use one module from the group, then you (possibly indirectly) likely use most of the others in the group as well. As such, it may be worthwhile to stuff multiply modules in a single .go. I imagine that would cut down on some duplication with strings (and also perhaps give the optimiser more opportunities with deduplication and inlining?).  Perhaps it would be worthwhile to stuff all the web stuff together in a group, the compiler stuff (minus esoteric things like brainfuck) together, ...

Doesn’t even need any compiler changes if you are willing to do things a little manually, just compile
(begin (include “module0.scm”) (include “module1.scm”) ...)
to “module0.go” and let “module1.go”, “moddule2.go”, ... be a symlink to “module0.go”

Some care required for targets not supporting symlinks, but making fake symlinks as regular files recognised by module loading code (or on lower level, whatever) should be straightforward.

Best regards,
Maxime Devos.

[-- Attachment #2: Type: text/html, Size: 8300 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-04-17 20:34 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-17 13:26 The case for more macro-instructions? Ludovic Courtès
2024-04-17 20:34 ` Maxime Devos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).