unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Andy Wingo <wingo@igalia.com>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: guix-devel <guix-devel@gnu.org>, Guile Devel <guile-devel@gnu.org>
Subject: Re: The size of ‘.go’ files
Date: Mon, 08 Jun 2020 10:07:56 +0200	[thread overview]
Message-ID: <877dwirndv.fsf@igalia.com> (raw)
In-Reply-To: <875zc5z18d.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Fri, 05 Jun 2020 22:50:10 +0200")

Hi :)

A few points of information :)

On Fri 05 Jun 2020 22:50, Ludovic Courtès <ludo@gnu.org> writes:

> [Sorting] the ELF sections of a .go file by size; for ‘python-xyz.go’,
> I get this:
>
> $13 = ((".rtl-text" . 3417108)
>  (".guile.arities" . 1358536)
>  (".data" . 586912)
>  (".rodata" . 361599)
>  (".symtab" . 117000)
>  (".debug_line" . 97342)
>  (".debug_info" . 54519)
>  (".guile.frame-maps" . 47114)
>  ("" . 1344)
>  (".guile.arities.strtab" . 681)
>  ("" . 232)
>  (".shstrtab" . 229)
>  (".dynamic" . 112)
>  (".debug_str" . 87)
>  (".strtab" . 75)
>  (".debug_abbrev" . 65)
>  (".guile.docstrs.strtab" . 1)
>  ("" . 0)
>  (".guile.procprops" . 0)
>  (".guile.docstrs" . 0)
>  (".debug_loc" . 0))
>
> More than half of those 6 MiB is code, and more than 1 MiB is
> “.guile.arities” (info "(guile) Object File Format"), which is
> surprisingly large; presumably the file only contains thunks (the
> ‘thunked’ fields of <package>).

The guile.arities section starts with a sorted array of fixed-size
headers, then is followed by a sequence of ULEB128 references to local
variable names, including non-arguments.  The size is a bit perplexing,
I agree.  I can think of a number of ways to encode that section
differently but we'd need to understand a bit more about it and why the
baseline compiler is significantly different.

> Stripping the .debug_* sections (if that works) clearly wouldn’t help.

I believe that it should eventually be possible to strip guile.arities,
fwiw.

> So I guess we could generate less code (reduce ‘.rtl-text’), perhaps by
> tweaking ‘define-record-type*’, but I have little hope there.

Hehe :)  As you mention later:

> With 3.0.3-to-be and -O1, python-xyz.go weighs in at 3.4 MiB instead of
> 5.9 MiB!  Here’s the section size distribution:
>
> $4 = ((".rtl-text" . 2101168)
>  (".data" . 586392)
>  (".rodata" . 360703)
>  (".guile.arities" . 193106)
>  (".symtab" . 117000)
>  (".debug_line" . 76685)
>  (".debug_info" . 53513)
>  ("" . 1280)
>  (".guile.arities.strtab" . 517)
>  ("" . 232)
>  (".shstrtab" . 211)
>  (".dynamic" . 96)
>  (".debug_str" . 87)
>  (".strtab" . 75)
>  (".debug_abbrev" . 56)
>  (".guile.docstrs.strtab" . 1)
>  ("" . 0)
>  (".guile.procprops" . 0)
>  (".guile.docstrs" . 0)
>  (".debug_loc" . 0))
> scheme@(guile-user)> (stat:size (stat go))
> $5 = 3519323
>
> “.rtl-text” is 38% smaller and “.guile.arities” is almost a tenth of
> what it was.

The difference in the text are the new baseline intrinsics,
e.g. $vector-ref.  It goes in the opposite direction from instruction
explosion, which sought to (1) make the JIT compiler easier by
decomposing compound operations into their atomic parts, (2) make the
optimizer learn more information from flow rather than type-checking
side effects, and (3) allow the optimizer to eliminate / hoist / move
the component pieces of macro-operations.

However in the baseline compiler (2) and (3) aren't possible because
there is no optimizer on that level, and therefore the result is
actually a lose -- 10 micro-ops cost more than 1 macro-op because of
stack traffic overhead, which isn't currently mitigated by the JIT (1).

So instruction explosion is residual code explosion, which should pay
off in theory, but not for the baseline compiler.  So I added new
intrinsics for e.g. $vector-ref et al.  Thus the smaller code size.

I am not sure what causes the significantly different .guile.arities
size!

> Something’s going on here!  Thoughts?

There are more possibilities for making code size smaller, e.g. having
two equivalent encodings for bytecode, where one is smaller:

  https://webkit.org/blog/9329/a-new-bytecode-format-for-javascriptcore/

Or it could be that if we could do register allocation for a
target-dependent fixed set of registers in bytecode already, that could
decrease minimum instruction size, making more instructions fit into
single 32-bit words.  Would be nice if the JIT could rely on the
bytecode compiler to already have done register allocation, and reify
corresponding debug information.  Just a thought though, and not really
appropriate to the baseline compiler.

Cheers,

Andy


  parent reply	other threads:[~2020-06-08  8:09 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-05 20:50 The size of ‘.go’ files Ludovic Courtès
2020-06-06  8:20 ` Mathieu Othacehe
2020-06-06 19:21   ` Katherine Cox-Buday
2020-06-07  9:07     ` Pierre Neidhardt
2020-06-08  8:07 ` Andy Wingo [this message]
2020-06-09 16:09   ` Ludovic Courtès
2020-06-24 12:11     ` Andy Wingo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877dwirndv.fsf@igalia.com \
    --to=wingo@igalia.com \
    --cc=guile-devel@gnu.org \
    --cc=guix-devel@gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).