guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: The size of ‘.go’ files


From: Andy Wingo
Subject: Re: The size of ‘.go’ files
Date: Mon, 08 Jun 2020 10:07:56 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)

Hi :)

A few points of information :)

On Fri 05 Jun 2020 22:50, Ludovic Courtès <ludo@gnu.org> writes:

> [Sorting] the ELF sections of a .go file by size; for ‘python-xyz.go’,
> I get this:
>
> $13 = ((".rtl-text" . 3417108)
>  (".guile.arities" . 1358536)
>  (".data" . 586912)
>  (".rodata" . 361599)
>  (".symtab" . 117000)
>  (".debug_line" . 97342)
>  (".debug_info" . 54519)
>  (".guile.frame-maps" . 47114)
>  ("" . 1344)
>  (".guile.arities.strtab" . 681)
>  ("" . 232)
>  (".shstrtab" . 229)
>  (".dynamic" . 112)
>  (".debug_str" . 87)
>  (".strtab" . 75)
>  (".debug_abbrev" . 65)
>  (".guile.docstrs.strtab" . 1)
>  ("" . 0)
>  (".guile.procprops" . 0)
>  (".guile.docstrs" . 0)
>  (".debug_loc" . 0))
>
> More than half of those 6 MiB is code, and more than 1 MiB is
> “.guile.arities” (info "(guile) Object File Format"), which is
> surprisingly large; presumably the file only contains thunks (the
> ‘thunked’ fields of <package>).

The guile.arities section starts with a sorted array of fixed-size
headers, then is followed by a sequence of ULEB128 references to local
variable names, including non-arguments.  The size is a bit perplexing,
I agree.  I can think of a number of ways to encode that section
differently but we'd need to understand a bit more about it and why the
baseline compiler is significantly different.

> Stripping the .debug_* sections (if that works) clearly wouldn’t help.

I believe that it should eventually be possible to strip guile.arities,
fwiw.

> So I guess we could generate less code (reduce ‘.rtl-text’), perhaps by
> tweaking ‘define-record-type*’, but I have little hope there.

Hehe :)  As you mention later:

> With 3.0.3-to-be and -O1, python-xyz.go weighs in at 3.4 MiB instead of
> 5.9 MiB!  Here’s the section size distribution:
>
> $4 = ((".rtl-text" . 2101168)
>  (".data" . 586392)
>  (".rodata" . 360703)
>  (".guile.arities" . 193106)
>  (".symtab" . 117000)
>  (".debug_line" . 76685)
>  (".debug_info" . 53513)
>  ("" . 1280)
>  (".guile.arities.strtab" . 517)
>  ("" . 232)
>  (".shstrtab" . 211)
>  (".dynamic" . 96)
>  (".debug_str" . 87)
>  (".strtab" . 75)
>  (".debug_abbrev" . 56)
>  (".guile.docstrs.strtab" . 1)
>  ("" . 0)
>  (".guile.procprops" . 0)
>  (".guile.docstrs" . 0)
>  (".debug_loc" . 0))
> scheme@(guile-user)> (stat:size (stat go))
> $5 = 3519323
>
> “.rtl-text” is 38% smaller and “.guile.arities” is almost a tenth of
> what it was.

The difference in the text are the new baseline intrinsics,
e.g. $vector-ref.  It goes in the opposite direction from instruction
explosion, which sought to (1) make the JIT compiler easier by
decomposing compound operations into their atomic parts, (2) make the
optimizer learn more information from flow rather than type-checking
side effects, and (3) allow the optimizer to eliminate / hoist / move
the component pieces of macro-operations.

However in the baseline compiler (2) and (3) aren't possible because
there is no optimizer on that level, and therefore the result is
actually a lose -- 10 micro-ops cost more than 1 macro-op because of
stack traffic overhead, which isn't currently mitigated by the JIT (1).

So instruction explosion is residual code explosion, which should pay
off in theory, but not for the baseline compiler.  So I added new
intrinsics for e.g. $vector-ref et al.  Thus the smaller code size.

I am not sure what causes the significantly different .guile.arities
size!

> Something’s going on here!  Thoughts?

There are more possibilities for making code size smaller, e.g. having
two equivalent encodings for bytecode, where one is smaller:

  https://webkit.org/blog/9329/a-new-bytecode-format-for-javascriptcore/

Or it could be that if we could do register allocation for a
target-dependent fixed set of registers in bytecode already, that could
decrease minimum instruction size, making more instructions fit into
single 32-bit words.  Would be nice if the JIT could rely on the
bytecode compiler to already have done register allocation, and reify
corresponding debug information.  Just a thought though, and not really
appropriate to the baseline compiler.

Cheers,

Andy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]