[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Proposal: block-based vector allocator
From: |
Stefan Monnier |
Subject: |
Re: Proposal: block-based vector allocator |
Date: |
Mon, 12 Dec 2011 11:24:13 -0500 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.0.92 (gnu/linux) |
>> Let us how it turns out.
> Results for the byte-compile benchmark, an average of 16 runs:
> CPU time spent in user mode, seconds
> ----< configuration >----< 32bit >----< 64bit >----
> default, stack mark 74.07 84.87
> default, GCPROs 72.90 81.37
> patched, stack mark 71.35 81.57
> patched, GCPROs 70.16 82.18
> Peak heap utilization, KBytes
> ----< configuration >----< 32bit >----< 64bit >----
> default, stack mark 41499 73651
> default, GCPROs 37918 65648
> patched, stack mark 38310 67169
> patched, GCPROs 38052 65730
> Total time spent in GC, seconds
> ----< configuration >----< 32bit >----< 64bit >----
> default, stack mark 23.58 32.32
> default, GCPROs 21.94 30.43
> patched, stack mark 21.64 29.89
> patched, GCPROs 21.13 29.22
> Average time per GC, milliseconds
> ----< configuration >----< 32bit >----< 64bit >----
> default, stack mark 27.62 36.03
> default, GCPROs 25.57 33.93
> patched, stack mark 25.22 33.34
> patched, GCPROs 24.63 32.57
Since most small differences are difficult to separate from noise (and
are not important anyway), the summary from where I stand is that
there's no substantial difference (CPU and memory wise) between your new
code (with or without GCPROs) and the current code with GCPROs, whereas
the current code with conservative stack scanning is a tiny bit slower
and uses a non-negligible amount of extra space in peak usage.
This confirms that the main issue is the mem_nodes.
It would also be interesting to see how your new code performs in terms
of fragmentation (i.e. average memory use for a long running interactive
session), but it's very difficult to measure and I doubt we'd see much
difference (other than the impact of mem_nodes of course).
> on 64-bit. In terms of CPU usage, results are more interesting: 1%
> worse for 64-bit case, but 3.8% better for 32-bit. The only
> explanation I have for this effect is that an arithmetic used in
> splitting/coalescing operations creates some pressure on the CPU in
> 64-bit mode, but 32-bit version of the same code may be implicitly
> executed in parallel by the 64-bit core.
64bit cores don't implicitly parallelize 32bit code to take advantage of
the 64bit datapath. So that can't be the explanation. And register
pressure is worse in the x86 architecture than in the
amd64 architecture.
> Due to this, I don't consider my 32-bit benchmark as fairly
> representative - it should be done on a real 32-bit core and not in
> 'compatibility mode' on 64-bit one.
You misunderstand what is the "compatibility mode" of amd64 processors.
Stefan
- Re: Proposal: block-based vector allocator, (continued)
- Re: Proposal: block-based vector allocator, Stefan Monnier, 2011/12/08
- Re: Proposal: block-based vector allocator, Dmitry Antipov, 2011/12/08
- Re: Proposal: block-based vector allocator, Stefan Monnier, 2011/12/08
- Re: Proposal: block-based vector allocator, Eli Zaretskii, 2011/12/09
- Re: Proposal: block-based vector allocator, Dmitry Antipov, 2011/12/09
- Re: Proposal: block-based vector allocator, Stefan Monnier, 2011/12/09
- Re: Proposal: block-based vector allocator, Dmitry Antipov, 2011/12/09
- Re: Proposal: block-based vector allocator, Stefan Monnier, 2011/12/09
- Re: Proposal: block-based vector allocator, Dmitry Antipov, 2011/12/11
- Re: Proposal: block-based vector allocator, Dmitry Antipov, 2011/12/11
- Re: Proposal: block-based vector allocator,
Stefan Monnier <=
- Re: Proposal: block-based vector allocator, Stephen J. Turnbull, 2011/12/08