qemu-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-commits] [qemu/qemu] 75e8b9: tcg: Merge opcode arguments into TCGO


From: GitHub
Subject: [Qemu-commits] [qemu/qemu] 75e8b9: tcg: Merge opcode arguments into TCGOp
Date: Wed, 25 Oct 2017 12:02:57 -0700

  Branch: refs/heads/master
  Home:   https://github.com/qemu/qemu
  Commit: 75e8b9b7aa0b95a761b9add7e2f09248b101a392
      
https://github.com/qemu/qemu/commit/75e8b9b7aa0b95a761b9add7e2f09248b101a392
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M tcg/optimize.c
    M tcg/tcg-op.c
    M tcg/tcg.c
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: Merge opcode arguments into TCGOp

Rather than have a separate buffer of 10*max_ops entries,
give each opcode 10 entries.  The result is actually a bit
smaller and should have slightly more cache locality.

Reviewed-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: acd937019bdaf933fcf1a7b57679ba07119c89b7
      
https://github.com/qemu/qemu/commit/acd937019bdaf933fcf1a7b57679ba07119c89b7
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M tcg/optimize.c

  Log Message:
  -----------
  tcg: Propagate args to op->args in optimizer

Reviewed-by: Emilio G. Cota <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: efee3746fa471852daba7674b0d34f8c88be7559
      
https://github.com/qemu/qemu/commit/efee3746fa471852daba7674b0d34f8c88be7559
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M tcg/tcg.c

  Log Message:
  -----------
  tcg: Propagate args to op->args in tcg.c

Reviewed-by: Emilio G. Cota <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: dd186292017641d5b31fc13225a420677e1d20d3
      
https://github.com/qemu/qemu/commit/dd186292017641d5b31fc13225a420677e1d20d3
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M tcg/tcg.c

  Log Message:
  -----------
  tcg: Propagate TCGOp down to allocators

Reviewed-by: Emilio G. Cota <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 434391390ba99996af1591b427a73b3f5c05065e
      
https://github.com/qemu/qemu/commit/434391390ba99996af1591b427a73b3f5c05065e
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M tcg/optimize.c
    M tcg/tcg.c
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: Introduce arg_temp

Reviewed-by: Emilio G. Cota <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: fa477d25470187030614288d35bc734edffa41ee
      
https://github.com/qemu/qemu/commit/fa477d25470187030614288d35bc734edffa41ee
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M tcg/optimize.c
    M tcg/tcg.c
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: Add temp_global bit to TCGTemp

This avoids needing to test the index of a temp against nb_globals.

Reviewed-by: Philippe Mathieu-Daudé <address@hidden>
Reviewed-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: c6c7d84df8889b9d6298466999b88a8a42e5f976
      
https://github.com/qemu/qemu/commit/c6c7d84df8889b9d6298466999b88a8a42e5f976
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: Return NULL temp for TCG_CALL_DUMMY_ARG

Reviewed-by: Philippe Mathieu-Daudé <address@hidden>
Reviewed-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 1807f4c40098070008eb84b2032e25b7ac42569e
      
https://github.com/qemu/qemu/commit/1807f4c40098070008eb84b2032e25b7ac42569e
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M tcg/tcg.c
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: Introduce temp_arg, export temp_idx

At the same time, drop the TCGContext argument and use tcg_ctx instead.

Reviewed-by: Philippe Mathieu-Daudé <address@hidden>
Reviewed-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: b83eabeac06e38706738bd5e92b1ba117a1b554d
      
https://github.com/qemu/qemu/commit/b83eabeac06e38706738bd5e92b1ba117a1b554d
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M tcg/tcg.c
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: Use per-temp state data in liveness

This avoids having to allocate external memory for each temporary.

Reviewed-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: ac3b88911ebc6fc841f28898ee8aed40839debe2
      
https://github.com/qemu/qemu/commit/ac3b88911ebc6fc841f28898ee8aed40839debe2
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M tcg/tcg.c

  Log Message:
  -----------
  tcg: Avoid loops against variable bounds

Copy s->nb_globals or s->nb_temps to a local variable for the purposes
of iteration.  This should allow the compiler to use low-overhead
looping constructs on some hosts.

Reviewed-by: Philippe Mathieu-Daudé <address@hidden>
Reviewed-by: Emilio G. Cota <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 2272e4a791b7e1a01ffac143616ba4ece9a5762d
      
https://github.com/qemu/qemu/commit/2272e4a791b7e1a01ffac143616ba4ece9a5762d
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M tcg/tcg.c

  Log Message:
  -----------
  tcg: Change temp_allocate_frame arg to TCGTemp

Reviewed-by: Philippe Mathieu-Daudé <address@hidden>
Reviewed-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 54534d7cfd3bdff1aa1f6c9472d94243d2303656
      
https://github.com/qemu/qemu/commit/54534d7cfd3bdff1aa1f6c9472d94243d2303656
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: Remove unused TCG_CALL_DUMMY_TCGV

Reviewed-by: Emilio G. Cota <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 6349039d0b06eda59820629b934944246b14a1c1
      
https://github.com/qemu/qemu/commit/6349039d0b06eda59820629b934944246b14a1c1
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M tcg/optimize.c
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: Use per-temp state data in optimize

While we're touching many of the lines anyway, adjust the naming
of the functions to better distinguish when "TCGArg" vs "TCGTemp"
should be used.

Reviewed-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: b7e8b17a77b94c33e1554fd5e1c1812ce05724be
      
https://github.com/qemu/qemu/commit/b7e8b17a77b94c33e1554fd5e1c1812ce05724be
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M tcg/tcg-op.c
    M tcg/tcg-op.h

  Log Message:
  -----------
  tcg: Push tcg_ctx into generator functions

Reviewed-by: Philippe Mathieu-Daudé <address@hidden>
Reviewed-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 960c50e07746048a5c74f4dd29bb04763fc80eba
      
https://github.com/qemu/qemu/commit/960c50e07746048a5c74f4dd29bb04763fc80eba
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M include/exec/helper-gen.h
    M tcg/tcg.c
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: Push tcg_ctx into tcg_gen_callN

Reviewed-by: Philippe Mathieu-Daudé <address@hidden>
Reviewed-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: ae8b75dc6ec808378487064922f25f1e7ea7a9be
      
https://github.com/qemu/qemu/commit/ae8b75dc6ec808378487064922f25f1e7ea7a9be
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M include/exec/helper-gen.h
    M include/exec/helper-head.h
    M tcg/tcg-op.c
    M tcg/tcg-op.h
    M tcg/tcg.c
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: Introduce tcgv_{i32,i64,ptr}_{arg,temp}

Transform TCGv_* to an "argument" or a temporary.
For now, an argument is simply the temporary index.

Reviewed-by: Philippe Mathieu-Daudé <address@hidden>
Reviewed-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 085272b35e0644fea373c33b5265c1818b7a978c
      
https://github.com/qemu/qemu/commit/085272b35e0644fea373c33b5265c1818b7a978c
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M tcg/tcg.c
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: Introduce temp_tcgv_{i32,i64,ptr}

Reviewed-by: Philippe Mathieu-Daudé <address@hidden>
Reviewed-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: dc41aa7d34989b552efe712ffe184236216f960b
      
https://github.com/qemu/qemu/commit/dc41aa7d34989b552efe712ffe184236216f960b
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M include/exec/helper-head.h
    M target/sparc/translate.c
    M tcg/tcg-op.c
    M tcg/tcg.c
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: Remove GET_TCGV_* and MAKE_TCGV_*

The GET and MAKE functions weren't really specific enough.
We now have a full complement of functions that convert exactly
between temporaries, arguments, tcgv pointers, and indices.

The target/sparc change is also a bug fix, which would have affected
a host that defines TCG_TARGET_HAS_extr[lh]_i64_i32, i.e. MIPS64.

Reviewed-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 11f4e8f8bfaa2caaab24bef6bbbb8a0205015119
      
https://github.com/qemu/qemu/commit/11f4e8f8bfaa2caaab24bef6bbbb8a0205015119
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M target/cris/translate.c
    M target/i386/translate.c
    M target/m68k/translate.c
    M target/ppc/translate.c
    M tcg/tcg-op.h
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: Remove TCGV_EQUAL*

When we used structures for TCGv_*, we needed a macro in order to
perform a comparison.  Now that we use pointers, this is just clutter.

Reviewed-by: Philippe Mathieu-Daudé <address@hidden>
Reviewed-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 55c3ceef61fcf06fc98ddc752b7cce788ce7680b
      
https://github.com/qemu/qemu/commit/55c3ceef61fcf06fc98ddc752b7cce788ce7680b
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M exec.c
    M include/qom/cpu.h
    M target/alpha/cpu.c
    M target/alpha/translate.c
    M target/arm/cpu.c
    M target/cris/cpu.c
    M target/hppa/cpu.c
    M target/hppa/translate.c
    M target/i386/cpu.c
    M target/i386/translate.c
    M target/lm32/cpu.c
    M target/m68k/cpu.c
    M target/microblaze/cpu.c
    M target/mips/cpu.c
    M target/mips/translate.c
    M target/moxie/cpu.c
    M target/moxie/translate.c
    M target/nios2/cpu.c
    M target/openrisc/cpu.c
    M target/ppc/translate.c
    M target/ppc/translate_init.c
    M target/s390x/cpu.c
    M target/sh4/cpu.c
    M target/sh4/translate.c
    M target/sparc/cpu.c
    M target/sparc/cpu.h
    M target/sparc/translate.c
    M target/tilegx/cpu.c
    M target/tricore/cpu.c
    M target/tricore/translate.c
    M target/unicore32/cpu.c
    M target/xtensa/cpu.c

  Log Message:
  -----------
  qom: Introduce CPUClass.tcg_initialize

Move target cpu tcg initialization to common code,
called from cpu_exec_realizefn.

Acked-by: Andreas Färber <address@hidden>
Reviewed-by: Emilio G. Cota <address@hidden>
Reviewed-by: Philippe Mathieu-Daudé <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: e89b28a63501c0ad6d2501fe851d0c5202055e70
      
https://github.com/qemu/qemu/commit/e89b28a63501c0ad6d2501fe851d0c5202055e70
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: Use offsets not indices for TCGv_*

Using the offset of a temporary, relative to TCGContext, rather than
its index means that we don't use 0.  That leaves offset 0 free for
a NULL representation without having to leave index 0 unused.

Reviewed-by: Philippe Mathieu-Daudé <address@hidden>
Reviewed-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 4e2ca83e71b51577b06b1468e836556912bd5b6e
      
https://github.com/qemu/qemu/commit/4e2ca83e71b51577b06b1468e836556912bd5b6e
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M accel/tcg/cpu-exec.c
    M accel/tcg/tcg-runtime.c
    M accel/tcg/translate-all.c
    M exec.c
    M include/exec/exec-all.h
    M include/exec/tb-hash-xx.h
    M include/exec/tb-hash.h
    M include/exec/tb-lookup.h
    M tcg/tcg.h
    M tests/qht-bench.c

  Log Message:
  -----------
  tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK

This will enable us to decouple code translation from the value
of parallel_cpus at any given time. It will also help us minimize
TB flushes when generating code via EXCP_ATOMIC.

Note that the declaration of parallel_cpus is brought to exec-all.h
to be able to define there the "curr_cflags" inline.

Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 9b990ee5a3cc6aa38f81266fb0c6ef37a36c45b9
      
https://github.com/qemu/qemu/commit/9b990ee5a3cc6aa38f81266fb0c6ef37a36c45b9
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M accel/tcg/cpu-exec.c
    M accel/tcg/translate-all.c
    M exec.c
    M include/qom/cpu.h
    M qom/cpu.c

  Log Message:
  -----------
  tcg: Add CPUState cflags_next_tb

We were generating code during tb_invalidate_phys_page_range,
check_watchpoint, cpu_io_recompile, and (seemingly) discarding
the TB, assuming that it would magically be picked up during
the next iteration through the cpu_exec loop.

Instead, record the desired cflags in CPUState so that we request
the proper TB so that there is no more magic.

Reviewed-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: cdfef1715c779eb528d633e8b76cbc8a10e71ac8
      
https://github.com/qemu/qemu/commit/cdfef1715c779eb528d633e8b76cbc8a10e71ac8
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M include/exec/exec-all.h

  Log Message:
  -----------
  tcg: Include CF_COUNT_MASK in CF_HASH_MASK

Reviewed-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: c5a49c63fa26e8825ad101dfe86339ae4c216539
      
https://github.com/qemu/qemu/commit/c5a49c63fa26e8825ad101dfe86339ae4c216539
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M accel/tcg/translator.c
    M include/exec/gen-icount.h
    M target/alpha/translate.c
    M target/arm/translate-a64.c
    M target/arm/translate.c
    M target/cris/translate.c
    M target/hppa/translate.c
    M target/i386/translate.c
    M target/lm32/translate.c
    M target/m68k/translate.c
    M target/microblaze/translate.c
    M target/mips/translate.c
    M target/moxie/translate.c
    M target/nios2/translate.c
    M target/openrisc/translate.c
    M target/ppc/translate.c
    M target/ppc/translate_init.c
    M target/s390x/translate.c
    M target/sh4/translate.c
    M target/sparc/translate.c
    M target/tilegx/translate.c
    M target/tricore/translate.c
    M target/unicore32/translate.c
    M target/xtensa/translate.c

  Log Message:
  -----------
  tcg: convert tb->cflags reads to tb_cflags(tb)

Convert all existing readers of tb->cflags to tb_cflags, so that we
use atomic_read and therefore avoid undefined behaviour in C11.

Note that the remaining setters/getters of the field are protected
by tb_lock, and therefore do not need conversion.

Luckily all readers access the field via 'tb->cflags' (so no foo.cflags,
bar->cflags in the code base), which makes the conversion easily
scriptable:

FILES=$(git grep 'tb->cflags' target include/exec/gen-icount.h \
         accel/tcg/translator.c | cut -f1 -d':' | sort | uniq)

perl -pi -e 's/([^.>])tb->cflags/$1tb_cflags(tb)/g' $FILES
perl -pi -e 's/([a-z->.]*)(->|\.)tb->cflags/tb_cflags($1$2tb)/g' $FILES

Then manually fixed the few errors that checkpatch reported.

Compile-tested for all targets.

Suggested-by: Richard Henderson <address@hidden>
Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 2399d4e7cec22ecf1c51062d2ebfd45220dbaace
      
https://github.com/qemu/qemu/commit/2399d4e7cec22ecf1c51062d2ebfd45220dbaace
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M target/arm/helper-a64.c
    M target/arm/helper-a64.h
    M target/arm/op_helper.c
    M target/arm/translate-a64.c
    M target/arm/translate.c

  Log Message:
  -----------
  target/arm: check CF_PARALLEL instead of parallel_cpus

Thereby decoupling the resulting translated code from the current state
of the system.

Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: f9f46db444a2dfc2ebf1f9f7d4b42163ab33187d
      
https://github.com/qemu/qemu/commit/f9f46db444a2dfc2ebf1f9f7d4b42163ab33187d
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M target/hppa/helper.h
    M target/hppa/op_helper.c
    M target/hppa/translate.c

  Log Message:
  -----------
  target/hppa: check CF_PARALLEL instead of parallel_cpus

Thereby decoupling the resulting translated code from the current state
of the system.

Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: b5e3b4c2aca8eb5a9cfeedfb273af623f17c3731
      
https://github.com/qemu/qemu/commit/b5e3b4c2aca8eb5a9cfeedfb273af623f17c3731
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M target/i386/translate.c

  Log Message:
  -----------
  target/i386: check CF_PARALLEL instead of parallel_cpus

Thereby decoupling the resulting translated code from the current state
of the system.

Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: f0ddf11b23260f0af84fb529486a8f9ba2d19401
      
https://github.com/qemu/qemu/commit/f0ddf11b23260f0af84fb529486a8f9ba2d19401
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M target/m68k/helper.h
    M target/m68k/op_helper.c
    M target/m68k/translate.c

  Log Message:
  -----------
  target/m68k: check CF_PARALLEL instead of parallel_cpus

Thereby decoupling the resulting translated code from the current state
of the system.

Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 6476615d385eb249105b25873ef30ba4b9c808dc
      
https://github.com/qemu/qemu/commit/6476615d385eb249105b25873ef30ba4b9c808dc
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M target/s390x/helper.h
    M target/s390x/mem_helper.c
    M target/s390x/translate.c

  Log Message:
  -----------
  target/s390x: check CF_PARALLEL instead of parallel_cpus

Thereby decoupling the resulting translated code from the current state
of the system.

Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 671f9a85d85ea7523707f88dffa9428ed4a19f75
      
https://github.com/qemu/qemu/commit/671f9a85d85ea7523707f88dffa9428ed4a19f75
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M target/sh4/translate.c

  Log Message:
  -----------
  target/sh4: check CF_PARALLEL instead of parallel_cpus

Thereby decoupling the resulting translated code from the current state
of the system.

Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 87d757d60d66d5ee1608460b0f1e07e2b758db9c
      
https://github.com/qemu/qemu/commit/87d757d60d66d5ee1608460b0f1e07e2b758db9c
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M target/sparc/translate.c

  Log Message:
  -----------
  target/sparc: check CF_PARALLEL instead of parallel_cpus

Thereby decoupling the resulting translated code from the current state
of the system.

Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: e82d5a2460b0e176128027651ff9b104e4bdf5cc
      
https://github.com/qemu/qemu/commit/e82d5a2460b0e176128027651ff9b104e4bdf5cc
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M accel/tcg/translate-all.c
    M tcg/tcg-op.c
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: check CF_PARALLEL instead of parallel_cpus

Thereby decoupling the resulting translated code from the current state
of the system.

The tb->cflags field is not passed to tcg generation functions. So
we add a field to TCGContext, storing there a copy of tb->cflags.

Most architectures have <= 32 registers, which results in a 4-byte hole
in TCGContext. Use this hole for the new field.

Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: ac03ee5331612e44beb393df2b578c951d27dc0d
      
https://github.com/qemu/qemu/commit/ac03ee5331612e44beb393df2b578c951d27dc0d
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M accel/tcg/cpu-exec.c

  Log Message:
  -----------
  cpu-exec: lookup/generate TB outside exclusive region during step_atomic

Now that all code generation has been converted to check CF_PARALLEL, we can
generate !CF_PARALLEL code without having yet set !parallel_cpus --
and therefore without having to be in the exclusive region during
cpu_exec_step_atomic.

While at it, merge cpu_exec_step into cpu_exec_step_atomic.

Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 0cf8a44c2f56ba884c2f6db47d27fbb24975daa3
      
https://github.com/qemu/qemu/commit/0cf8a44c2f56ba884c2f6db47d27fbb24975daa3
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M include/exec/exec-all.h

  Log Message:
  -----------
  tcg: Add CF_LAST_IO + CF_USE_ICOUNT to CF_HASH_MASK

These flags are used by target/*/translate.c,
and affect code generation.

Reviewed-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 416986d3f97329655e30da7271a2d11c6d707b06
      
https://github.com/qemu/qemu/commit/416986d3f97329655e30da7271a2d11c6d707b06
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M accel/tcg/cpu-exec.c
    M accel/tcg/translate-all.c
    M include/exec/exec-all.h

  Log Message:
  -----------
  tcg: Remove CF_IGNORE_ICOUNT

Now that we have curr_cflags, we can include CF_USE_ICOUNT
early and then remove it as necessary.

Reviewed-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 2ac01d6dafabd4a726254eea98824c798d416ee4
      
https://github.com/qemu/qemu/commit/2ac01d6dafabd4a726254eea98824c798d416ee4
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M accel/tcg/translate-all.c
    M include/exec/exec-all.h
    M include/exec/tb-context.h

  Log Message:
  -----------
  translate-all: use a binary search tree to track TBs in TBContext

This is a prerequisite for supporting multiple TCG contexts, since
we will have threads generating code in separate regions of
code_gen_buffer.

For this we need a new field (.size) in struct tb_tc to keep
track of the size of the translated code. This field uses a size_t
to avoid adding a hole to the struct, although really an unsigned
int would have been enough.

The comparison function we use is optimized for the common case:
insertions. Profiling shows that upon booting debian-arm, 98%
of comparisons are between existing tb's (i.e. a->size and b->size
are both !0), which happens during insertions (and removals, but
those are rare). The remaining cases are lookups. From reading the glib
sources we see that the first key is always the lookup key. However,
the code does not assume this to always be the case because this
behaviour is not guaranteed in the glib docs. However, we embed
this knowledge in the code as a branch hint for the compiler.

Note that tb_free does not free space in the code_gen_buffer anymore,
since we cannot easily know whether the tb is the last one inserted
in code_gen_buffer. The next patch in this series renames tb_free
to tb_remove to reflect this.

Performance-wise, lookups in tb_find_pc are the same as before:
O(log n). However, insertions are O(log n) instead of O(1), which
results in a small slowdown when booting debian-arm:

Performance counter stats for 'build/arm-softmmu/qemu-system-arm \
        -machine type=virt -nographic -smp 1 -m 4096 \
        -netdev user,id=unet,hostfwd=tcp::2222-:22 \
        -device virtio-net-device,netdev=unet \
        -drive file=img/arm/jessie-arm32.qcow2,id=myblock,index=0,if=none \
        -device virtio-blk-device,drive=myblock \
        -kernel img/arm/aarch32-current-linux-kernel-only.img \
        -append console=ttyAMA0 root=/dev/vda1 \
        -name arm,debug-threads=on -smp 1' (10 runs):

- Before:
  8048.598422      task-clock (msec)         #    0.931 CPUs utilized           
 ( +-  0.28% )
      16,974      context-switches          #    0.002 M/sec                    
( +-  0.12% )
           0      cpu-migrations            #    0.000 K/sec
      10,125      page-faults               #    0.001 M/sec                    
( +-  1.23% )
    35,144,901,879      cycles                    #    4.367 GHz                
      ( +-  0.14% )
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    65,758,252,643      instructions              #    1.87  insns per cycle    
      ( +-  0.33% )
    10,871,298,668      branches                  # 1350.707 M/sec              
      ( +-  0.41% )
       192,322,212      branch-misses             #    1.77% of all branches    
      ( +-  0.32% )
  8.640869419 seconds time elapsed                                          ( 
+-  0.57% )

- After:
       8146.242027      task-clock (msec)         #    0.923 CPUs utilized      
      ( +-  1.23% )
      17,016      context-switches          #    0.002 M/sec                    
( +-  0.40% )
           0      cpu-migrations            #    0.000 K/sec
      18,769      page-faults               #    0.002 M/sec                    
( +-  0.45% )
    35,660,956,120      cycles                    #    4.378 GHz                
      ( +-  1.22% )
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    65,095,366,607      instructions              #    1.83  insns per cycle    
      ( +-  1.73% )
    10,803,480,261      branches                  # 1326.192 M/sec              
      ( +-  1.95% )
       195,601,289      branch-misses             #    1.81% of all branches    
      ( +-  0.39% )
  8.828660235 seconds time elapsed                                          ( 
+-  0.38% )

Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: be1e01171b556807198c84feac7cf4bca0d904c2
      
https://github.com/qemu/qemu/commit/be1e01171b556807198c84feac7cf4bca0d904c2
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M accel/tcg/cpu-exec.c
    M accel/tcg/translate-all.c
    M include/exec/exec-all.h

  Log Message:
  -----------
  exec-all: rename tb_free to tb_remove

We don't really free anything in this function anymore; we just remove
the TB from the binary search tree.

Suggested-by: Alex Bennée <address@hidden>
Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: f19c6cc6fc356dab7a766b471ec5eb3058f0afc1
      
https://github.com/qemu/qemu/commit/f19c6cc6fc356dab7a766b471ec5eb3058f0afc1
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M accel/tcg/translate-all.c

  Log Message:
  -----------
  translate-all: report correct avg host TB size

Since commit 6e3b2bfd6 ("tcg: allocate TB structs before the
corresponding translated code") we are not fully utilizing
code_gen_buffer for translated code, and therefore are
incorrectly reporting the amount of translated code as well as
the average host TB size. Address this by:

- Making the conscious choice of misreporting the total translated code;
  doing otherwise would mislead users into thinking "-tb-size" is not
  honoured.

- Expanding tb_tree_stats to accurately count the bytes of translated code on
  the host, and using this for reporting the average tb host size,
  as well as the expansion ratio.

In the future we might want to consider reporting the accurate numbers for
the total translated code, together with a "bookkeeping/overhead" field to
account for the TB structs.

Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 44ded3d04821bec57407cc26a8b4db620da2be04
      
https://github.com/qemu/qemu/commit/44ded3d04821bec57407cc26a8b4db620da2be04
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M accel/tcg/cpu-exec.c
    M accel/tcg/translate-all.c
    M include/exec/tb-context.h
    M linux-user/main.c
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: take tb_ctx out of TCGContext

Groundwork for supporting multiple TCG contexts.

Reviewed-by: Richard Henderson <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: b1311c4acf503dc9c1a310cc40b64f05b08833dc
      
https://github.com/qemu/qemu/commit/b1311c4acf503dc9c1a310cc40b64f05b08833dc
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M accel/tcg/tcg-runtime.c
    M accel/tcg/translate-all.c
    M bsd-user/main.c
    M include/exec/gen-icount.h
    M linux-user/main.c
    M target/alpha/translate.c
    M target/arm/translate.c
    M target/cris/translate.c
    M target/cris/translate_v10.c
    M target/hppa/translate.c
    M target/i386/translate.c
    M target/lm32/translate.c
    M target/m68k/translate.c
    M target/microblaze/translate.c
    M target/mips/translate.c
    M target/moxie/translate.c
    M target/nios2/translate.c
    M target/openrisc/translate.c
    M target/ppc/translate.c
    M target/s390x/translate.c
    M target/sh4/translate.c
    M target/sparc/translate.c
    M target/tilegx/translate.c
    M target/tricore/translate.c
    M target/unicore32/translate.c
    M target/xtensa/translate.c
    M tcg/tcg-op.c
    M tcg/tcg.c
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: define tcg_init_ctx and make tcg_ctx a pointer

Groundwork for supporting multiple TCG contexts.

The core of this patch is this change to tcg/tcg.h:

> -extern TCGContext tcg_ctx;
> +extern TCGContext tcg_init_ctx;
> +extern TCGContext *tcg_ctx;

Note that for now we set *tcg_ctx to whatever TCGContext is passed
to tcg_context_init -- in this case &tcg_init_ctx.

Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 26689780f838f6be13c3878b973ad4a83c0e8071
      
https://github.com/qemu/qemu/commit/26689780f838f6be13c3878b973ad4a83c0e8071
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M include/exec/gen-icount.h
    M tcg/tcg.h

  Log Message:
  -----------
  gen-icount: fold exitreq_label into TCGContext

Groundwork for supporting multiple TCG contexts.

Reviewed-by: Richard Henderson <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: df2cce2968069526553d82331ce9817eaca6b03a
      
https://github.com/qemu/qemu/commit/df2cce2968069526553d82331ce9817eaca6b03a
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M tcg/tcg.c

  Log Message:
  -----------
  tcg: introduce **tcg_ctxs to keep track of all TCGContext's

Groundwork for supporting multiple TCG contexts.

Note that having n_tcg_ctxs is unnecessary. However, it is
convenient to have it, since it will simplify iterating over the
array: we'll have just a for loop instead of having to iterate
over a NULL-terminated array (which would require n+1 elems)
or having to check with ifdef's for usermode/softmmu.

Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: c3fac1138e13f8074168ee32a46afd6f3ff49059
      
https://github.com/qemu/qemu/commit/c3fac1138e13f8074168ee32a46afd6f3ff49059
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M accel/tcg/translate-all.c
    M tcg/tcg.c
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: distribute profiling counters across TCGContext's

This is groundwork for supporting multiple TCG contexts.

To avoid scalability issues when profiling info is enabled, this patch
makes the profiling info counters distributed via the following changes:

1) Consolidate profile info into its own struct, TCGProfile, which
   TCGContext also includes. Note that tcg_table_op_count is brought
   into TCGProfile after dropping the tcg_ prefix.
2) Iterate over the TCG contexts in the system to obtain the total counts.

This change also requires updating the accessors to TCGProfile fields to
use atomic_read/set whenever there may be conflicting accesses (as defined
in C11) to them.

Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 34184b071817b4f9edbfd1aa2225c196f05a0947
      
https://github.com/qemu/qemu/commit/34184b071817b4f9edbfd1aa2225c196f05a0947
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M tcg/optimize.c

  Log Message:
  -----------
  tcg: allocate optimizer temps with tcg_malloc

Groundwork for supporting multiple TCG contexts.

While at it, also allocate temps_used directly as a bitmap of the
required size, instead of using a bitmap of TCG_MAX_TEMPS via
TCGTempSet.

Performance-wise we lose about 1.12% in a translation-heavy workload
such as booting+shutting down debian-arm:

Performance counter stats for 'taskset -c 0 arm-softmmu/qemu-system-arm \
        -machine type=virt -nographic -smp 1 -m 4096 \
        -netdev user,id=unet,hostfwd=tcp::2222-:22 \
        -device virtio-net-device,netdev=unet \
        -drive file=die-on-boot.qcow2,id=myblock,index=0,if=none \
        -device virtio-blk-device,drive=myblock \
        -kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
        -name arm,debug-threads=on -smp 1' (10 runs):
        exec time (s)  Relative slowdown wrt original (%)
---------------------------------------------------------------
 original     20.213321616                                  0.
 tcg_malloc   20.441130078                           1.1270214
 TCGContext   20.477846517                           1.3086662
 g_malloc     20.780527895                           2.8061013

The other two alternatives shown in the table are:
- TCGContext: embed temps[TCG_MAX_TEMPS] and TCGTempSet used_temps
  in TCGContext. This is simple enough but it isn't faster than using
  tcg_malloc; moreover, it wastes memory.
- g_malloc: allocate/deallocate both temps and used_temps every time
  tcg_optimize is executed.

Suggested-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 5fa64b3130af9a45e7e2a904bde1f8cfb72be5c9
      
https://github.com/qemu/qemu/commit/5fa64b3130af9a45e7e2a904bde1f8cfb72be5c9
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M include/qemu/osdep.h
    M util/osdep.c

  Log Message:
  -----------
  osdep: introduce qemu_mprotect_rwx/none

Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: f51f315a676ec913a55ac27be4ef857f9f7ddc5c
      
https://github.com/qemu/qemu/commit/f51f315a676ec913a55ac27be4ef857f9f7ddc5c
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M accel/tcg/translate-all.c

  Log Message:
  -----------
  translate-all: use qemu_protect_rwx/none helpers

The helpers require the address and size to be page-aligned, so
do that before calling them.

Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: e8feb96fcc6c16eab8923332e86ff4ef0e2ac276
      
https://github.com/qemu/qemu/commit/e8feb96fcc6c16eab8923332e86ff4ef0e2ac276
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M accel/tcg/translate-all.c
    M bsd-user/main.c
    M cpus.c
    M linux-user/main.c
    M tcg/tcg.c
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: introduce regions to split code_gen_buffer

This is groundwork for supporting multiple TCG contexts.

The naive solution here is to split code_gen_buffer statically
among the TCG threads; this however results in poor utilization
if translation needs are different across TCG threads.

What we do here is to add an extra layer of indirection, assigning
regions that act just like pages do in virtual memory allocation.
(BTW if you are wondering about the chosen naming, I did not want
to use blocks or pages because those are already heavily used in QEMU).

We use a global lock to serialize allocations as well as statistics
reporting (we now export the size of the used code_gen_buffer with
tcg_code_size()). Note that for the allocator we could just use
a counter and atomic_inc; however, that would complicate the gathering
of tcg_code_size()-like stats. So given that the region operations are
not a fast path, a lock seems the most reasonable choice.

The effectiveness of this approach is clear after seeing some numbers.
I used the bootup+shutdown of debian-arm with '-tb-size 80' as a benchmark.
Note that I'm evaluating this after enabling per-thread TCG (which
is done by a subsequent commit).

* -smp 1, 1 region (entire buffer):
    qemu: flush code_size=83885014 nb_tbs=154739 avg_tb_size=357
    qemu: flush code_size=83884902 nb_tbs=153136 avg_tb_size=363
    qemu: flush code_size=83885014 nb_tbs=152777 avg_tb_size=364
    qemu: flush code_size=83884950 nb_tbs=150057 avg_tb_size=373
    qemu: flush code_size=83884998 nb_tbs=150234 avg_tb_size=373
    qemu: flush code_size=83885014 nb_tbs=154009 avg_tb_size=360
    qemu: flush code_size=83885014 nb_tbs=151007 avg_tb_size=370
    qemu: flush code_size=83885014 nb_tbs=151816 avg_tb_size=367

That is, 8 flushes.

* -smp 8, 32 regions (80/32 MB per region) [i.e. this patch]:

    qemu: flush code_size=76328008 nb_tbs=141040 avg_tb_size=356
    qemu: flush code_size=75366534 nb_tbs=138000 avg_tb_size=361
    qemu: flush code_size=76864546 nb_tbs=140653 avg_tb_size=361
    qemu: flush code_size=76309084 nb_tbs=135945 avg_tb_size=375
    qemu: flush code_size=74581856 nb_tbs=132909 avg_tb_size=375
    qemu: flush code_size=73927256 nb_tbs=135616 avg_tb_size=360
    qemu: flush code_size=78629426 nb_tbs=142896 avg_tb_size=365
    qemu: flush code_size=76667052 nb_tbs=138508 avg_tb_size=368

Again, 8 flushes. Note how buffer utilization is not 100%, but it
is close. Smaller region sizes would yield higher utilization,
but we want region allocation to be rare (it acquires a lock), so
we do not want to go too small.

* -smp 8, static partitioning of 8 regions (10 MB per region):
    qemu: flush code_size=21936504 nb_tbs=40570 avg_tb_size=354
    qemu: flush code_size=11472174 nb_tbs=20633 avg_tb_size=370
    qemu: flush code_size=11603976 nb_tbs=21059 avg_tb_size=365
    qemu: flush code_size=23254872 nb_tbs=41243 avg_tb_size=377
    qemu: flush code_size=28289496 nb_tbs=52057 avg_tb_size=358
    qemu: flush code_size=43605160 nb_tbs=78896 avg_tb_size=367
    qemu: flush code_size=45166552 nb_tbs=82158 avg_tb_size=364
    qemu: flush code_size=63289640 nb_tbs=116494 avg_tb_size=358
    qemu: flush code_size=51389960 nb_tbs=93937 avg_tb_size=362
    qemu: flush code_size=59665928 nb_tbs=107063 avg_tb_size=372
    qemu: flush code_size=38380824 nb_tbs=68597 avg_tb_size=374
    qemu: flush code_size=44884568 nb_tbs=79901 avg_tb_size=376
    qemu: flush code_size=50782632 nb_tbs=90681 avg_tb_size=374
    qemu: flush code_size=39848888 nb_tbs=71433 avg_tb_size=372
    qemu: flush code_size=64708840 nb_tbs=119052 avg_tb_size=359
    qemu: flush code_size=49830008 nb_tbs=90992 avg_tb_size=362
    qemu: flush code_size=68372408 nb_tbs=123442 avg_tb_size=368
    qemu: flush code_size=33555560 nb_tbs=59514 avg_tb_size=378
    qemu: flush code_size=44748344 nb_tbs=80974 avg_tb_size=367
    qemu: flush code_size=37104248 nb_tbs=67609 avg_tb_size=364

That is, 20 flushes. Note how a static partitioning approach uses
the code buffer poorly, leading to many unnecessary flushes.

Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 3468b59e18b179bc63c7ce934de912dfa9596122
      
https://github.com/qemu/qemu/commit/3468b59e18b179bc63c7ce934de912dfa9596122
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M accel/tcg/translate-all.c
    M cpus.c
    M linux-user/syscall.c
    M tcg/tcg.c
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: enable multiple TCG contexts in softmmu

This enables parallel TCG code generation. However, we do not take
advantage of it yet since tb_lock is still held during tb_gen_code.

In user-mode we use a single TCG context; see the documentation
added to tcg_region_init for the rationale.

Note that targets do not need any conversion: targets initialize a
TCGContext (e.g. defining TCG globals), and after this initialization
has finished, the context is cloned by the vCPU threads, each of
them keeping a separate copy.

TCG threads claim one entry in tcg_ctxs[] by atomically increasing
n_tcg_ctxs. Do not be too annoyed by the subsequent atomic_read's
of that variable and tcg_ctxs; they are there just to play nice with
analysis tools such as thread sanitizer.

Note that we do not allocate an array of contexts (we allocate
an array of pointers instead) because when tcg_context_init
is called, we do not know yet how many contexts we'll use since
the bool behind qemu_tcg_mttcg_enabled() isn't set yet.

Previous patches folded some TCG globals into TCGContext. The non-const
globals remaining are only set at init time, i.e. before the TCG
threads are spawned. Here is a list of these set-at-init-time globals
under tcg/:

Only written by tcg_context_init:
- indirect_reg_alloc_order
- tcg_op_defs
Only written by tcg_target_init (called from tcg_context_init):
- tcg_target_available_regs
- tcg_target_call_clobber_regs
- arm: arm_arch, use_idiv_instructions
- i386: have_cmov, have_bmi1, have_bmi2, have_lzcnt,
  have_movbe, have_popcnt
- mips: use_movnz_instructions, use_mips32_instructions,
  use_mips32r2_instructions, got_sigill (tcg_target_detect_isa)
- ppc: have_isa_2_06, have_isa_3_00, tb_ret_addr
- s390: tb_ret_addr, s390_facilities
- sparc: qemu_ld_trampoline, qemu_st_trampoline (build_trampolines),
   use_vis3_instructions

Only written by tcg_prologue_init:
- 'struct jit_code_entry one_entry'
- aarch64: tb_ret_addr
- arm: tb_ret_addr
- i386: tb_ret_addr, guest_base_flags
- ia64: tb_ret_addr
- mips: tb_ret_addr, bswap32_addr, bswap32u_addr, bswap64_addr

Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 1c2adb958fc07e5b3e81ed21b801c04a15f41f4f
      
https://github.com/qemu/qemu/commit/1c2adb958fc07e5b3e81ed21b801c04a15f41f4f
  Author: Richard Henderson <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M include/exec/gen-icount.h
    M target/alpha/translate.c
    M target/arm/translate.c
    M target/arm/translate.h
    M target/cris/translate.c
    M target/cris/translate_v10.c
    M target/hppa/translate.c
    M target/i386/translate.c
    M target/lm32/translate.c
    M target/m68k/translate.c
    M target/microblaze/translate.c
    M target/mips/translate.c
    M target/moxie/translate.c
    M target/nios2/translate.c
    M target/openrisc/translate.c
    M target/ppc/translate.c
    M target/s390x/translate.c
    M target/sh4/translate.c
    M target/sparc/translate.c
    M target/tilegx/translate.c
    M target/tricore/translate.c
    M target/unicore32/translate.c
    M target/xtensa/translate.c
    M tcg/tcg-op.c
    M tcg/tcg.c
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: Initialize cpu_env generically

This is identical for each target.  So, move the initialization to
common code.  Move the variable itself out of tcg_ctx and name it
cpu_env to minimize changes within targets.

This also means we can remove tcg_global_reg_new_{ptr,i32,i64},
since there are no longer global-register temps created by targets.

Reviewed-by: Emilio G. Cota <address@hidden>
Reviewed-by: Philippe Mathieu-Daudé <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: cc689485ee3e9dca05765326ee8fd619a6ec48f0
      
https://github.com/qemu/qemu/commit/cc689485ee3e9dca05765326ee8fd619a6ec48f0
  Author: Emilio G. Cota <address@hidden>
  Date:   2017-10-24 (Tue, 24 Oct 2017)

  Changed paths:
    M accel/tcg/translate-all.c

  Log Message:
  -----------
  translate-all: exit from tb_phys_invalidate if qht_remove fails

Two or more threads might race while invalidating the same TB. We currently
do not check for this at all despite taking tb_lock, which means we would
wrongly invalidate the same TB more than once. This bug has actually been
hit by users: I recently saw a report on IRC, although I have yet to see
the corresponding test case.

Fix this by using qht_remove as the synchronization point; if it fails,
that means the TB has already been invalidated, and therefore there
is nothing left to do in tb_phys_invalidate.

Note that this solution works now that we still have tb_lock, and will
continue working once we remove tb_lock.

Reviewed-by: Richard Henderson <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Message-Id: <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: ae49fbbcd8e4e9d8bf7131add34773f579e1aff7
      
https://github.com/qemu/qemu/commit/ae49fbbcd8e4e9d8bf7131add34773f579e1aff7
  Author: Peter Maydell <address@hidden>
  Date:   2017-10-25 (Wed, 25 Oct 2017)

  Changed paths:
    M accel/tcg/cpu-exec.c
    M accel/tcg/tcg-runtime.c
    M accel/tcg/translate-all.c
    M accel/tcg/translator.c
    M bsd-user/main.c
    M cpus.c
    M exec.c
    M include/exec/exec-all.h
    M include/exec/gen-icount.h
    M include/exec/helper-gen.h
    M include/exec/helper-head.h
    M include/exec/tb-context.h
    M include/exec/tb-hash-xx.h
    M include/exec/tb-hash.h
    M include/exec/tb-lookup.h
    M include/qemu/osdep.h
    M include/qom/cpu.h
    M linux-user/main.c
    M linux-user/syscall.c
    M qom/cpu.c
    M target/alpha/cpu.c
    M target/alpha/translate.c
    M target/arm/cpu.c
    M target/arm/helper-a64.c
    M target/arm/helper-a64.h
    M target/arm/op_helper.c
    M target/arm/translate-a64.c
    M target/arm/translate.c
    M target/arm/translate.h
    M target/cris/cpu.c
    M target/cris/translate.c
    M target/cris/translate_v10.c
    M target/hppa/cpu.c
    M target/hppa/helper.h
    M target/hppa/op_helper.c
    M target/hppa/translate.c
    M target/i386/cpu.c
    M target/i386/translate.c
    M target/lm32/cpu.c
    M target/lm32/translate.c
    M target/m68k/cpu.c
    M target/m68k/helper.h
    M target/m68k/op_helper.c
    M target/m68k/translate.c
    M target/microblaze/cpu.c
    M target/microblaze/translate.c
    M target/mips/cpu.c
    M target/mips/translate.c
    M target/moxie/cpu.c
    M target/moxie/translate.c
    M target/nios2/cpu.c
    M target/nios2/translate.c
    M target/openrisc/cpu.c
    M target/openrisc/translate.c
    M target/ppc/translate.c
    M target/ppc/translate_init.c
    M target/s390x/cpu.c
    M target/s390x/helper.h
    M target/s390x/mem_helper.c
    M target/s390x/translate.c
    M target/sh4/cpu.c
    M target/sh4/translate.c
    M target/sparc/cpu.c
    M target/sparc/cpu.h
    M target/sparc/translate.c
    M target/tilegx/cpu.c
    M target/tilegx/translate.c
    M target/tricore/cpu.c
    M target/tricore/translate.c
    M target/unicore32/cpu.c
    M target/unicore32/translate.c
    M target/xtensa/cpu.c
    M target/xtensa/translate.c
    M tcg/optimize.c
    M tcg/tcg-op.c
    M tcg/tcg-op.h
    M tcg/tcg.c
    M tcg/tcg.h
    M tests/qht-bench.c
    M util/osdep.c

  Log Message:
  -----------
  Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20171025' into staging

TCG patch queue

# gpg: Signature made Wed 25 Oct 2017 10:30:18 BST
# gpg:                using RSA key 0x64DF38E8AF7E215F
# gpg: Good signature from "Richard Henderson <address@hidden>"
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* remotes/rth/tags/pull-tcg-20171025: (51 commits)
  translate-all: exit from tb_phys_invalidate if qht_remove fails
  tcg: Initialize cpu_env generically
  tcg: enable multiple TCG contexts in softmmu
  tcg: introduce regions to split code_gen_buffer
  translate-all: use qemu_protect_rwx/none helpers
  osdep: introduce qemu_mprotect_rwx/none
  tcg: allocate optimizer temps with tcg_malloc
  tcg: distribute profiling counters across TCGContext's
  tcg: introduce **tcg_ctxs to keep track of all TCGContext's
  gen-icount: fold exitreq_label into TCGContext
  tcg: define tcg_init_ctx and make tcg_ctx a pointer
  tcg: take tb_ctx out of TCGContext
  translate-all: report correct avg host TB size
  exec-all: rename tb_free to tb_remove
  translate-all: use a binary search tree to track TBs in TBContext
  tcg: Remove CF_IGNORE_ICOUNT
  tcg: Add CF_LAST_IO + CF_USE_ICOUNT to CF_HASH_MASK
  cpu-exec: lookup/generate TB outside exclusive region during step_atomic
  tcg: check CF_PARALLEL instead of parallel_cpus
  target/sparc: check CF_PARALLEL instead of parallel_cpus
  ...

Signed-off-by: Peter Maydell <address@hidden>


Compare: https://github.com/qemu/qemu/compare/4e1b31dba8f6...ae49fbbcd8e4

reply via email to

[Prev in Thread] Current Thread [Next in Thread]