qemu-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-commits] [qemu/qemu] d1a9f2: atomics: Add parameters to macros


From: GitHub
Subject: [Qemu-commits] [qemu/qemu] d1a9f2: atomics: Add parameters to macros
Date: Thu, 27 Oct 2016 07:30:09 -0700

  Branch: refs/heads/master
  Home:   https://github.com/qemu/qemu
  Commit: d1a9f2d12fcfc942924956fbe321aedf4226ccb7
      
https://github.com/qemu/qemu/commit/d1a9f2d12fcfc942924956fbe321aedf4226ccb7
  Author: Richard Henderson <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M include/qemu/atomic.h

  Log Message:
  -----------
  atomics: Add parameters to macros

Making these functional rather than object macros will
prevent later problems with complex macro expansion.

Reviewed-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 61696ddbdc74263ddb6869856772cfe355a5d3bd
      
https://github.com/qemu/qemu/commit/61696ddbdc74263ddb6869856772cfe355a5d3bd
  Author: Emilio G. Cota <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M include/qemu/atomic.h

  Log Message:
  -----------
  atomics: add atomic_xor

This paves the way for upcoming work.

Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Message-Id: <address@hidden>


  Commit: 83d0c719f837724d9e3963b078211b2242bdd2a5
      
https://github.com/qemu/qemu/commit/83d0c719f837724d9e3963b078211b2242bdd2a5
  Author: Emilio G. Cota <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M include/qemu/atomic.h

  Log Message:
  -----------
  atomics: add atomic_op_fetch variants

This paves the way for upcoming work.

Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Message-Id: <address@hidden>


  Commit: 84bca3927b36fb1d9a2ca85cbbdf9023d2b84678
      
https://github.com/qemu/qemu/commit/84bca3927b36fb1d9a2ca85cbbdf9023d2b84678
  Author: Richard Henderson <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M include/qemu/atomic.h

  Log Message:
  -----------
  atomics: Add __nocheck atomic operations

While the check against sizeof(void *) is appropriate for
normal usage within qemu, there are places in which we want
wider operaions and have checked for their existance.

Reviewed-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 258dfaaad05a5fbe32a142b794e1df3e16501d0e
      
https://github.com/qemu/qemu/commit/258dfaaad05a5fbe32a142b794e1df3e16501d0e
  Author: Richard Henderson <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M exec.c
    M include/qemu/int128.h

  Log Message:
  -----------
  exec: Avoid direct references to Int128 parts

Reviewed-by: Emilio G. Cota <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 0846beb36641e8f0c3ee55a5bb84d468b653c852
      
https://github.com/qemu/qemu/commit/0846beb36641e8f0c3ee55a5bb84d468b653c852
  Author: Richard Henderson <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M include/qemu/int128.h
    M tests/test-int128.c

  Log Message:
  -----------
  int128: Use __int128 if available

Reviewed-by: Emilio G. Cota <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 1edaeee0955fba7d834b7c8f4e372e7eae030745
      
https://github.com/qemu/qemu/commit/1edaeee0955fba7d834b7c8f4e372e7eae030745
  Author: Richard Henderson <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M include/qemu/int128.h

  Log Message:
  -----------
  int128: Add int128_make128

Allows Int128 to be used more generally, rather than having to
begin with 64-bit inputs and accumulate.

Reviewed-by: Emilio G. Cota <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: fdbc2b5722f6092e47181a947c90fd4bdcc1c121
      
https://github.com/qemu/qemu/commit/fdbc2b5722f6092e47181a947c90fd4bdcc1c121
  Author: Richard Henderson <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M cpu-exec-common.c
    M cpu-exec.c
    M cpus.c
    M include/exec/cpu-all.h
    M include/exec/exec-all.h
    M include/qemu-common.h
    M linux-user/main.c
    M tcg/tcg.h
    M translate-all.c

  Log Message:
  -----------
  tcg: Add EXCP_ATOMIC

When we cannot emulate an atomic operation within a parallel
context, this exception allows us to stop the world and try
again in a serial context.

Reviewed-by: Emilio G. Cota <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: b67cb68ba59fd36076e5961139cb3c953c69bed0
      
https://github.com/qemu/qemu/commit/b67cb68ba59fd36076e5961139cb3c953c69bed0
  Author: Alex Bennée <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M linux-user/syscall.c

  Log Message:
  -----------
  linux-user: enable parallel code generation on clone

The variable parallel_cpus controls the generation of thread aware
atomic code.  We only need to set it once we clone our first thread.
At this point any existing translations need to be thrown away.

Reviewed-by: Emilio G. Cota <address@hidden>
Signed-off-by: Alex Bennée <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: dea2198201b3e0151d75b42774c51cf2ffe2ca4b
      
https://github.com/qemu/qemu/commit/dea2198201b3e0151d75b42774c51cf2ffe2ca4b
  Author: Richard Henderson <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M cputlb.c
    M softmmu_template.h

  Log Message:
  -----------
  cputlb: Replace SHIFT with DATA_SIZE

Reviewed-by: Emilio G. Cota <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 3b08f0a92545ba06fbdeaae929a5172480300c33
      
https://github.com/qemu/qemu/commit/3b08f0a92545ba06fbdeaae929a5172480300c33
  Author: Richard Henderson <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M cputlb.c
    M softmmu_template.h

  Log Message:
  -----------
  cputlb: Move probe_write out of softmmu_template.h

Reviewed-by: Emilio G. Cota <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 40978428853e2f7b4597ab2a9ffeb187333802dc
      
https://github.com/qemu/qemu/commit/40978428853e2f7b4597ab2a9ffeb187333802dc
  Author: Richard Henderson <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M softmmu_template.h

  Log Message:
  -----------
  cputlb: Remove includes from softmmu_template.h

We already include exec/address-spaces.h and exec/memory.h in
cputlb.c; the include of qemu/timer.h appears to be a fossil.

Reviewed-by: Emilio G. Cota <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 82a45b96a203a7403427183f1afd3d295222ff7d
      
https://github.com/qemu/qemu/commit/82a45b96a203a7403427183f1afd3d295222ff7d
  Author: Richard Henderson <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M cputlb.c
    M softmmu_template.h

  Log Message:
  -----------
  cputlb: Move most of iotlb code out of line

Saves 2k code size off of a cold path.

Reviewed-by: Emilio G. Cota <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: c86c6e4c80fee4d9423bedb10ba9e9c4aa68f861
      
https://github.com/qemu/qemu/commit/c86c6e4c80fee4d9423bedb10ba9e9c4aa68f861
  Author: Richard Henderson <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M cputlb.c
    M softmmu_template.h

  Log Message:
  -----------
  cputlb: Tidy some macros

TGT_LE and TGT_BE are not size dependent and do not need to be
redefined.  The others are no longer used at all.

Reviewed-by: Emilio G. Cota <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: c482cb117cc418115ca9c6d21a7a2315414c0a40
      
https://github.com/qemu/qemu/commit/c482cb117cc418115ca9c6d21a7a2315414c0a40
  Author: Richard Henderson <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M Makefile.objs
    M Makefile.target
    A atomic_template.h
    M cputlb.c
    M tcg-runtime.c
    M tcg/tcg-op.c
    M tcg/tcg-op.h
    M tcg/tcg-runtime.h
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: Add atomic helpers

Add all of cmpxchg, op_fetch, fetch_op, and xchg.
Handle both endian-ness, and sizes up to 8.
Handle expanding non-atomically, when emulating in serial.

Reviewed-by: Emilio G. Cota <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 7ebee43ee3e2fcd7b5063058b7ef74bc43216733
      
https://github.com/qemu/qemu/commit/7ebee43ee3e2fcd7b5063058b7ef74bc43216733
  Author: Richard Henderson <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M atomic_template.h
    M configure
    M cputlb.c
    M include/qemu/int128.h
    M tcg-runtime.c
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: Add atomic128 helpers

Force the use of cmpxchg16b on x86_64.

Wikipedia suggests that only very old AMD64 (circa 2004) did not have
this instruction.  Further, it's required by Windows 8 so no new cpus
will ever omit it.

If we truely care about these, then we could check this at startup time
and then avoid executing paths that use it.

Reviewed-by: Emilio G. Cota <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: df79b996a7b21c6ea7847f7927a2e1a294b86c72
      
https://github.com/qemu/qemu/commit/df79b996a7b21c6ea7847f7927a2e1a294b86c72
  Author: Richard Henderson <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M configure
    M cputlb.c
    M tcg-runtime.c
    M tcg/tcg-op.c
    M tcg/tcg-runtime.h
    M tcg/tcg.h

  Log Message:
  -----------
  tcg: Add CONFIG_ATOMIC64

Allow qemu to build on 32-bit hosts without 64-bit atomic ops.

Even if we only allow 32-bit hosts to multi-thread emulate 32-bit
guests, we still need some way to handle the 32-bit guest using a
64-bit atomic operation.  Do so by dropping back to single-step.

Reviewed-by: Emilio G. Cota <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 91682118aa330aff7e8ef0cc685c32d101f49940
      
https://github.com/qemu/qemu/commit/91682118aa330aff7e8ef0cc685c32d101f49940
  Author: Richard Henderson <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M tcg/tcg-op.c

  Log Message:
  -----------
  tcg: Emit barriers with parallel_cpus

Reviewed-by: Emilio G. Cota <address@hidden>
Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: ae03f8de45427042ecd10b0941a005f21ecc064c
      
https://github.com/qemu/qemu/commit/ae03f8de45427042ecd10b0941a005f21ecc064c
  Author: Emilio G. Cota <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M target-i386/helper.h
    M target-i386/mem_helper.c
    M target-i386/translate.c

  Log Message:
  -----------
  target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers

The diff here is uglier than necessary. All this does is to turn

FOO

into:

if (s->prefix & PREFIX_LOCK) {
  BAR
} else {
  FOO
}

where FOO is the original implementation of an unlocked cmpxchg.

[rth: Adjust unlocked cmpxchg to use movcond instead of branches.
Adjust helpers to use atomic helpers.]

Signed-off-by: Emilio G. Cota <address@hidden>
Message-Id: <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: a7cee522f3529c2fc85379237b391ea98823271e
      
https://github.com/qemu/qemu/commit/a7cee522f3529c2fc85379237b391ea98823271e
  Author: Emilio G. Cota <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M target-i386/translate.c

  Log Message:
  -----------
  target-i386: emulate LOCK'ed OP instructions using atomic helpers

[rth: Eliminate some unnecessary temporaries.]

Signed-off-by: Emilio G. Cota <address@hidden>
Message-Id: <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 60e573462fcdb83aa1a41e66a9f31dc8a4364399
      
https://github.com/qemu/qemu/commit/60e573462fcdb83aa1a41e66a9f31dc8a4364399
  Author: Emilio G. Cota <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M target-i386/translate.c

  Log Message:
  -----------
  target-i386: emulate LOCK'ed INC using atomic helper

[rth: Merge gen_inc_locked back into gen_inc to share cc update.]

Signed-off-by: Emilio G. Cota <address@hidden>
Message-Id: <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 2a5fe8ae145ef7a3ab480922116d27efcc97b85d
      
https://github.com/qemu/qemu/commit/2a5fe8ae145ef7a3ab480922116d27efcc97b85d
  Author: Emilio G. Cota <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M target-i386/translate.c

  Log Message:
  -----------
  target-i386: emulate LOCK'ed NOT using atomic helper

[rth: Avoid qemu_load that's redundant with the atomic op.]

Signed-off-by: Emilio G. Cota <address@hidden>
Message-Id: <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 8eb8c7385608b99bed6055a22d897ff727a6cb8e
      
https://github.com/qemu/qemu/commit/8eb8c7385608b99bed6055a22d897ff727a6cb8e
  Author: Emilio G. Cota <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M target-i386/translate.c

  Log Message:
  -----------
  target-i386: emulate LOCK'ed NEG using cmpxchg helper

[rth: Move redundant qemu_load out of cmpxchg loop.]

Signed-off-by: Emilio G. Cota <address@hidden>
Message-Id: <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: f53b01817f95781d2bcc8a82e057d1416601e13b
      
https://github.com/qemu/qemu/commit/f53b01817f95781d2bcc8a82e057d1416601e13b
  Author: Emilio G. Cota <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M target-i386/translate.c

  Log Message:
  -----------
  target-i386: emulate LOCK'ed XADD using atomic helper

[rth: Move load of reg value to common location.]

Signed-off-by: Emilio G. Cota <address@hidden>
Message-Id: <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: cfe819d309d472f75fd129faf1d1064a2498326c
      
https://github.com/qemu/qemu/commit/cfe819d309d472f75fd129faf1d1064a2498326c
  Author: Emilio G. Cota <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M target-i386/translate.c

  Log Message:
  -----------
  target-i386: emulate LOCK'ed BTX ops using atomic helpers

[rth: Avoid redundant qemu_ld in locked case.  Fix previously unnoticed
incorrect zero-extension of address in register-offset case.]

Signed-off-by: Emilio G. Cota <address@hidden>
Message-Id: <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: ea97ebe89f7a879ea9aba90140e40c29b5cbd653
      
https://github.com/qemu/qemu/commit/ea97ebe89f7a879ea9aba90140e40c29b5cbd653
  Author: Emilio G. Cota <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M target-i386/translate.c

  Log Message:
  -----------
  target-i386: emulate XCHG using atomic helper

Signed-off-by: Emilio G. Cota <address@hidden>
Message-Id: <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 37b995f6e7a1cb6fa378c5cd4217b9dd9e1fc98b
      
https://github.com/qemu/qemu/commit/37b995f6e7a1cb6fa378c5cd4217b9dd9e1fc98b
  Author: Emilio G. Cota <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M target-i386/helper.h
    M target-i386/mem_helper.c
    M target-i386/translate.c

  Log Message:
  -----------
  target-i386: remove helper_lock()

It's been superseded by the atomic helpers.

The use of the atomic helpers provides a significant performance and scalability
improvement. Below is the result of running the atomic_add-test microbenchmark 
with:
 $ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n
, where $n is the number of threads and $r is the allowed range for the 
additions.

The scenarios measured are:
- atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset)
- cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper
- master: before this patchset

Results sorted in ascending range, i.e. descending degree of contention.
Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64
Opteron 6376 cores.
           atomic_add-bench: 5000000 ops/thread, [0,1] range

  25 ++---------+----------+---------+----------+----------+----------+---++
     + atomic +-E--+       +         +          +          +          +    |
     |cmpxchg +-H--+                                                       |
  20 +Emaster +-N--+                                                      ++
     ||                                                                    |
     |++                                                                   |
     ||                                                                    |
  15 +++                                                                  ++
     |N|                                                                   |
     |+|                                                                   |
  10 ++|                                                                  ++
     |+|+                                                                  |
     | |    -+E+------        +++  ---+E+------+E+------+E+-----+E+------+E|
     |+E+E+- +++     +E+------+E+--                                        |
   5 ++|+                                                                 ++
     |+N+H+---                                 +++                         |
     ++++N+--+H++----+++   +  +++  --++H+------+H+------+H++----+H+---+--- |
   0 ++---------+-----H----+---H-----+----------+----------+----------+---H+
     0          10         20        30         40         50         60
                          Number of threads
           atomic_add-bench: 5000000 ops/thread, [0,2] range

  25 ++---------+----------+---------+----------+----------+----------+---++
     ++atomic +-E--+       +         +          +          +          +    |
     |cmpxchg +-H--+                                                       |
  20 ++master +-N--+                                                      ++
     |E|                                                                   |
     |++                                                                   |
     ||E                                                                   |
  15 ++|                                                                  ++
     |N||                                                                  |
     |+||                                   ---+E+------+E+-----+E+------+E|
  10 ++| |        ---+E+------+E+-----+E+---                    +++      +++
     ||H+E+--+E+--                                                         |
     |+++++                                                                |
     | ||                                                                  |
   5 ++|+H+--                                  +++                        ++
     |+N+    -                              ---+H+------+H+------          |
     +  +N+--+H++----+H+---+--+H+----++H+---    +          +    +H+---+--+H|
   0 ++---------+----------+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                          Number of threads
           atomic_add-bench: 5000000 ops/thread, [0,8] range

  40 ++---------+----------+---------+----------+----------+----------+---++
     ++atomic +-E--+       +         +          +          +          +    |
  35 +cmpxchg +-H--+                                                      ++
     | master +-N--+               ---+E+------+E+------+E+-----+E+------+E|
  30 ++|                   ---+E+--   +++                                 ++
     | |            -+E+---                                                |
  25 ++E        ---- +++                                                  ++
     |+++++ -+E+                                                           |
  20 +E+ E-- +++                                                          ++
     |H|+++                                                                |
     |+|                                       +H+-------                  |
  15 ++H+                                   ---+++      +H+------         ++
     |N++H+--                         +++---                    +H+------++|
  10 ++ +++  -       +++           ---+H+                       +++      +H+
     | |     +H+-----+H+------+H+--                                        |
   5 ++|                      +++                                         ++
     ++N+N+--+N++          +         +          +          +          +    |
   0 ++---------+----------+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                          Number of threads
          atomic_add-bench: 5000000 ops/thread, [0,128] range

  160 ++---------+---------+----------+---------+----------+----------+---++
      + atomic +-E--+      +          +         +          +          +    |
  140 +cmpxchg +-H--+                          +++      +++               ++
      | master +-N--+                           E--------E------+E+------++|
  120 ++                                      --|        |      +++       E+
      |                                     -- +++      +++              ++|
  100 ++                                   -                              ++
      |                                +++-                     +++      ++|
   80 ++                              -+E+    -+H+------+H+------H--------++
      |                           ----    ----                  +++       H|
      |            ---+E+-----+E+-  ---+H+                               ++|
   60 ++     +E+---   +++  ---+H+---                                      ++
      |    --+++   ---+H+--                                                |
   40 ++ +E+-+H+---                                                       ++
      |  +H+                                                               |
   20 +EE+                                                                ++
      +N+        +         +          +         +          +          +    |
    0 ++N-N---N--+---------+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                          Number of threads
         atomic_add-bench: 5000000 ops/thread, [0,1024] range

  350 ++---------+---------+----------+---------+----------+----------+---++
      + atomic +-E--+      +          +         +          +          +    |
  300 +cmpxchg +-H--+                                                    +++
      | master +-N--+                                           +++       ||
      |                                                 +++      |    ----E|
  250 ++                                                 |   ----E----    ++
      |                                              ----E---    |    ---+H|
  200 ++                                      -+E+---   +++  ---+H+---    ++
      |                                   ----         -+H+--              |
      |                                +E+     +++ ---- +++                |
  150 ++                            ---+++  ---+H+-                       ++
      |                          ---  -+H+--                               |
  100 ++                   ---+E+ ---- +++                                ++
      |      +++   ---+E+-----+H+-                                         |
      |     -+E+------+H+--                                                |
   50 ++ +E+                                                              ++
      +EE+       +         +          +         +          +          +    |
    0 ++N-N---N--+---------+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                          Number of threads

  hi-res: http://imgur.com/a/fMRmq

For master I stopped measuring master after 8 threads, because there is little
point in measuring the well-known performance collapse of a contended lock.

Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Message-Id: <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 070e3edceaa023109bfa7a1c5c259342e0b6b625
      
https://github.com/qemu/qemu/commit/070e3edceaa023109bfa7a1c5c259342e0b6b625
  Author: Emilio G. Cota <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M tests/.gitignore
    M tests/Makefile.include
    A tests/atomic_add-bench.c

  Log Message:
  -----------
  tests: add atomic_add-bench

With this microbenchmark we can measure the overhead of emulating atomic
instructions with a configurable degree of contention.

The benchmark spawns $n threads, each performing $o atomic ops (additions)
in a loop. Each atomic operation is performed on a different cache line
(assuming lines are 64b long) that is randomly selected from a range [0, $r).

[ Note: each $foo corresponds to a -foo flag ]

Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Message-Id: <address@hidden>


  Commit: 7f5616f53896a4e08ad37de3ac50d3a4cc8eff7a
      
https://github.com/qemu/qemu/commit/7f5616f53896a4e08ad37de3ac50d3a4cc8eff7a
  Author: Richard Henderson <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M target-arm/translate.c

  Log Message:
  -----------
  target-arm: Rearrange aa32 load and store functions

Stop specializing on TARGET_LONG_BITS == 32; unconditionally allocate
a temp and expand with tcg_gen_extu_i32_tl.  Split out gen_aa32_addr,
gen_aa32_frob64, gen_aa32_ld_i32 and gen_aa32_st_i32 as separate interfaces.

Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 354161b37c6465a32073eac5f16fa35939af2bb4
      
https://github.com/qemu/qemu/commit/354161b37c6465a32073eac5f16fa35939af2bb4
  Author: Emilio G. Cota <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M target-arm/translate.c

  Log Message:
  -----------
  target-arm: emulate LL/SC using cmpxchg helpers

Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. Portable parallel code, however,
is written assuming only cmpxchg--and not LL/SC--is available.
This means that in practice emulating LL/SC with cmpxchg is
a viable alternative.

The appended emulates LL/SC pairs in ARM with cmpxchg helpers.
This works in both user and system mode. In usermode, it avoids
pausing all other CPUs to perform the LL/SC pair. The subsequent
performance and scalability improvement is significant, as the
plots below show. They plot the throughput of atomic_add-bench
compiled for ARM and executed on a 64-core x86 machine.

Hi-res plots: http://imgur.com/a/aNQpB
          atomic_add-bench: 1000000 ops/thread, [0,1] range

  9 ++---------+----------+----------+----------+----------+----------+---++
    +cmpxchg +-E--+       +          +          +          +          +    |
  8 +Emaster +-H--+                                                       ++
    | |                                                                    |
  7 ++E                                                                   ++
    | |                                                                    |
  6 ++++                                                                  ++
    |  |                                                                   |
  5 ++ |                                                                  ++
  4 ++ |                                                                  ++
    |  |                                                                   |
  3 ++ |                                                                  ++
    |   |                                                                  |
  2 ++  |                                                                 ++
    |H++E+---                                  +++  ---+E+------+E+------+E|
  1 +++     +E+-----+E+------+E+------+E+------+E+--   +++      +++       ++
    ++H+       +    +++   +  +++     ++++       +          +          +    |
  0 ++--H----H-+-----H----+----------+----------+----------+----------+---++
    0          10         20         30         40         50         60
                         Number of threads
           atomic_add-bench: 1000000 ops/thread, [0,2] range

  16 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  14 ++master +-H--+                                                      ++
     | |                                                                   |
  12 ++|                                                                  ++
     | E                                                                   |
  10 ++|                                                                  ++
     | |                                                                   |
   8 ++++                                                                 ++
     |E+|                                                                  |
     |  |                                                                  |
   6 ++ |                                                                 ++
     |   |                                                                 |
   4 ++  |                                                                ++
     |  +E+---       +++      +++              +++           ---+E+------+E|
   2 +H+     +E+------E-------+E+-----+E+------+E+------+E+--            +++
     + |        +    +++   +         ++++       +          +          +    |
   0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                          Number of threads
          atomic_add-bench: 1000000 ops/thread, [0,128] range

  70 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +       ++++          +    |
  60 ++master +-H--+                                 ----E------+E+-------++
     |                                        -+E+---   +++     +++      +E|
     |                                +++ ---- +++                       ++|
  50 ++                       +++  ---+E+-                                ++
     |                        -E---                                        |
  40 ++                    ---+++                                         ++
     |               +++---                                                |
     |              -+E+                                                   |
  30 ++      +++----                                                      ++
     |       +E+                                                           |
  20 ++ +++--                                                             ++
     |  +E+                                                                |
     |+E+                                                                  |
  10 +E+                                                                  ++
     +          +          +         +          +          +          +    |
   0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                          Number of threads
         atomic_add-bench: 1000000 ops/thread, [0,1024] range

  120 ++---------+---------+----------+---------+----------+----------+---++
      +cmpxchg +-E--+      +          +         +          +          +    |
      | master +-H--+                                                    ++|
  100 ++                                                              ----E+
      |                                                 +++  ---+E+---   ++|
      |                                                --E---   +++        |
   80 ++                                           ---- +++               ++
      |                                     ---+E+-                        |
   60 ++                              -+E+--                              ++
      |                       +++ ---- +++                                 |
      |                      -+E+-                                         |
   40 ++              +++----                                             ++
      |      +++   ---+E+                                                  |
      |     -+E+---                                                        |
   20 ++ +E+                                                              ++
      |+E+++                                                               |
      +E+        +         +          +         +          +          +    |
    0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                          Number of threads

[rth: Enforce alignment for ldrexd.]

Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Message-Id: <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: cf12bce088f22b92bf62ffa0d7f6a3e951e355a9
      
https://github.com/qemu/qemu/commit/cf12bce088f22b92bf62ffa0d7f6a3e951e355a9
  Author: Emilio G. Cota <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M target-arm/translate.c

  Log Message:
  -----------
  target-arm: emulate SWP with atomic_xchg helper

Signed-off-by: Emilio G. Cota <address@hidden>
Message-Id: <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: 1dd089d0eec060dcd8478735114d98421d414805
      
https://github.com/qemu/qemu/commit/1dd089d0eec060dcd8478735114d98421d414805
  Author: Emilio G. Cota <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M target-arm/helper-a64.c
    M target-arm/helper-a64.h
    M target-arm/translate-a64.c

  Log Message:
  -----------
  target-arm: emulate aarch64's LL/SC using cmpxchg helpers

Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem. Portable parallel code, however,
is written assuming only cmpxchg--and not LL/SC--is available.
This means that in practice emulating LL/SC with cmpxchg is
a viable alternative.

The appended emulates LL/SC pairs in aarch64 with cmpxchg helpers.
This works in both user and system mode. In usermode, it avoids
pausing all other CPUs to perform the LL/SC pair. The subsequent
performance and scalability improvement is significant, as the
plots below show. They plot the throughput of atomic_add-bench
compiled for ARM and executed on a 64-core x86 machine.

Hi-res plots: http://imgur.com/a/JVc8Y
           atomic_add-bench: 1000000 ops/thread, [0,1] range

  18 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  16 ++master +-H--+                                                      ++
     ||                                                                    |
  14 ++                                                                   ++
     | |                                                                   |
  12 ++|                                                                  ++
     | |                                                                   |
  10 ++++                                                                 ++
   8 ++E                                                                  ++
     |+++                                                                  |
   6 ++ |                                                                 ++
     |  |                                                                  |
   4 ++ |                                                                 ++
     |   |                                                                 |
   2 +H++E+---                                                            ++
     + |     +E++----+E+---+--+E+----++E+------+E+------+E++----+E+---+--+E|
   0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                          Number of threads
           atomic_add-bench: 1000000 ops/thread, [0,2] range

  18 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  16 ++master +-H--+                                                      ++
     | |                                                                   |
  14 ++E                                                                  ++
     | |                                                                   |
  12 ++|                                                                  ++
     |+++                                                                  |
  10 ++ |                                                                 ++
   8 ++ |                                                                 ++
     |  |                                                                  |
   6 ++ |                                                                 ++
     |   |                                                                 |
   4 ++  |                                                                ++
     |  +E+---                                                             |
   2 +H+     +E+-----+++              +++      +++   ---+E+-----+E+------+++
     +++        +    +E+---+--+E+----++E+------+E+---   ++++    +++   +  +E|
   0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                          Number of threads
          atomic_add-bench: 1000000 ops/thread, [0,128] range

  70 ++---------+----------+---------+----------+----------+----------+---++
     +cmpxchg +-E--+       +         +          +          +          +    |
  60 ++master +-H--+                  +++            ---+E+-----+E+------+E+
     |                        +E+------E-------+E+---                      |
     |                     ---        +++                                  |
  50 ++              +++---                                               ++
     |              -+E+                                                   |
  40 ++      +++----                                                      ++
     |        E-                                                           |
     |      --|                                                            |
  30 ++   -- +++                                                          ++
     |  +E+                                                                |
  20 ++E+                                                                 ++
     |E+                                                                   |
     |                                                                     |
  10 ++                                                                   ++
     +          +          +         +          +          +          +    |
   0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                          Number of threads
         atomic_add-bench: 1000000 ops/thread, [0,1024] range

  160 ++---------+---------+----------+---------+----------+----------+---++
      +cmpxchg +-E--+      +          +         +          +          +    |
  140 ++master +-H--+                                           +++      +++
      |                                                -+E+-----+E+-------E|
  120 ++                                       +++ ----                  +++
      |                                +++  ----E--                        |
  100 ++                              --E---   +++                        ++
      |                       +++ ---- +++                                 |
   80 ++                     --E--                                        ++
      |                  ---- +++                                          |
      |              -+E+                                                  |
   60 ++         ---- +++                                                 ++
      |      +E+-                                                          |
   40 ++   --                                                             ++
      |  +E+                                                               |
   20 +EE+                                                                ++
      +++        +         +          +         +          +          +    |
    0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                          Number of threads

[rth: Rearrange 128-bit cmpxchg helper.  Enforce alignment on LL.]

Signed-off-by: Emilio G. Cota <address@hidden>
Message-Id: <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>


  Commit: b50b82fc486a227c27481d60c75c5ee7cc282028
      
https://github.com/qemu/qemu/commit/b50b82fc486a227c27481d60c75c5ee7cc282028
  Author: Emilio G. Cota <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M linux-user/main.c

  Log Message:
  -----------
  linux-user: remove handling of ARM's EXCP_STREX

The exception is not emitted anymore.

Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Message-Id: <address@hidden>


  Commit: f4e6eb7ffeefb3f2e9fff0bbe5eb7c9962c31dcd
      
https://github.com/qemu/qemu/commit/f4e6eb7ffeefb3f2e9fff0bbe5eb7c9962c31dcd
  Author: Emilio G. Cota <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M linux-user/main.c

  Log Message:
  -----------
  linux-user: remove handling of aarch64's EXCP_STREX

The exception is not emitted anymore.

Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Message-Id: <address@hidden>


  Commit: 05188cc72f0399e99c92f608a8e7ca4c8e552c4b
      
https://github.com/qemu/qemu/commit/05188cc72f0399e99c92f608a8e7ca4c8e552c4b
  Author: Emilio G. Cota <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M target-arm/cpu.h
    M target-arm/internals.h
    M target-arm/translate.c
    M target-arm/translate.h

  Log Message:
  -----------
  target-arm: remove EXCP_STREX + cpu_exclusive_{test, info}

The exception is not emitted anymore; remove it and the associated
TCG variables.

Reviewed-by: Alex Bennée <address@hidden>
Signed-off-by: Emilio G. Cota <address@hidden>
Signed-off-by: Richard Henderson <address@hidden>
Message-Id: <address@hidden>


  Commit: 6a73ecf5cfcd39b7afb5d6a24174730eac49d4b5
      
https://github.com/qemu/qemu/commit/6a73ecf5cfcd39b7afb5d6a24174730eac49d4b5
  Author: Richard Henderson <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M target-alpha/cpu.h
    M target-alpha/helper.c
    M target-alpha/helper.h
    M target-alpha/mem_helper.c
    M target-alpha/translate.c

  Log Message:
  -----------
  target-alpha: Introduce MMU_PHYS_IDX

Rather than using helpers for physical accesses, use a mmu index.
The primary cleanup is with store-conditional on physical addresses.

Signed-off-by: Richard Henderson <address@hidden>


  Commit: ed2839166c21e001d15868f4d9591a21aaebd547
      
https://github.com/qemu/qemu/commit/ed2839166c21e001d15868f4d9591a21aaebd547
  Author: Richard Henderson <address@hidden>
  Date:   2016-10-26 (Wed, 26 Oct 2016)

  Changed paths:
    M linux-user/main.c
    M target-alpha/cpu.h
    M target-alpha/helper.c
    M target-alpha/machine.c
    M target-alpha/translate.c

  Log Message:
  -----------
  target-alpha: Emulate LL/SC using cmpxchg helpers

Emulating LL/SC with cmpxchg is not correct, since it can
suffer from the ABA problem.  However, portable parallel
code is written assuming only cmpxchg which means that in
practice this is a viable alternative.

Signed-off-by: Richard Henderson <address@hidden>


  Commit: 5929d7e8a0e1f43333bc3528b50397ae8dd0fd6b
      
https://github.com/qemu/qemu/commit/5929d7e8a0e1f43333bc3528b50397ae8dd0fd6b
  Author: Peter Maydell <address@hidden>
  Date:   2016-10-27 (Thu, 27 Oct 2016)

  Changed paths:
    M Makefile.objs
    M Makefile.target
    A atomic_template.h
    M configure
    M cpu-exec-common.c
    M cpu-exec.c
    M cpus.c
    M cputlb.c
    M exec.c
    M include/exec/cpu-all.h
    M include/exec/exec-all.h
    M include/qemu-common.h
    M include/qemu/atomic.h
    M include/qemu/int128.h
    M linux-user/main.c
    M linux-user/syscall.c
    M softmmu_template.h
    M target-alpha/cpu.h
    M target-alpha/helper.c
    M target-alpha/helper.h
    M target-alpha/machine.c
    M target-alpha/mem_helper.c
    M target-alpha/translate.c
    M target-arm/cpu.h
    M target-arm/helper-a64.c
    M target-arm/helper-a64.h
    M target-arm/internals.h
    M target-arm/translate-a64.c
    M target-arm/translate.c
    M target-arm/translate.h
    M target-i386/helper.h
    M target-i386/mem_helper.c
    M target-i386/translate.c
    M tcg-runtime.c
    M tcg/tcg-op.c
    M tcg/tcg-op.h
    M tcg/tcg-runtime.h
    M tcg/tcg.h
    M tests/.gitignore
    M tests/Makefile.include
    A tests/atomic_add-bench.c
    M tests/test-int128.c
    M translate-all.c

  Log Message:
  -----------
  Merge remote-tracking branch 'remotes/rth/tags/pull-atomic-20161026' into 
staging

cmpxchg emulation of atomics, v8

# gpg: Signature made Wed 26 Oct 2016 16:30:03 BST
# gpg:                using RSA key 0xAD1270CC4DD0279B
# gpg: Good signature from "Richard Henderson <address@hidden>"
# gpg:                 aka "Richard Henderson <address@hidden>"
# gpg:                 aka "Richard Henderson <address@hidden>"
# Primary key fingerprint: 9CB1 8DDA F8E8 49AD 2AFC  16A4 AD12 70CC 4DD0 279B

* remotes/rth/tags/pull-atomic-20161026: (37 commits)
  target-alpha: Emulate LL/SC using cmpxchg helpers
  target-alpha: Introduce MMU_PHYS_IDX
  target-arm: remove EXCP_STREX + cpu_exclusive_{test, info}
  linux-user: remove handling of aarch64's EXCP_STREX
  linux-user: remove handling of ARM's EXCP_STREX
  target-arm: emulate aarch64's LL/SC using cmpxchg helpers
  target-arm: emulate SWP with atomic_xchg helper
  target-arm: emulate LL/SC using cmpxchg helpers
  target-arm: Rearrange aa32 load and store functions
  tests: add atomic_add-bench
  target-i386: remove helper_lock()
  target-i386: emulate XCHG using atomic helper
  target-i386: emulate LOCK'ed BTX ops using atomic helpers
  target-i386: emulate LOCK'ed XADD using atomic helper
  target-i386: emulate LOCK'ed NEG using cmpxchg helper
  target-i386: emulate LOCK'ed NOT using atomic helper
  target-i386: emulate LOCK'ed INC using atomic helper
  target-i386: emulate LOCK'ed OP instructions using atomic helpers
  target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers
  tcg: Emit barriers with parallel_cpus
  ...

Signed-off-by: Peter Maydell <address@hidden>


Compare: https://github.com/qemu/qemu/compare/8f9d84df97a3...5929d7e8a0e1

reply via email to

[Prev in Thread] Current Thread [Next in Thread]