[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH 07/10] target/arm: Implement v8.1M low-overhead-loop instructions
From: |
Peter Maydell |
Subject: |
[PATCH 07/10] target/arm: Implement v8.1M low-overhead-loop instructions |
Date: |
Mon, 12 Oct 2020 16:37:43 +0100 |
v8.1M's "low-overhead-loop" extension has three instructions
for looping:
* DLS (start of a do-loop)
* WLS (start of a while-loop)
* LE (end of a loop)
The loop-start instructions are both simple operations to start a
loop whose iteration count (if any) is in LR. The loop-end
instruction handles "decrement iteration count and jump back to loop
start"; it also caches the information about the branch back to the
start of the loop to improve performance of the branch on subsequent
iterations.
As with the branch-future instructions, the architecture permits an
implementation to discard the LO_BRANCH_INFO cache at any time, and
QEMU takes the IMPDEF option to never set it in the first place
(equivalent to discarding it immediately), because for us a "real"
implementation would be unnecessary complexity.
(This implementation only provides the simple looping constructs; the
vector extension MVE (Helium) adds some extra variants to handle
looping across vectors. We'll add those later when we implement
MVE.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/t32.decode | 8 +++++
target/arm/translate.c | 74 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 82 insertions(+)
diff --git a/target/arm/t32.decode b/target/arm/t32.decode
index 3015731a8d0..8152739b52b 100644
--- a/target/arm/t32.decode
+++ b/target/arm/t32.decode
@@ -659,4 +659,12 @@ BL 1111 0. .......... 11.1 ............
@branch24
BF 1111 0 boff:4 10 ----- 1110 - ---------- 1 # BF
BF 1111 0 boff:4 11 ----- 1110 0 0000000000 1 # BFX, BFLX
]
+ [
+ # LE and WLS immediate
+ %lob_imm 1:10 11:1 !function=times_2
+
+ DLS 1111 0 0000 100 rn:4 1110 0000 0000 0001
+ WLS 1111 0 0000 100 rn:4 1100 . .......... 1 imm=%lob_imm
+ LE 1111 0 0000 0 f:1 0 1111 1100 . .......... 1 imm=%lob_imm
+ ]
}
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 9e72d719c6f..742c219c071 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7953,6 +7953,80 @@ static bool trans_BF(DisasContext *s, arg_BF *a)
return true;
}
+static bool trans_DLS(DisasContext *s, arg_DLS *a)
+{
+ /* M-profile low-overhead loop start */
+ TCGv_i32 tmp;
+
+ if (!dc_isar_feature(aa32_lob, s)) {
+ return false;
+ }
+ if (a->rn == 13 || a->rn == 15) {
+ /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
+ return false;
+ }
+
+ /* Not a while loop, no tail predication: just set LR to the count */
+ tmp = load_reg(s, a->rn);
+ store_reg(s, 14, tmp);
+ return true;
+}
+
+static bool trans_WLS(DisasContext *s, arg_WLS *a)
+{
+ /* M-profile low-overhead while-loop start */
+ TCGv_i32 tmp;
+ TCGLabel *nextlabel;
+
+ if (!dc_isar_feature(aa32_lob, s)) {
+ return false;
+ }
+ if (a->rn == 13 || a->rn == 15) {
+ /* CONSTRAINED UNPREDICTABLE: we choose to UNDEF */
+ return false;
+ }
+
+ nextlabel = gen_new_label();
+ tcg_gen_brcondi_i32(TCG_COND_NE, cpu_R[a->rn], 0, nextlabel);
+ gen_jmp(s, read_pc(s) + a->imm);
+
+ gen_set_label(nextlabel);
+ tmp = load_reg(s, a->rn);
+ store_reg(s, 14, tmp);
+ gen_jmp(s, s->base.pc_next);
+ return true;
+}
+
+static bool trans_LE(DisasContext *s, arg_LE *a)
+{
+ /*
+ * M-profile low-overhead loop end. The architecture permits an
+ * implementation to discard the LO_BRANCH_INFO cache at any time,
+ * and we take the IMPDEF option to never set it in the first place
+ * (equivalent to always discarding it immediately), because for QEMU
+ * a "real" implementation would be complicated and wouldn't execute
+ * any faster.
+ */
+ TCGv_i32 tmp;
+
+ if (!dc_isar_feature(aa32_lob, s)) {
+ return false;
+ }
+
+ if (!a->f) {
+ /* Not loop-forever. If LR <= 1 this is the last loop: do nothing. */
+ arm_gen_condlabel(s);
+ tcg_gen_brcondi_i32(TCG_COND_LEU, cpu_R[14], 1, s->condlabel);
+ /* Decrement LR */
+ tmp = load_reg(s, 14);
+ tcg_gen_addi_i32(tmp, tmp, -1);
+ store_reg(s, 14, tmp);
+ }
+ /* Jump back to the loop start */
+ gen_jmp(s, read_pc(s) - a->imm);
+ return true;
+}
+
static bool op_tbranch(DisasContext *s, arg_tbranch *a, bool half)
{
TCGv_i32 addr, tmp;
--
2.20.1
- Re: [PATCH 04/10] target/arm: Make the t32 insn[25:23]=111 group non-overlapping, (continued)
[PATCH 10/10] target/arm: Fix writing to FPSCR.FZ16 on M-profile, Peter Maydell, 2020/10/12
[PATCH 05/10] target/arm: Don't allow BLX imm for M-profile, Peter Maydell, 2020/10/12
[PATCH 06/10] target/arm: Implement v8.1M branch-future insns (as NOPs), Peter Maydell, 2020/10/12
[PATCH 07/10] target/arm: Implement v8.1M low-overhead-loop instructions,
Peter Maydell <=
- Re: [PATCH 07/10] target/arm: Implement v8.1M low-overhead-loop instructions, Peter Maydell, 2020/10/12
- Re: [PATCH 07/10] target/arm: Implement v8.1M low-overhead-loop instructions, Richard Henderson, 2020/10/13
- Re: [PATCH 07/10] target/arm: Implement v8.1M low-overhead-loop instructions, Peter Maydell, 2020/10/13
- Re: [PATCH 07/10] target/arm: Implement v8.1M low-overhead-loop instructions, Richard Henderson, 2020/10/13
- Re: [PATCH 07/10] target/arm: Implement v8.1M low-overhead-loop instructions, Peter Maydell, 2020/10/13
Re: [PATCH 07/10] target/arm: Implement v8.1M low-overhead-loop instructions, Richard Henderson, 2020/10/13
[PATCH 08/10] target/arm: Fix has_vfp/has_neon ID reg squashing for M-profile, Peter Maydell, 2020/10/12