qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] target/ppc: Fix for optimized vsl/vsr instructions


From: Aleksandar Markovic
Subject: Re: [PATCH] target/ppc: Fix for optimized vsl/vsr instructions
Date: Thu, 03 Oct 2019 16:31:59 +0200
User-agent: SOGoMail 2.3.10

Hello, Stefan,
 
In previous implementation, invocation of TCG shift function could request
shift of TCG variable by 64 bits when variable "sh" is 0, which is not
supported in TCG (values can be shifted by 0 to 63 bits). This patch fixes
this by using two separate invocation of TCG shift functions, with maximum
shift amount of 32.

Also, name of variable "shifted" is changed to "carry" so variable naming
is similar to old helper implementation.

Signed-off-by: Stefan Brankovic <address@hidden>
---
target/ppc/translate/vmx-impl.inc.c | 40 ++++++++++++++++++-------------------
1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index 2472a52..f5006b1 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -593,37 +593,37 @@ static void trans_vsl(DisasContext *ctx)
TCGv_i64 avrA = tcg_temp_new_i64();
TCGv_i64 avrB = tcg_temp_new_i64();
TCGv_i64 sh = tcg_temp_new_i64();
- TCGv_i64 shifted = tcg_temp_new_i64();
+ TCGv_i64 carry = tcg_temp_new_i64();
TCGv_i64 tmp = tcg_temp_new_i64();

- /* Place bits 125-127 of vB in sh. */
+ /* Place bits 125-127 of vB in "sh". */
get_avr64(avrB, VB, false);
tcg_gen_andi_i64(sh, avrB, 0x07ULL);

/*
- * Save highest sh bits of lower doubleword element of vA in variable
- * shifted and perform shift on lower doubleword.
+ * Save highest "sh" bits of lower doubleword element of vA in variable
+ * "carry" and perform shift on lower doubleword.
*/
get_avr64(avrA, VA, false);
- tcg_gen_subfi_i64(tmp, 64, sh);
- tcg_gen_shr_i64(shifted, avrA, tmp);
- tcg_gen_andi_i64(shifted, shifted, 0x7fULL);
+ tcg_gen_subfi_i64(tmp, 32, sh);
+ tcg_gen_shri_i64(carry, avrA, 32);
+ tcg_gen_shr_i64(carry, carry, tmp);
tcg_gen_shl_i64(avrA, avrA, sh);
set_avr64(VT, avrA, false);

/*
* Perform shift on higher doubleword element of vA and replace lowest
- * sh bits with shifted.
+ * "sh" bits with "carry".
*/
get_avr64(avrA, VA, true);
tcg_gen_shl_i64(avrA, avrA, sh);
- tcg_gen_or_i64(avrA, avrA, shifted);
+ tcg_gen_or_i64(avrA, avrA, carry);
set_avr64(VT, avrA, true);

tcg_temp_free_i64(avrA);
tcg_temp_free_i64(avrB);
tcg_temp_free_i64(sh);
- tcg_temp_free_i64(shifted);
+ tcg_temp_free_i64(carry);
tcg_temp_free_i64(tmp);
}

@@ -642,36 +642,36 @@ static void trans_vsr(DisasContext *ctx)
TCGv_i64 avrA = tcg_temp_new_i64();
TCGv_i64 avrB = tcg_temp_new_i64();
TCGv_i64 sh = tcg_temp_new_i64();
- TCGv_i64 shifted = tcg_temp_new_i64();
+ TCGv_i64 carry = tcg_temp_new_i64();
TCGv_i64 tmp = tcg_temp_new_i64();

- /* Place bits 125-127 of vB in sh. */
+ /* Place bits 125-127 of vB in "sh". */
get_avr64(avrB, VB, false);
tcg_gen_andi_i64(sh, avrB, 0x07ULL);

/*
- * Save lowest sh bits of higher doubleword element of vA in variable
- * shifted and perform shift on higher doubleword.
+ * Save lowest "sh" bits of higher doubleword element of vA in variable
+ * "carry" and perform shift on higher doubleword.
*/
get_avr64(avrA, VA, true);
- tcg_gen_subfi_i64(tmp, 64, sh);
- tcg_gen_shl_i64(shifted, avrA, tmp);
- tcg_gen_andi_i64(shifted, shifted, 0xfe00000000000000ULL);
+ tcg_gen_subfi_i64(tmp, 32, sh);
+ tcg_gen_shli_i64(carry, avrA, 32);
+ tcg_gen_shl_i64(carry, carry, tmp);
tcg_gen_shr_i64(avrA, avrA, sh);
set_avr64(VT, avrA, true);
/*
* Perform shift on lower doubleword element of vA and replace highest
- * sh bits with shifted.
+ * "sh" bits with "carry".
*/
get_avr64(avrA, VA, false);
tcg_gen_shr_i64(avrA, avrA, sh);
- tcg_gen_or_i64(avrA, avrA, shifted);
+ tcg_gen_or_i64(avrA, avrA, carry);
set_avr64(VT, avrA, false);

tcg_temp_free_i64(avrA);
tcg_temp_free_i64(avrB);
tcg_temp_free_i64(sh);
- tcg_temp_free_i64(shifted);
+ tcg_temp_free_i64(carry);
tcg_temp_free_i64(tmp);
}

--
2.7.4
 

It looks to me that you do not need two variables 'avrA' and 'avrB' for manipulating source registers. That is because 'avrB' is used in the first stage of the handler, and 'avrA' in the second (non-overlapping) stage. So, one variable called let's say, 'avr' would be sufficient.

For similar reasons, variable 'tmp' is not needed, afaict. The code segment:

tcg_gen_subfi_i64(tmp, 32, sh);
tcg_gen_shli_i64(carry, avrA, 32);
tcg_gen_shl_i64(carry, carry, tmp);

could become:

tcg_gen_subfi_i64(sh, 32, sh);
tcg_gen_shli_i64(carry, avr, 32);
tcg_gen_shl_i64(carry, carry, sh);

and variable 'tmp' deleted.

The two variable eliminations decribed above should produce a small, but measurable performance gain.

Also, I see you used double quatation (") around instruction mnemonics (for example, "sh"). I think it is more common to use single quoutation mark (') in these cases.

Yours,
Aleksandar







 
reply via email to

[Prev in Thread] Current Thread [Next in Thread]