[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Rapp-dev] How-to vectorize functions in compute/generic (like rc_ty
From: |
Hans-Peter Nilsson |
Subject: |
Re: [Rapp-dev] How-to vectorize functions in compute/generic (like rc_type_bin_to_u8 using RC_VEC_SETMASKV) |
Date: |
Wed, 21 Mar 2012 07:13:59 +0100 |
> From: Hans-Peter Nilsson <address@hidden>
> Date: Mon, 19 Mar 2012 07:39:03 +0100
> There were no comments on the suggested new vector abstraction
> layer macros, so I took the liberty to change my mind.
> ...
> I also no
> longer think there should be separate binary-to- 16-bit-vector
> and 32-bit-vector conversion macros, but instead just
> sign-extension macros, extending from 8 to 16 and from 16 to 32.
> But that's for another day.
That day was today, or rather yesterday. The following backend-
macros are now implemented on the git branch
topic/macros-for-integral including support in SSE2, Altivec and
(just the RC_VEC_{ADD,SUB}{16,32} macros) SWAR. This is not
final; whatever backend-macros are needed for integral images
will be added and those not needed will be removed.
diff --git a/compute/backend/rc_vec_api.h b/compute/backend/rc_vec_api.h
index c91da0d..597c551 100644
--- a/compute/backend/rc_vec_api.h
+++ b/compute/backend/rc_vec_api.h
@@ -530,6 +530,54 @@ typedef arch_vector_t rc_vec_t;
#define RC_VEC_ADDS(dstv, srcv1, srcv2)
/**
+ * Non-saturating addition, uint16_t elements.
+ * Computes dstv = srcv1 + srcv2 for each 16-bit field in two's
+ * complement truncating arithmetic (wrapping around at zero without
+ * exceptions).
+ *
+ * @param dstv The output vector.
+ * @param srcv1 The first input vector.
+ * @param srcv2 The second input vector.
+ */
+#define RC_VEC_ADD16(dstv, srcv1, srcv2)
+
+/**
+ * Non-saturating subtraction, uint16_t elements.
+ * Computes dstv = srcv1 - srcv2 for each 16-bit field in two's
+ * complement truncating arithmetic (wrapping around at zero without
+ * exceptions).
+ *
+ * @param dstv The output vector.
+ * @param srcv1 The first input vector.
+ * @param srcv2 The second input vector.
+ */
+#define RC_VEC_SUB16(dstv, srcv1, srcv2)
+
+/**
+ * Non-saturating addition, uint32_t elements.
+ * Computes dstv = srcv1 + srcv2 for each 32-bit field in two's
+ * complement truncating arithmetic (wrapping around at zero without
+ * exceptions).
+ *
+ * @param dstv The output vector.
+ * @param srcv1 The first input vector.
+ * @param srcv2 The second input vector.
+ */
+#define RC_VEC_ADD32(dstv, srcv1, srcv2)
+
+/**
+ * Non-saturating subtraction, uint32_t elements.
+ * Computes dstv = srcv1 - srcv2 for each 32-bit field in two's
+ * complement truncating arithmetic (wrapping around at zero without
+ * exceptions).
+ *
+ * @param dstv The output vector.
+ * @param srcv1 The first input vector.
+ * @param srcv2 The second input vector.
+ */
+#define RC_VEC_SUB32(dstv, srcv1, srcv2)
+
+/**
* Average value, truncated.
* Computes dstv = (srcv1 + srcv2) >> 1 for each 8-bit field.
*
@@ -769,6 +817,71 @@ typedef arch_vector_t rc_vec_t;
/*
* -------------------------------------------------------------
+ * Type conversions
+ * -------------------------------------------------------------
+ */
+
+/**
+ * @name Type conversions
+ * @{
+ */
+
+/**
+ * Sign-extend 8-bit vector fields into 16-bit vector fields.
+ * The most significant bit of each 8-bit vector field is replicated
+ * into eight more significant bits and the result is stored into the
+ * corresponding 16-bit field in a vector pair ldstv and rdstv.
+ * N.B. the type of the vectors are still rc_vec_t.
+ *
+ * @param ldstv The left-most part of the output vector.
+ * @param rdstv The right-most part of the output vector.
+ * @param srcv The input vector.
+ */
+#define RC_VEC_8S16(ldstv, rdstv, srcv)
+
+/**
+ * Sign-extend 16-bit vector fields into 32-bit vector fields.
+ * The most significant bit of each 16-bit vector field is replicated
+ * into 16 more significant bits and the result is stored into the
+ * corresponding 32-bit field in a vector pair ldstv and rdstv.
+ * N.B. the type of the vectors are still rc_vec_t.
+ *
+ * @param ldstv The left-most part of the output vector.
+ * @param rdstv The right-most part of the output vector.
+ * @param srcv The input vector.
+ */
+#define RC_VEC_16S32(ldstv, rdstv, srcv)
+
+/**
+ * Zero-extend 8-bit vector fields into 16-bit vector fields.
+ * Each 8-bit vector field is extended by eight zero bits and the
+ * result is stored into the corresponding 16-bit field in a vector
+ * pair ldstv and rdstv. N.B. the data type of the vectors are still
+ * rc_vec_t.
+ *
+ * @param ldstv The left-most part of the output vector.
+ * @param rdstv The right-most part of the output vector.
+ * @param srcv The input vector.
+ */
+#define RC_VEC_8U16(ldstv, rdstv, srcv)
+
+/**
+ * Zero-extend 16-bit vector fields into 32-bit vector fields.
+ * Each 16-bit vector field is extended by 16 zero bits and the result
+ * is stored into the corresponding 32-bit field in a vector pair ldstv
+ * and rdstv. N.B. the data type of the vectors are still rc_vec_t.
+ *
+ * @param ldstv The left-most part of the output vector.
+ * @param rdstv The right-most part of the output vector.
+ * @param srcv The input vector.
+ */
+#define RC_VEC_16U32(ldstv, rdstv, srcv)
+
+/* @} */
+
+
+/*
+ * -------------------------------------------------------------
* Reductions
* -------------------------------------------------------------
*/