rapp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Rapp-dev] How-to vectorize functions in compute/generic (like rc_ty


From: Hans-Peter Nilsson
Subject: Re: [Rapp-dev] How-to vectorize functions in compute/generic (like rc_type_bin_to_u8 using RC_VEC_SETMASKV)
Date: Wed, 21 Mar 2012 07:13:59 +0100

> From: Hans-Peter Nilsson <address@hidden>
> Date: Mon, 19 Mar 2012 07:39:03 +0100

> There were no comments on the suggested new vector abstraction
> layer macros, so I took the liberty to change my mind.
> ...
> I also no
> longer think there should be separate binary-to- 16-bit-vector
> and 32-bit-vector conversion macros, but instead just
> sign-extension macros, extending from 8 to 16 and from 16 to 32.
> But that's for another day.

That day was today, or rather yesterday.  The following backend-
macros are now implemented on the git branch
topic/macros-for-integral including support in SSE2, Altivec and
(just the RC_VEC_{ADD,SUB}{16,32} macros) SWAR.  This is not
final; whatever backend-macros are needed for integral images
will be added and those not needed will be removed.

diff --git a/compute/backend/rc_vec_api.h b/compute/backend/rc_vec_api.h
index c91da0d..597c551 100644
--- a/compute/backend/rc_vec_api.h
+++ b/compute/backend/rc_vec_api.h
@@ -530,6 +530,54 @@ typedef arch_vector_t rc_vec_t;
 #define RC_VEC_ADDS(dstv, srcv1, srcv2)
 
 /**
+ *  Non-saturating addition, uint16_t elements.
+ *  Computes dstv = srcv1 + srcv2 for each 16-bit field in two's
+ *  complement truncating arithmetic (wrapping around at zero without
+ *  exceptions).
+ *
+ *  @param dstv   The output vector.
+ *  @param srcv1  The first input vector.
+ *  @param srcv2  The second input vector.
+ */
+#define RC_VEC_ADD16(dstv, srcv1, srcv2)
+
+/**
+ *  Non-saturating subtraction, uint16_t elements.
+ *  Computes dstv = srcv1 - srcv2 for each 16-bit field in two's
+ *  complement truncating arithmetic (wrapping around at zero without
+ *  exceptions).
+ *
+ *  @param dstv   The output vector.
+ *  @param srcv1  The first input vector.
+ *  @param srcv2  The second input vector.
+ */
+#define RC_VEC_SUB16(dstv, srcv1, srcv2)
+
+/**
+ *  Non-saturating addition, uint32_t elements.
+ *  Computes dstv = srcv1 + srcv2 for each 32-bit field in two's
+ *  complement truncating arithmetic (wrapping around at zero without
+ *  exceptions).
+ *
+ *  @param dstv   The output vector.
+ *  @param srcv1  The first input vector.
+ *  @param srcv2  The second input vector.
+ */
+#define RC_VEC_ADD32(dstv, srcv1, srcv2)
+
+/**
+ *  Non-saturating subtraction, uint32_t elements.
+ *  Computes dstv = srcv1 - srcv2 for each 32-bit field in two's
+ *  complement truncating arithmetic (wrapping around at zero without
+ *  exceptions).
+ *
+ *  @param dstv   The output vector.
+ *  @param srcv1  The first input vector.
+ *  @param srcv2  The second input vector.
+ */
+#define RC_VEC_SUB32(dstv, srcv1, srcv2)
+
+/**
  *  Average value, truncated.
  *  Computes dstv = (srcv1 + srcv2) >> 1 for each 8-bit field.
  *
@@ -769,6 +817,71 @@ typedef arch_vector_t rc_vec_t;
 
 /*
  * -------------------------------------------------------------
+ *  Type conversions
+ * -------------------------------------------------------------
+ */
+
+/**
+ *  @name Type conversions
+ *  @{
+ */
+
+/**
+ *  Sign-extend 8-bit vector fields into 16-bit vector fields.
+ *  The most significant bit of each 8-bit vector field is replicated
+ *  into eight more significant bits and the result is stored into the
+ *  corresponding 16-bit field in a vector pair ldstv and rdstv.
+ *  N.B. the type of the vectors are still rc_vec_t.
+ *
+ *  @param ldstv  The left-most part of the output vector.
+ *  @param rdstv  The right-most part of the output vector.
+ *  @param srcv   The input vector.
+ */
+#define RC_VEC_8S16(ldstv, rdstv, srcv)
+
+/**
+ *  Sign-extend 16-bit vector fields into 32-bit vector fields.
+ *  The most significant bit of each 16-bit vector field is replicated
+ *  into 16 more significant bits and the result is stored into the
+ *  corresponding 32-bit field in a vector pair ldstv and rdstv.
+ *  N.B. the type of the vectors are still rc_vec_t.
+ *
+ *  @param ldstv  The left-most part of the output vector.
+ *  @param rdstv  The right-most part of the output vector.
+ *  @param srcv   The input vector.
+ */
+#define RC_VEC_16S32(ldstv, rdstv, srcv)
+
+/**
+ *  Zero-extend 8-bit vector fields into 16-bit vector fields.
+ *  Each 8-bit vector field is extended by eight zero bits and the
+ *  result is stored into the corresponding 16-bit field in a vector
+ *  pair ldstv and rdstv. N.B. the data type of the vectors are still
+ *  rc_vec_t.
+ *
+ *  @param ldstv  The left-most part of the output vector.
+ *  @param rdstv  The right-most part of the output vector.
+ *  @param srcv   The input vector.
+ */
+#define RC_VEC_8U16(ldstv, rdstv, srcv)
+
+/**
+ *  Zero-extend 16-bit vector fields into 32-bit vector fields.
+ *  Each 16-bit vector field is extended by 16 zero bits and the result
+ *  is stored into the corresponding 32-bit field in a vector pair ldstv
+ *  and rdstv. N.B. the data type of the vectors are still rc_vec_t.
+ *
+ *  @param ldstv  The left-most part of the output vector.
+ *  @param rdstv  The right-most part of the output vector.
+ *  @param srcv  The input vector.
+ */
+#define RC_VEC_16U32(ldstv, rdstv, srcv)
+
+/* @} */
+
+
+/*
+ * -------------------------------------------------------------
  *  Reductions
  * -------------------------------------------------------------
  */



reply via email to

[Prev in Thread] Current Thread [Next in Thread]