[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: float matrix multiplication problem
From: |
Jaroslav Hajek |
Subject: |
Re: float matrix multiplication problem |
Date: |
Thu, 14 Jan 2010 09:31:26 +0100 |
On Thu, Jan 14, 2010 at 9:18 AM, John W. Eaton <address@hidden> wrote:
> Using the current sources, running Octave with valgrind, and given
>
> x = single (rand (4, 4) + i*rand (4, 4));
>
> then the expression
>
> x'*x;
>
> generates the following diagnostics:
>
> ==12836== Conditional jump or move depends on uninitialised value(s)
> ==12836== at 0x681224F: bool mx_inline_all_real<float>(unsigned long,
> std::complex<float> const*) (mx-inlines.cc:214)
> ==12836== by 0x6826504: FloatComplexNDArray::all_elements_are_real()
> const (fCNDArray.cc:536)
> ==12836== by 0x567F587:
> octave_float_complex_matrix::try_narrowing_conversion() (ov-flt-cx-mat.cc:83)
> ==12836== by 0x56BF551: octave_value::maybe_mutate() (ov.cc:1148)
> ==12836== by 0x56CD9B3: octave_value::octave_value(FloatComplexMatrix
> const&, MatrixType const&) (ov.cc:689)
> ==12836== by 0x57C7A79: oct_binop_herm_mul(octave_base_value const&,
> octave_base_value const&) (op-fcm-fcm.cc:137)
> ==12836== by 0x56C50E0: do_binary_op(octave_value::compound_binary_op,
> octave_value const&, octave_value const&) (ov.cc:2103)
> ==12836== by 0x574E833: tree_compound_binary_expression::rvalue1(int)
> (pt-cbinop.cc:212)
> ==12836== by 0x5755FB4: tree_evaluator::visit_statement(tree_statement&)
> (pt-eval.cc:689)
> ==12836== by 0x57728F3: tree_statement::accept(tree_walker&)
> (pt-stmt.cc:152)
> ==12836== by 0x5755DBB:
> tree_evaluator::visit_statement_list(tree_statement_list&) (pt-eval.cc:725)
> ==12836== by 0x577291F: tree_statement_list::accept(tree_walker&)
> (pt-stmt.cc:216)
>
> As far as I can tell, this is happening at the point where the result
> from xgemm is being stuffed into an octave_value object and
> maybe_mutate is called to try to narrow the result.
>
> I don't understand why something would be uninitialized here, but the
> following change avoids the problem for me:
>
> diff --git a/liboctave/fCMatrix.cc b/liboctave/fCMatrix.cc
> --- a/liboctave/fCMatrix.cc
> +++ b/liboctave/fCMatrix.cc
> @@ -3769,7 +3769,7 @@
> {
> octave_idx_type lda = a.rows ();
>
> - retval = FloatComplexMatrix (a_nr, b_nc);
> + retval = FloatComplexMatrix (a_nr, b_nc, 0.0);
> FloatComplex *c = retval.fortran_vec ();
>
> const char *ctra = get_blas_trans_arg (tra, cja);
>
> I wouldn't think this should be necessary, because I thought CHERK
> and the following code to fill in the lower part of the symmetric
> matrix would properly initialize everything.
>
> jwe
>
You may have hit the nail on the head. The problem is the following
statement in the ZHERK docs of reference BLAS:
* Note that the imaginary parts of the diagonal elements need
* not be set, they are assumed to be zero, and on exit they
* are set to zero.
Probably ATLAS does not do the latter. I think it's an ATLAS bug,
then. It is not needed to initialize the whole Matrix with zeros, but
what about something like the attached? This won't have a noticeable
performance impact:
diff --git a/liboctave/CMatrix.cc b/liboctave/CMatrix.cc
--- a/liboctave/CMatrix.cc
+++ b/liboctave/CMatrix.cc
@@ -3789,8 +3789,12 @@
F77_CHAR_ARG_LEN (1)
F77_CHAR_ARG_LEN (1)));
for (int j = 0; j < a_nr; j++)
- for (int i = 0; i < j; i++)
- retval.xelem (j,i) = std::conj (retval.xelem (i,j));
+ {
+ for (int i = 0; i < j; i++)
+ retval.xelem (j,i) = std::conj (retval.xelem (i,j));
+ // FIXME: In fact, ZHERK should do this, but some
BLASes miss it.
+ retval.xelem (j,j) = real (retval.xelem (j,j));
+ }
}
else
{
diff --git a/liboctave/fCMatrix.cc b/liboctave/fCMatrix.cc
--- a/liboctave/fCMatrix.cc
+++ b/liboctave/fCMatrix.cc
@@ -3782,8 +3782,12 @@
F77_CHAR_ARG_LEN (1)
F77_CHAR_ARG_LEN (1)));
for (int j = 0; j < a_nr; j++)
- for (int i = 0; i < j; i++)
- retval.xelem (j,i) = std::conj (retval.xelem (i,j));
+ {
+ for (int i = 0; i < j; i++)
+ retval.xelem (j,i) = std::conj (retval.xelem (i,j));
+ // FIXME: In fact, CHERK should do this, but some
BLASes miss it.
+ retval.xelem (j,j) = real (retval.xelem (j,j));
+ }
}
else
{
--
RNDr. Jaroslav Hajek, PhD
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz