[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] overview of C++ expression template libraries
From: |
Greg Chicares |
Subject: |
Re: [lmi] overview of C++ expression template libraries |
Date: |
Tue, 30 Aug 2005 20:17:42 +0000 |
User-agent: |
Mozilla Thunderbird 1.0.2 (Windows/20050317) |
On 2005-8-27 17:50 UTC, Vadim Zeitlin wrote:
>
[Motivation for using an expression template library: a classic paper says:]
>
> 95-99.5% efficiency of hand-coded C using this technique (for long
> vectors). The speed is 2-15 times that of a conventional C++ vector
> class.
>
> to which I'd like to add that using this technique also results in concice
> and clear source code which is another important advantage.
For lmi, that's the definitive motivation. Here's an explanation that's
probably redundant for Vadim.
Consider lines [225,252] of this file:
ihs_mortal.cpp,v 1.13 2005/08/30 03:54:43
There are three statements, spanning nineteen lines, that mean
"multiply two vectors, and constrain the result to [0,1]"
and a fourth (nine lines) that applies a functor to each element.
Two different forms of std::transform() are used, and three custom functors,
along with std::bind1st and std::bind2nd. Four calls to transform() mean four
implicit loops, each with its own overhead that contemporary compilers may
not be able to optimize away. That overhead isn't just a matter of cycling
four integer counters instead of one: the real problem is many redundant
loads and stores to and from the floating-point unit, which is costly and
can impair accuracy.
I don't think that code is really an abuse of STL. Sure, we could combine all
four functions into one monolithic Megafunctor, which sounds like something
in a Godzilla movie and may be similarly scary, or risible, depending on your
point of view. This would eliminate the redundant loop overhead. However,
Megafunctor would be so specialized that it wouldn't be reusable elsewhere.
And it would be extra work to unit test it, yet perilous not to.
We could construct Megafunctor inline with standard binders and nonstandard
composers, as in the old boost compose library, but that can make code very
hard to read. That library was withdrawn because other boost libraries like
bind and lambda do a better job. They let us write code like
ihs_acctval.cpp,v 1.48 2005/08/28 14:09:02
lines [1299,1308] (I've simplified the variable names):
std::vector<double> vector1;
std::transform
(vector0.begin()
,vector0.end() - year
,std::inserter(vector1, vector1.begin())
,boost::bind
(some_unary_functor
,boost::bind(std::multiplies<double>(), _1, scalar0)
)
);
That eliminates the overhead of multiple transform() calls, but it's kind of
hard to read. All we really want to say is
vector1 = some_unary_functor(scalar0 * vector0);
and, with expression templates, we can potentially write it exactly that way,
and it'll just work.
The hand-coded C alternative
for(int j = 1; i <= vector0.size; ++j)
vector1[j] = some_unary_functor(scalar0[j] * vector0[j]);
is not much harder to write, but it's harder to write correctly. To
demonstrate that, I deliberately introduced five errors. A compiler can
find two of them, or maybe three depending on the context. Yet it can't
find the others: one just gives the wrong answer; but the other causes
a segfault if we're very lucky, or might instead cause a sporadic,
difficult-to-reproduce error in some apparently unrelated calculation.
Those are the problems that any of the other general approaches can
detect and prevent just by enabling an appropriate error-checking mode.
> Web search found the following candidate libraries (in no particular
> order):
>
> 1. Blitz++
> 2. MET (Matrix Expression Templates)
> 3. PETE (Portable Expression Template Engine)
> 4. POOMA (Parallel Object-Oriented Methods and Applications)
> 5. GET (Generic expression templates)
> 6. FC++ (Functional Programming in C++)
> 7. uBLAS (? Basic Linear Algebra Subprograms)
> 8. boost::mpl (Meta Programming Library)
> 9. MTL (Matrix Template Library)
>
>
> General information:
>
> Docs Licence URL
> ----------------------------------------------------------------------------
> Blitz++ good+ GPL http://oonumerics.org/blitz/
> MET poor GPL http://met.sourceforge.net/
> PETE poor ??? http://acts.nersc.gov/pete/
> POOMA good- GPL http://www.nongnu.org/freepooma/
> GET poor PD http://www.codeproject.com/cpp/expressiontemplates.asp
> FC++ good BSD(?) http://www.cc.gatech.edu/%7Eyannis/fc++/
> uBLAS good+ Boost http://www.boost.org/libs/numeric/ublas/doc/
> mpl good Boost http://www.boost.org/libs/mpl/doc/
> MTL good+ Art. http://www.osl.iu.edu/research/mtl/
>
>
> Activity/popularity:
>
> Last release Web update ML activity Google Debian
> ----------------------------------------------------------------------------
> Blitz++ Nov 2004 Jun 2005 medium low 33000 Y
> MET Feb 2001 none ? N
> PETE ? none 500 N
> POOMA Dec 2004 Feb 2005 very low 6300 N
> GET Jan 2005 none 1 N
> FC++ Aug 2003 Sep 2003 unaccessible 700 N
> uBLAS Jul 2005 2002(?) 25000 Y
> mpl Nov 2004 90000 Y
> MTL Jul 2005 Jul 2005 unaccessible 9260 N
>
>
> We can forget "GET" which is just an example of the code which could be
> used as a base for our own implementation, nothing more. We can also
> discard MET (unmaintained, doesn't compile under Windows)
Agreed.
> and PETE (doesn't
> exist on its own any more, part of POOMA). Next I excluded POOMA because it
> seems to be much, much more than what we need and is not particularly easy
> to use. I also have concerns about its activity.
It appears that PETE might be easy to pluck out of POOMA. It has its own
directory in their cvs at nongnu.org:
[Sources] / freepooma / freepooma / src / PETE
which looks very similar to the pete-2.1.0 archive I downloaded five years
ago. Maybe I'm just getting sentimental because it looked so nice to me then,
but it's only about a hundred kilobytes, and simplicity might be an advantage.
> Further, both FC++ and
> boost::mpl do many interesting things but nothing directly relevant to the
> problem at hand.
Yes, and maintenance of FC++ was undecided last time I looked, and I must say
its code needs a serious overhaul. I agree that we should rule those out.
> Finally, all of Blitz++, uBLAS and MTL are clearly geared
> towards scientific computing exclusively. On its own this is not a problem,
> of course, but I'd expect them to be optimized for working with huge
> matrices and not (relatively) small vectors as is the goal here. In
IIRC, Blitz++'s TinyVector is recommended for fixed-size arrays with about ten
elements. Typically, lmi uses fifty or a hundred elements, but the number of
elements is not fixed. It probably falls into the zone optimized least well
by Blitz++, and other libraries too. I think that Blitz++'s TinyVector
approach may be a poor fit: it unrolls everything, IIRC, and
a[ 0] = b[ 0] - c[ 0] * d[ 0];
a[ 1] = b[ 1] - c[ 1] * d[ 1];
...
a[99] = b[99] - c[99] * d[99];
might choke the compiler; but we could test that.
To choose the best library, we'll need to see how it performs on operations
typical for lmi. I'll put together a little test suite of vector operations
that we can code for each of a small number of libraries (probably about two
based on your analysis below). I've already got a framework that'll measure
how long they take to run. We'll also want to measure compile time.
Furthermore, we should consider what error-detection facilities each final
candidate provides. Trying to add two vectors of different lengths is an
error that I'd like to be detected automatically, even if only in a
nondefault 'debug' mode.
> addition, it turned out finally that MTL doesn't use expression templates
> yet and so would have poor performance.
I agree that we should rule out MTL. Its authors claim high performance,
but I'd guess that doesn't apply to typical lmi usage. Worse, MTL doesn't
seem to use operator overloading. I don't want to write 'add' or 'mult':
'+' and '*' are much better. The most important thing for lmi is that the
code be concise and clear, as you noted above. This
vector1 = some_unary_functor(scalar0 * vector0);
is much more readable and maintainable than
vector1 = some_unary_functor(mult(scalar0, vector0));
(Is it MTL's 'mult', 'mult_', or 'scale'? I don't want to have to think
about things like that.)
> So the choice is ultimately between
> Blitz++ and uBLAS and between them I think I prefer the latter because it's
> more recent and, well, being part of boost doesn't hurt.
I'm not sure Blitz++ is really being actively maintained; I agree that we
should rule it out, because uBLAS seems preferable.
> So our possibilities are:
>
> - use uBLAS (or Blitz++ if this is preferable for some reason)
Let's consider only uBLAS in this category. It's two megabytes of headers, and
that seems like an awful lot. That's why I'm thinking PETE may deserve
reconsideration. But I've looked over the uBLAS documentation and these guys
really seem to know what they're doing; and maybe we would actually use only a
small part of it. We've already got other dependencies on boost. This library
is implemented in headers only, as you note below, which makes our life easier.
> - write a simple ET library ourselves (possible using GET code)
I'd hesitate to do that.
> - just keep the current code but simplify it by using boost::lambda, e.g.
>
> transform(v1.begin(), v1.end(), v2.begin(), output.begin(),
> min(1., max(0., _1 + _2))
That seems better than the similar lines [225,243] of
ihs_mortal.cpp,v 1.13 2005/08/30 03:54:43
but I'd much rather have
output = min(1., max(0., v1 + v2));
> I honestly don't know what to recommend. uBLAS is nice but it's a big and
> rather complicated library (but it's fully template and hence inline). OTOH
Let's call it a candidate worthy of testing.
> redoing the work ourselves is not particularly appealing neither and while
> it wouldn't take that long it still feels like time lost.
Neither of us is looking for a thesis project, I guess.
We could perhaps fall back on valarray. I have at least two very different
implementations of it that we could test. It's generally regarded as not
the best design, but it's in the standard. I wouldn't bring it up if we
had lots of other candidates.
> So probably I'd
> just use boost::lambda to make using std algorithms less painful. It should
> have decent performance (I could test it...) but wouldn't result in the
> clearest possible code.
"Less painful" is an improvement, but I'm not yet willing to let go of the
dream: no pain, and "the clearest possible code".
Let me sketch out a test suite. I uploaded 'vector_test.cpp' to cvs the
other day. It's just the beginning of an investigation I started years ago
but never finished. I think I can adapt it to help test typical lmi
operations with various techniques.