[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lmi] [Fwd: Astonishingly, xmlwrapp has vanished; use libxml++ instead?]
From: |
Greg Chicares |
Subject: |
[lmi] [Fwd: Astonishingly, xmlwrapp has vanished; use libxml++ instead?] |
Date: |
Tue, 09 Aug 2005 17:00:06 +0000 |
User-agent: |
Mozilla Thunderbird 1.0.2 (Windows/20050317) |
[We were talking about replacing xmlwrapp, whose author no longer supports
it and has actually taken it off his website, so that over time it will
become difficult to find the version that lmi uses. The libxml++ library
seems like a plausible candidate. Large census files take a long time to
load, so speed is important. In that context, Vadim sent me the following
email, which I quote here with his permission.]
Sorry for a potentially stupid question but why use DOM if speed is really
important? The quality of DOM implementations may wary but I'd be surprized
if even the fastest DOM model could rival with a SAX library, especially
for big libraries. Also, according to everything I heard, the fastest XML
parser is expat, not libxml2.
GC> An original goal for xmlwrapp was to be able to use other C xml libraries
GC> than libxml2, but I thought Peter Jones abandoned that goal after he'd
GC> discussed it on his xmlwrapp mailing list and no one seemed interested.
There is a project called Arabice which seems quite interesting from this
point of view: http://www.jezuk.co.uk/cgi-bin/view/arabica. It builds both
SAX and DOM APIs on top of either of expat, libxml, xerces or, under
Windows only, MSXML parser (which is, BTW, known to be quite fast).
GC> Years ago, I think I looked into the available C libraries and formed
GC> an impression that nothing would be much faster than libxml2, though
GC> now I can't say how good my analysis was.
There are some benchmarks at http://xmlbench.sourceforge.net/index.php
which seem to support my claim about expat above. I.e. expat is the fastest
one, then libxml and, farther behind, xerces.
GC> But I doubt that a C++ wrapper has much effect on speed.
If we're to believe these benchmarking results (e.g. see
http://xmlbench.sourceforge.net/results/benchmark200402/index.html) it can:
expat is the fastest parser on its own but expat+arabica is by far the
slowest one. I'm quite surprized about this but I didn't want to spend time
on rerunning the benchmarks myself unless you're really interested in this.
[...]
Here is the result of this. I've looked at the 3 "classic" XML parsers:
expat (and C++ wrapper for it), libxml and xerces as well as another one
having good reputation and the already mentioned Arabica which builds on
top of the 3 others. I've also included xmlwrapp for reference. Here are
the details of all these projects if you want to check something by
yourself:
expat http://expat.sourceforge.net/
expatpp http://www.oofile.com.au/xml/expatpp.html
libxml http://www.xmlsoft.org/
libxml++ http://libxmlplusplus.sourceforge.net/
xerces http://xml.apache.org/xerces-c/index.html
TinyXML http://www.grinninglizard.com/tinyxmldocs/index.html
Arabica http://www.jezuk.co.uk/cgi-bin/view/arabica
Please use fixed width font and 4-space tabs to view the tables below:
[GWC replaced tabs with spaces when quoting this email]
Table 1: overview
Parser Popularity Debian Used by Last release Activity
----------------------------------------------------------------------------
expat very high Yes Python, 2005-01-28 moderate
Perl,
Mozilla
expatpp very low 2003-07-26 very low
libxml high Yes GNOME 2005-07-10 high
libxml++ average Yes 2005-02-13 low
xerces high Yes Apache 2004-09-29 low
TinyXML low 2004-05-20 low
Arabica very low 2004-02-26 high
xmlwrapp low 2004-03-19 frozen
Table 2: technical comparison
Parser Lang Performance Size Portability Features
----------------------------------------------------------------------------
expat C best tiny excellent basic
expatpp C++ good [3] basic
libxml C good avg[2] good kitchen sink included
libxml++ C++ ???? ???? good [3] same as above
xerces "C+"[1] good big excellent extensive
TinyXML C++ good tiny good very basic
Arabica C++ poor avg good [3] extensive
xmlwrapp C++ ???? avg good basic
Notes:
[1] "C+" means that they use so-called portable subset of C++, i.e. no
exceptions, no templates, no STL -- more like "C with classes" than
modern C++
[2] libxml is not big under Unix but it builds into 1.5MB DLL under Windows
by default (the size can be reduced by almost half by omitting unneeded
options)
[3] untested, according to web site information only
My recommendations: if the size is important at all (e.g. any chance of
porting lmi to PDA-class devices (not joking, I seriously consider such
possibility)) or if the speed is *really* important _and_ if only basic XML
parsing is required (and not anything more like XPath, XSchema and
validatio, XPointer, XInclude &c), then expat would be the best choice.
It's robust, used in many many high profile projects, very fast and
extremely small. Unfortunately there is no decent C++ wrapper for it as
expatpp to be abandoned and Arabica appears to be done very poorly
performance-wise. So if you want to use it we'd need to write our own C++
wrapper, just as we did in wxWidgets for XRC.
Otherwise and by default, the best choice is libxml2: it's very fast,
widely used, supports just about everything and is very actively developed.
It's not clear how much all this applies to libxml++ but, at the very
least, it seems to be still developed and even though its popularity is
much lower than that of libxml itself (which is itself lower than expat)
it's still a successful project.
None of the other projects has any noticeable advantages. Certainly xerces
seems like a solid library and, being used by Apache, it can be supposed to
be well-engineered but it's really not a "real" C++ library (of course, the
same could be said about wxWidgets but then we luckily have fewer
concurrents ;-). TinyXML doesn't seem to much tinier than expat and I don't
see why would we use it. As for Arabica, it is the example of what I'd have
done myself as it seems the most elegant solution from the engineering
point of view (separate XML parsing itself from SAX/DOM API which can be
implemented on top of it) and, indeed, there is a possibility that one day
we do something like this in wxWidgets where we'd definitely support
multiple backends in plugins. But I don't think you're really interested in
being able to switch between XML parsing backends and so it hardly presents
enough advantages to offset the risk or relying on yet another not very
well deployed library.
So after spending 2 hours on exploring all the alternatives I can only
come up with 2 proposals: (a) use expat with our own C++ wrapper outsde it
or (b) do what you initially proposed and just go with libxml++.
- [lmi] [Fwd: Astonishingly, xmlwrapp has vanished; use libxml++ instead?],
Greg Chicares <=