lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi] Which is the best C++ wrapper for libxml2?


From: Greg Chicares
Subject: [lmi] Which is the best C++ wrapper for libxml2?
Date: Wed, 08 Nov 2006 04:27:01 +0000
User-agent: Thunderbird 1.5.0.4 (Windows/20060516)

Should we use the latest version of xmlwrapp instead of libxml++ ?

On 2005-8-9 17:00 UTC, Greg Chicares wrote:
[Was: Astonishingly, xmlwrapp has vanished; use libxml++ instead?]
> We were talking about replacing xmlwrapp, whose author no longer supports
> it and has actually taken it off his website, so that over time it will
> become difficult to find the version that lmi uses.

It hasn't really vanished: it lives on, though dormantly, here:
  http://sourceforge.net/projects/xmlwrapp/
as well as in *nix distros like:
  
http://www.mirrorservice.org/sites/ftp.freebsd.org/pub/FreeBSD/distfiles/%5Bpage=159%5D

> The libxml++ library
> seems like a plausible candidate.

Now libxml++ seems to be dormant--see this message from a month ago:
  http://sourceforge.net/mailarchive/message.php?msg_id=37044159
This recent development vitiates our original rationale for replacing
xmlwrapp with libxml++, so now I'd like to reconsider that change.

Today, lmi HEAD builds with any of the following C++ wrappers (in the
lmi makefiles, they're selectable by setting $xml_wrapper as shown):
  libxml++-2.14.0   [make xml_wrapper=libxmlpp]
  xmlwrapp-0.2.0    [make xml_wrapper=xmlwrapp_0_2_0]
  xmlwrapp-0.5.0    [make xml_wrapper=xmlwrapp_0_5_0]
IOW, I've temporarily virtualized the C++ wrapper. Obviously there's
no value in keeping it virtualized, but it does
 - demonstrate that all of these libraries work with lmi
 - let us compare their speed [1]
 - permit API comparison: see files 'xml_lmi.cpp' and 'xmlpp_lmi.cpp'

They all seem to work correctly. One is about as fast as another.
Neither appears to be actively maintained, though xmlwrapp has been
more plainly stalled for a longer time. Both remain available on the
web. I think it comes down to choosing the better API, and I'd say
xmlwrapp is much better, because it's written in a more modern idiom
that's safer. One particular but not atypical example is given below
[2].

Furthermore, xmlwrapp wraps libxslt, whereas libxml++ does not.
We've already begun remedying that libxml++ deficiency in
'gnome-xml-branch', but should that effort continue when a C++
wrapper is already available and we haven't even evaluated it?

Please tell me if I'm wrong, because I'm thinking of switching to
xmlwrapp-0.5.0 (the latest), and I'd like to remove support for two
of these three libraries right away. I assume everyone agrees that
the older version 0.2.0 of xmlwrapp isn't the best choice.

This is not to say that the original change was a mistake. Rather,
the original rationale has been weakened, and it would be a mistake
not to reconsider the decision now. This is also not to say that the
effort has been wasted: we needed to build and use libxslt in any
event; we've migrated to the latest libxml2 (the version we'd been
using was four and a half years old); and the lmi code that uses the
C++ wrapper library has been cleaned up (or is about to be).

---------
[1] Here are some timings from a unit test. They read a '.ill' or
a small '.cns' file and write a duplicate file.

xmlwrapp-0.2.0
  'cns' io: [5.857e-002] 10 iterations took 585 milliseconds
  'ill' io: [1.961e-002] 10 iterations took 196 milliseconds
xmlwrapp-0.5.0
  'cns' io: [6.144e-002] 10 iterations took 614 milliseconds
  'ill' io: [2.063e-002] 10 iterations took 206 milliseconds
libxml++
  'cns' io: [5.223e-002] 10 iterations took 522 milliseconds
  'ill' io: [1.698e-002] 10 iterations took 169 milliseconds

Although libxml++ is about fifteen percent faster, I believe that's
largely because of the change committed 20061107T0451Z, which seems
to penalize xmlwrapp disproportionately. Restoring that one changed
snippet to the earliest "native" idiom for both libraries cuts the
libxml++ speed advantage in half.

And if we want to increase speed, we should probably try replacing
DOM with SAX, as Vadim pointed out long ago.

[2] Parse a file and get the DOM root node. Code to set libxml2
options (e.g., validation and entity substitution) suppressed.

xmlwrapp:

  try {
    xml::tree_parser parser(filename);
    if(parser)
      xml::node &root = parser.get_document().get_root_node();
    }

libxml++ :

  try {
    xmlpp::DomParser parser;
    parser.parse_file(filename);
    if(parser)
      const xmlpp::Node* pNode = parser.get_document()->get_root_node();
    }

In the libxml++ example, I believe that either of these
  parser.get_document()
  parser.get_document()->get_root_node()
can be NULL, so safe use of this library seems to require manual
testing of every pointer it returns--and many of its functions
return pointers. Such manual testing bloats code that uses the
library, and it's difficult to be sure that every case has been
tested unless all library calls are firewalled behind a wrapper.
It just doesn't seem right to have to wrap a C++ wrapper library
to add safety.

I conclude from inspecting the code that xmlwrapp really does have
an object when it returns a reference, whereas libxml++'s pointers
really can be NULL. IOW, I did not see xmlwrapp hiding dereference
operations that could lead to segfaults.

Let me clearly distinguish this from cases like

  wxWindow *w = /* whatever */
  wxSizer const* s = w->GetContainingSizer();
  if(NULL == s)
    {
    // This isn't an error. It isn't a library failure.
    // It isn't a misfeature. It doesn't mean that the
    // wxWindow object is invalid. It just means that
    // there happens to be no containing sizer.

    /* Do something appropriate. */
    }

where pointers make perfect sense in the interface, and also from
libraries like wxWindows that have legitimate design reasons never
to throw exceptions. A DOM parser that contains a NULL document
after parsing, in a library that throws exceptions, is a different
matter:

  if(NULL = parser.get_document())
    {
    // Now what? Parse it again? It's an error, so the
    // best we can do is to throw an exception; but the
    // library should do that, IMO.
    }





reply via email to

[Prev in Thread] Current Thread [Next in Thread]