lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Ledger::write() -- adding DOCTYPE support to xmlwrapp?


From: Vaclav Slavik
Subject: Re: [lmi] Ledger::write() -- adding DOCTYPE support to xmlwrapp?
Date: Sat, 13 Dec 2008 16:36:31 +0100

Hi,

On Fri, 2008-12-12 at 02:55 +0000, Greg Chicares wrote:
> This comment in 'input_xml_io.cpp' may suggest a useful enhancement:
> 
> // XMLWRAPP !! The unit test demonstrates that the suppressed code is
> // twenty-five percent slower. What would be really desirable is an
> // (efficient) element-iterator class.

I.e. something like xml::node::find(), but not limited to single element
name. Yes, I can see how that would be very useful. 

>     xml::node::const_iterator child;
>     for(child = x.begin(); child != x.end(); ++child)
>         {
> [...]
>         if(child->is_text())
>             {
>             continue;
>             }
> 
> Skipping an iteration if iterator.is_text() is true seemed to be a
> common idiom in the xmlwrapp sample programs. 

It is incorrect if what you want is to iterate over _elements_, though
-- it skips text nodes, but doesn't skip comment or more exotic nodes
like PIs, CDATAs, unexpanded entities and so on.

> I looked into this
> once, years ago, and IIRC it seemed that the iterator was accessing
> data below the xml entity level. For example, again IIRC, given
>   <foo>Hello</foo>
> the iterator would reach "Hello". That's just my recollection, and
> I might misunderstand it completely, but there's *some* reason why
> that odd 'continue' idiom is needed.

If what you mean (as I think you do) is that "Hello" would be reached
when iterating over the children of <root> in this example:

  <root>
    <foo>Hello</foo>
    <bar>...</bar>
  </root>

(as opposed to iterating over <foo>'s children, where reaching its only
child, text node "Hello", is entirely expected), then I believe the
problem was something else:

Unless you configure libxml2's parser to ignore all whitespace-only text
nodes, it exposes whitespace-only nodes in the iterator and so the list
of <root>'s children will contain "\n  " text node, <foo> element,
"\n  " text node, <bar> element and finally "\n" text node. This is in
accordance with XML DOM spec, because whitespace is significant in XML.
But it can be quite inconvenient, which is why libxml2 has an option to
filter these out.

We could enable this option for LMI if you are sure there's no
significant use of whitespace in its XML files, but I usually prefer to
write the parsing code in such way that it works with either setting
(but then, that was xmlwrapp code used in a library where I couldn't
affect app-wide global settings). And we would _still_ need
element-iterator to skip over comments and other junk.

> Searching for is_text() in 'xml_lmi.cpp' finds some code that IIRC
> was my attempt to encapsulate that idiom. And I think the comment
> block quoted above means that my attempt proved too slow. So I
> thought that perhaps an alternative iterator, which addresses only
> full elements, could be a useful addition to xmlwrapp.

It certainly would. I'm sure I have some older code that would benefit
from it laying around -- I remember being disappointed that xml::node
had only find(element_name) method and not some find_all_elements().

> enough to explain it clearly. The only thing I know for sure is
> that I measured a 25% performance penalty as noted in that comment.

Do you remember how to run this particular test?

> BTW, 'xml_lmi.?pp' doesn't have to live forever. It might be better
> to kill it soon. 

I'm all for it. I already eliminated a tiny, xml::init-related, part of
it locally, thanks to this xmlwrapp change:
http://xmlwrapp.svn.sourceforge.net/viewvc/xmlwrapp?view=rev&revision=107

We won't need child_elements() either, as discussed above. On the other
hand, things like get_name() or get_content() seem useful. We may want
to still keep some utility functions in there, or we may migrate some of
these over to xmlwrapp.

Regards,
Vaclav





reply via email to

[Prev in Thread] Current Thread [Next in Thread]