[lmi] Reimplementation of various reports

lmi

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi] Reimplementation of various reports

From:	Greg Chicares
Subject:	[lmi] Reimplementation of various reports
Date:	Fri, 14 Nov 2008 14:36:43 +0000
User-agent:	Thunderbird 2.0.0.17 (Windows/20080914)

We need to renovate report generation. It's grown too difficult
to maintain as it stands. We can't respond quickly enough to
requests for changes. Too many defects, known and unknown, lurk
in dark corners beyond the reach of automated testing, and they
can be quite difficult to track down and fix.

Often xslt is the right way to get multiple reports from a single
dataset that's available as xml, but we pushed it too far. Speed
turned out to be a real problem, so we created multiple datasets,
each with its own single stylesheet--and it was still slow. To
avoid slowing it down further, we formatted numbers in C++; but
now we need more flexible control over the formatting, e.g., as
described in the first paragraph of last month's release notes:

  http://lists.nongnu.org/archive/html/lmi/2008-10/msg00038.html

Formatting never achieved the design goals:

  http://lists.gnu.org/archive/html/lmi/2006-09/msg00005.html
|
| Right now, four formats are hardcoded:
|   {[0 | 2] decimals, [true | false] show-as-percentage}
| That's not as flexible as it should be. Any number of decimals
| in [0, DECIMAL_DIG] should be permitted.
[...]
| The titles are a mess: " _____ __They __Look __Like __This"
| because we couldn't figure out how to make apache FOP render them
| in a set of right- and bottom-aligned set boxes.

Today we still have four hardcoded formats. We do have some code
(see TEXT_LINE_WRAPPER in 'fo_common.xsl') to control where lines
break in column titles, but the '_' once used for that purpose
survives in 'ledger_formats.xml' only in xml comments, e.g.:

  <column name="AttainedAge">
      <title>End of Year Age</title><!-- _____________ End of __Year Age -->
      <format>f1</format>
  </column>

and in some stylesheets that should refer to 'ledger_formats.xml'
for titles but don't, as in this hardcoded 'nasd.xsl' example:

  <column composite="0" name="AttainedAge">End of _Year Age</column>

That column title is shown differently in different contexts.
(As of 20081114T0506Z, though, I reverted to original code that
honors '_' line breaks.)

In trying to make sense of pdf generation recently, I've come to
the conclusion that we've got code to write two equivalent but
incompatible xml datasets; and that xsl-fo uses the "new" one,
but only after applying 'xml2to1.xsl' to transform it into the
"old". Apparently the "new" xml was used only for purposes other
than xsl-fo, and only in 'illustration_view.cpp'--where recently
I reverted all such uses to C++ implementations, largely because
that improves runtime performance:

http://lists.nongnu.org/archive/html/lmi/2008-10/msg00038.html
|
| The normal output screen (calculation summary) now appears much
| faster: for the simplest single-life illustration, it's four
| times as fast as in the July release.

That C++ implementation is well tested, so it's more reliable.
A week or two ago we got a defect report from users: a certain
column appears blank when it should not. With the September 30
release, this occurs for both the calculation summary and the pdf
output. With the October 31 release, it occurs only in the pdf
case. I reverted the calculation summary to the original C++ code
in the last few days of October; presumably that's what "fixed"
the problem there. We're still trying to get a testcase that
fails reproducibly on my machine for the remaining pdf problem.

Anyway, it looks like these files (with 'wc -l' line counts):
  1313 ledger_xml_io2.cpp
   132 ledger_common_tsv.xsl
    42 ledger_excerpt.hpp
   739 ledger_formats.xml
   373 ledger_formatter.cpp
   131 ledger_formatter.hpp
   230 calculation_summary_html.xsl
   120 calculation_summary_tsv.xsl
   166 microcosm_tsv.xsl
   143 xml2to1.xsl
  3389 total
can soon be eliminated as being redundant. At this moment, the
'ledger_test.cpp' unit test creates several types of reports,
using both C++ (which will remain) and xslt (which will soon be
removed). That's a weird test, which probably shouldn't have been
written as a unit test. It serves no other purpose, so it should
probably be removed, too. Already it's not routinely run along
with other unit tests; to run it requires, e.g.:
  make unit_tests unit_test_targets=ledger_test.exe

For the record, here's a sample of the "new" xml format, which
isn't used as such:

  <double_vector name="TotalIMF">
    <duration>0.00%</duration>
    <duration>0.00%</duration>
...
    <duration>0.00%</duration>
  </double_vector>

and here's the same data in the "old" format, which is used:

    <newcolumn>
      <column name="TotalIMF">
        <duration number="0" column_value="0.00%"/>
        <duration number="1" column_value="0.00%"/>
...
        <duration number="54" column_value="0.00%"/>
      </column>
    </newcolumn>

The "new" format seems clearly preferable, but the "old" format
is required by xsl-fo stylesheets such as 'nasd.xsl'; in fact,
today I believe that the "new" format isn't used anywhere. We may
want to change the format someday, but for now we need to remove
duplicative code because it's an obstacle to understanding and
maintenance, and also because it has many shortcomings that have
been noted inline for years and won't ever be addressed. I plan
to clean this up over the next few days.

[Prev in Thread]

Current Thread

[Next in Thread]

[lmi] Reimplementation of various reports, Greg Chicares <=

Prev by Date: [lmi] DragAcceptFiles() on GNU/Linux (was Re: Terse list of valuable projects)
Next by Date: [lmi] using xmlwrapp 0.5.1
Previous by thread: [lmi] DragAcceptFiles() on GNU/Linux (was Re: Terse list of valuable projects)
Next by thread: [lmi] using xmlwrapp 0.5.1
Index(es):
- Date
- Thread