Re: [lmi] Embed {{MST}} and <html> in product database

lmi
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] Embed {{MST}} and <html> in product database

From:	Greg Chicares
Subject:	Re: [lmi] Embed {{MST}} and <html> in product database
Date:	Wed, 24 Jul 2019 18:10:26 +0000
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.1
On 2019-07-23 00:41, Vadim Zeitlin wrote:
> On Mon, 22 Jul 2019 18:19:37 +0000 Greg Chicares <address@hidden> wrote:
> 
> GC> On 2019-07-21 12:36, Vadim Zeitlin wrote:
> GC> [...]
> GC> > On Thu, 18 Jul 2019 18:24:49 +0000 Greg Chicares <address@hidden> wrote:
> GC> [...]
> GC> > GC> > Is the problem you're trying to solve that it is only supposed to 
> contain
> GC> > GC> > text, and not HTML, currently, while you'd like to use HTML 
> inside it?
> GC> > GC> 
> GC> > GC> In part, yes. More generally, I'd also like text substitutions
> GC> > GC> such as mustache offers.
> GC> > 
> GC> >  Trying to understand the reasons for my instinctive dislike of your
> GC> > initial proposal, I've realized that I don't actually have any 
> objections
> GC> > to using text substitutions in the policy files. My main problem is with
> GC> > putting HTML inside them, as this just seems like a completely wrong 
> level
> GC> > for this: policy files are supposed to contain business logic, not
> GC> > presentation-level decorations. I.e. rather than having
> GC> > 
> GC> >           <br><b>Title:</b><br> Contents
> GC> > 
> GC> > in the policy file, I'd much prefer to have "Title" and "Contents" as 2
> GC> > separate policy fields and then have
> GC> > 
> GC> >           <br><b>{{Title}}</b><br>{{Contents}}
> GC> > 
> GC> > in the .mst file. This would preserve the separation between the "model"
> GC> > and the "view" parts and would avoid situations where you'd need to
> GC> > recreate the policy file just to change bold to italic or change the 
> size
> GC> > of a font, instead of simply editing the .mst file directly.
> GC> 
> GC> We don't need all the flexibility of html in the policy files.
> 
>  Unfortunately, chances are that after enough time, we will get it there,
> whether we need it or not. I can't really see why would we forbid using <i>
> if we allow <b>. Or why would we forbid <span style="color: red;"> if we
> allow the other ones. And so on...

That's a good argument against allowing html per se in '.policy' files.

> GC> All we need is <b>, along with either <br> or <p>. If there were some
> GC> easy way to write those in mustache syntax, e.g.:
> GC>   {{??The pilcrow makes this a separate paragraph.}}
> GC> that would be good enough. Or, instead of devising our own weird mustache
> GC> dialect with '{{??' as an atom, we could use special characters to 
> indicate
> GC> markup, e.g.:
> GC>   "????Some technical term??: its narrative definition."
> GC> which would become
> GC>   <p><b>Some technical term<b>: its narrative definition.</p>
> GC> in PDF output.
> 
>  It looks like the real solution would be to allow defining maps in the
> policy files. I.e. instead of a single scalar field, I'd like to have a map
> with several "Some technical term" -> "its narrative definition" pairs of
> keys/values. Would this be something you might consider? It would clearly
> require more changes, but would seem to be much cleaner too.

No. I'm afraid I misled you by suggesting that a "stylesheet" for
definitions would solve our problems. But we have other needs for
paragraphing and boldfacing, which wouldn't be answered by a map, or a
stylesheet. Therefore, a special-case solution (for defined terms only)
would take time and effort, and add complexity, without addressing the
general need.

For example, today we have a <ProductDescription> element whose content
varies from one proprietary product to another. It's neatly split into
paragraphs in the source code that generates the '.policy' files. But
that formatting doesn't come through into PDF output--instead, the
paragraphs all run together into an unattractive giant blob of text.

How can blobs like that be paragraphed as intended? This command:
  grep -r InforceNonGuaranteedFootnote * |less -S
suggests that a similar need has been handled by introducing
  glossed_string InforceNonGuaranteedFootnote0;
  glossed_string InforceNonGuaranteedFootnote1;
  glossed_string InforceNonGuaranteedFootnote2;
  glossed_string InforceNonGuaranteedFootnote3;
where, in order to keep a multiparagraph footnote free of markup,
we split it into four chunks. I'm sure that's the wrong way.

It's as though, in order to write this new paragraph, I had to send
a separate email--that would be wrong. Instead, I just hit Enter
twice--that's simple, obvious, natural, and righteous. (Of course,
HTML treats newline characters as whitespace, in effect removing
them, so we need something like a pilcrow.)

BTW, although 'InforceNonGuaranteedFootnoteN' is defined for all
N in 0..3, only {0,3} are used today AFAICT. Perhaps {1,2} were
actually used at some time in the past, and weren't (manually)
garbage-collected when they later became unused. But to this day
we still have twelve lines of code in five different C++ files to
implement them--so splitting a footnote like this into four pieces
just in order to avoid using pilcrows leads to persistent code
pollution. Thus, embedding pilcrows in '.policy' files does, in a
sense, to some degree, violate an ideal separation of concerns;
but if we must choose one of
  {plutonium, anthrax, dioxin, pilcrows}
then we need to pick the least harmful.

> GC> Motivating use case: each illustration includes a "dictionary", somewhat
> GC> like this:
> GC> 
> GC>   <P><B>Do:</B> A deer. A female deer.</P>
> GC>   <P><B>Re:</B> A drop of golden sun.</P>
> GC>   <P><B>Mi:</B> A name I call myself.</P>
> GC>   <P><B>Re:</B> A long, long way to run.</P>
> 
>  Just to be clear, all this is to be defined inside a single policy field,
> right?

Looking beyond the narrow case that I was focusing on, it now becomes
clear that we need something like a pilcrow for other cases, beyond
these definitions. Therefore, there's no pressing need to combine all
definitions into a single xml element. But we do need something like:

  "&para;&laquo;Do:&raquo A deer. A female deer."

[Here, I've written html entities, only because I've noticed that the
pilcrows and guillemets I used earlier in this thread are turned into
'??' in email--but in the policy files, it might be better to use the
actual unicode characters.]

> GC> That will bring us back to:
> GC>   <P><B>Do:</B> A deer. A female deer.</P>
> GC> We really need a simple way to express this. The specification is
> GC> not that "Do" exists, and has a name, and has a definition, and it's
> GC> definition is to be formatted in a particular way...and that "Re"
> GC> exists, and..."Mi"... . Rather, it's that we have a sizable set of
> GC> definitions, and they're all to be formatted in a particular style:
> GC>  - name in bold, definition not in bold; and
> GC>  - each as a separate paragraph; and
> GC>  - nothing at all (no empty paragraph) if the definition is empty.
> GC> In that context, a good separation of concerns would express this
> GC> formatting as some sort of stylesheet, and I'd be glad to approach
> GC> it that way if you can think of a way to do it.
> 
>  In the worst case, we could hardcode formatting of such "maps" in the C++
> code generating PDF. This would already have the advantage of restricting
> the HTML features that can be used in the policy files (to none) and of
> centralizing all the formatting logic in a single place

"Maps" or "stylesheets" aren't general enough, as explained above.

But let's try looking at this in a different way--not as two levels
  {content, formatting}
but as three:
  {content, structure, presentation}
much as HTML has introduced "structural" <strong> as distinct from
purely presentational <b>. That would alleviate this objection...

> and even if this
> place is C++ code and not .mst file, this is still advantageous because if
> someone suddenly decides that in 2020 the names should be in bold italic
> and not just in regular bold, this would be trivial to change.

...because the presentation of <strong> could be changed from
<b> to <b><i> (e.g.) in MST, without changing the contents of
'.policy' files, which would have
  <<Do:>> A deer. A female deer. [ersatz ASCII "guillemets" here]
with the same "structural" meaning as HTML <strong>.

Alternatively, maybe we could use '_' and '||': i.e.,
  _strong_
which is in broad general usage already, and
  end of old paragraph||beginning of new paragraph
which is apparently what Sanskrit uses for a pilcrow. And this
"structural" markup could be translated in a distinct C++ function,
so we're never writing html markup in '.policy' files.

> GC> How can we move forward now, without that much labor?
> GC> 
> GC> Of the two ideas presented, for strings in '.policy' files:
> GC>  (1) allow mustache substitutions
> GC>  (2) allow markup for boldface and paragraphing
> GC> it seems that:
> GC> 
> GC>  - You don't strenuously object to (1)...so can we decide now how to
> GC> implement it? E.g., invoke a function twice, as in my experimental
> GC> patch; or rewrite the function so that it doesn't need to be called
> GC> twice?
> 
>  It looks like we should use Mustache partials for this: they're exactly
> what is used for including strings in Mustache syntax from elsewhere. But
> this would make sense only if the string came from the policy file
> directly, we could then (easily) implement something like {{<policy:field}}
> to get it from there and expand.

Such is not the case. Recall that product parameters are embodied
principally in two types of files:
 - '.database': numeric data
 - '.policy': string data
and some narrative text combines both--in 'ill_reg_narr_summary.mst',
for example:

  Loaned amounts of the {{AvName}}
  Value will be credited a rate equal to the loan interest rate less
  a spread, guaranteed not to exceed {{MaxAnnGuarLoanSpread}}.

Might we address that by using different prefixes to indicate the
source of the data, e.g.:
  {{<policy:AvName}}
  {{<database:MaxAnnGuarLoanSpread}}
? I think not, for two reasons. First, '.database' files contain only
doubles, and the formatting for 'MaxAnnGuarLoanSpread' (e.g.) is
specified only in 'ledger_evaluator.cpp':

    // F4: scaled by 100, two decimals, with '%' at end:
    // > Format as percentage "0.00%"
    ,{"MaxAnnGuarLoanSpread"            , f4}

which is the C++ file where all string and numeric ledger data are
assembled together to be served to MST. Second, as this example in
'ill_reg_narr_summary2.mst' shows:

{{#HasSpouseRider}}
    <p>
    The ${{SpouseRiderAmount}} Spouse
    rider provides term life insurance on the spouse
    (issue age {{SpouseIssueAge}})
    for a limited duration, for an extra charge.
    Please refer to your {{ContractName}} for specific provisions
    and a detailed schedule of charges.
    </p>
{{/HasSpouseRider}}

some data used in writing footnotes come from class Input:

  class Input   bool: {{HasSpouseRider}}
  class Input double: {{SpouseRiderAmount}}
  class Input    int: {{SpouseIssueAge}}
  '.policy'   string: {{ContractName}}

and, like '.database' numeric data, are formatted in
'ledger_evaluator.cpp':

// F0: zero decimals
// > Format as a number no thousand separator or decimal point (##0%)
    ,{"SpouseRiderAmount"               , f1}

but encapsulation ensures that the input file inaccessible when MST
substitution is performed.

Likewise, in 'finra_notes1.mst' and elsewhere:

  The initial 7-pay premium limit is ${{InitSevenPayPrem}}.

this variable:
    ,{"InitSevenPayPrem"                , f2}
comes from no file at all--it's calculated dynamically.

>  However if we decided to use "maps" suggested above, then I'm not sure if
> this would make much sense... So let's first decide if we want to do this
> or not.

No. I really regret that I suggested "stylesheets" for definitions,
which led to the idea of "maps" of definitions, because the scope is
definitely not limited to that single special case.

> GC>  - As for (2), is there some simple-enough way to get the boldface
> GC> and paragraph separation we need, without using html? Would it be
> GC> better to use pilcrows and guillemets as above, or does that just
> GC> amount to reinventing html, poorly? Or is there some way to design
> GC> a "stylesheet" that does what we need, while keeping presentation
> GC> and content separate?
> 
>  The fact is that we need to separate the string into its various parts
> somehow. The cleanest way to do it would be to define these different parts
> in different XML tags, but, again, this would certainly require quite a few
> changes.
> 
>  If we're not prepared to start doing this right now, perhaps some sort of
> a hack with separator characters wouldn't be too bad.

It's not a choice of starting to do this now versus at some time in
the future. The point is that we want to avoid doing this at all.
To explain why, let's step back and start a sidebar discussion here.

Under US law, a life insurance policy can become a so-called MEC, which
subjects it to less favorable taxation. In practice, only an insurance
company is capable of ascertaining a policy's "MEC status". Therefore,
illustrations must include a warning when an input scenario produces a
MEC. The attorneys responsible for compliance with this law have the
authority to dictate the way a MEC is disclosed, and they'll often want
to emphasize certain parts of the disclaimers they devise. For example:

  <p>This policy <b>becomes a MEC</b> in year {{MecYear}}, which makes
  it subject to <b>adverse tax consequences</b>. {{InsCompanyName}}
  cannot give you tax advice; you must <b>consult your own advisor</b>
  before taking any cash disbursement from this policy.<p>

The actual wording will vary from one product to another, and may
change over time, because different attorneys may revise these
disclaimers for various groups of policies at various times, in light
of changing circumstances--or maybe they just change their minds and
alter the wording on what may seem to us to be a whim, but they're
empowered to do that. When they review or re-review an illustration:

 - from our point of view, they're looking at a PDF that is just one
   of many possible realizations of an MST template, in the context
   of a particular '.ill' input file; but

 - from their point of view, they're looking at a word-processing
   document, so adding a word here and boldfacing another word there
   are trivial, zero-cost operations within their easy control; and
   from that POV their changes apply only to illustrations for the
   particular policy they're reviewing at that moment, not to a
   generic template.

Thus, their ever-changing prescriptions are logically a switch on
product type, so we must choose one of these options:

 - Just say no: tell the attorneys that they must live with a single
   hardcoded disclaimer in each of our '.mst' files. We've struggled
   to do this for years, but that battle is now completely lost.

 - Insert a switch on product type into MST files. But lmi rigorously
   separates public from proprietary data, so we can't do so directly.
   In the past, the authors of the old XSL-FO templates resorted to
   tortured workarounds that attempt to deduce the product type from
   some of the strings provided by its proprietary '.policy' file,
   whose echoes are still seen in weird variables synthesized in
   add_variable() calls, such as:
     "GroupCarveout", "SinglePremium",
     "ModifiedSinglePremium", "ModifiedSinglePremium0",
     "ModifiedSinglePremiumOrModifiedSinglePremium0"
   But this is broken. The logic is well-nigh incomprehensible. We
   can't trust that it's correct. We can't maintain it.

 - Enumerate products anonymously, and switch on enumerator in MST:
     {{#Product00000}}[variant text]{{/Product00000}},
     {{#Product00001}}[different text]{{/Product00001}}, etc.
   That doesn't leak proprietary information into the public code,
   and it doesn't require gnarled logic like the preceding option.
   But it is a 140-way switch, today, and more products will be added
   as time goes by. At least some markup is required (we can't tell
   attorneys not to prescribe <strong> for certain words or to break
   their long narratives into logical paragraphs), and this is the
   least awful option that keeps all markup in MST. But it's an
   understatement to say this is inconvenient. Recognizing that
   a MEC disclaimer is only one of, say, one hundred required text
   interpolations, this approach would amount to an inversion of the
   140-way logic: instead of 140 tables of 100 strings each, we'd
   have 100 distinct 140-way switches, and that's not practicable.

 - Allow limited, semantic-only (not presentational) markup in
   '.policy' files. For the 140 products lmi supports today, the
   product database *is* that 140-way switch: we need at least one
   such switch, and we can't live with more than one. Thus, XML file
  'sample2xyz.policy' might contain something like:
     <MecDisclaimer>
       <p>This policy <strong>becomes a MEC</strong> in
       year {{MecYear}}.<p>
     </MecDisclaimer>
   but MST files would say only "{{MecDisclaimer}}". This sacrifices
   the absolute separation between text content and markup, but only
   to the minimal extent necessary. I think this is the only option
   that isn't unworkable.

 - Alternatively, store <MecDisclaimer> without embedded markup, by
   breaking it into pieces that require different presentation:
     <Mec00A_plain> This policy              </Mec00A_plain>
     <Mec01A_bold>  becomes a MEC            </Mec01A_bold>
     <Mec02A_plain> in year {{MecYear}}...   </Mec02A_plain>
     <Mec00B_bold>  {{InsCoName}} cannot give tax advice.
                    Consult your own advisor.</Mec00B_bold>
   and then in MST:
     <p>{{Mec00A_plain}}<b>{{Mec01A_bold}}</b>{{Mec02A_plain}}</p>
     <p>{{Mec00B_bold}} <!-- separate paragraph demanded here -->
   But then, if a new reviewer insists on setting "cannot" in bold,
   our tears will flow, and they won't understand why, because, from
   their POV, we can just highlight that word and click a "bold" icon.

We may wish that these specifications were set in stone, but from the
POV of those empowered to set specifications, it's more like deciding
what sort of sandwich they want for lunch...today. If we say "but
yesterday you wanted tuna salad on rye bread", they won't even be able
to parse that as a sane utterance: that was yesterday, this is today,
and any objection belongs in a Monty Python skit.

Returning from that sidebar...

>  The fact is that we need to separate the string into its various parts
> somehow. The cleanest way to do it would be to define these different parts
> in different XML tags, but, again, this would certainly require quite a few
> changes.

Some existing work, like 'InforceNonGuaranteedFootnote[0-3]' above,
effects such a separation of concerns, but that cure is worse than
the disease: the price we pay for that rigidity is fragmenting the
content, and constructing channels to pass an unwieldy number of
fragments individually along a chain
  product files --> ledger classes --> mustache templates --> PDF
with distinct variables for all the fragments. That's just unworkable.

The question is how to relax that rigidity as little as possible in
order to regain simplicity and make this inherited mess manageable.
The gentlest way I can see is to embed structural indicators in the
xml elements in '.policy' files. Thus, <InforceNonGuaranteedFootnote>
is a single entity, which may structurally consist of multiple
paragraphs because paragraph breaks are semantic. Breaking it up into
four pieces destroys its semantic unity: in the problem domain, we
know very well what an inforce non-guaranteed footnote is, but not how
<InforceNonGuaranteedFootnote3> and <InforceNonGuaranteedFootnote2>
differ or what each one might mean.

>  [...] perhaps some sort of
> a hack with separator characters wouldn't be too bad.

Yes, no matter what particular characters we choose, the more I think
about this, the clearer it becomes that pilcrows and guillemets are
sufficient and less offensive than other approaches. I'm thinking
that "||", "<<", and ">>" might be our best option because they're
pure ASCII, and should never arise in descriptions of life insurance
(which never use "||" for logical OR, or "<<" for "much less than").

> Then we could parse
> the string into a map in pdf_command_wx.cpp and apply formatting to this
> map instead.

No map (see above); but I'm glad you wrote this anyway, because it
points out that the extra step of processing whatever guillemets,
pilcrows, and {{partials}} may be contained in '.policy' strings is,
conceptually at least, a distinct step--so should it be a physically
distinct step, e.g.
  left  guillemet --> <strong>  (or  <em>, or  <b>)
  right guillemet --> </strong> (or </em>, or </b>)
applied only to strings taken from '.policy' files?

> The exact details of separators don't matter, but it could be
> something as simple as Markdown-like section syntax:
> 
>       # First definition
> 
>       Definition text. More text.
>       And even more text.

I think I like '||' better than '#', and '{{#' is already reserved
by mustache.

> GC> If we don't have time to figure out the best solution now, should
> GC> we just forge ahead along the lines of my experimental patch, in
> GC> the doubtful hope that we'll come back later and redesign it?
> 
>  I'm afraid this hope is not just doubtful but forlorn.

Agreed. So let's design it as well as we can now.

> GC> Or perhaps I've now been able to persuade you that html isn't such
> GC> a bad solution to (2) after all, given the scope of the undertaking;
> GC> if so, then should we replace that experimental patch with an
> GC> implementation that does the same thing in a less awful way?
> 
>  I understand and accept your arguments, but embedding HTML like this still
> feels dirty to me and there is just one small step from having only <b> and
> <p> tags in there to having 7 levels of nested <table>s for PDF layout. I'd
> really like to avoid making this step possible.

I therefore change my original proposal accordingly, withdrawing the
<p> and <b> suggestion in favor of non-html, structural-only (i.e.,
non-presentational) markup. To use any html at all would be to take
the first step down a slippery slope, so we'll just rule that out
categorically.

Are we ready to proceed to the implementation details?
[Prev in Thread]
Current Thread
[Next in Thread]
[lmi] Embed {{MST}} and <html> in product database, Greg Chicares, 2019/07/18
- Re: [lmi] Embed {{MST}} and <html> in product database, Vadim Zeitlin, 2019/07/18
  - Re: [lmi] Embed {{MST}} and <html> in product database, Greg Chicares, 2019/07/18
    - Re: [lmi] Embed {{MST}} and <html> in product database, Vadim Zeitlin, 2019/07/21
    - Re: [lmi] Embed {{MST}} and <html> in product database, Greg Chicares, 2019/07/22
    - Re: [lmi] Embed {{MST}} and <html> in product database, Vadim Zeitlin, 2019/07/22
    - Re: [lmi] Embed {{MST}} and <html> in product database, Greg Chicares <=
    - Re: [lmi] Embed {{MST}} and <html> in product database, Vadim Zeitlin, 2019/07/25
    - Re: [lmi] Embed {{MST}} and <html> in product database, Greg Chicares, 2019/07/26
    - Re: [lmi] Embed {{MST}} and <html> in product database, Vadim Zeitlin, 2019/07/26
    - Re: [lmi] Embed {{MST}} and <html> in product database, Greg Chicares, 2019/07/27
    - Re: [lmi] Embed {{MST}} and <html> in product database, Vadim Zeitlin, 2019/07/27
Prev by Date: Re: [lmi] Embed {{MST}} and <html> in product database
Next by Date: [lmi] Empty paragraphs in HTML-MST
Previous by thread: Re: [lmi] Embed {{MST}} and <html> in product database
Next by thread: Re: [lmi] Embed {{MST}} and <html> in product database
Index(es):
- Date
- Thread