lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Embed {{MST}} and <html> in product database


From: Vadim Zeitlin
Subject: Re: [lmi] Embed {{MST}} and <html> in product database
Date: Sat, 27 Jul 2019 02:29:40 +0200

On Sat, 27 Jul 2019 00:01:11 +0000 Greg Chicares <address@hidden> wrote:

GC> On 2019-07-25 22:53, Vadim Zeitlin wrote:
[...]
GC> > GC> Alternatively, maybe we could use '_' and '||': i.e.,
GC> > GC>   strong
GC> > GC> which is in broad general usage already, and
GC> > GC>   end of old paragraph||beginning of new paragraph
GC> > GC> which is apparently what Sanskrit uses for a pilcrow. And this
GC> > GC> "structural" markup could be translated in a distinct C++ function,
GC> > GC> so we're never writing html markup in '.policy' files.
GC> > 
GC> >  You're basically proposing to use Markdown in policy files
GC> 
GC> Yes. But I still have some reservations about using ASCII.
GC> Normal narrative text drafted by attorneys wouldn't include an
GC> underscore, but we might encounter something like
GC>   "Agent's signature: ________________"
GC> which is sixteen underscores:
GC>     ________________
GC> , not fourteen boldface underscores:
GC>   <b>______________</b>
GC> or (if replacement is recursive) an empty septuply-bold string:
GC>   <b><b><b><b><b><b><b></b></b></b></b></b></b></b>

 Right, there are going to be corner cases with underscores and asterisks.
But if we can afford diagnosing everything we don't understand (i.e. not
matching underscores nor sequences of 3 or more underscores nor whatever
other legal use of underscores could there be), they shouldn't be a problem
in practice.

GC> But USA attorneys are never going to use guillemets,

 The trouble is that neither will anybody else. Learning to write

        _this is emphatic_

is much simpler than learning to write

GC> so
GC>   «this is emphatic»

GC> is unambiguous. And if <strong> is written with guillemets,
GC> it's easier to detect whether it has been closed:
GC>   ««this is emphatic»   Error: unclosed
GC>   __this is emphatic_   Error, but harder to see
GC> 
GC> So now I'm thinking that guillemets and pilcrows might be better
GC> after all. This silly sample does work:

 I still disagree just because they seem so unnatural to me and I'm pretty
sure the feeling will be shared by anybody needing to write them. In
particular, using quotes for emphasis is really strange, we should choose
something even more exotic if we go this route.

GC> > [markdown] honours the
GC> > explicit line breaks between paragraphs (while still soft-wrapping the
GC> > paragraphs themselves)
GC> 
GC> I'm not sure how we'd represent that in the C++ source code that
GC> generates product files, because
GC>   std::string const s =
GC>     "This is one paragraph."
GC>     " And this is a separate paragraph";
GC> is preprocessed to
GC>     "This is one paragraph. And this is a separate paragraph"
GC> so I think we still need something like ¶.

 No, I'm even more sure we don't need the pilcrows because we can just use
raw C++11 strings instead:

        std::string const s = R"(
                This is one paragraph.

                And this is a separate paragraph.
                )";

(and here spaces insignificance in HTML actually plays for us because we
can indent the string in any way we want -- or not).


GC> [...variable substitutions like "This illustrates your {{NameOfPolicy}}"...]
GC> 
GC> > GC> >  It looks like we should use Mustache partials for this: they're 
exactly
GC> > GC> > what is used for including strings in Mustache syntax from 
elsewhere. But
GC> > GC> > this would make sense only if the string came from the policy file
GC> > GC> > directly, we could then (easily) implement something like 
{{<policy:field}}
GC> > GC> > to get it from there and expand.
GC> > GC> 
GC> > GC> Such is not the case. Recall that product parameters are embodied
GC> > GC> principally in two types of files:
GC> > GC>  - '.database': numeric data
GC> > GC>  - '.policy': string data
GC> > 
GC> >  Just to be clear, I only suggested using partials for .policy files 
fields
GC> > and even then only for those for which it's necessary to do it. I.e. for
GC> > simple fields, not requiring Mustache interpolation of their contents, 
we'd
GC> > still continue to use just {{field}} syntax and {{<policy:field}} would be
GC> > available in addition to it.
GC> > 
GC> >  Do you still object to doing it even so?
GC> 
GC> Yes, because I see no advantage to doing so.

 One immediate advantage is that we avoid double interpolation that you've
been (understandably) unhappy about. With partials, this interpolation will
be done only when necessary inside interpolate_string() itself.

GC> If 'sample2xyz.policy' contains
GC>   NameOfPolicy = "group insurance certificate";
GC>   Footnote = "Read your {NameOfPolicy} carefully";
GC> then that's already as simple as possible and as powerful as necessary.

 It's confusing though, as you never know whether you string is going to be
interpolated or not and how many times.

GC> This
GC>   Footnote = "Read your {<sample2xyz.policy:NameOfPolicy} carefully";
GC> is verbose and redundant, and we'll absolutely never want anything like
GC>   Footnote = "Read your {<SomeDifferentFile.policy:NameOfPolicy} carefully";

 I don't think we need the name of the file here. In my "{{<policy:field}}"
proposal, "policy" was literal and just denotes a namespace (whereas the
implicit namespace is "file", i.e. in "{{<foo}}", "foo" is interpreted as a
file name currently).

GC> >  I really think we should go with the standard conventions (used by
GC> > Markdown but really predating it for a couple of decades in common use) 
and
GC> > use "_" and/or "*" for emphasis. And I think even some lawyers would be
GC> > familiar with this use, unlike the use of "«" or "¶".
GC> 
GC> Let me emphasize that only Kim and I would ever see a pilcrow or guillemet,
GC> and then only in '.policy' files and the source code that generates them.
GC> OTOH, I haven't asked her yet, and she may find them inconvenient.

 Unless the uses Vim (and knows about Ctrl-K P I), I suspect she will. But
I'd like to emphasize, again, that I'm quite certain we don't need pilcrows
because using them instead of plain line breaks is just being weird for the
sake of being weird -- this has no advantages whatsoever.

GC> > GC> Are we ready to proceed to the implementation details?
GC> > 
GC> >  There is still the question of whether to use Mustache partials or not.
GC> > This, of course, affects only the use of MST in the .policy files, but not
GC> > HTML/markup, so if this has higher priority (as I guess it might), then 
the
GC> > answer to the question above is "yes".
GC> 
GC> Okay, we're ready to proceed.

 Note that if we don't use partials, we'll have to interpolate all the data
coming out of the policy files. This is probably not the end of the world,
but it does feel a bit strange to do it.

GC> >  Moreover, I think this discussion could be rather short: let's just
GC> > implement support for very minimal Markdown subset right now. The only
GC> > real question I have is about error checking/reporting: what would be the
GC> > best way to deal with things like "This is _important, even if the closing
GC> > underscore has been forgotten"? Worst would be to generate an unclosed <i>
GC> > (or whatever) tag, but we won't do this. But should we ignore this lone
GC> > underscore (i.e. drop it), preserver it verbatim, warn the user about it
GC> > (but they don't control the .policy files contents, so what good would 
this
GC> > warning do to them?) or try to fix it automatically (tempting, but
GC> > potentially dangerous)?
GC> 
GC> Alternatively, use '«' and '»', and automatically diagnose anything
GC> other than matched pairs, at the time the '.policy' files are
GC> generated, or immediately thereafter. We don't need multiple degrees
GC> of <strong>, so '«[^»]*«' can be treated as an error.

 Using matching characters does have its advantages but I'm not sure if it
outweighs the familiarity and ease of entering the usual ASCII notation. If
we do use a pair of matching characters, I think it should be something
other than guillemets because they're just quotes and so don't associate
with emphasis at all. Maybe we could use "「" (U+300C) and "」" (U+300D)
instead, at least they clearly don't mean anything.

 Regards,
VZ

Attachment: pgp2RPxzrCOy4.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]