Re: %(addr{<component>}) and RFC-invalid headers

nmh-workers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: %(addr{<component>}) and RFC-invalid headers

From:	Ken Hornstein
Subject:	Re: %(addr{<component>}) and RFC-invalid headers
Date:	Thu, 30 Mar 2023 19:43:14 -0400

>However, ISTM having addr output text that can be something other than
>its documented "contract" (an mbox@host or host!mbox) is perhaps a bit
>dodgy, even when the root issue is invalid (or unsupported) input. I
>don't know how the formatting engine works, but couldn't addr
>conceivably scan its own prospective output and error if it's not in the
>right shape?  (This is idle speculation, not a request; and I've no idea
>whether such a change would be an unwelcome incompatibility for other
>folks. As I mentioned above, I think I can work around what addr appears
>to do.)

_Could_ it?  Weeeellll ... maybe?

The "ap" command does this (and fmttest in -address mode), but what it
does isn't suitable for general consumption.  What IT does is directly
try parsing the address; if that fails it sets a special "error"
component to the error message it got from the address parser (oddly
enough this is one of the few places you DO get an error from the
address parser), and the default format program for ap and fmttest
check to see if the {error} component is set.  Setting a special component
for a general message is kind of a bit tricky, especially since there
are multiple headers that could be parsed and I don't like intruding into
that namespace for a message.

What happens under the scenes is a little more complicated; let's
break it down a bit.  Here's a very simple format program in it's
compiled form:

% fmttest -dump -format '%(addr{text})'
Instruction dump of format string: 
%(addr{text})
        PARSEADDR, c_name "text", c_type 0x1<ADDR>
        LS_ADDR, c_name "text", c_type 0x1<ADDR>
        STR
        DONE

Every time a format function is invoked which involves address parsing,
it calls the PARSEADDR function; once a component is parsed it sets the
CF_PARSED flag for that component so further calls to PARSEADDR will
simply make it a no-op and the address parser is only invoked once per
component.

_If_ the address is parsed successfully, a field in the component
structure is set to the value of the address parser return (the getm()
function) which has all of the various address fields broken out.  If
address parsing FAILS, then the field in the component structure is
set to someting called fmt_mnull, which has every address field set to
NULL.

Now what happens next is a little bit interesting.  For the SPECIFIC
cases of the %(addr) and %(friendly) functions (they end up emitting
a LS_ADDR and LS_FRIENDLY instructions respectively in addition to
PARSEADDR), if the address parser has failed (using fmt_mnull) then
the format engine will output the original text of the component.
Everything ELSE, it will end up using the value from fmt_mnull which is
typically a NULL pointer, which means nothing will be output.  So the
decision to always output the component text on address parsing error
for %(addr) and %(friendly) was deliberate.  I could see that making
sense for %(friendly), but it's fair to point out that %(addr) is a bit
of a tougher sell and is inconsistent with it's stated output.  But ...
that's long-standing behavior and I have reluctance to change it.  I'm
open to discussion here.

I guess this boils down to a question of (a) Are there any changes we
should make in the long run, and (b) are there things you can do today
to make things better?

For (a), we should handle those addresses better.  But should we
make a format function that could detect a mis-parsed address?  That
seems straightforward to me; might require a slight API extension
internally but shouldn't be too bad.  Something like %(addrerror{from})
that would return a true if the address failed to parse correctly.

For (b), it DOES occur to me that you could use the feature of the return
of NULL for most invalid address parts to test for a mis-parsed address.
E.g., "%(mbox)" (which normally returns "user" for an email address of
"user@host").  So you could do this:

        %<(mbox{text})%(addr{text})%|Address is borked%>

Which I think would do what you want, today.  Obviously put whatever
you want for "Address is borked" and use the appropriate component for
{text}.

--Ken

[Prev in Thread]

Current Thread

[Next in Thread]

%(addr{<component>}) and RFC-invalid headers, Richard M Kreuter, 2023/03/29
- Re: %(addr{<component>}) and RFC-invalid headers, Ken Hornstein, 2023/03/29
  - Re: %(addr{<component>}) and RFC-invalid headers, Richard M Kreuter, 2023/03/30
    - Re: %(addr{<component>}) and RFC-invalid headers, Ken Hornstein <=

Prev by Date: Re: %(addr{<component>}) and RFC-invalid headers
Previous by thread: Re: %(addr{<component>}) and RFC-invalid headers
Index(es):
- Date
- Thread