bug-mailutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC 822 address parsing API


From: Sam Roberts
Subject: Re: RFC 822 address parsing API
Date: Thu, 8 Mar 2001 23:18:24 -0500
User-agent: Mutt/1.3.16i

Quoting Alain Magloire <address@hidden>, who wrote:
> : Since the address-list may contain multiple addresses, they are accessed
> : by a @strong{one-based index number}, @var{no}.
> : 
> : Comment: Why is this one-based?
> 
> To be consistent whith mailbox_get_message().  Message are one-based count.
> Most protocol (IMAP) etc .. use one base count, the first message is 
> msg number 1.  The library takes care of mapping it to whatever forms/index 
> it uses.

Fair enough. I added a comment to that effect.

> : There needs to be a way of accessing the local-part and the
> : domain of an email address seperately, the syntax of local-part
> : is too complex
> : to expect somebody to parse it, in particular strchring for '@@'
> : will fail if there is an '@@' in the local-part, which is valid if
> : its quoted or escaped.
> 
> Good point.  I did not go that far, what do you suggest?

This:

extern int address_get_local_part
   __P ((address_t, size_t, char *, size_t, size_t *));
extern int address_get_domain
   __P ((address_t, size_t, char *, size_t, size_t *));

> I know there
> is work underway for a new RFC822, maybe it should follow this direction.

You mean somebody is working on reworking mailutil's address.c, or do you
mean the new internet-draft update to rfc822, drums?

> : You can't create an address or an address list.
> 
> Maybe  I'm missing something here, but :
> {
>   address_t add;
>   address_create (&add, "address@hidden, address@hidden");
> }

You made an rfc822 valid address list on the right, knowing what is
and isn't valid. What if I want to compose an email address out
of my address database from the three fields: phrase "sam :-)",
local-part "me", and domain "local.net". If I'm not familiar with
rfc822, I might

sprintf(s, "%s <address@hidden>", "sam :-)", "me", "local.net");

Calling address_create() on that mess results in:

$ ./addr 'sam :-) <address@hidden>'
'sam :-) <address@hidden>' ->
  pcount 1
  personal ''
  comments '< address@hidden >'
  email 'sam:-'
  to_string 'sam :-) <address@hidden>'

Oops! So functionally what's needed is to transform:

rfc822 ->  (personal, localpart, domain) [unquoted]

(personal, localpart, domain) -> rfc822 [personal and localpart
                                             correctly quoted]

The user of the api shouldn't have to know rfc822's quoting rules.

I suggested something like:

address_create(&addr, NULL)

address_append(addr, "sam :-)", "me", "local.net");

address_to_string() -> would result in a string with the personal
part being surrounded with double quotes.

> 
> : What about groups? They are easy to parse, but a pain from an api point
> : of view. _address, instead of being a pure linked list, could
> 
> Groups was not "address" in the first API.
> One of the problem, is group can sometimes be confuse with "aliases" that
> Mail clients have.  They use it in address book to maintain a list of
> email address.  They propably don't want to have the "group:" name part
> of the final address.

I don't know what mail clients use in their address books, but how
about this as a discussion of what I see as functionally useful.

The api should allow any valid syntax to be parsed, and any reasonable
syntax to be generated, so that includes parsing groups, and probably
generating groups. IMO, this would be justified just on principal.

Software is out there that will put group notation in headers, mutt
for example:

mutt -shello 'a group: address@hidden, address@hidden;' < /dev/null

--- in my nullmailer queue --
address@hidden
address@hidden
address@hidden

Received: (nullmailer pid 4558898 invoked by uid 100);
        Fri, 09 Mar 2001 02:30:43 -0000
Date: Thu, 8 Mar 2001 21:30:42 -0500
From: Sam Roberts <address@hidden>
To: a group: address@hidden, address@hidden;
Subject: hello
Message-ID: <address@hidden>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.3.16i

----

So either the MUA or the MTA managed to extract the valid email addresses
from that group notation, the mailutils API should be useful for this
kind of thing.

For instance an implementation of 'sendmail -t' wouldn't care whether
the To field contains a group or a list, it just gets the field from
the mail header, pulls out all the email addresses, and tries to
deliver to them.

So I think that an address_t should BE a group, and the default
iteration through it (address_get_email(0, ), ..get_email(1,  ...)),
should just pull email addresses out, not matter the complexity of
the internal structure of the To field.

I'd say that's a minimaly useful functionality.

It would also be the kind of thing Sieve would want, it would just
treat the group name and semicolon as syntactic noise around an
address list.

> So are you suggesting to have a special object say :
> address_group_t or address_alias_t
> that would let client regroup in simple aliases mail address?
> 
> : // -1 is the end, anything else is position to be inserted at, for ex.
> : address_create(&group, NULL)
> : address_insert(group, -1, "Sam", "sroberts", "uniserve.com");
> : address_insert(group, -1, "Joe @ \"home\"", "joe", "uniserve.com");
> : address_insert_group(addr, group, "the uniserve boys");
> 
> Again your example is not clear(to me 8), do you mean
> 
> group_t group;
> group_create(&group, NULL)
> group_insert(group, -1, "Sam", "sroberts", "uniserve.com");
> group_insert(group, -1, "Joe @ \"home\"", "joe", "uniserve.com");
> 
> // For an already constructed address_t.
> // To bad no overloading in C.
> address_t add;
> ....
> group_insert_address (group, addr);
> 
> Note:  for the -1 hack how about a cover function :
> 
> group_append (group_t group, char *s1, char *s2, char *s3)
> {
>   return group_insert (group, -1, s1, s2, s3);
> } 

I think you kind of got what I was fumbling at, with the two
clarifications:

1 - an address_t should encapsulate a group, I don't think you need
    want a seperate group_t (that was my point from above).

2 - the reson I had an insert function that took multiple args, was
so it would know how to quote the parts, and make them into either
a validly quoted rfc822 mailbox.

> : and utf-8 in the strings, just allow it? The machinery to encode
> : and decode header fields according to the MIME spec doesn't really
> : belong here, it's a layer on top of rfc822.
> 
> I disagree.  We have two objects header_t and addresss_t.
> header_t is dumb just extract the rfc822 :

...
> header_get_value (header, value, ....)
> header_set_value (header, value, ....)
...
> Nothing more is done.

I'm with you so far.

> Adress_t is more complex, parsing is done.
> 
> The address_t can take one more method encode()/decode (), not sure
> on the API/prototypes.  When set, things like.
> 
> =?ISO-8859-1?Q?Fran=E7ois_Pinard?= <address@hidden>
> 
> Will be map back properly when retreiving the info:
> 
> address_get_personal (add, pos, buffer, ... )

It's more complicated than that. Fran=E7ois_Pinard is just
quoted printable encoded, you don't even need to know the
charset to decode it. But you do to display it! What if my
terminal is in utf-8 mode, what would I see if the result
of get_personal() had a non-ascii 8bit char that happened to
be cedilla in iso-8859-1? My MUA *could* display it, but
it needs to do something more like this (pseudo-code):

address_get_personal(add, pos, buf, ...);

rfc2047_translate("utf-8", buf, decodedbuf);

-> decoded buf is Francois, decoded into binary, iconv()ed
   into utf-8

rfc2047_translate("utf-8", "hello, world", decodedbuf);

-> decoded buf is "hello, world", since ascii is same
   in utf-8 (I think), and it isn't rfc2047 encoded.

And what about:

To: =?ISO-8859-1?Q?Fran=E7ois_Pinard?=
  =?japanese?B?DFFGHI=?= <address@hidden>

I made up the japanese thingy, but this last, on my nice
is-8859-1 supporting console, shouldn't even try to
decode that japanese part, it'll just leave it as is,
I hope.

> 
> The same should be true for the reverse.
> 
> (where "c," is cedilla i.e. a char over ascii 7 bits.)
> address_set_personal (add, "Franc,ois Pinard")
> --> =?ISO-8859-1?Q?Fran=E7ois_Pinard?=

How does set_personal() know my terminal isn't in koi8!
It needs to know the charset so it can know how to fill
the charset in in the RFC2047 encodeing.

===

Summary of my suggestions:

. address_t should allow group notation to be iterated as if it
was an unadorned address list.

 -> just a matter of having the parser keep on going through a
    group

. there should be an api to create an address that knows to quote
the 3 parts of a mailbox.

 -> address_append_mailbox(address_t, const char* lp, domain, phrase)

(forget the insert, build the thing up in the order you want it)

. there should be an api to get the local-part and domain part
of an address, seperately, and unquoted.

 -> get_email() gets an address that could be delivered to, fed
    into an smtp dialog, for example, the localpart and domain
    part would be the unquoted thing. I also think that the
    return of get_personal() should be unquoted.

. there could be an api to create groups, and to append addresses:

 -> address_append_group(address_t, "the group name", address_t);
 -> address_append_list(address_t, address_t);

(also forget the insert, build the thing up in the order you want it)

. there could be an api that allowed you to traverse the address
tree, find group nodes, and walk their contents knowing that
you are in a named group. Perhaps not widely useful, but I can see
neat things that could be done.

. there should be an api to translate a header field from rfc2047
encoded form, into the local charset, and from the local charset
into rfc2047 form.

Sam

p.s. I didn't think you'd actually patch the docs, I just included it
as something to talk about, now that it's there I'm going to take an
hour and finish it, and chop the comments out.

-- 
Sam Roberts <address@hidden> (http://www.emyr.net/Sam)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]