bug-mailutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: The state I'm in - rfc 822 parsing.


From: Sam Roberts
Subject: Re: The state I'm in - rfc 822 parsing.
Date: Wed, 14 Mar 2001 11:29:01 -0500
User-agent: Mutt/1.3.16i

Quoting Alain Magloire <address@hidden>, who wrote:
Allo!

> > If I find the awful comment, I put it where I would have
> > put a phrase.
> 
> Seems you actually remove comments totaly.
> Some clients may want them right.  Especially in the "Received:"

Uh, GetMailBox() grabs the comment and puts in the personal,
doesn't it? I don't have the source in front of me, but
if what I sent you doesn't do that I sent the wrong diff.

The underlying GetComment() collects the contents of the comment,
so if (when) I parse Received fields, I will get the comment
contents and return them.

> > I accept route-addrs without a phrase in front.
> > 
> > Groups just look like more addresses, I discard their names.
> 
> Ok, suggestion; how about to build on top(not now, but much ... much later)
> something that would permit a user to build and save aliases:
> alias_create (&alias, "canadian_hackers");
> alias_add (alias, "address@hidden");
> alias_add (alias, "address@hidden");

Sure, but I'm not a fan of calling it alias. Alias is used as a nickname
feature in mail clients, and the alias can be to a single route-addr,
and usually is. Alias is used in sendmail, I think, to expand to
a list of addresses, #(mailbox), not to a group. In general, I like
to call something by the RFC name, i.e. "group".

> >  (in the attached files, search for "b- ", looks like the
> >  author of qmail has a sense of humour)
> 
> 8-), If I remember djb, he was not easy to deal with.

Careful there, some people say that about RMS! ;-) Actually, his
tone on drums was(is) shocking, which is particularly frustrating
because some (but not all) of his points are valid, which tends
to get lost in his own noise. Maybe he has good reasons to be so
frustrated, I don't know the whole history.

> > - <>, valid in SMTP "rcpt from:", should I parse and ignore it?
> 
> I vote strip. Since I want to reuse the email part to build the "From "
> string separator for Unix Mbox.

By strip, you mean parse and ignore? If I parse and ignore, this
would be a list with two addresses: address@hidden, <>, address@hidden

> > - I don't accept > 127 chars as ascii anywhere, should I allow
> > them in the personal name?
> 
> Technically it should not be permit since rfc822 explicit specify
> that header should be ascii7, maybe substitute them with "?"

I agree, I thought you wouldn't since many european's include
non-ascii in the phrase or comment field of an address, for
their name, with clients that don't mime encode the field.

> Yes, you could use the "poison" mallocs (xmalloc(), xrealloc, xcalloc)
> but I do know if that is the proper way for a library to behave. 

Our test suites at work do amazing things, we do a loop around
our test cases, and before entering the loop set the n'th malloc()
to fail. Then we call the test, it fails with ENOMEM, we then
check that all allocated memory was free. Then we increment n.
It's stunning what exhaustive checking turns up... I'll bet you
a case of beer the same approach to libmailbox would turn up
bugs.

> > feedback on whether the reallocations I'm doing make people
> > cringe. I.e, whether I need to rework the building up of strings.
> > I can think of some other ways to do it, but I don't want to
> > if I don't have to.
> 
> The realloc-by-1 is hard.  Why not construct a "string class" and realloc
> by a fixed size, say 16/32, since it's only short strings.
> 
> struct _string {
>   char *buffer;
>   size_t buflen;
> };
> and pass around the struct _string *;

Ok. How about I use my favorite optimization, double memory on realloc(),
half when usage drops below a quarter of allocated. This leads to O(log n)
performance, or something, rather than O(n/16). I'll do this.

I was also thinking about using
struct _strspan {
        char* begin;
        char* end;
        };

So I could parse through stuff, return what I found, and the caller could
decide whether they wanted to allocate space for it. Have to look and see
if  that's useful.

> Or using GNU obstack object.

How easy would it be to cut GNU obstack code into mailutils? Not
all libraries are GNU...

> > - proper prototypes and naming conventions, maybe even a more
> > abstract api for the parser, C++ comments -> C comments, etc.
> 
> Some serious nitpicking, 8-) :
>
> - (*)Proper prototypes is a good thing.

parse822_ as a function prefix ok? _ as word seperator, with
all lower case function names ok? The current names are a
result of ':%s/MParse822:://g'...

> - (*)C comments, please, some old compiler will chocke.
> - (*)Also not using c++ reserved words, I saw a couple of "char *new;"

Ok.

> - (*)A more consitent use of "const char *" versus "char *"
>   for example StrAppend*() should be :
>   int StrAppend (char **to, const char *from);

I didn't do this initially, because the C spec has a bug:

char* acp[];

void f(const char* const a[]);

f(acp); /* this causes warnings in C, though not C++, C is
  being lazy, its a safe conversion. */

It's a pain in the ass. Since I use double indirection a lot,
I'll have to cast all over the place.

> - (*)"new = (char*) realloc()", cast not needed

realloc() returns void*, same as malloc, doesn't it? I'll check.

> - I saw a mailbox_t can we rename this for consistency, since
>   the name is already taken.
Ok.

> - The return values looks a little inconsistent 0== vs != 0

I have to fix them. In the C++ version exceptions were thrown
for out-of-memory, so the only failure was parse failure. I
have to rewrite the return values so failur for memory
or parsing can be returned.

> - (**)GNU coding style: this not a high priority, I probably will not do 
> nothing
>   now, since lots of othe code have different style, for example, mime.c.
>   It's up to you, some people can not stand it.

The GNU coding style was invented as a demonstration of how cool
emacs is, IMO. There's not enough time in my life to convince my
editor to support GNU conventions. I could run it through indent,
but it would probably make changes all over the place. How do
people who don't use emacs usually produce GNU code? If somebody
has a vim cfg statement that will do it, I'll use it.

> - (***)Some memory leaks, when the parsing failed, since it's a library
>   we should release, (see append patch, ok to commit?).  

Some? I don't recall checking the return value of a single StrAppend()!

I'll look at the patch tonight.

> (*)I can submit a patch to correct those if you want.
> (**) I can submit a patch but ... you may not like it 8-).
> (***) patch in attachement, not commited.

In general, I'm not adverse to fixing bugs in code I wrote! 8-)

I'll integrate your patch with a bunch of the other things.

Btw, my intention was to make the parsing functions public,
not static, that's what I meant by "proper" prototypes.

Should parse822.h be in mailbox/include, mailbox, or
include/mailbox? The component functions can be quickly used
by others, for instance if you want'ed to parse the
route, you would build your own loop, but use the parse822
functions to grab the bits, skip the comments, etc. Or if
you were to use it to parse the header fields, which I was,
you might want to use its functions from somewhere else.

Content-Description: C program text
> --- parse822.c.orig   Tue Mar 13 20:55:12 2001
> +++ parse822.c        Tue Mar 13 23:02:30 2001
> @@ -71,7 +71,7 @@ int StrAppendN(char** to, char* from, si
>               if(!new) {
>                       return 0;
>               }
> -             
> +
>               *to = new;
>       } else {
>               *to = (char*) malloc(n + 1);
> @@ -251,7 +251,7 @@ int GetComment(char** p, char* e, char**
>       if(!GetSpecial(p, e, '(')) {
>               return 0;
>       }
> -     
> +
>       while(*p != e) {
>               char c = **p;
>  
> @@ -398,6 +398,7 @@ int GetPhrase(char** p, char* e, char** 
>                       StrAppend(phrase, word);
>                       *word = 0;
>               }
> +             StrFree (&word);
>       }
>       return 1;
>  }
> @@ -439,7 +440,7 @@ int address_create0 (address_t* a, const
>       if(!a || *a) {
>               return EINVAL;
>       }
> -     
> +
>       status = GetAddressList(a, (char*) s);
>  
>       if(status > 0) {
> @@ -469,7 +470,7 @@ int GetAddressList(mailbox_t** a, char* 
>  
>       /* An address can contain a group, so an entire
>        * list of addresses may have been appended, or no
> -      * addresses at all. Walk to the end. 
> +      * addresses at all. Walk to the end.
>        */
>       while(*an) {
>               ++ok;
> @@ -529,7 +530,7 @@ int GetGroup(char**p, char* e, mailbox_t
>  
>               /* see if there are more */
>               SkipComments(p, e);
> -             
> +
>               while(GetSpecial(p, e, ',')) {
>                       SkipComments(p, e);
>  
> @@ -579,6 +580,7 @@ int GetMailBox(char** p, char* e, mailbo
>  
>               if(!GetRouteAddr(p, e, a)) {
>                       *p = save;
> +                     StrFree (&phrase);
>                       return 0;
>               }
>  
> @@ -622,7 +624,7 @@ int GetRouteAddr(char** p, char* e, mail
>               *p = save;
>  
>               address_destroy(a);
> -             
> +
>               return 0;
>       }
>  
> @@ -632,7 +634,7 @@ int GetRouteAddr(char** p, char* e, mail
>  int GetRoute(char** p, char* e, char** route)
>  {
>       // route = 1#("@" domain ) ":"
> -     // 
> +     //
>       // Note: I don't hav a way of returning the route, so toss it for now.
>  
>       char* accumulator = 0;
> @@ -642,6 +644,7 @@ int GetRoute(char** p, char* e, char** r
>  
>               if(!GetSpecial(p, e, '@')) {
>                       // it's not a route
> +                     StrFree(&accumulator);
>                       return 0;
>               }
>  
> @@ -651,6 +654,7 @@ int GetRoute(char** p, char* e, char** r
>  
>               if(!GetDomain(p, e, &accumulator)) {
>                       // it looked like a route, but there's no domain!
> +                     StrFree(&accumulator);
>                       return 0;
>               }
>  
> @@ -666,6 +670,7 @@ int GetRoute(char** p, char* e, char** r
>       SkipComments(p, e);
>  
>       if(!GetSpecial(p, e, ':')) {
> +             StrFree(&accumulator);
>               return 0;
>       }
>  
> @@ -682,18 +687,25 @@ int GetAddrSpec(char** p, char* e, mailb
>       char* local_part = 0;
>       char* domain = 0;
>  
> -     if(!GetLocalPart(p, e, &local_part))
> -             return 0;
> +     if(!GetLocalPart(p, e, &local_part)) {
> +            StrFree (&local_part);
> +            StrFree (&domain);
> +            return 0;
> +     }
>  
>       SkipComments(p, e);
>  
>       if(!GetSpecial(p, e, '@')) {
>               *p = save;
> +             StrFree (&local_part);
> +             StrFree (&domain);
>               return 0;
>       }
>  
>       if(!GetDomain(p, e, &domain)) {
>               *p = save;
> +             StrFree (&local_part);
> +             StrFree (&domain);
>               return 0;
>       }
>  
> @@ -730,6 +742,7 @@ int GetLocalPart(char** p, char* e, char
>  
>               if(!GetLocalPart(p, e, &more)) {
>                       *p = save;
> +                     StrFree(&more);
>                       return 1;
>               }
>               // append more
> @@ -768,6 +781,7 @@ int GetDomain(char** p, char* e, char** 
>               char* more = 0;
>               if(!GetDomain(p, e, &more)) {
>                       *p = save;
> +                     StrFree(&more);
>                       return 1;
>               }
>  
> @@ -835,6 +849,7 @@ int GetDomainLiteral(char** p, char* e, 
>       if(!GetSpecial(p, e, ']')) {
>               *p = save;
>  
> +             StrFree(&literal);
>               return 0;
>       }
>       StrAppendChar(&literal, ']');


-- 
Sam Roberts <address@hidden>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]