bug-mailutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

sieve, and header_t's api


From: Sam Roberts
Subject: sieve, and header_t's api
Date: Sun, 22 Apr 2001 17:08:23 -0400
User-agent: Mutt/1.3.16i

So I'm working on getting cmu's sieve implementation working with
mailutils. It's got a callback structure, wherein it parses and
evaluates scripts, but you provide the callbacks to get headers,
message sizes, and perform actions on the message (discard, file to
another mailbox, etc.). The getheader() cb fn looks like:

int (*getheader)(void* msg, const char* name, char*** value);

The idea is a field name can occur multiple times, so this function
fills *value with a pointer to a null-terminated array of pointers
to the values found for that field. Seems fine... but there's
a problem. Who owns the data?

In cmu sieve, the implementor of getheader() owns the data, so it
passes pointers to data that it keeps back to the caller. So what
you do is you build a table of headers that have been asked about,
and the arrays of all the values for that header that were in the
message. But you can't call the header api's that get the field
by name (they'll only return the first value, we want all of them).
So, what you pretty much have to do is go through the headers
in order (1, 2, ...) get the value, build a data structure that
has a node for every field name that was found, and all the values
for that field name are stored in that node.

Then you free this thing after the sieve script has run on the message.

This is all fine. Except that as I was looking through the mailbox
code I see that the header_t *already* appears to have such a
structure. In other words, it appears that for on-disk mailboxes,
that the header is parsed completely, and a table of all the values
found is built. For IMAP it looks like it cache's field values after
looking them up.

There are two styles of api to get info that are character strings,
one allocates a string for you, and passes it back, the other
takes a pointer to data you allocated, and fills it in.

I need a header_t API that passes me back pointers to const data
that has a lifetime of the header_t. Since there isn't one, I
have to pull all the header field values out of header_t, and put
them in my own data structrure, and then query it for header
values. Basically I build a cache of the header field's and pass
back pointers into that cache.

I wouldn't even mention this, except that it appears that inside
the mailbox API, the mailbox is doing the same thing...

The address_t is doing the same thing. The caller wants the value
of a string. That string *is* already in existence inside the
address_t. It won't have a lifetime past that of the address_t, and
it can't be modified, but a lot of callers might just want to get
a pointer to it so they can print it out, or do a quick test against
it. They are forced to reallocate for no reason.

----

I'm sorry this was so long-winded, it would be fast with a whiteboard!

A little, this is just some observations, but I think it would be
worth considering doing some of the following:

1 - use this inside the header_t code, and add APIs to header
and addres that return pointers to internal data:

address_get_email_p(address_t a, size_t no, const char** email);
  // *email will point into the address_t, and they can't change it,
  // or access it after the address is destroyed.

header_get_field_values_p(header_t h, const char* name, const char*** values)
  // similar style to above, but.

2 - write a cache_t class, that stores name/value pairs, where there
can be more one than one value per name. Sieve had this, and I'm
cleaning it up a bit. Using it I'm implementing an API like -2- suggests.

I guess that I can do this means that the API is ok, but it seems like
such a general purpose thing, that I'd like to maybe clean it up, and
sell it as an adapter to wrap around header_t.

create_header_cache(header_cache_t* ch, message_t m);
header_cache_get_field_values_p(header_cache_t ch, const char* name, const 
char*** values);

----

Quick question because I'm lazy: what does IMAP do when you ask
for the "received" fields?

> Thinking of the IMAP search command, would it be possible to implement
> this in term in terms of sieve.  I'm thinking of this because, it may not
> be necessary to have an interface to the mailbox_t API for search.

If IMAP searches for messages satisfying a boolean condition, and
the server builds a sieve script that contains a conditional check
and runs it against all the messages in a mailbox, it could find
messages. It might be necessary to add some types of checks and
comparisons to sieve as extensions (which is allowed) to support
the kinds of checks IMAP does. Or maybe not. This would be kindof
cool.

> 
> The mailbox_t API is lacking API for :
>   - searching
>   - sorting(threading)
> Since searching(sorting?) can be express in sieve...

Not sorting!

Sorting is just a way of looking at messages. Real sorting is looking
at the "in-reply-to" field value, and then searching for the message
that has that message-id. But doing soft threading based on subjects
(a la how mutt can do, if wanted) is useful. It seems like an api to
get messages by message-id would be all that's needed. Then you
could have a mailbox_thread_t that took a mailbox, went through all
the messages in it, and built the tree, but in each node put only
the id of the message that node corresponded to, and perhaps some
common information. This seems really application specific, I think
it would be a good chunk of code to have, but that it shouldn't
be part of mailbox, it should be a seperate type that operates on
a mailbox. Unless IMAP has facitilities to "thread"... then I guess
it would have to be a part of the mailbox.

> This is a little confusing  ... I have to think it over 8-)

Indeed!

Enjoy the sun.
Sam

-- 
Sam Roberts <address@hidden> (Vivez sans temps mort!)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]