nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [nmh-workers] INCing of email archives


From: Bakul Shah
Subject: Re: [nmh-workers] INCing of email archives
Date: Fri, 26 Jul 2019 07:42:02 -0700

On Jul 25, 2019, at 4:25 PM, Ken Hornstein <address@hidden> wrote:
> 
>> Once in a while I download email archives of some mailing list
>> and unpack them using "inc -file <archive-file>". But more
>> than once I have seen that inc gets confused and doesn't
>> unpack the whole thing. The cause seems to be a line starting
>> with From in some message body. Ideally inc should look that
>> a "From ..." line is immediately followed by header lines.
>> And if this is not the case, assume it is in the message body.
> 
> Ralph answered this, but let me expand a bit.
> 
> The job of inc(1) is to incorporate messages from a 'mail drop' into your
> MH mailbox.  Traditionally it handles mbox-style files and POP (it also
> does MMDF, but let us not speak of that).
> 
> As you can see from the Wikipedia entry Ralph linked to, all of the
> various mbox formats use the same scheme: a line beginning with "From
> " is the mailbox delimiter (mboxcl and mboxcl2 uses a Content-Length
> header; I believe they are officially dead at this point).  The big
> differences are in quoting rules.  Unfortunately since we're kind of
> locked in to the mbox format in inc(1) at least, changing that would
> have some nasty consequences (Ralph gave you an example of a message
> that it would break on but I am sure there are others).  I think your
> best bet is to preprocess these mailing list archives so they are valid
> mbox files.

Thanks, Ralph & Ken. The site from where I downloaded the latest
email archive uses mailman so I was a bit surprised. The method
I suggested would make inc able to handle a larger set of inputs.
While there can still be false positives, the number of messages
matching 

From ... [0-9]$
<mail header>:

is likely to be much much smaller than a random line starting with
"From " and ending in a digit. Still, I can understand the reluctance
to add this logic to inc.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]