nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Sort and delete duplcate messages


From: Howard Bampton
Subject: Re: Sort and delete duplcate messages
Date: Sun, 3 May 2020 21:28:11 -0400

Is the goal to delete messages with the same subject line (but which may have different bodies), or messages that are fully duplicates (so same body, subject line, and most other headers)? "Duplicate" in the second case is a lot harder as you could have messages that the received headers are different but which are otherwise the same. To handle that case, I'd think you'd want to do:
1) Use scan or something similar to find messages with the same subject
2) Use a custom scan template (or resort to grep) to find messages within the previous set that have duplicated headers (presumably, to, from, subject, and perhaps a few others).
3) Within any duplicates that have passed test 2, then use mhstore or the like to extract the bodies, and use md5 or cmp to verify the bodies are the same too.


On Sun, May 3, 2020 at 9:19 PM Ken Hornstein <address@hidden> wrote:
>I know that 'sortm -textfield Subject' will sort messages accoring to
>the subject field. Having run that command, is there a way to then
>delete the first duplicate of each message in the list such that if 1
>and 2 are duplicates and 6 and 7 are duplicates you would delete messages
>2 and 7 leaving 1 and 6?

I want to say you could do something with piping the output of scan
into "uniq -d -f <num>".  Might require a custom scan format, but that
seems relatively simple.

Hm, a quick test:

% scan -format '%(msg) %{subject}' | uniq -d -f 1

suggests that it prints the first one, not later ones, so that isn't
exactly what you want.  Might be a good starting point, though?  You could
probably do something with uniq -c and pipe that to an awk script that
did what you wanted.

--Ken


reply via email to

[Prev in Thread] Current Thread [Next in Thread]