pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] Query on REGEX syntax


From: Marc Schwartz
Subject: Re: [Pan-users] Query on REGEX syntax
Date: Tue, 17 Feb 2004 13:10:50 -0600

On Tue, 2004-02-17 at 12:10, Christophe Lambin wrote:
> On Mon, 16 Feb, 2004 at 17:49 +0100, Marc Schwartz wrote:
> 
> > I have been trying to construct a REGEX syntax to find a keyword in post
> > subject lines. 
> 
> How are you implementing this?  As a score? A filter? Or just in the
> filter in the articlelist?  All but the last should work; the latter
> is only a 'phrase' filter (i.e. literally matches the filter), so no RE's.

I have this as a filter, simply called "R", where the line is:

'Article Subject matches regular expression \bR\b'

I then use that filter in a Rule, also called "R".

In that Rule, I have it set up to use the "R" filter in three particular
groups. If that filter is met, the action is set to "Watch thread".

The issue here is that it highlights threads that have an 'r' or an 'R'
whether it is a stand alone 'word' or in the middle of a word.  Thus,
for example, "contract" will seem to cause a match. Though not
consistently.

It also seems to pick up the "Re:" at the beginning of follow up posts
if I am reviewing the behavior properly, thus highlighting the entire
thread, even if the main post does not have an 'r' in it.

> > I have tried:
> > \bR\b
> 
> This should work: as of 0.14.2.91, Pan uses PCRE for regex, which this
> filter is compliant with (I successfully use a similar regex in my 
> scorefile, albeit a whole word).
> 
> Do you see any messages in the Log Viewer related to regex'es?

Ok...here is something curious....could be on to something.

I just deleted the articles from the 3 groups where I apply this rule
and re-downloaded them.

In the Log Viewer, the counts for the number of articles that meet the
filter criteria are correct. 2 in one group, 6 in another and 8 in the
final one. 

Yet, many other threads, that do not fit the criteria, are highlighted.
Though, this is not consistent across the three groups. In one group for
example, the following thread subject is properly highlighted:

"How to add values to an existing vector in R"

So is the following:

"Symmetry, Asymmetry, Conjugation, Complements"

However, the following is not:

"Re: Main Effects and Interactions"


Something else, which now suggests that there is some confounding issue
here. If I use the "R" filter from the Filter Menu, the correct articles
remain in the Header Pane in these three groups.

OK....I think that I may have it figured out.  I erased the contents of
my Scorefile and re-saved the empty file.  I then deleted and
redownloaded the articles in these three groups. The filter now seems to
be properly highlighting the correct articles. So it looks like there
may have been a problem in the scorefile, though what I am not sure.

Now, I believe that I may know what occurred here. When I initially
tried to create the filter to simply match an "R" in the subject, it of
course matches any "R" or "r" since it is not case sensitive and whether
the 'R' is a word or not. That is when I began to use the REGEX
approach.

I am guessing that after deleting the initial "R" Rule, the scorefile
was not altered, thus continuing to highlight the threads that had an r
in them, even though I was no longer using the older R Rule

After creating the new R Rule using the REGEX, both the articles that
matched the REGEX and the articles that matched the now deleted older R
Rule continued to be highlighted. However, any new articles that just
had an "r" or "R" anywhere were not highlighted against the deleted R
Rule.

Does that make sense?

So basically, once an article is scored to be highlighted in the
scorefile, it does not change, unless you manually edit the scorefile.
It will not change simply by deleting the Rule/Filter that initially
resulted in the score being set to 9999.

> Regards,
> Christophe

Thanks Christophe!  This helped me go through the debugging process and
hopefully this may be of help to others.

Regards,

Marc






reply via email to

[Prev in Thread] Current Thread [Next in Thread]