emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Orgmode] Re: Boolean word/regexp search problem


From: Carsten Dominik
Subject: Re: [Orgmode] Re: Boolean word/regexp search problem
Date: Tue, 5 Jan 2010 12:17:59 +0100

Hi Matt,

On Nov 27, 2009, at 8:54 PM, Matt Lundin wrote:

Hi Carsten,

Matthew Lundin <address@hidden> writes:

Matt Lundin <address@hidden> writes:

The word/regexp agenda search to work with more than one word or regexp
unless the first word or regexp is also preceded by a "+" or "-".

I've investigated this further and beg your permission to offer a few
comments/suggestions.

First, I apologize for missing the change in behavior in the
org-search-view introduced in Org 6.32. Reading the ChangeLog, I now see
the following information:

,----
| Agenda Search view: Search for substrings
|
| The default in search view (C-c a s) is now that the search expression | is searched for as a substring, i.e. the different words must occur in
| direct sequence, and it may be only part of a word. If you want to
| look for a number of separate keywords with Boolean logic, all words
| must be preceded by + or -.
|
| This was, more-or-less, requested by John Wiegley.
`----

In particular, I see that "all words must be preceded by + or -"

In fact, only the first needs the "+", for any additional words, the plus
is optional, only a "-" is necessary.  I have improved the documentation
here.

for a
boolean search. I've also read the manual section 10.3.5 as well as the docstring for org-search-view and appreciate that this new behavior can
be turned off with the variable
org-agenda-search-view-search-words-only.

A few comments:

1) I'm wondering whether the substring search should be the default. I
search quite often for two or three words or regexps that I know are in
an entry (regardless of order), while I rarely search for a specific
phrase or sequence of words. Of course, others might disagree.

I think the main application is actually not looking for a phrase,
but looking for a partial word - which was impossible before this
change.


2) Many web and database search engines use the following convention: a
space between words becomes an automatic AND,

That is right.

while quotation marks
indicate searches for a phrase/substring (i.e., words in sequence).

Yes. This is a bit of a hassle to implement.  But I agree that this
would be nice to have - if the search is Boolean.  OK, this is now
in as well.

Having missed the description of the new behavior in the ChangeLog, I
found the new default substring search a bit counter-intuitive. My vote
would be for sloppy boolean searches by default, with quotation marks
reserved for substring searches. But of course, this is not a huge
priority for org-mode development, and I have no idea how difficult it
would be to implement!

This is really a matter of taste.  John argues in an email to
me for something which is more emacs internally consistent than
consistent with other programs:

> I realize that search engines work differently than Emacs in several
> cases.  For example, if you type M-x search-forward, then foo, Emacs
> will do a substring search for foo, not a complete string search.
> In fact, it takes work to get Emacs to do a precise word
> search (you have to re-search, then use \<foo\>), and so it seemed
> odd to me that Org-mode made this its default.

Also, the prompt was really bad, suggesting a Boolean search in any case.
Now the prompt does a better job, I think.

3) The new substring search changes the behavior of regexp searches. A
simple regexp search with brackets (e.g, {Carst}) no longer produces any
results unless the brackets are preceded by a +. This is true even if
one is searching only for a single regexp. In other words, regexp
brackets now *must* always be preceded by a plus or a minus. Is this the
intended behavior?

This is a bug, which I just fixed.  If the first thing is a regexp, this
will turn on Boolean search as well.  Please verify that this is
indeed fixed.


4) Pressing "[" or "]" or "{" or "}" in the agenda buffer adds a "+" or
"-" after the first term in the minibuffer. E.g.,

--8<---------------cut here---------------start------------->8---
[+-]Word/{Regexp} ...: Emacs +
--8<---------------cut here---------------end--------------->8---

But if the user simply adds another term at the cursor (i.e., after the "+"), the search will fail, since "Emacs" now must also be preceded by a
"+".

I don't think so, see above, additional "+" is, in fact, optional,
a space is enough.

Another improvement I made is that the "+" is only added by "[" if
the last search was Boolean.  If not, you simply get back to edit
the phrase.

Thanks for reading this long email.

Thanks for putting so much time in helping to improve Org-mode!

I have tried to improve the logic of all this a bit, but I am
sticking with the default for phrase search.  It is important
to keep John Wiegley happy :-)  and I quite like it this way.
The prompt is now more explicit about what is expected, and
you can default to Boolean search by setting the variable
`org-agenda-search-view-always-boolean' if you prefer.

Hope I am also keeping *you* happy this way :-)

Here is the new docstring for org-search view, which explains
things a bit better.
--------------------------------------------------------------------------
Show all entries that contain a phrase or words or regular expressions.

With optional prefix argument TODO-ONLY, only consider entries that are
TODO entries.  The argument STRING can be used to pass a default search
string into this function.  If EDIT-AT is non-nil, it means that the
user should get a chance to edit this string, with cursor at position
EDIT-AT.

The search string can be viewed either as a phrase that should be found as is, or it can be broken into a number of snippets, each of which must match in a Boolean way to select an entry. The default depends on the variable
`org-agenda-search-view-always-boolean'.
Even if this is turned off (the default) you can always switch to
Boolean search dynamically by preceeding the first word with \"+\" or \"-\".

The default is a direct search of the whole phrase, where each space in
the search string can expand to an arbitrary amount of whitespace,
including newlines.

If using a Boolean search, the search string is split on whitespace and
each snipped is search separately, with logical AND to select an entry.
Words prefixed with a minus must *not* occur in the entry. Words without
a prefix or prefixed with a plus must occur in the entry.  Matching is
case-insensitive.  Words are enclosed by word delimiters (i.e. they must
match whole words, not parts of a word) if
`org-agenda-search-view-force-full-words' is set (default is nil).

Boolean search snippets enclosed by curly braces are interpreted as
regular expressions that must or (when preceeded with \"-\") must not
match in the entry.

- If the search string starts with an asterisk, search only in headlines.
- If (possibly after the leading star) the search string starts with an
exclamation mark, this also means to look at TODO entries only, an effect
  that can also be achieved with a prefix argument.
- If (possibly after star and exclamation mark) the seatch string starts
  with a colon, this will mean that the snippets of the boolean search
  must match as full words.

This command searches the agenda files, and in addition the files listed
in `org-agenda-text-search-extra-files'.

- Carsten





reply via email to

[Prev in Thread] Current Thread [Next in Thread]