emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
* Boolean word/regexp search problem
@ 2009-11-27 17:18 Matt Lundin
  2009-11-27 17:37 ` Matthew Lundin
  0 siblings, 1 reply; 4+ messages in thread
From: Matt Lundin @ 2009-11-27 17:18 UTC (permalink / raw)
  To: Org Mode

The word/regexp agenda search to work with more than one word or regexp
unless the first word or regexp is also preceded by a "+" or "-".

Take the following example.

--8<---------------cut here---------------start------------->8---
* Org-mode

Org mode is a major mode for Emacs written by Carsten Dominik.
--8<---------------cut here---------------end--------------->8---

Let's say I search for Emacs with "C-a s [RET] Emacs". So far, so good:
this item appears in the results. But let's say I want to narrow down
the search. When I press "[" to add a search term, I see the following
prompt in the minibuffer:

[+-]Word/{Regexp} ...: Emacs +

If I complete the prompt as given ("Emacs +Carsten"), there are no
results.

The search only succeeds if I add a "+" in front of Emacs as well, i.e.,
"+Emacs +Carsten". 

The same behavior occurs with exclusion ("-") and with the regexp
search (i.e., brackets).

Two questions:

1) Do boolean word/regexp searches require a "+" or "-" symbol before
the first word/regexp? If so, this is a bit confusing, since tag and
property searches do not require an initial symbol. (E.g.,
"emacs+orgmode" works as a tag search.)

2) If boolean word/regexp do require an initial "+" or "-", could the
prompt after pressing "[" or "]" or "{" or "}" in the search results
buffer be amended to add a plus in front of the first search term?

Here is the relevant portion of the manual:

,----[10.5 Commands in the agenda buffer]
| `[ ] { }'
| 
|     in search view
|           add new search words (`[' and `]') or new regular expressions
|           (`{' and `}') to the query string.  The opening bracket/brace
|           will add a positive search term prefixed by `+', indicating
|           that this search term must occur/match in the entry.  The
|           closing bracket/brace will add a negative search term which
|           must not occur/match in the entry for it to be selected.
`----

Thanks,
Matt

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Boolean word/regexp search problem
  2009-11-27 17:18 Boolean word/regexp search problem Matt Lundin
@ 2009-11-27 17:37 ` Matthew Lundin
  2009-11-27 19:54   ` Matt Lundin
  0 siblings, 1 reply; 4+ messages in thread
From: Matthew Lundin @ 2009-11-27 17:37 UTC (permalink / raw)
  To: Matt Lundin; +Cc: Org Mode

Matt Lundin <mdl@imapmail.org> writes:

> The word/regexp agenda search to work with more than one word or regexp
> unless the first word or regexp is also preceded by a "+" or "-".
>
> Take the following example.
>
> * Org-mode
>
> Org mode is a major mode for Emacs written by Carsten Dominik.
>
> Let's say I search for Emacs with "C-a s [RET] Emacs". So far, so good:
> this item appears in the results. But let's say I want to narrow down
> the search. When I press "[" to add a search term, I see the following
> prompt in the minibuffer:
>
> [+-]Word/{Regexp} ...: Emacs +
>
> If I complete the prompt as given ("Emacs +Carsten"), there are no
> results.
>
> The search only succeeds if I add a "+" in front of Emacs as well, i.e.,
> "+Emacs +Carsten". 
>
> The same behavior occurs with exclusion ("-") and with the regexp
> search (i.e., brackets).
>
> Two questions:
>
> 1) Do boolean word/regexp searches require a "+" or "-" symbol before
> the first word/regexp? If so, this is a bit confusing, since tag and
> property searches do not require an initial symbol. (E.g.,
> "emacs+orgmode" works as a tag search.)
>
> 2) If boolean word/regexp do require an initial "+" or "-", could the
> prompt after pressing "[" or "]" or "{" or "}" in the search results
> buffer be amended to add a plus in front of the first search term?
>
> Here is the relevant portion of the manual:
>
> ,----[10.5 Commands in the agenda buffer]
> | `[ ] { }'
> | 
> |     in search view
> |           add new search words (`[' and `]') or new regular expressions
> |           (`{' and `}') to the query string.  The opening bracket/brace
> |           will add a positive search term prefixed by `+', indicating
> |           that this search term must occur/match in the entry.  The
> |           closing bracket/brace will add a negative search term which
> |           must not occur/match in the entry for it to be selected.
> `----

O.K., I just found the variable org-agenda-search-view-search-words-only:

,----
| Non-nil means, the search string is interpreted as individual words
| The search then looks for each word separately in each entry and
| selects entries that have matches for all words.
| When nil, matching as loose words will only take place if the first
| word is preceded by + or -.  If that is not the case, the search
| string will just be matched as a substring in the entry, but with
| each space character allowing for any whitespace, including newlines.
`----

Please disregard question one above, at least insofar as it applies to
word searches (I'm still trying to work out the regexps). But re:
question two, would it be worthwhile to add a "+" to the beginning of
the search org-agenda-manipulate-query is invoked in a search agenda
buffer?

Thanks,
Matt

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Boolean word/regexp search problem
  2009-11-27 17:37 ` Matthew Lundin
@ 2009-11-27 19:54   ` Matt Lundin
  2010-01-05 11:17     ` Carsten Dominik
  0 siblings, 1 reply; 4+ messages in thread
From: Matt Lundin @ 2009-11-27 19:54 UTC (permalink / raw)
  Cc: Org Mode

Hi Carsten,

Matthew Lundin <mdl@imapmail.org> writes:

> Matt Lundin <mdl@imapmail.org> writes:
>
>> The word/regexp agenda search to work with more than one word or regexp
>> unless the first word or regexp is also preceded by a "+" or "-".

I've investigated this further and beg your permission to offer a few
comments/suggestions.

First, I apologize for missing the change in behavior in the
org-search-view introduced in Org 6.32. Reading the ChangeLog, I now see
the following information:

,----
| Agenda Search view: Search for substrings
| 
| The default in search view (C-c a s) is now that the search expression
| is searched for as a substring, i.e. the different words must occur in
| direct sequence, and it may be only part of a word. If you want to
| look for a number of separate keywords with Boolean logic, all words
| must be preceded by + or -.
| 
| This was, more-or-less, requested by John Wiegley.
`----

In particular, I see that "all words must be preceded by + or -" for a
boolean search. I've also read the manual section 10.3.5 as well as the
docstring for org-search-view and appreciate that this new behavior can
be turned off with the variable
org-agenda-search-view-search-words-only.

A few comments:

1) I'm wondering whether the substring search should be the default. I
search quite often for two or three words or regexps that I know are in
an entry (regardless of order), while I rarely search for a specific
phrase or sequence of words. Of course, others might disagree.

2) Many web and database search engines use the following convention: a
space between words becomes an automatic AND, while quotation marks
indicate searches for a phrase/substring (i.e., words in sequence).
Having missed the description of the new behavior in the ChangeLog, I
found the new default substring search a bit counter-intuitive. My vote
would be for sloppy boolean searches by default, with quotation marks
reserved for substring searches. But of course, this is not a huge
priority for org-mode development, and I have no idea how difficult it
would be to implement!

3) The new substring search changes the behavior of regexp searches. A
simple regexp search with brackets (e.g, {Carst}) no longer produces any
results unless the brackets are preceded by a +. This is true even if
one is searching only for a single regexp. In other words, regexp
brackets now *must* always be preceded by a plus or a minus. Is this the
intended behavior?

4) Pressing "[" or "]" or "{" or "}" in the agenda buffer adds a "+" or
"-" after the first term in the minibuffer. E.g.,

--8<---------------cut here---------------start------------->8---
[+-]Word/{Regexp} ...: Emacs +
--8<---------------cut here---------------end--------------->8---

But if the user simply adds another term at the cursor (i.e., after the
"+"), the search will fail, since "Emacs" now must also be preceded by a
"+".

Thanks for reading this long email.

- Matt

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Re: Boolean word/regexp search problem
  2009-11-27 19:54   ` Matt Lundin
@ 2010-01-05 11:17     ` Carsten Dominik
  0 siblings, 0 replies; 4+ messages in thread
From: Carsten Dominik @ 2010-01-05 11:17 UTC (permalink / raw)
  To: Matt Lundin; +Cc: Org Mode

Hi Matt,

On Nov 27, 2009, at 8:54 PM, Matt Lundin wrote:

> Hi Carsten,
>
> Matthew Lundin <mdl@imapmail.org> writes:
>
>> Matt Lundin <mdl@imapmail.org> writes:
>>
>>> The word/regexp agenda search to work with more than one word or  
>>> regexp
>>> unless the first word or regexp is also preceded by a "+" or "-".
>
> I've investigated this further and beg your permission to offer a few
> comments/suggestions.
>
> First, I apologize for missing the change in behavior in the
> org-search-view introduced in Org 6.32. Reading the ChangeLog, I now  
> see
> the following information:
>
> ,----
> | Agenda Search view: Search for substrings
> |
> | The default in search view (C-c a s) is now that the search  
> expression
> | is searched for as a substring, i.e. the different words must  
> occur in
> | direct sequence, and it may be only part of a word. If you want to
> | look for a number of separate keywords with Boolean logic, all words
> | must be preceded by + or -.
> |
> | This was, more-or-less, requested by John Wiegley.
> `----
>
> In particular, I see that "all words must be preceded by + or -"

In fact, only the first needs the "+", for any additional words, the  
plus
is optional, only a "-" is necessary.  I have improved the documentation
here.

> for a
> boolean search. I've also read the manual section 10.3.5 as well as  
> the
> docstring for org-search-view and appreciate that this new behavior  
> can
> be turned off with the variable
> org-agenda-search-view-search-words-only.
>
> A few comments:
>
> 1) I'm wondering whether the substring search should be the default. I
> search quite often for two or three words or regexps that I know are  
> in
> an entry (regardless of order), while I rarely search for a specific
> phrase or sequence of words. Of course, others might disagree.

I think the main application is actually not looking for a phrase,
but looking for a partial word - which was impossible before this
change.

>
> 2) Many web and database search engines use the following  
> convention: a
> space between words becomes an automatic AND,

That is right.

> while quotation marks
> indicate searches for a phrase/substring (i.e., words in sequence).

Yes. This is a bit of a hassle to implement.  But I agree that this
would be nice to have - if the search is Boolean.  OK, this is now
in as well.

> Having missed the description of the new behavior in the ChangeLog, I
> found the new default substring search a bit counter-intuitive. My  
> vote
> would be for sloppy boolean searches by default, with quotation marks
> reserved for substring searches. But of course, this is not a huge
> priority for org-mode development, and I have no idea how difficult it
> would be to implement!

This is really a matter of taste.  John argues in an email to
me for something which is more emacs internally consistent than
consistent with other programs:

 > I realize that search engines work differently than Emacs in several
 > cases.  For example, if you type M-x search-forward, then foo, Emacs
 > will do a substring search for foo, not a complete string search.
 > In fact, it takes work to get Emacs to do a precise word
 > search (you have to re-search, then use \<foo\>), and so it seemed
 > odd to me that Org-mode made this its default.

Also, the prompt was really bad, suggesting a Boolean search in any  
case.
Now the prompt does a better job, I think.

> 3) The new substring search changes the behavior of regexp searches. A
> simple regexp search with brackets (e.g, {Carst}) no longer produces  
> any
> results unless the brackets are preceded by a +. This is true even if
> one is searching only for a single regexp. In other words, regexp
> brackets now *must* always be preceded by a plus or a minus. Is this  
> the
> intended behavior?

This is a bug, which I just fixed.  If the first thing is a regexp, this
will turn on Boolean search as well.  Please verify that this is
indeed fixed.

>
> 4) Pressing "[" or "]" or "{" or "}" in the agenda buffer adds a "+"  
> or
> "-" after the first term in the minibuffer. E.g.,
>
> --8<---------------cut here---------------start------------->8---
> [+-]Word/{Regexp} ...: Emacs +
> --8<---------------cut here---------------end--------------->8---
>
> But if the user simply adds another term at the cursor (i.e., after  
> the
> "+"), the search will fail, since "Emacs" now must also be preceded  
> by a
> "+".

I don't think so, see above, additional "+" is, in fact, optional,
a space is enough.

Another improvement I made is that the "+" is only added by "[" if
the last search was Boolean.  If not, you simply get back to edit
the phrase.

> Thanks for reading this long email.

Thanks for putting so much time in helping to improve Org-mode!

I have tried to improve the logic of all this a bit, but I am
sticking with the default for phrase search.  It is important
to keep John Wiegley happy :-)  and I quite like it this way.
The prompt is now more explicit about what is expected, and
you can default to Boolean search by setting the variable
`org-agenda-search-view-always-boolean' if you prefer.

Hope I am also keeping *you* happy this way :-)

Here is the new docstring for org-search view, which explains
things a bit better.
--------------------------------------------------------------------------
Show all entries that contain a phrase or words or regular expressions.

With optional prefix argument TODO-ONLY, only consider entries that are
TODO entries.  The argument STRING can be used to pass a default search
string into this function.  If EDIT-AT is non-nil, it means that the
user should get a chance to edit this string, with cursor at position
EDIT-AT.

The search string can be viewed either as a phrase that should be  
found as
is, or it can be broken into a number of snippets, each of which must  
match
in a Boolean way to select an entry.  The default depends on the  
variable
`org-agenda-search-view-always-boolean'.
Even if this is turned off (the default) you can always switch to
Boolean search dynamically by preceeding the first word with  \"+\" or  
\"-\".

The default is a direct search of the whole phrase, where each space in
the search string can expand to an arbitrary amount of whitespace,
including newlines.

If using a Boolean search, the search string is split on whitespace and
each snipped is search separately, with logical AND to select an entry.
Words prefixed with a minus must *not* occur in the entry.  Words  
without
a prefix or prefixed with a plus must occur in the entry.  Matching is
case-insensitive.  Words are enclosed by word delimiters (i.e. they must
match whole words, not parts of a word) if
`org-agenda-search-view-force-full-words' is set (default is nil).

Boolean search snippets enclosed by curly braces are interpreted as
regular expressions that must or (when preceeded with \"-\") must not
match in the entry.

- If the search string starts with an asterisk, search only in  
headlines.
- If (possibly after the leading star) the search string starts with an
   exclamation mark, this also means to look at TODO entries only, an  
effect
   that can also be achieved with a prefix argument.
- If (possibly after star and exclamation mark) the seatch string starts
   with a colon, this will mean that the snippets of the boolean search
   must match as full words.

This command searches the agenda files, and in addition the files listed
in `org-agenda-text-search-extra-files'.

- Carsten

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-01-05 11:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-27 17:18 Boolean word/regexp search problem Matt Lundin
2009-11-27 17:37 ` Matthew Lundin
2009-11-27 19:54   ` Matt Lundin
2010-01-05 11:17     ` Carsten Dominik

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).