emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Carsten Dominik <carsten.dominik@gmail.com>
To: Matt Lundin <mdl@imapmail.org>
Cc: Org Mode <emacs-orgmode@gnu.org>
Subject: Re: Re: Boolean word/regexp search problem
Date: Tue, 5 Jan 2010 12:17:59 +0100	[thread overview]
Message-ID: <242B91B8-9615-49E6-A245-ABFB3E29EDBC@gmail.com> (raw)
In-Reply-To: <m2y6lrvj3j.fsf@fastmail.fm>

Hi Matt,

On Nov 27, 2009, at 8:54 PM, Matt Lundin wrote:

> Hi Carsten,
>
> Matthew Lundin <mdl@imapmail.org> writes:
>
>> Matt Lundin <mdl@imapmail.org> writes:
>>
>>> The word/regexp agenda search to work with more than one word or  
>>> regexp
>>> unless the first word or regexp is also preceded by a "+" or "-".
>
> I've investigated this further and beg your permission to offer a few
> comments/suggestions.
>
> First, I apologize for missing the change in behavior in the
> org-search-view introduced in Org 6.32. Reading the ChangeLog, I now  
> see
> the following information:
>
> ,----
> | Agenda Search view: Search for substrings
> |
> | The default in search view (C-c a s) is now that the search  
> expression
> | is searched for as a substring, i.e. the different words must  
> occur in
> | direct sequence, and it may be only part of a word. If you want to
> | look for a number of separate keywords with Boolean logic, all words
> | must be preceded by + or -.
> |
> | This was, more-or-less, requested by John Wiegley.
> `----
>
> In particular, I see that "all words must be preceded by + or -"

In fact, only the first needs the "+", for any additional words, the  
plus
is optional, only a "-" is necessary.  I have improved the documentation
here.

> for a
> boolean search. I've also read the manual section 10.3.5 as well as  
> the
> docstring for org-search-view and appreciate that this new behavior  
> can
> be turned off with the variable
> org-agenda-search-view-search-words-only.
>
> A few comments:
>
> 1) I'm wondering whether the substring search should be the default. I
> search quite often for two or three words or regexps that I know are  
> in
> an entry (regardless of order), while I rarely search for a specific
> phrase or sequence of words. Of course, others might disagree.

I think the main application is actually not looking for a phrase,
but looking for a partial word - which was impossible before this
change.

>
> 2) Many web and database search engines use the following  
> convention: a
> space between words becomes an automatic AND,

That is right.

> while quotation marks
> indicate searches for a phrase/substring (i.e., words in sequence).

Yes. This is a bit of a hassle to implement.  But I agree that this
would be nice to have - if the search is Boolean.  OK, this is now
in as well.

> Having missed the description of the new behavior in the ChangeLog, I
> found the new default substring search a bit counter-intuitive. My  
> vote
> would be for sloppy boolean searches by default, with quotation marks
> reserved for substring searches. But of course, this is not a huge
> priority for org-mode development, and I have no idea how difficult it
> would be to implement!

This is really a matter of taste.  John argues in an email to
me for something which is more emacs internally consistent than
consistent with other programs:

 > I realize that search engines work differently than Emacs in several
 > cases.  For example, if you type M-x search-forward, then foo, Emacs
 > will do a substring search for foo, not a complete string search.
 > In fact, it takes work to get Emacs to do a precise word
 > search (you have to re-search, then use \<foo\>), and so it seemed
 > odd to me that Org-mode made this its default.

Also, the prompt was really bad, suggesting a Boolean search in any  
case.
Now the prompt does a better job, I think.

> 3) The new substring search changes the behavior of regexp searches. A
> simple regexp search with brackets (e.g, {Carst}) no longer produces  
> any
> results unless the brackets are preceded by a +. This is true even if
> one is searching only for a single regexp. In other words, regexp
> brackets now *must* always be preceded by a plus or a minus. Is this  
> the
> intended behavior?

This is a bug, which I just fixed.  If the first thing is a regexp, this
will turn on Boolean search as well.  Please verify that this is
indeed fixed.

>
> 4) Pressing "[" or "]" or "{" or "}" in the agenda buffer adds a "+"  
> or
> "-" after the first term in the minibuffer. E.g.,
>
> --8<---------------cut here---------------start------------->8---
> [+-]Word/{Regexp} ...: Emacs +
> --8<---------------cut here---------------end--------------->8---
>
> But if the user simply adds another term at the cursor (i.e., after  
> the
> "+"), the search will fail, since "Emacs" now must also be preceded  
> by a
> "+".

I don't think so, see above, additional "+" is, in fact, optional,
a space is enough.

Another improvement I made is that the "+" is only added by "[" if
the last search was Boolean.  If not, you simply get back to edit
the phrase.

> Thanks for reading this long email.

Thanks for putting so much time in helping to improve Org-mode!

I have tried to improve the logic of all this a bit, but I am
sticking with the default for phrase search.  It is important
to keep John Wiegley happy :-)  and I quite like it this way.
The prompt is now more explicit about what is expected, and
you can default to Boolean search by setting the variable
`org-agenda-search-view-always-boolean' if you prefer.

Hope I am also keeping *you* happy this way :-)

Here is the new docstring for org-search view, which explains
things a bit better.
--------------------------------------------------------------------------
Show all entries that contain a phrase or words or regular expressions.

With optional prefix argument TODO-ONLY, only consider entries that are
TODO entries.  The argument STRING can be used to pass a default search
string into this function.  If EDIT-AT is non-nil, it means that the
user should get a chance to edit this string, with cursor at position
EDIT-AT.

The search string can be viewed either as a phrase that should be  
found as
is, or it can be broken into a number of snippets, each of which must  
match
in a Boolean way to select an entry.  The default depends on the  
variable
`org-agenda-search-view-always-boolean'.
Even if this is turned off (the default) you can always switch to
Boolean search dynamically by preceeding the first word with  \"+\" or  
\"-\".

The default is a direct search of the whole phrase, where each space in
the search string can expand to an arbitrary amount of whitespace,
including newlines.

If using a Boolean search, the search string is split on whitespace and
each snipped is search separately, with logical AND to select an entry.
Words prefixed with a minus must *not* occur in the entry.  Words  
without
a prefix or prefixed with a plus must occur in the entry.  Matching is
case-insensitive.  Words are enclosed by word delimiters (i.e. they must
match whole words, not parts of a word) if
`org-agenda-search-view-force-full-words' is set (default is nil).

Boolean search snippets enclosed by curly braces are interpreted as
regular expressions that must or (when preceeded with \"-\") must not
match in the entry.

- If the search string starts with an asterisk, search only in  
headlines.
- If (possibly after the leading star) the search string starts with an
   exclamation mark, this also means to look at TODO entries only, an  
effect
   that can also be achieved with a prefix argument.
- If (possibly after star and exclamation mark) the seatch string starts
   with a colon, this will mean that the snippets of the boolean search
   must match as full words.

This command searches the agenda files, and in addition the files listed
in `org-agenda-text-search-extra-files'.

- Carsten

      reply	other threads:[~2010-01-05 11:18 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-27 17:18 Boolean word/regexp search problem Matt Lundin
2009-11-27 17:37 ` Matthew Lundin
2009-11-27 19:54   ` Matt Lundin
2010-01-05 11:17     ` Carsten Dominik [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=242B91B8-9615-49E6-A245-ABFB3E29EDBC@gmail.com \
    --to=carsten.dominik@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    --cc=mdl@imapmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).