* Boolean word/regexp search problem @ 2009-11-27 17:18 Matt Lundin 2009-11-27 17:37 ` Matthew Lundin 0 siblings, 1 reply; 4+ messages in thread From: Matt Lundin @ 2009-11-27 17:18 UTC (permalink / raw) To: Org Mode The word/regexp agenda search to work with more than one word or regexp unless the first word or regexp is also preceded by a "+" or "-". Take the following example. --8<---------------cut here---------------start------------->8--- * Org-mode Org mode is a major mode for Emacs written by Carsten Dominik. --8<---------------cut here---------------end--------------->8--- Let's say I search for Emacs with "C-a s [RET] Emacs". So far, so good: this item appears in the results. But let's say I want to narrow down the search. When I press "[" to add a search term, I see the following prompt in the minibuffer: [+-]Word/{Regexp} ...: Emacs + If I complete the prompt as given ("Emacs +Carsten"), there are no results. The search only succeeds if I add a "+" in front of Emacs as well, i.e., "+Emacs +Carsten". The same behavior occurs with exclusion ("-") and with the regexp search (i.e., brackets). Two questions: 1) Do boolean word/regexp searches require a "+" or "-" symbol before the first word/regexp? If so, this is a bit confusing, since tag and property searches do not require an initial symbol. (E.g., "emacs+orgmode" works as a tag search.) 2) If boolean word/regexp do require an initial "+" or "-", could the prompt after pressing "[" or "]" or "{" or "}" in the search results buffer be amended to add a plus in front of the first search term? Here is the relevant portion of the manual: ,----[10.5 Commands in the agenda buffer] | `[ ] { }' | | in search view | add new search words (`[' and `]') or new regular expressions | (`{' and `}') to the query string. The opening bracket/brace | will add a positive search term prefixed by `+', indicating | that this search term must occur/match in the entry. The | closing bracket/brace will add a negative search term which | must not occur/match in the entry for it to be selected. `---- Thanks, Matt ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Boolean word/regexp search problem 2009-11-27 17:18 Boolean word/regexp search problem Matt Lundin @ 2009-11-27 17:37 ` Matthew Lundin 2009-11-27 19:54 ` Matt Lundin 0 siblings, 1 reply; 4+ messages in thread From: Matthew Lundin @ 2009-11-27 17:37 UTC (permalink / raw) To: Matt Lundin; +Cc: Org Mode Matt Lundin <mdl@imapmail.org> writes: > The word/regexp agenda search to work with more than one word or regexp > unless the first word or regexp is also preceded by a "+" or "-". > > Take the following example. > > * Org-mode > > Org mode is a major mode for Emacs written by Carsten Dominik. > > Let's say I search for Emacs with "C-a s [RET] Emacs". So far, so good: > this item appears in the results. But let's say I want to narrow down > the search. When I press "[" to add a search term, I see the following > prompt in the minibuffer: > > [+-]Word/{Regexp} ...: Emacs + > > If I complete the prompt as given ("Emacs +Carsten"), there are no > results. > > The search only succeeds if I add a "+" in front of Emacs as well, i.e., > "+Emacs +Carsten". > > The same behavior occurs with exclusion ("-") and with the regexp > search (i.e., brackets). > > Two questions: > > 1) Do boolean word/regexp searches require a "+" or "-" symbol before > the first word/regexp? If so, this is a bit confusing, since tag and > property searches do not require an initial symbol. (E.g., > "emacs+orgmode" works as a tag search.) > > 2) If boolean word/regexp do require an initial "+" or "-", could the > prompt after pressing "[" or "]" or "{" or "}" in the search results > buffer be amended to add a plus in front of the first search term? > > Here is the relevant portion of the manual: > > ,----[10.5 Commands in the agenda buffer] > | `[ ] { }' > | > | in search view > | add new search words (`[' and `]') or new regular expressions > | (`{' and `}') to the query string. The opening bracket/brace > | will add a positive search term prefixed by `+', indicating > | that this search term must occur/match in the entry. The > | closing bracket/brace will add a negative search term which > | must not occur/match in the entry for it to be selected. > `---- O.K., I just found the variable org-agenda-search-view-search-words-only: ,---- | Non-nil means, the search string is interpreted as individual words | The search then looks for each word separately in each entry and | selects entries that have matches for all words. | When nil, matching as loose words will only take place if the first | word is preceded by + or -. If that is not the case, the search | string will just be matched as a substring in the entry, but with | each space character allowing for any whitespace, including newlines. `---- Please disregard question one above, at least insofar as it applies to word searches (I'm still trying to work out the regexps). But re: question two, would it be worthwhile to add a "+" to the beginning of the search org-agenda-manipulate-query is invoked in a search agenda buffer? Thanks, Matt ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Boolean word/regexp search problem 2009-11-27 17:37 ` Matthew Lundin @ 2009-11-27 19:54 ` Matt Lundin 2010-01-05 11:17 ` Carsten Dominik 0 siblings, 1 reply; 4+ messages in thread From: Matt Lundin @ 2009-11-27 19:54 UTC (permalink / raw) Cc: Org Mode Hi Carsten, Matthew Lundin <mdl@imapmail.org> writes: > Matt Lundin <mdl@imapmail.org> writes: > >> The word/regexp agenda search to work with more than one word or regexp >> unless the first word or regexp is also preceded by a "+" or "-". I've investigated this further and beg your permission to offer a few comments/suggestions. First, I apologize for missing the change in behavior in the org-search-view introduced in Org 6.32. Reading the ChangeLog, I now see the following information: ,---- | Agenda Search view: Search for substrings | | The default in search view (C-c a s) is now that the search expression | is searched for as a substring, i.e. the different words must occur in | direct sequence, and it may be only part of a word. If you want to | look for a number of separate keywords with Boolean logic, all words | must be preceded by + or -. | | This was, more-or-less, requested by John Wiegley. `---- In particular, I see that "all words must be preceded by + or -" for a boolean search. I've also read the manual section 10.3.5 as well as the docstring for org-search-view and appreciate that this new behavior can be turned off with the variable org-agenda-search-view-search-words-only. A few comments: 1) I'm wondering whether the substring search should be the default. I search quite often for two or three words or regexps that I know are in an entry (regardless of order), while I rarely search for a specific phrase or sequence of words. Of course, others might disagree. 2) Many web and database search engines use the following convention: a space between words becomes an automatic AND, while quotation marks indicate searches for a phrase/substring (i.e., words in sequence). Having missed the description of the new behavior in the ChangeLog, I found the new default substring search a bit counter-intuitive. My vote would be for sloppy boolean searches by default, with quotation marks reserved for substring searches. But of course, this is not a huge priority for org-mode development, and I have no idea how difficult it would be to implement! 3) The new substring search changes the behavior of regexp searches. A simple regexp search with brackets (e.g, {Carst}) no longer produces any results unless the brackets are preceded by a +. This is true even if one is searching only for a single regexp. In other words, regexp brackets now *must* always be preceded by a plus or a minus. Is this the intended behavior? 4) Pressing "[" or "]" or "{" or "}" in the agenda buffer adds a "+" or "-" after the first term in the minibuffer. E.g., --8<---------------cut here---------------start------------->8--- [+-]Word/{Regexp} ...: Emacs + --8<---------------cut here---------------end--------------->8--- But if the user simply adds another term at the cursor (i.e., after the "+"), the search will fail, since "Emacs" now must also be preceded by a "+". Thanks for reading this long email. - Matt ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Re: Boolean word/regexp search problem 2009-11-27 19:54 ` Matt Lundin @ 2010-01-05 11:17 ` Carsten Dominik 0 siblings, 0 replies; 4+ messages in thread From: Carsten Dominik @ 2010-01-05 11:17 UTC (permalink / raw) To: Matt Lundin; +Cc: Org Mode Hi Matt, On Nov 27, 2009, at 8:54 PM, Matt Lundin wrote: > Hi Carsten, > > Matthew Lundin <mdl@imapmail.org> writes: > >> Matt Lundin <mdl@imapmail.org> writes: >> >>> The word/regexp agenda search to work with more than one word or >>> regexp >>> unless the first word or regexp is also preceded by a "+" or "-". > > I've investigated this further and beg your permission to offer a few > comments/suggestions. > > First, I apologize for missing the change in behavior in the > org-search-view introduced in Org 6.32. Reading the ChangeLog, I now > see > the following information: > > ,---- > | Agenda Search view: Search for substrings > | > | The default in search view (C-c a s) is now that the search > expression > | is searched for as a substring, i.e. the different words must > occur in > | direct sequence, and it may be only part of a word. If you want to > | look for a number of separate keywords with Boolean logic, all words > | must be preceded by + or -. > | > | This was, more-or-less, requested by John Wiegley. > `---- > > In particular, I see that "all words must be preceded by + or -" In fact, only the first needs the "+", for any additional words, the plus is optional, only a "-" is necessary. I have improved the documentation here. > for a > boolean search. I've also read the manual section 10.3.5 as well as > the > docstring for org-search-view and appreciate that this new behavior > can > be turned off with the variable > org-agenda-search-view-search-words-only. > > A few comments: > > 1) I'm wondering whether the substring search should be the default. I > search quite often for two or three words or regexps that I know are > in > an entry (regardless of order), while I rarely search for a specific > phrase or sequence of words. Of course, others might disagree. I think the main application is actually not looking for a phrase, but looking for a partial word - which was impossible before this change. > > 2) Many web and database search engines use the following > convention: a > space between words becomes an automatic AND, That is right. > while quotation marks > indicate searches for a phrase/substring (i.e., words in sequence). Yes. This is a bit of a hassle to implement. But I agree that this would be nice to have - if the search is Boolean. OK, this is now in as well. > Having missed the description of the new behavior in the ChangeLog, I > found the new default substring search a bit counter-intuitive. My > vote > would be for sloppy boolean searches by default, with quotation marks > reserved for substring searches. But of course, this is not a huge > priority for org-mode development, and I have no idea how difficult it > would be to implement! This is really a matter of taste. John argues in an email to me for something which is more emacs internally consistent than consistent with other programs: > I realize that search engines work differently than Emacs in several > cases. For example, if you type M-x search-forward, then foo, Emacs > will do a substring search for foo, not a complete string search. > In fact, it takes work to get Emacs to do a precise word > search (you have to re-search, then use \<foo\>), and so it seemed > odd to me that Org-mode made this its default. Also, the prompt was really bad, suggesting a Boolean search in any case. Now the prompt does a better job, I think. > 3) The new substring search changes the behavior of regexp searches. A > simple regexp search with brackets (e.g, {Carst}) no longer produces > any > results unless the brackets are preceded by a +. This is true even if > one is searching only for a single regexp. In other words, regexp > brackets now *must* always be preceded by a plus or a minus. Is this > the > intended behavior? This is a bug, which I just fixed. If the first thing is a regexp, this will turn on Boolean search as well. Please verify that this is indeed fixed. > > 4) Pressing "[" or "]" or "{" or "}" in the agenda buffer adds a "+" > or > "-" after the first term in the minibuffer. E.g., > > --8<---------------cut here---------------start------------->8--- > [+-]Word/{Regexp} ...: Emacs + > --8<---------------cut here---------------end--------------->8--- > > But if the user simply adds another term at the cursor (i.e., after > the > "+"), the search will fail, since "Emacs" now must also be preceded > by a > "+". I don't think so, see above, additional "+" is, in fact, optional, a space is enough. Another improvement I made is that the "+" is only added by "[" if the last search was Boolean. If not, you simply get back to edit the phrase. > Thanks for reading this long email. Thanks for putting so much time in helping to improve Org-mode! I have tried to improve the logic of all this a bit, but I am sticking with the default for phrase search. It is important to keep John Wiegley happy :-) and I quite like it this way. The prompt is now more explicit about what is expected, and you can default to Boolean search by setting the variable `org-agenda-search-view-always-boolean' if you prefer. Hope I am also keeping *you* happy this way :-) Here is the new docstring for org-search view, which explains things a bit better. -------------------------------------------------------------------------- Show all entries that contain a phrase or words or regular expressions. With optional prefix argument TODO-ONLY, only consider entries that are TODO entries. The argument STRING can be used to pass a default search string into this function. If EDIT-AT is non-nil, it means that the user should get a chance to edit this string, with cursor at position EDIT-AT. The search string can be viewed either as a phrase that should be found as is, or it can be broken into a number of snippets, each of which must match in a Boolean way to select an entry. The default depends on the variable `org-agenda-search-view-always-boolean'. Even if this is turned off (the default) you can always switch to Boolean search dynamically by preceeding the first word with \"+\" or \"-\". The default is a direct search of the whole phrase, where each space in the search string can expand to an arbitrary amount of whitespace, including newlines. If using a Boolean search, the search string is split on whitespace and each snipped is search separately, with logical AND to select an entry. Words prefixed with a minus must *not* occur in the entry. Words without a prefix or prefixed with a plus must occur in the entry. Matching is case-insensitive. Words are enclosed by word delimiters (i.e. they must match whole words, not parts of a word) if `org-agenda-search-view-force-full-words' is set (default is nil). Boolean search snippets enclosed by curly braces are interpreted as regular expressions that must or (when preceeded with \"-\") must not match in the entry. - If the search string starts with an asterisk, search only in headlines. - If (possibly after the leading star) the search string starts with an exclamation mark, this also means to look at TODO entries only, an effect that can also be achieved with a prefix argument. - If (possibly after star and exclamation mark) the seatch string starts with a colon, this will mean that the snippets of the boolean search must match as full words. This command searches the agenda files, and in addition the files listed in `org-agenda-text-search-extra-files'. - Carsten ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-01-05 11:18 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-11-27 17:18 Boolean word/regexp search problem Matt Lundin 2009-11-27 17:37 ` Matthew Lundin 2009-11-27 19:54 ` Matt Lundin 2010-01-05 11:17 ` Carsten Dominik
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).