* Apropos commands and regexps
@ 2002-05-12 0:57 Kim F. Storm
2002-05-12 5:28 ` Eli Zaretskii
` (3 more replies)
0 siblings, 4 replies; 56+ messages in thread
From: Kim F. Storm @ 2002-05-12 0:57 UTC (permalink / raw)
I was thinking about the use of regexps in connection with the various
apropos commands.
We often advise new users to use e.g. C-h a or M-x apropos (also
accessible through the Help menu).
However, these commands prompts like this:
Apropos command (regexp):
which may be nonsense to some novice users.
Wouldn't it be simpler (for a novice user -- and for advanced users
too) to simply write one or more words (substrings) and then search
for all combinations of those words (substrings) in the relevant list.
E.g. C-h a open file RET would find any matching
open.*file and file.*open
BTW, this obvious example doesn't find `find-file' :-(
Maybe we should have a defalias open-file -> find-file ?
Of course, if the input is already a regexp (e.g. if it does not
contain any spaces), it should be used directly. Below is a patch
to show the concept:
Index: apropos.el
===================================================================
RCS file: /cvs/emacs/lisp/apropos.el,v
retrieving revision 1.84
diff -c -r1.84 apropos.el
*** apropos.el 4 May 2002 14:51:16 -0000 1.84
--- apropos.el 11 May 2002 23:55:10 -0000
***************
*** 122,127 ****
--- 122,130 ----
(defvar apropos-regexp nil
"Regexp used in current apropos run.")
+ (defvar apropos-orig-regexp nil
+ "Regexp as entered by user.")
+
(defvar apropos-files-scanned ()
"List of elc files already scanned in current run of `apropos-documentation'.")
***************
*** 219,224 ****
--- 222,245 ----
(and label button)))
\f
+ (defun apropos-rewrite-regexp (regexp)
+ "Rewrite a list of words to a regexp matching all permutations.
+ If REGEXP is already a regexp, don't modify it."
+ (setq apropos-orig-regexp regexp)
+ (if (and (string-match " " regexp)
+ (string-equal (regexp-quote regexp) regexp))
+ ;; We don't actually make a regexp matching all permutations.
+ ;; Instead, for e.g. "a b c", we make a regexp matching
+ ;; any combination of two or more words like this:
+ ;; (a|b|c).*(a|b|c) which may give some false matches,
+ ;; but as long as it also gives the right ones, that's ok.
+ (let ((words (split-string regexp "[ \t]+"))
+ res)
+ (dolist (w words)
+ (setq res (concat (or res "\\(") (if res "\\|" "") w)))
+ (concat res "\\).*" res "\\)"))
+ regexp))
+
;;;###autoload
(define-derived-mode apropos-mode fundamental-mode "Apropos"
"Major mode for following hyperlinks in output of apropos commands.
***************
*** 262,267 ****
--- 283,289 ----
"or function ")
"(regexp): "))
current-prefix-arg))
+ (setq apropos-regexp (apropos-rewrite-regexp apropos-regexp))
(let ((message
(let ((standard-output (get-buffer-create "*Apropos*")))
(print-help-return-message 'identity))))
***************
*** 304,309 ****
--- 326,332 ----
show unbound symbols and key bindings, which is a little more
time-consuming. Returns list of symbols and documentation found."
(interactive "sApropos symbol (regexp): \nP")
+ (setq apropos-regexp (apropos-rewrite-regexp apropos-regexp))
(setq apropos-accumulator
(apropos-internal apropos-regexp
(and (not do-all)
***************
*** 371,376 ****
--- 394,400 ----
at the function and at the names and values of properties.
Returns list of symbols and values found."
(interactive "sApropos value (regexp): \nP")
+ (setq apropos-regexp (apropos-rewrite-regexp apropos-regexp))
(or do-all (setq do-all apropos-do-all))
(setq apropos-accumulator ())
(let (f v p)
***************
*** 397,402 ****
--- 421,427 ----
bindings.
Returns list of symbols and documentation found."
(interactive "sApropos documentation (regexp): \nP")
+ (setq apropos-regexp (apropos-rewrite-regexp apropos-regexp))
(or do-all (setq do-all apropos-do-all))
(setq apropos-accumulator () apropos-files-scanned ())
(let ((standard-input (get-buffer-create " apropos-temp"))
***************
*** 590,596 ****
If SPACING is non-nil, it should be a string;
separate items with that string."
(if (null apropos-accumulator)
! (message "No apropos matches for `%s'" apropos-regexp)
(setq apropos-accumulator
(sort apropos-accumulator (lambda (a b)
(string-lessp (car a) (car b)))))
--- 615,621 ----
If SPACING is non-nil, it should be a string;
separate items with that string."
(if (null apropos-accumulator)
! (message "No apropos matches for `%s'" apropos-orig-regexp)
(setq apropos-accumulator
(sort apropos-accumulator (lambda (a b)
(string-lessp (car a) (car b)))))
--
Kim F. Storm <storm@cua.dk> http://www.cua.dk
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-12 0:57 Apropos commands and regexps Kim F. Storm
@ 2002-05-12 5:28 ` Eli Zaretskii
2002-05-12 5:38 ` Eli Zaretskii
` (2 more replies)
2002-05-12 10:06 ` Kai Großjohann
` (2 subsequent siblings)
3 siblings, 3 replies; 56+ messages in thread
From: Eli Zaretskii @ 2002-05-12 5:28 UTC (permalink / raw)
Cc: emacs-devel
On 12 May 2002 storm@cua.dk wrote:
> Apropos command (regexp):
>
> which may be nonsense to some novice users.
>
>
> Wouldn't it be simpler (for a novice user -- and for advanced users
> too) to simply write one or more words (substrings) and then search
> for all combinations of those words (substrings) in the relevant list.
>
> E.g. C-h a open file RET would find any matching
>
> open.*file and file.*open
>
> BTW, this obvious example doesn't find `find-file' :-(
> Maybe we should have a defalias open-file -> find-file ?
Perhaps we should have a new command for that, and name it something like
apropos-keywords.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-12 5:28 ` Eli Zaretskii
@ 2002-05-12 5:38 ` Eli Zaretskii
2002-05-13 1:40 ` Miles Bader
2002-05-13 19:11 ` Kim F. Storm
2 siblings, 0 replies; 56+ messages in thread
From: Eli Zaretskii @ 2002-05-12 5:38 UTC (permalink / raw)
On Sun, 12 May 2002, I wrote:
> Perhaps we should have a new command for that, and name it something like
> apropos-keywords.
Btw, where such keyword-based searches will really be a bonus is in the
Info `i' command. Currently, `i' only searches for the string you type
literally; it's probably a good idea to augment that with additionally
searching for the individual words.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-12 5:28 ` Eli Zaretskii
2002-05-12 5:38 ` Eli Zaretskii
@ 2002-05-13 1:40 ` Miles Bader
2002-05-13 19:18 ` Kim F. Storm
2002-05-13 19:11 ` Kim F. Storm
2 siblings, 1 reply; 56+ messages in thread
From: Miles Bader @ 2002-05-13 1:40 UTC (permalink / raw)
Cc: storm, emacs-devel
Eli Zaretskii <eliz@is.elta.co.il> writes:
> > Wouldn't it be simpler (for a novice user -- and for advanced users
> > too) to simply write one or more words (substrings) and then search
> > for all combinations of those words (substrings) in the relevant list.
> >
> > E.g. C-h a open file RET would find any matching
> >
> > open.*file and file.*open
>
> Perhaps we should have a new command for that, and name it something like
> apropos-keywords.
I agree, but I think it shouldn't use the wierd hack on regexp syntax,
that's just confusing.
I'd say just separate the keywords by looking for commas or whitespace
or either (each `keyword' could be a regexp though). That would be
both more convenient and also more familiar to people used to using
typical keyword searches (e.g., in web search engines).
-Miles
--
Would you like fries with that?
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-13 1:40 ` Miles Bader
@ 2002-05-13 19:18 ` Kim F. Storm
2002-05-14 5:55 ` Miles Bader
0 siblings, 1 reply; 56+ messages in thread
From: Kim F. Storm @ 2002-05-13 19:18 UTC (permalink / raw)
Cc: Eli Zaretskii, emacs-devel
Miles Bader <miles@lsi.nec.co.jp> writes:
> Eli Zaretskii <eliz@is.elta.co.il> writes:
> > > Wouldn't it be simpler (for a novice user -- and for advanced users
> > > too) to simply write one or more words (substrings) and then search
> > > for all combinations of those words (substrings) in the relevant list.
> > >
> > > E.g. C-h a open file RET would find any matching
> > >
> > > open.*file and file.*open
> >
> > Perhaps we should have a new command for that, and name it something like
> > apropos-keywords.
>
> I agree, but I think it shouldn't use the wierd hack on regexp syntax,
> that's just confusing.
I agree that we might find something better than what I suggested; it's
a starting point which can be improved...
>
> I'd say just separate the keywords by looking for commas or whitespace
> or either (each `keyword' could be a regexp though). That would be
> both more convenient and also more familiar to people used to using
> typical keyword searches (e.g., in web search engines).
We could put \b around the words in the regexp if we don't want substring
matching.
The obvious problem restricting this to complete words is how to make
e.g. "list process" match "list-processes".
--
Kim F. Storm <storm@cua.dk> http://www.cua.dk
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-13 19:18 ` Kim F. Storm
@ 2002-05-14 5:55 ` Miles Bader
0 siblings, 0 replies; 56+ messages in thread
From: Miles Bader @ 2002-05-14 5:55 UTC (permalink / raw)
Cc: Eli Zaretskii, emacs-devel
storm@cua.dk (Kim F. Storm) writes:
> We could put \b around the words in the regexp if we don't want substring
> matching.
Hmmm, personally I quite often want substring matching in apropos
searches, though it's usually the _end_ of the word where it matters;
perhaps just putting \< at the beginning would be alright (iff the
search term starts with a word constituent).
-Miles
--
Come now, if we were really planning to harm you, would we be waiting here,
beside the path, in the very darkest part of the forest?
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-12 5:28 ` Eli Zaretskii
2002-05-12 5:38 ` Eli Zaretskii
2002-05-13 1:40 ` Miles Bader
@ 2002-05-13 19:11 ` Kim F. Storm
2002-05-14 5:38 ` Miles Bader
2002-05-15 7:00 ` Richard Stallman
2 siblings, 2 replies; 56+ messages in thread
From: Kim F. Storm @ 2002-05-13 19:11 UTC (permalink / raw)
Cc: emacs-devel
Eli Zaretskii <eliz@is.elta.co.il> writes:
> On 12 May 2002 storm@cua.dk wrote:
>
> > Apropos command (regexp):
> >
> > which may be nonsense to some novice users.
> >
> >
> > Wouldn't it be simpler (for a novice user -- and for advanced users
> > too) to simply write one or more words (substrings) and then search
> > for all combinations of those words (substrings) in the relevant list.
> >
> > E.g. C-h a open file RET would find any matching
> >
> > open.*file and file.*open
> >
> > BTW, this obvious example doesn't find `find-file' :-(
> > Maybe we should have a defalias open-file -> find-file ?
>
> Perhaps we should have a new command for that, and name it something like
> apropos-keywords.
I disagree.
IMO, the purpose of the various apropos commands are to find
(let's call it) "interesting information".
To fulfill that purpose, all apropos commands should
a) be easy to use - especially for the novice user
b) accept the same type of "search patterns".
I think giving the apropos commands a keyword based interface is a
good way to accomplish (a), and having an specific apropos-keywords
command breaks (b).
Also, what would apropos-keywords look for? `commands', `variables'
`documentation', or "all of the above" ?
++kfs
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-13 19:11 ` Kim F. Storm
@ 2002-05-14 5:38 ` Miles Bader
2002-05-15 7:00 ` Richard Stallman
1 sibling, 0 replies; 56+ messages in thread
From: Miles Bader @ 2002-05-14 5:38 UTC (permalink / raw)
Cc: Eli Zaretskii, emacs-devel
no-spam@cua.dk (Kim F. Storm) writes:
> > Perhaps we should have a new command for that, and name it something like
> > apropos-keywords.
>
> I disagree.
>
> all apropos commands should
> a) be easy to use - especially for the novice user
> b) accept the same type of "search patterns".
>
> I think giving the apropos commands a keyword based interface is a
> good way to accomplish (a), and having an specific apropos-keywords
> command breaks (b).
I tend to agree with this.
I think that an enhanced `keyword' interface can be made so that it
won't interfere unduly with traditional regexp apropos usage anyway.
For instance, if we use commas & whitespace to separate the `keywords',
I suspect it's very rare for either to occur inside apropos regexps,
simply because the things being searched for contain neither commas or
whitespace (so such a regexp would always fail). [the exceptions are
things like `apropos-documentation' (and apropos-zippy!), but even then,
it's probably still true in practice]
-Miles
--
Suburbia: where they tear out the trees and then name streets after them.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-13 19:11 ` Kim F. Storm
2002-05-14 5:38 ` Miles Bader
@ 2002-05-15 7:00 ` Richard Stallman
2002-05-15 11:23 ` Miles Bader
2002-05-15 21:55 ` Kim F. Storm
1 sibling, 2 replies; 56+ messages in thread
From: Richard Stallman @ 2002-05-15 7:00 UTC (permalink / raw)
Cc: eliz, emacs-devel
I think giving the apropos commands a keyword based interface is a
good way to accomplish (a), and having an specific apropos-keywords
command breaks (b).
Are you suggesting all apropos commands should work by keywords
instead of by regexps?
Your "all permutations" seems useful -- but I wonder whether it is
overkill...
So my idea of just searching for any entry matching at least two keywords
will find all the entries found by searching for all combinations - and
it may find some entries the user didn't think about...
What exactly is the difference between these two alternatives?
That isn't clear to me.
The obvious problem restricting this to complete words is how to make
e.g. "list process" match "list-processes".
That is a good point. We want the specified keywords to match
subsets of words in the command name.
I wonder if the `apropos keyword' command being discussed could maintain
a list of common `equivalents', and try substituting some if the
original apropos doesn't return anything useful (or maybe even if
returns only a few matches).
That is a natural extension.
Looking for an equivalent in this list should work by substring match
too. And if an equivalent is found, searching for it in command names
or elsewhere should also use substring match.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-15 7:00 ` Richard Stallman
@ 2002-05-15 11:23 ` Miles Bader
2002-05-15 21:59 ` Kim F. Storm
2002-05-16 20:24 ` Richard Stallman
2002-05-15 21:55 ` Kim F. Storm
1 sibling, 2 replies; 56+ messages in thread
From: Miles Bader @ 2002-05-15 11:23 UTC (permalink / raw)
Cc: kfs, eliz, emacs-devel
Richard Stallman <rms@gnu.org> writes:
> Are you suggesting all apropos commands should work by keywords
> instead of by regexps?
The way I envisioned it was all apropos commands taking _list_ of
regexps (separated by whitespace/commas), and applying them in an `and'
manner. That way, the case where each element is a simple word word
would act like a typical keyword search, and the case where there's
only one entry would act like the current single-regexp implementation.
Given the particular nature of apropos usage in emacs, I think there
wouldn't be any conflict in practice from using whitespace/commas as
the list delimiters.
There might also be other things we can do, like if a term begins with
an alphabetic character, anchor it with \<. [anchoring the end is
probably not a good idea, because people often expect searches to match
prefixes, I think.]
-Miles
--
[|nurgle|] ddt- demonic? so quake will have an evil kinda setting? one that
will make every christian in the world foamm at the mouth?
[iddt] nurg, that's the goal
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-15 11:23 ` Miles Bader
@ 2002-05-15 21:59 ` Kim F. Storm
2002-05-16 1:26 ` Miles Bader
2002-05-16 4:54 ` Eli Zaretskii
2002-05-16 20:24 ` Richard Stallman
1 sibling, 2 replies; 56+ messages in thread
From: Kim F. Storm @ 2002-05-15 21:59 UTC (permalink / raw)
Cc: rms, eliz, emacs-devel
Miles Bader <miles@gnu.org> writes:
> Richard Stallman <rms@gnu.org> writes:
> > Are you suggesting all apropos commands should work by keywords
> > instead of by regexps?
>
> The way I envisioned it was all apropos commands taking _list_ of
> regexps (separated by whitespace/commas), and applying them in an `and'
> manner. That way, the case where each element is a simple word word
> would act like a typical keyword search, and the case where there's
> only one entry would act like the current single-regexp implementation.
I don't like the "and" approach -- at least not as the default.
(see my previous posting for the rationale).
>
> Given the particular nature of apropos usage in emacs, I think there
> wouldn't be any conflict in practice from using whitespace/commas as
> the list delimiters.
Comparing to WEB search engines, I don't think that separating
keywords with commas is common practice...
>
> There might also be other things we can do, like if a term begins with
> an alphabetic character, anchor it with \<. [anchoring the end is
> probably not a good idea, because people often expect searches to match
> prefixes, I think.]
Consider searching for "grep" -- shouldn't that return "igrep" ?
--
Kim F. Storm <storm@cua.dk> http://www.cua.dk
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-15 21:59 ` Kim F. Storm
@ 2002-05-16 1:26 ` Miles Bader
2002-05-16 22:26 ` Kim F. Storm
2002-05-16 4:54 ` Eli Zaretskii
1 sibling, 1 reply; 56+ messages in thread
From: Miles Bader @ 2002-05-16 1:26 UTC (permalink / raw)
Cc: rms, eliz, emacs-devel
storm@cua.dk (Kim F. Storm) writes:
> I don't like the "and" approach -- at least not as the default.
> (see my previous posting for the rationale).
`and' is the defautl for all search engines that I've used, and has the
advantage of being simple and easy to understand.
Perhaps your idea of `2 or more' would work better in practice, but
it's hard to say without some real experience.
> Comparing to WEB search engines, I don't think that separating
> keywords with commas is common practice...
No, but it's common practice in writing lists.
> Consider searching for "grep" -- shouldn't that return "igrep" ?
Good point; I guess maybe it shouldn't do any anchoring at all (bit
annoying, that, since I often get lots of false hits when my apropos
term unexpectedly happens to occur in the middle of a common english
word...).
-Miles
--
Is it true that nothing can be known? If so how do we know this? -Woody Allen
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-16 1:26 ` Miles Bader
@ 2002-05-16 22:26 ` Kim F. Storm
2002-05-16 21:38 ` Stefan Monnier
` (3 more replies)
0 siblings, 4 replies; 56+ messages in thread
From: Kim F. Storm @ 2002-05-16 22:26 UTC (permalink / raw)
Cc: rms, eliz, emacs-devel
Miles Bader <miles@lsi.nec.co.jp> writes:
> storm@cua.dk (Kim F. Storm) writes:
> > I don't like the "and" approach -- at least not as the default.
> > (see my previous posting for the rationale).
>
> `and' is the defautl for all search engines that I've used, and has the
> advantage of being simple and easy to understand.
A quick test shows that Google, Yahoo, Lycos uses AND
while Altavista, Excite, AskJeeves uses OR.
That's 50/50...
For WEB search engines, I think AND does make sense -- since there
are SOOOO many pages to match. But for a limited universe like
emacs -- which doesn't always use the most obvious terms --
using AND doesn't make a lot of sense to me.
>
> Perhaps your idea of `2 or more' would work better in practice, but
> it's hard to say without some real experience.
I think it is adequate in practice.
>
> > Comparing to WEB search engines, I don't think that separating
> > keywords with commas is common practice...
>
> No, but it's common practice in writing lists.
I don't see the relevance, sorry. But of course, we could ignore
commas in case people use them...
>
> > Consider searching for "grep" -- shouldn't that return "igrep" ?
>
> Good point; I guess maybe it shouldn't do any anchoring at all (bit
> annoying, that, since I often get lots of false hits when my apropos
> term unexpectedly happens to occur in the middle of a common english
> word...).
We could put a "button bar" at the top of the apropos output with
the following buttons:
[Match all words] [anchored match] [search documentation]
--
Kim F. Storm <storm@cua.dk> http://www.cua.dk
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-16 22:26 ` Kim F. Storm
@ 2002-05-16 21:38 ` Stefan Monnier
2002-05-17 11:59 ` Kai Großjohann
2002-05-18 18:48 ` Richard Stallman
2002-05-16 21:58 ` Miles Bader
` (2 subsequent siblings)
3 siblings, 2 replies; 56+ messages in thread
From: Stefan Monnier @ 2002-05-16 21:38 UTC (permalink / raw)
Cc: Miles Bader, rms, eliz, emacs-devel
This discussion focuses too narrowly on apropos for my taste.
Of course, you may argue "it's because that's what the subject says",
but truly I think that if we want to spiff things up, we had better
do something that searches through var names, fun names, docstrings
and the manuals. For instance all the talk about adding "synonyms"
seems to be unnecessary if you consider that an Info-index search will
already provide that kind of thing.
Stefan
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-16 21:38 ` Stefan Monnier
@ 2002-05-17 11:59 ` Kai Großjohann
2002-05-18 18:48 ` Richard Stallman
1 sibling, 0 replies; 56+ messages in thread
From: Kai Großjohann @ 2002-05-17 11:59 UTC (permalink / raw)
Cc: Kim F. Storm, Miles Bader, rms, eliz, emacs-devel
"Stefan Monnier" <monnier+gnu/emacs@RUM.cs.yale.edu> writes:
> This discussion focuses too narrowly on apropos for my taste.
> Of course, you may argue "it's because that's what the subject says",
> but truly I think that if we want to spiff things up, we had better
> do something that searches through var names, fun names, docstrings
> and the manuals. For instance all the talk about adding "synonyms"
> seems to be unnecessary if you consider that an Info-index search will
> already provide that kind of thing.
I started writing documentation.el. The idea was to search all
documentation that's available to Emacs: symbol names, docstrings,
info files, man pages, ... I wish I had time to pursue this project :-|
kai
--
Silence is foo!
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-16 21:38 ` Stefan Monnier
2002-05-17 11:59 ` Kai Großjohann
@ 2002-05-18 18:48 ` Richard Stallman
2002-05-18 22:24 ` Stefan Monnier
1 sibling, 1 reply; 56+ messages in thread
From: Richard Stallman @ 2002-05-18 18:48 UTC (permalink / raw)
Cc: storm, miles, eliz, emacs-devel
For instance all the talk about adding "synonyms"
seems to be unnecessary if you consider that an Info-index search will
already provide that kind of thing.
Could you please give a concrete explanation of what you mean?
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-18 18:48 ` Richard Stallman
@ 2002-05-18 22:24 ` Stefan Monnier
2002-05-19 12:02 ` Kai Großjohann
2002-05-19 19:40 ` Richard Stallman
0 siblings, 2 replies; 56+ messages in thread
From: Stefan Monnier @ 2002-05-18 22:24 UTC (permalink / raw)
Cc: monnier+gnu/emacs, storm, miles, eliz, emacs-devel
> For instance all the talk about adding "synonyms"
> seems to be unnecessary if you consider that an Info-index search will
> already provide that kind of thing.
> Could you please give a concrete explanation of what you mean?
C-h i m emacs i paste RET
will make you jump to the glossary where it says "see killing and yanking",
so the user should then know to search for `yank'.
Stefan
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-18 22:24 ` Stefan Monnier
@ 2002-05-19 12:02 ` Kai Großjohann
2002-05-19 14:50 ` Eli Zaretskii
2002-05-19 19:40 ` Richard Stallman
1 sibling, 1 reply; 56+ messages in thread
From: Kai Großjohann @ 2002-05-19 12:02 UTC (permalink / raw)
Cc: Richard Stallman, storm, miles, eliz, emacs-devel
"Stefan Monnier" <monnier+gnu/emacs@RUM.cs.yale.edu> writes:
>> For instance all the talk about adding "synonyms"
>> seems to be unnecessary if you consider that an Info-index search will
>> already provide that kind of thing.
>> Could you please give a concrete explanation of what you mean?
>
> C-h i m emacs i paste RET
> will make you jump to the glossary where it says "see killing and yanking",
> so the user should then know to search for `yank'.
This won't work for more complicated queries, only for the
single-word case. What happens when the user searches for "cut
line"? The fact that "cut" is mentioned in the glossary won't help
so much. (I assume that "cut" is mentioned in the glossary.)
kai
--
Silence is foo!
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-19 12:02 ` Kai Großjohann
@ 2002-05-19 14:50 ` Eli Zaretskii
2002-05-19 15:23 ` Kai Großjohann
0 siblings, 1 reply; 56+ messages in thread
From: Eli Zaretskii @ 2002-05-19 14:50 UTC (permalink / raw)
Cc: emacs-devel
> From: Kai.Grossjohann@CS.Uni-Dortmund.DE
> Date: Sun, 19 May 2002 14:02:24 +0200
>
> > C-h i m emacs i paste RET
> > will make you jump to the glossary where it says "see killing and yanking",
> > so the user should then know to search for `yank'.
>
> This won't work for more complicated queries, only for the
> single-word case. What happens when the user searches for "cut
> line"? The fact that "cut" is mentioned in the glossary won't help
> so much.
The same problem will happen if the user searches for "yank line".
So I'm not sure what is your point; it is quite clear that some of
the searches will fail to find all the keywords, and will have to do
something smart about that.
> (I assume that "cut" is mentioned in the glossary.)
The Glossary has "cut and paste".
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-19 14:50 ` Eli Zaretskii
@ 2002-05-19 15:23 ` Kai Großjohann
0 siblings, 0 replies; 56+ messages in thread
From: Kai Großjohann @ 2002-05-19 15:23 UTC (permalink / raw)
Cc: emacs-devel
"Eli Zaretskii" <eliz@is.elta.co.il> writes:
>> From: Kai.Grossjohann@CS.Uni-Dortmund.DE
>> Date: Sun, 19 May 2002 14:02:24 +0200
>>
>> > C-h i m emacs i paste RET
>> > will make you jump to the glossary where it says "see killing and yanking",
>> > so the user should then know to search for `yank'.
>>
>> This won't work for more complicated queries, only for the
>> single-word case. What happens when the user searches for "cut
>> line"? The fact that "cut" is mentioned in the glossary won't help
>> so much.
>
> The same problem will happen if the user searches for "yank line".
> So I'm not sure what is your point; it is quite clear that some of
> the searches will fail to find all the keywords, and will have to do
> something smart about that.
When the user enters a query `foo bar' and Emacs knows that quux is a
synonym for foo, then the system should search for `foo bar quux'.
(The preceding sentence assumes that the system interprets the query
in a best-match fashion. It will have to be changed accordingly for
other query interpretations.)
The fact that the system already finds the glossary entry for quux
when the user enters foo doesn't provide the functionality that's
described in my first sentence in this message.
Does that make it clearer what my point is?
kai
--
Silence is foo!
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-18 22:24 ` Stefan Monnier
2002-05-19 12:02 ` Kai Großjohann
@ 2002-05-19 19:40 ` Richard Stallman
2002-05-19 23:33 ` Kim F. Storm
1 sibling, 1 reply; 56+ messages in thread
From: Richard Stallman @ 2002-05-19 19:40 UTC (permalink / raw)
Cc: monnier+gnu/emacs, storm, miles, eliz, emacs-devel
C-h i m emacs i paste RET
will make you jump to the glossary where it says "see killing and yanking",
so the user should then know to search for `yank'.
Still, it would not hurt if apropos knew this too
and used the knowledge automatically.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-19 19:40 ` Richard Stallman
@ 2002-05-19 23:33 ` Kim F. Storm
2002-05-20 9:50 ` Alex Schroeder
0 siblings, 1 reply; 56+ messages in thread
From: Kim F. Storm @ 2002-05-19 23:33 UTC (permalink / raw)
Cc: monnier+gnu/emacs, miles, eliz, emacs-devel
Richard Stallman <rms@gnu.org> writes:
> C-h i m emacs i paste RET
> will make you jump to the glossary where it says "see killing and yanking",
> so the user should then know to search for `yank'.
>
> Still, it would not hurt if apropos knew this too
> and used the knowledge automatically.
In case the user enters M-x apropos RET paste RET, it would make more
sense if the response is an informative message:
Note: Emacs uses the term "yank" for pasting text into a buffer.
It could then offer to search using "yank" instead of "paste".
This would have the benefit of teaching the user the proper terms.
--
Kim F. Storm <storm@cua.dk> http://www.cua.dk
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-16 22:26 ` Kim F. Storm
2002-05-16 21:38 ` Stefan Monnier
@ 2002-05-16 21:58 ` Miles Bader
2002-05-17 12:01 ` Kai Großjohann
2002-05-17 21:56 ` Kim F. Storm
2002-05-17 6:15 ` Eli Zaretskii
2002-05-17 11:58 ` Kai Großjohann
3 siblings, 2 replies; 56+ messages in thread
From: Miles Bader @ 2002-05-16 21:58 UTC (permalink / raw)
Cc: rms, eliz, emacs-devel
storm@cua.dk (Kim F. Storm) writes:
> > `and' is the defautl for all search engines that I've used, and has the
> > advantage of being simple and easy to understand.
>
> A quick test shows that Google, Yahoo, Lycos uses AND
> while Altavista, Excite, AskJeeves uses OR.
>
> For WEB search engines, I think AND does make sense -- since there
> are SOOOO many pages to match. But for a limited universe like
> emacs -- which doesn't always use the most obvious terms --
> using AND doesn't make a lot of sense to me.
`or' is clearly wrong; even in emacs' `limited' universe, it generates
way too many hits.
E.g., (apropos "\\(find.*file\\|file.*find\\)") gets about 50 hits,
whereas (apropos "\\(find\\|file\\)") gets over 700!
Maybe your idea of `at least N matches' is a good compromise.
> > No, but it's common practice in writing lists.
>
> I don't see the relevance, sorry. But of course, we could ignore
> commas in case people use them...
That's the point. I wasn't suggesting that they be required.
> We could put a "button bar" at the top of the apropos output with
> the following buttons:
>
> [Match all words] [anchored match] [search documentation]
That seems like a good idea in general; I'd even like the ability to do
other sorts of apropos searches, e.g., in a command-apropos buffer,
have a button that does a variable-apropos on the same search terms.
-Miles
--
Saa, shall we dance? (from a dance-class advertisement)
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-16 21:58 ` Miles Bader
@ 2002-05-17 12:01 ` Kai Großjohann
2002-05-17 21:56 ` Kim F. Storm
1 sibling, 0 replies; 56+ messages in thread
From: Kai Großjohann @ 2002-05-17 12:01 UTC (permalink / raw)
Cc: Kim F. Storm, rms, eliz, emacs-devel
Miles Bader <miles@gnu.org> writes:
> `or' is clearly wrong; even in emacs' `limited' universe, it generates
> way too many hits.
>
> E.g., (apropos "\\(find.*file\\|file.*find\\)") gets about 50 hits,
> whereas (apropos "\\(find\\|file\\)") gets over 700!
>
> Maybe your idea of `at least N matches' is a good compromise.
Information Retrieval research has shown that weighting and ranking is
what's needed. Just list the "good" matches first. With Boolean
searches, people need to issue a lot of queries to select an
appropriate answer set. If you have ranking, fewer queries will be
sufficient.
But it might be useful to somehow indicate to the user the nature of
each match so that the user can decide what they want. For example:
this group of matches contains all words, the following group of
matches misses the word foo, ...
kai
--
Silence is foo!
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-16 21:58 ` Miles Bader
2002-05-17 12:01 ` Kai Großjohann
@ 2002-05-17 21:56 ` Kim F. Storm
2002-05-18 6:31 ` Eli Zaretskii
2002-05-18 22:47 ` Stefan Monnier
1 sibling, 2 replies; 56+ messages in thread
From: Kim F. Storm @ 2002-05-17 21:56 UTC (permalink / raw)
Cc: rms, eliz, emacs-devel
Miles Bader <miles@gnu.org> writes:
> `or' is clearly wrong; even in emacs' `limited' universe, it generates
> way too many hits.
>
> E.g., (apropos "\\(find.*file\\|file.*find\\)") gets about 50 hits,
> whereas (apropos "\\(find\\|file\\)") gets over 700!
Actually your first example finds 67 matches on my emacs.
Your second example is an unfair comparison, as my proposal was that
at least two matching words should be required. This can be illustrated
by (apropos "\\(find\\|file\\).*\\(find\\|file\\)") which finds 74 hits.
So I don't see the big difference... The reason there are a few more
matches with the second pattern is that it also finds entries with
either word occurring twice.
But I never claimed my sample implementation is perfect :-)
--
Kim F. Storm <storm@cua.dk> http://www.cua.dk
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-16 22:26 ` Kim F. Storm
2002-05-16 21:38 ` Stefan Monnier
2002-05-16 21:58 ` Miles Bader
@ 2002-05-17 6:15 ` Eli Zaretskii
2002-05-17 11:58 ` Kai Großjohann
3 siblings, 0 replies; 56+ messages in thread
From: Eli Zaretskii @ 2002-05-17 6:15 UTC (permalink / raw)
Cc: emacs-devel
> From: storm@cua.dk
> Date: 17 May 2002 00:26:43 +0200
>
> A quick test shows that Google, Yahoo, Lycos uses AND
> while Altavista, Excite, AskJeeves uses OR.
>
> That's 50/50...
Perhaps that's because they want to show off the number of hits they
return. I was always annoyed by ORing, and many times catch myself
forgetting to type the magic that makes it do an AND. Because I
always want the AND method.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-16 22:26 ` Kim F. Storm
` (2 preceding siblings ...)
2002-05-17 6:15 ` Eli Zaretskii
@ 2002-05-17 11:58 ` Kai Großjohann
3 siblings, 0 replies; 56+ messages in thread
From: Kai Großjohann @ 2002-05-17 11:58 UTC (permalink / raw)
Cc: Miles Bader, rms, eliz, emacs-devel
storm@cua.dk (Kim F. Storm) writes:
> We could put a "button bar" at the top of the apropos output with
> the following buttons:
>
> [Match all words] [anchored match] [search documentation]
Good idea.
kai
--
Silence is foo!
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-15 21:59 ` Kim F. Storm
2002-05-16 1:26 ` Miles Bader
@ 2002-05-16 4:54 ` Eli Zaretskii
2002-05-16 22:10 ` Kim F. Storm
2002-05-18 18:49 ` Richard Stallman
1 sibling, 2 replies; 56+ messages in thread
From: Eli Zaretskii @ 2002-05-16 4:54 UTC (permalink / raw)
Cc: emacs-devel
On 15 May 2002 storm@cua.dk wrote:
> I don't like the "and" approach -- at least not as the default.
I'm afraid anything else will bring too many hits. A docs search tool
that returns gobs of information is not very useful, in my experience.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-16 4:54 ` Eli Zaretskii
@ 2002-05-16 22:10 ` Kim F. Storm
2002-05-16 21:20 ` Miles Bader
2002-05-18 18:49 ` Richard Stallman
1 sibling, 1 reply; 56+ messages in thread
From: Kim F. Storm @ 2002-05-16 22:10 UTC (permalink / raw)
Cc: emacs-devel
Eli Zaretskii <eliz@is.elta.co.il> writes:
> On 15 May 2002 storm@cua.dk wrote:
>
> > I don't like the "and" approach -- at least not as the default.
>
> I'm afraid anything else will bring too many hits. A docs search tool
> that returns gobs of information is not very useful, in my experience.
Some search engines order hits depending on how many words matches.
I guess we could achieve the same in emacs.
Alternatively, if matching only two words gives too many matches
for documentation, require three (or four) matching words.
--
Kim F. Storm <storm@cua.dk> http://www.cua.dk
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-16 22:10 ` Kim F. Storm
@ 2002-05-16 21:20 ` Miles Bader
2002-05-17 6:13 ` Eli Zaretskii
0 siblings, 1 reply; 56+ messages in thread
From: Miles Bader @ 2002-05-16 21:20 UTC (permalink / raw)
Cc: Eli Zaretskii, emacs-devel
storm@cua.dk (Kim F. Storm) writes:
> Some search engines order hits depending on how many words matches.
> I guess we could achieve the same in emacs.
>
> Alternatively, if matching only two words gives too many matches
> for documentation, require three (or four) matching words.
I think it's clear that we need a bit of experience with this stuff, so
we can see how well the various alternatives actually work in practice,
rather than sitting around pontificating...
-Miles
--
Would you like fries with that?
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-16 4:54 ` Eli Zaretskii
2002-05-16 22:10 ` Kim F. Storm
@ 2002-05-18 18:49 ` Richard Stallman
2002-05-19 4:51 ` Eli Zaretskii
1 sibling, 1 reply; 56+ messages in thread
From: Richard Stallman @ 2002-05-18 18:49 UTC (permalink / raw)
Cc: storm, emacs-devel
> I don't like the "and" approach -- at least not as the default.
I'm afraid anything else will bring too many hits.
The rule that at least two of the keywords must match
should not bring too many hits, I would think.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-18 18:49 ` Richard Stallman
@ 2002-05-19 4:51 ` Eli Zaretskii
2002-05-19 19:40 ` Richard Stallman
2002-05-19 23:29 ` Kim F. Storm
0 siblings, 2 replies; 56+ messages in thread
From: Eli Zaretskii @ 2002-05-19 4:51 UTC (permalink / raw)
Cc: emacs-devel
On Sat, 18 May 2002, Richard Stallman wrote:
> > I don't like the "and" approach -- at least not as the default.
>
> I'm afraid anything else will bring too many hits.
>
> The rule that at least two of the keywords must match
> should not bring too many hits, I would think.
An example with two words discussed here brought about 70 hits, which
IMHO is too many.
Moreover, I think a rule based on the number of matched keywords is not
good enough, since sometimes even one word is enough to yield a very
accurate result. Try "M-x apropos bell RET", for example.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-19 4:51 ` Eli Zaretskii
@ 2002-05-19 19:40 ` Richard Stallman
2002-05-19 23:29 ` Kim F. Storm
1 sibling, 0 replies; 56+ messages in thread
From: Richard Stallman @ 2002-05-19 19:40 UTC (permalink / raw)
Cc: emacs-devel
An example with two words discussed here brought about 70 hits, which
IMHO is too many.
This is no disaster. The current apropos can easily give more matches
than that. With argument `file' it gives hundreds of matches.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-19 4:51 ` Eli Zaretskii
2002-05-19 19:40 ` Richard Stallman
@ 2002-05-19 23:29 ` Kim F. Storm
2002-05-20 3:31 ` Eli Zaretskii
1 sibling, 1 reply; 56+ messages in thread
From: Kim F. Storm @ 2002-05-19 23:29 UTC (permalink / raw)
Cc: Richard Stallman, emacs-devel
Eli Zaretskii <eliz@is.elta.co.il> writes:
> On Sat, 18 May 2002, Richard Stallman wrote:
>
> > > I don't like the "and" approach -- at least not as the default.
> >
> > I'm afraid anything else will bring too many hits.
> >
> > The rule that at least two of the keywords must match
> > should not bring too many hits, I would think.
>
> An example with two words discussed here brought about 70 hits, which
> IMHO is too many.
If those two words are find and file, there *are* 70 commands which
contains both words ... how do you suggest emacs should decide which of
those to show (if you insist that 70 is too many)?
>
> Moreover, I think a rule based on the number of matched keywords is not
> good enough, since sometimes even one word is enough to yield a very
> accurate result. Try "M-x apropos bell RET", for example.
So does M-x apropos RET ring bell RET :-)
--
Kim F. Storm <storm@cua.dk> http://www.cua.dk
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-19 23:29 ` Kim F. Storm
@ 2002-05-20 3:31 ` Eli Zaretskii
0 siblings, 0 replies; 56+ messages in thread
From: Eli Zaretskii @ 2002-05-20 3:31 UTC (permalink / raw)
Cc: emacs-devel
> From: storm@cua.dk
> Date: 20 May 2002 01:29:09 +0200
>
> > An example with two words discussed here brought about 70 hits, which
> > IMHO is too many.
>
> If those two words are find and file, there *are* 70 commands which
> contains both words ... how do you suggest emacs should decide which of
> those to show (if you insist that 70 is too many)?
It should suggest to refine the search in some way.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-15 11:23 ` Miles Bader
2002-05-15 21:59 ` Kim F. Storm
@ 2002-05-16 20:24 ` Richard Stallman
1 sibling, 0 replies; 56+ messages in thread
From: Richard Stallman @ 2002-05-16 20:24 UTC (permalink / raw)
Cc: kfs, eliz, emacs-devel
The way I envisioned it was all apropos commands taking _list_ of
regexps (separated by whitespace/commas), and applying them in an `and'
manner.
The other suggestion, to look for names that contain at least two of the
specified terms, seems more useful to me.
Given the particular nature of apropos usage in emacs, I think there
wouldn't be any conflict in practice from using whitespace/commas as
the list delimiters.
I agree.
There might also be other things we can do, like if a term begins with
an alphabetic character, anchor it with \<.
I disagree; I think it is better to allow prefixes too.
That will tend to find a small number of extra matches,
which will not do much harm, and sometimes it will do good.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-15 7:00 ` Richard Stallman
2002-05-15 11:23 ` Miles Bader
@ 2002-05-15 21:55 ` Kim F. Storm
2002-05-16 4:52 ` Eli Zaretskii
1 sibling, 1 reply; 56+ messages in thread
From: Kim F. Storm @ 2002-05-15 21:55 UTC (permalink / raw)
Cc: eliz, emacs-devel
Richard Stallman <rms@gnu.org> writes:
> I think giving the apropos commands a keyword based interface is a
> good way to accomplish (a), and having an specific apropos-keywords
> command breaks (b).
>
> Are you suggesting all apropos commands should work by keywords
> instead of by regexps?
Instead of: no
In addition to: yes
>
> Your "all permutations" seems useful -- but I wonder whether it is
> overkill...
>
> So my idea of just searching for any entry matching at least two keywords
> will find all the entries found by searching for all combinations - and
> it may find some entries the user didn't think about...
>
> What exactly is the difference between these two alternatives?
> That isn't clear to me.
If a user enters keywords "find window mini", the first approach will
only find the entries containing all of find, window, and mini, while
the second approach will find the entries which contains two or more
of the keywords.
Using the second approach has a more "novice" appeal:
if don't know what a specific function is called, it will be
easier to enter a few more alternatives, and see what turns up.
-- it specifying more words returns more alternatives.
>
> The obvious problem restricting this to complete words is how to make
> e.g. "list process" match "list-processes".
>
> That is a good point. We want the specified keywords to match
> subsets of words in the command name.
And if the user enters `grep', it should also match `igrep' (if that
command exists).
>
> I wonder if the `apropos keyword' command being discussed could maintain
> a list of common `equivalents', and try substituting some if the
> original apropos doesn't return anything useful (or maybe even if
> returns only a few matches).
>
> That is a natural extension.
Yes, I like that proposal.
>
> Looking for an equivalent in this list should work by substring match
> too. And if an equivalent is found, searching for it in command names
> or elsewhere should also use substring match.
I agree.
--
Kim F. Storm <storm@cua.dk> http://www.cua.dk
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-12 0:57 Apropos commands and regexps Kim F. Storm
2002-05-12 5:28 ` Eli Zaretskii
@ 2002-05-12 10:06 ` Kai Großjohann
2002-05-12 17:03 ` Alex Schroeder
2002-05-16 11:04 ` Kai Großjohann
2002-05-23 21:41 ` Kim F. Storm
3 siblings, 1 reply; 56+ messages in thread
From: Kai Großjohann @ 2002-05-12 10:06 UTC (permalink / raw)
Cc: emacs-devel
storm@cua.dk (Kim F. Storm) writes:
> E.g. C-h a open file RET would find any matching
>
> open.*file and file.*open
Way cool!
kai
--
Silence is foo!
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-12 10:06 ` Kai Großjohann
@ 2002-05-12 17:03 ` Alex Schroeder
2002-05-13 19:26 ` Kim F. Storm
0 siblings, 1 reply; 56+ messages in thread
From: Alex Schroeder @ 2002-05-12 17:03 UTC (permalink / raw)
Cc: Kim F. Storm, emacs-devel
Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:
> storm@cua.dk (Kim F. Storm) writes:
>
>> E.g. C-h a open file RET would find any matching
>>
>> open.*file and file.*open
>
> Way cool!
I posted a similar thing in elisp some time back. Searching for it on
Google returned this:
Von:Alex Schroeder (asc@bsiag.com)
Betrifft:Re: Emacs Boolean Help
Newsgroups:gnu.emacs.gnus
Datum:2000-08-17 10:07:22 PST
(defun ed-apropos (keywords)
"Search for KEYWORDS.
This uses `apropos'. All the keywords must match.
KEYWORDS can be a comma-separated list."
(interactive "sKeywords: (comma-separated) ")
(apropos (my-csv-string-to-regexp keywords)))
(defun my-csv-string-to-regexp (str)
"Translate comma separated values into regexp.
A,B,C turns into \\(A.*B.*C\\|A.*C.*B\\|B.*A.*C\\|B.*C.*A\\|C.*A.*B\\|C.*B.*A\\)
(let* ((l (perms (split-string str ",\\s-*"))))
(mapconcat (function (lambda (n)
(mapconcat 'identity n ".*"))) l "\\|")))
;; thanks to Christoph Conrad <cc@cli.de>
(require 'cl)
(defun perms (l)
(if (null l)
(list '())
(mapcan #'(lambda( a )
(mapcan #'(lambda( p )
(list (cons a p)))
(perms (remove* a l :count 1))))
l)))
--
http://www.electronicintifada.net/diaries/index.html
http://www.us-israel.org/jsource/US-Israel/hr2506c.html
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-12 17:03 ` Alex Schroeder
@ 2002-05-13 19:26 ` Kim F. Storm
2002-05-14 5:26 ` Miles Bader
0 siblings, 1 reply; 56+ messages in thread
From: Kim F. Storm @ 2002-05-13 19:26 UTC (permalink / raw)
Cc: Kai Großjohann, emacs-devel
Alex Schroeder <alex@emacswiki.org> writes:
> Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:
>
> > storm@cua.dk (Kim F. Storm) writes:
> >
> >> E.g. C-h a open file RET would find any matching
> >>
> >> open.*file and file.*open
> >
> > Way cool!
>
> I posted a similar thing in elisp some time back. Searching for it on
> Google returned this:
Interesting...
However, my proposal is to make keyword matching the default
(alternative) behaviour for all apropos commands.
Your "all permutations" seems useful -- but I wonder whether it is
overkill... If a user is searching for some command which does
something "useful", it is already quite hard to guess the terms emacs
may be using to accomplish a given task (e.g. some novice users may search
for "change file" when they really should look for "switch buffer").
So my idea of just searching for any entry matching at least two keywords
will find all the entries found by searching for all combinations - and
it may find some entries the user didn't think about...
>
> Von:Alex Schroeder (asc@bsiag.com)
> Betrifft:Re: Emacs Boolean Help
> Newsgroups:gnu.emacs.gnus
> Datum:2000-08-17 10:07:22 PST
>
> (defun ed-apropos (keywords)
> "Search for KEYWORDS.
> This uses `apropos'. All the keywords must match.
> KEYWORDS can be a comma-separated list."
> (interactive "sKeywords: (comma-separated) ")
> (apropos (my-csv-string-to-regexp keywords)))
>
> (defun my-csv-string-to-regexp (str)
> "Translate comma separated values into regexp.
> A,B,C turns into \\(A.*B.*C\\|A.*C.*B\\|B.*A.*C\\|B.*C.*A\\|C.*A.*B\\|C.*B.*A\\)
> (let* ((l (perms (split-string str ",\\s-*"))))
> (mapconcat (function (lambda (n)
> (mapconcat 'identity n ".*"))) l "\\|")))
>
> ;; thanks to Christoph Conrad <cc@cli.de>
> (require 'cl)
> (defun perms (l)
> (if (null l)
> (list '())
> (mapcan #'(lambda( a )
> (mapcan #'(lambda( p )
> (list (cons a p)))
> (perms (remove* a l :count 1))))
> l)))
>
--
Kim F. Storm <storm@cua.dk> http://www.cua.dk
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-13 19:26 ` Kim F. Storm
@ 2002-05-14 5:26 ` Miles Bader
0 siblings, 0 replies; 56+ messages in thread
From: Miles Bader @ 2002-05-14 5:26 UTC (permalink / raw)
Cc: Alex Schroeder, Kai Großjohann, emacs-devel
storm@cua.dk (Kim F. Storm) writes:
> If a user is searching for some command which does something "useful",
> it is already quite hard to guess the terms emacs may be using to
> accomplish a given task (e.g. some novice users may search for "change
> file" when they really should look for "switch buffer").
I wonder if the `apropos keyword' command being discussed could maintain
a list of common `equivalents', and try substituting some if the
original apropos doesn't return anything useful (or maybe even if
returns only a few matches).
E.g., it might group (`file', `buffer', and `document') together, and
(`switch', `change', select', `open', `find' together), so a user that
searches for `open document' would find both `find-file' and
`switch-to-buffer'.
Maybe this would end up generating too many false positives, though.
-Miles
--
Quidquid latine dictum sit, altum viditur.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-12 0:57 Apropos commands and regexps Kim F. Storm
2002-05-12 5:28 ` Eli Zaretskii
2002-05-12 10:06 ` Kai Großjohann
@ 2002-05-16 11:04 ` Kai Großjohann
2002-05-16 12:30 ` Eli Zaretskii
` (4 more replies)
2002-05-23 21:41 ` Kim F. Storm
3 siblings, 5 replies; 56+ messages in thread
From: Kai Großjohann @ 2002-05-16 11:04 UTC (permalink / raw)
Cc: emacs-devel
storm@cua.dk (Kim F. Storm) writes:
> Wouldn't it be simpler (for a novice user -- and for advanced users
> too) to simply write one or more words (substrings) and then search
> for all combinations of those words (substrings) in the relevant list.
There has been a lengthy discussion on this. Since my research area
is Information Retrieval, maybe I should say something about this...
IMHO, the long-term goal for searching the Emacs documentation should
look similar to the following:
The user enters a query. The system searches items of documentation
and computes a score for each one. Then the items with the highest
score come out first.
Now we need to decide on the query language (query format), and we
need to decide on the method of computing a score for each item of
documentation.
For the query language, I see these possibilities:
* List of words.
Here, items containing all words will come out first, followed by
items with all but one word, and so on. The presence or absence of
a very common word has less effect on the score than the presence
or absence of an unusual word. (Extreme example: if the user types
"A B C" and there is only one item of documentation matching C but
A and B are very common, then that item should come out first.)
* List of words, with optional "+" or "-" prefixes.
The idea is that words prefixed with + must occur, whereas words
prefixed with - must not occur. The presence or absence of
unprefixed words just raises/lowers the score, as in the first
alternative.
* Boolean expression with parentheses and AND, OR, NOT connectives.
Actually, these connectives just combine the individual scores.
For example, if some item has score X w.r.t. query A and score Y
w.r.t. query B, then the score for A AND B could be X * Y, the
score for A OR B could be X + Y - (X * Y), and so on. These
formulas come from a probabilistic interpretation of the scores
(which are assumed to be between 0 and 1). But other useful
formulas could be found, with different theoretical foundations.
Maybe people can suggest other possibilities.
And then we need to compute the scores for each individual "word" in
the query. The Emacs documentation has a complex structure (for the
traditional Information Retrieval crowd anyway, who looks at
retrieving "documents" where each such document is just a sequence of
terms). A number of rules come to my mind that might be useful to
implement:
* If the word occurs in a command/function/variable name, then the
score should be higher than a match in the docstring (or other
explanatory text) only.
* If the word does not occur at all, but a synonym of the word does,
the item should match (perhaps with a lowered score).
* Instead of just synonyms, also consider more general terms,
more specific terms, related terms.
* If the word does not occur, but a derived form does, then the item
should match (perhaps with a lowered score). So "mouse" should
find "mice" and so on. The Porter stemming algorithm appears to be
a useful thing here.
* I guess that "igrep" should be considered a "derived form" of "grep"
in the context of the Emacs documentation. Do we do this with an
explicit synonym list? Or perhaps with a metric of similarity
between terms which is based on editing distance or suchlike?
And then, there is the issue of where to look when searching.
Sometimes, people want to search in the docstrings, sometimes only in
the command names.
When searching in info files, some subdivision makes sense I think.
Should each node be considered a retrievable item, or is another
subdivision more sensible?
The above should be interpreted as a source of ideas. I think I
would like to implement most of the mechanisms someday, but who knows
when I will be able to do that. You just select from this something
which appears useful and implement that. I haven't covered result
presentation at all, yet.
For instance, a cool replacement (complement?) for M-x apropos RET
could be something that splits the query and each Lisp symbol into
words. So now the query is a set of words and the Lisp symbol is a
set of words. The score for the Lisp symbol would be the number of
elements in the intersection of these two sets. Sort the result by
decreasing score. (Also offer to sort by name of symbol, while
displaying the score.)
What do you think?
kai
--
Silence is foo!
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-16 11:04 ` Kai Großjohann
@ 2002-05-16 12:30 ` Eli Zaretskii
2002-05-16 13:05 ` D. Goel
` (3 subsequent siblings)
4 siblings, 0 replies; 56+ messages in thread
From: Eli Zaretskii @ 2002-05-16 12:30 UTC (permalink / raw)
Cc: emacs-devel
> From: Kai.Grossjohann@CS.Uni-Dortmund.DE
> Date: Thu, 16 May 2002 13:04:13 +0200
>
> The user enters a query. The system searches items of documentation
> and computes a score for each one. Then the items with the highest
> score come out first.
There is an alternative approach:
The user enters a query. The system does the search and presents a
menu of possible refinements of the original search spec. The user
chooses one of the possibilities, and the process repeats, until the
list of possible hits is shorter than some predefined value; when
that happens, the list of hits is displayed.
The advantage of this method is twofold:
- You don't need to invent a good scoring system.
- The user never needs to wade through gobs of hits, trying to
figure out which one is relevant to his/her query.
(If this description doesn't explain the suggestion, I can craft a
ficticious example that might help.)
I generally find scoring a poor means for me to decide whether the hit
is relevant. When I google, for example, I find myself examining the
search words shown with surrounding text much more than looking at the
scores. But if the number of hits shown is large, my method is not
very efficient, and can be even frustrating; thus the suggestion for
interactively refining the search before showing the hits.
> When searching in info files, some subdivision makes sense I think.
> Should each node be considered a retrievable item, or is another
> subdivision more sensible?
I'd say, as the first approximation, search node names, chapter/section
names, index entries, and glossary items.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-16 11:04 ` Kai Großjohann
2002-05-16 12:30 ` Eli Zaretskii
@ 2002-05-16 13:05 ` D. Goel
2002-05-16 22:37 ` Alex Schroeder
` (2 subsequent siblings)
4 siblings, 0 replies; 56+ messages in thread
From: D. Goel @ 2002-05-16 13:05 UTC (permalink / raw)
Cc: emacs-devel
> What do you think?
IMHO, cool idea.. i do love it when the search-engine does smart stuff
for you using a lot of rules and intelligent guesses.. and various
combinations of what you typed.. in fact, i revel in programming
similar stuff for my company..
but, IMHO, that should be an alternative, and not replacement for
existing functionality.. there should always be a way where i can ask
the good old apropos to do 'precisely' what i ask it to do
viz. precisely match the regexp i asked it to match.. and not make
smart guesses of its own..
Have a good day,
D <http://www.glue.umd.edu/~deego/>
--
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-16 11:04 ` Kai Großjohann
2002-05-16 12:30 ` Eli Zaretskii
2002-05-16 13:05 ` D. Goel
@ 2002-05-16 22:37 ` Alex Schroeder
2002-05-16 22:44 ` Kim F. Storm
2002-05-17 19:28 ` Richard Stallman
4 siblings, 0 replies; 56+ messages in thread
From: Alex Schroeder @ 2002-05-16 22:37 UTC (permalink / raw)
Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:
> What do you think?
I think you should implement it! :)
Alex.
--
http://www.electronicintifada.net/diaries/index.html
http://www.us-israel.org/jsource/US-Israel/hr2506c.html
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-16 11:04 ` Kai Großjohann
` (2 preceding siblings ...)
2002-05-16 22:37 ` Alex Schroeder
@ 2002-05-16 22:44 ` Kim F. Storm
2002-05-17 19:28 ` Richard Stallman
4 siblings, 0 replies; 56+ messages in thread
From: Kim F. Storm @ 2002-05-16 22:44 UTC (permalink / raw)
Cc: emacs-devel
Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann) writes:
> The user enters a query. The system searches items of documentation
> and computes a score for each one. Then the items with the highest
> score come out first.
>
> What do you think?
Sounds really good to me!
But I still think we could start with the simple "hack" I proposed.
--
Kim F. Storm <storm@cua.dk> http://www.cua.dk
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-16 11:04 ` Kai Großjohann
` (3 preceding siblings ...)
2002-05-16 22:44 ` Kim F. Storm
@ 2002-05-17 19:28 ` Richard Stallman
2002-05-18 6:26 ` Eli Zaretskii
4 siblings, 1 reply; 56+ messages in thread
From: Richard Stallman @ 2002-05-17 19:28 UTC (permalink / raw)
Cc: storm, emacs-devel
For the query language, I see these possibilities:
* List of words.
Here, items containing all words will come out first, followed by
items with all but one word, and so on. The presence or absence of
a very common word has less effect on the score than the presence
or absence of an unusual word.
I think that is the best way to handle the argument. But the search
for these words should allow them to be substrings of words. It should
not require an exact match against an entire word in the command name.
* If the word occurs in a command/function/variable name, then the
score should be higher than a match in the docstring (or other
explanatory text) only.
I am not sure it is worth distinguishing. If the user says to look
at the doc string, treat it as equally important.
* If the word does not occur at all, but a synonym of the word does,
the item should match (perhaps with a lowered score).
The synonym may as well have the same score. We don't need
a feature to make it different.
* Instead of just synonyms, also consider more general terms,
more specific terms, related terms.
That would match too much, so I recommend against writing it.
* If the word does not occur, but a derived form does, then the item
should match (perhaps with a lowered score). So "mouse" should
find "mice" and so on. The Porter stemming algorithm appears to be
a useful thing here.
These plurals can be defined as synonyms, so this is not needed
as a separate feature.
* I guess that "igrep" should be considered a "derived form" of "grep"
in the context of the Emacs documentation. Do we do this with an
explicit synonym list? Or perhaps with a metric of similarity
between terms which is based on editing distance or suchlike?
Substring matching will handle this with no extra features.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-17 19:28 ` Richard Stallman
@ 2002-05-18 6:26 ` Eli Zaretskii
2002-05-19 5:30 ` Richard Stallman
0 siblings, 1 reply; 56+ messages in thread
From: Eli Zaretskii @ 2002-05-18 6:26 UTC (permalink / raw)
Cc: emacs-devel
> From: Richard Stallman <rms@gnu.org>
> Date: Fri, 17 May 2002 13:28:32 -0600 (MDT)
>
> But the search
> for these words should allow them to be substrings of words. It should
> not require an exact match against an entire word in the command name.
Substring search might yield too many hits, I'm afraid. I'd suggest
instead to use a list of synonyms; e.g., `buffer' might have
`iswitchb' and `ibuffer' as its synonyms. A list maintained by humans
should be more accurate than a substring search done by a program.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-18 6:26 ` Eli Zaretskii
@ 2002-05-19 5:30 ` Richard Stallman
0 siblings, 0 replies; 56+ messages in thread
From: Richard Stallman @ 2002-05-19 5:30 UTC (permalink / raw)
Cc: emacs-devel
> But the search
> for these words should allow them to be substrings of words. It should
> not require an exact match against an entire word in the command name.
Substring search might yield too many hits, I'm afraid.
No more than Apropos gets now. Anyway, it is very useful. Let's try
it this way (i.e., no change in this regard) first; we could change
it later if necessary.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Apropos commands and regexps
2002-05-12 0:57 Apropos commands and regexps Kim F. Storm
` (2 preceding siblings ...)
2002-05-16 11:04 ` Kai Großjohann
@ 2002-05-23 21:41 ` Kim F. Storm
3 siblings, 0 replies; 56+ messages in thread
From: Kim F. Storm @ 2002-05-23 21:41 UTC (permalink / raw)
I have now added keyword based searching to all the apropos commands.
I have incorporated a number of suggestions on this list, notably
(simple) scoring and matching of synonyms for commonly used terms.
IMHO, it actually works better than I had expected!
So now you can now enter
C-h a open file RET
and learn about find-file (as the first choice).
And C-h a cut line RET tells you about kill-line!
M-x apropos-documentation RET mouse jump away RET
actually finds mouse-avoidance-mode for you.
There may be some rough edges, but please try it out and
tell me what you think about it.
--
Kim F. Storm <storm@cua.dk> http://www.cua.dk
^ permalink raw reply [flat|nested] 56+ messages in thread
end of thread, other threads:[~2002-05-23 21:41 UTC | newest]
Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-12 0:57 Apropos commands and regexps Kim F. Storm
2002-05-12 5:28 ` Eli Zaretskii
2002-05-12 5:38 ` Eli Zaretskii
2002-05-13 1:40 ` Miles Bader
2002-05-13 19:18 ` Kim F. Storm
2002-05-14 5:55 ` Miles Bader
2002-05-13 19:11 ` Kim F. Storm
2002-05-14 5:38 ` Miles Bader
2002-05-15 7:00 ` Richard Stallman
2002-05-15 11:23 ` Miles Bader
2002-05-15 21:59 ` Kim F. Storm
2002-05-16 1:26 ` Miles Bader
2002-05-16 22:26 ` Kim F. Storm
2002-05-16 21:38 ` Stefan Monnier
2002-05-17 11:59 ` Kai Großjohann
2002-05-18 18:48 ` Richard Stallman
2002-05-18 22:24 ` Stefan Monnier
2002-05-19 12:02 ` Kai Großjohann
2002-05-19 14:50 ` Eli Zaretskii
2002-05-19 15:23 ` Kai Großjohann
2002-05-19 19:40 ` Richard Stallman
2002-05-19 23:33 ` Kim F. Storm
2002-05-20 9:50 ` Alex Schroeder
2002-05-16 21:58 ` Miles Bader
2002-05-17 12:01 ` Kai Großjohann
2002-05-17 21:56 ` Kim F. Storm
2002-05-18 6:31 ` Eli Zaretskii
2002-05-18 22:47 ` Stefan Monnier
2002-05-17 6:15 ` Eli Zaretskii
2002-05-17 11:58 ` Kai Großjohann
2002-05-16 4:54 ` Eli Zaretskii
2002-05-16 22:10 ` Kim F. Storm
2002-05-16 21:20 ` Miles Bader
2002-05-17 6:13 ` Eli Zaretskii
2002-05-18 18:49 ` Richard Stallman
2002-05-19 4:51 ` Eli Zaretskii
2002-05-19 19:40 ` Richard Stallman
2002-05-19 23:29 ` Kim F. Storm
2002-05-20 3:31 ` Eli Zaretskii
2002-05-16 20:24 ` Richard Stallman
2002-05-15 21:55 ` Kim F. Storm
2002-05-16 4:52 ` Eli Zaretskii
[not found] ` <5xbsbf4thx.fsf@kfs2.cua.dk>
2002-05-17 6:22 ` Eli Zaretskii
2002-05-12 10:06 ` Kai Großjohann
2002-05-12 17:03 ` Alex Schroeder
2002-05-13 19:26 ` Kim F. Storm
2002-05-14 5:26 ` Miles Bader
2002-05-16 11:04 ` Kai Großjohann
2002-05-16 12:30 ` Eli Zaretskii
2002-05-16 13:05 ` D. Goel
2002-05-16 22:37 ` Alex Schroeder
2002-05-16 22:44 ` Kim F. Storm
2002-05-17 19:28 ` Richard Stallman
2002-05-18 6:26 ` Eli Zaretskii
2002-05-19 5:30 ` Richard Stallman
2002-05-23 21:41 ` Kim F. Storm
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.