* Why do apropos commands match only pairs of words in a word-list pattern?
@ 2015-05-09 19:16 Drew Adams
2015-05-11 5:21 ` Nicolas Richard
0 siblings, 1 reply; 6+ messages in thread
From: Drew Adams @ 2015-05-09 19:16 UTC (permalink / raw)
To: Emacs-Devel (emacs-devel@gnu.org)
If you type a list of words to match, instead of typing a regexp,
to a command such as `apropos', each word is not matched against
the candidates and then the intersection of those match sets
retained.
Instead of matching each word in the list you provide it, apropos
commands match each pair of words from the list.
For example, if you type `foo bar toto' then all matches of `foo'
& `bar' (in either order) are retained, plus all matches of `bar'
& `toto', plus all matches of `foo' & `toto'. So for instance, a
candidate `some-bar-foo-thing' is retained, even though it does
not also match `toto' - it is enough that it matches both `foo'
and `bar'.
Why is this the design? Wouldn't users more typically want *each*
of the words they type to be matched?
Is this perhaps only because the existing code before introducing
word-list patterns provided for using a regexp, and in order to
bolt word-list matching onto that existing code it was thought to
be easier to just come up with a single regexp to match, instead
of handling the word-list case as an intersection of separate
matches?
IOW, was this just an implementation decision, or is there some
more important reason for it, from a user point of view?
The behavior is documented, in (emacs)`Apropos', as follows:
When you specify more than one word in the apropos pattern,
a name must contain at least two of the words in order to match.
No reason given there as to why this would be behavior you might
want or expect. And beyond that brief description, there is
only this comment in the apropos.el code:
;; We don't actually make a regexp matching all permutations.
;; Instead, for e.g. "a b c", we make a regexp matching
;; any combination of two or more words like this:
;; (a|b|c).*(a|b|c) which may give some false matches,
;; but as long as it also gives the right ones, that's ok.
That tells what happens, but not why this choice was made.
And the last line almost sounds like an apology, as if this
is not ideal but it is generally OK, since although extra
junk is included at least we don't missing any sought matches.
IOW, it sounds like, even though you really want only matches
of all three: `foo' & `bar' & `toto', we think it's OK if you
get additional, false positives such as `some-bar-foo-thing',
as long as you also get all true positives (such as
`a-bar-toto-foo-thing').
Is this the right behavior? If so, why - what am I missing?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Why do apropos commands match only pairs of words in a word-list pattern?
2015-05-09 19:16 Why do apropos commands match only pairs of words in a word-list pattern? Drew Adams
@ 2015-05-11 5:21 ` Nicolas Richard
2015-05-11 14:06 ` Drew Adams
0 siblings, 1 reply; 6+ messages in thread
From: Nicolas Richard @ 2015-05-11 5:21 UTC (permalink / raw)
To: Drew Adams; +Cc: Emacs-Devel (emacs-devel@gnu.org)
Drew Adams <drew.adams@oracle.com> writes:
> Instead of matching each word in the list you provide it, apropos
> commands match each pair of words from the list.
> Why is this the design? Wouldn't users more typically want *each*
> of the words they type to be matched?
My own experience is that I both sometimes liked and sometimes hated the
behaviour. Often the latter, though. I think it would be nice to sort by
relevance (e.g. the number of words that matched). How easy/difficult
would that be ?
--
Nicolas
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: Why do apropos commands match only pairs of words in a word-list pattern?
2015-05-11 5:21 ` Nicolas Richard
@ 2015-05-11 14:06 ` Drew Adams
2015-05-11 14:51 ` Nicolas Richard
2015-05-11 14:59 ` Eli Zaretskii
0 siblings, 2 replies; 6+ messages in thread
From: Drew Adams @ 2015-05-11 14:06 UTC (permalink / raw)
To: Nicolas Richard; +Cc: Emacs-Devel (emacs-devel@gnu.org)
> > Instead of matching each word in the list you provide it, apropos
> > commands match each pair of words from the list.
>
> > Why is this the design? Wouldn't users more typically want *each*
> > of the words they type to be matched?
>
> My own experience is that I both sometimes liked and sometimes hated
> the behaviour. Often the latter, though. I think it would be nice to
> sort by relevance (e.g. the number of words that matched). How
> easy/difficult would that be ?
That's not the answer, IMO. Better is to give users control over the
behavior. A user option is one approach.
(I've done that in my library `apu.el': `apu-match-word-pairs-only-flag',
http://www.emacswiki.org/emacs/download/apu.el).
Another possibility is to have an option to define the default behavior,
but to let users decide immediately which behavior to get when they use
the command. That is the approach taken by apropos commands for DO-ALL.
That's the way to go, I think.
It's not just about ordering things. Order is a separate choice axis,
and yes, users should be able to order the output in different ways.
But simply always combining the two matching approaches mentioned, and
relegating the "looser" pair-matching candidates to the end of the
buffer is not a good design. (IMHO.)
But I would still like to hear from someone who gives a good reason
for the current design. You've said that you sometimes like it, but
that doesn't tell why. And why pairs and not triplets or...?
My guess so far is that this is just historical - a vestige of the
fact that apropos was implemented to use a regexp, so we cobbled
together a regexp that, while not doing what one would expect for
keyword matching, at least covers all of the true positives, even
if it also throws in a lot of false positives.
But I would like to hear arguments of why this is TRT for apropos.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Why do apropos commands match only pairs of words in a word-list pattern?
2015-05-11 14:06 ` Drew Adams
@ 2015-05-11 14:51 ` Nicolas Richard
2015-05-11 14:59 ` Eli Zaretskii
1 sibling, 0 replies; 6+ messages in thread
From: Nicolas Richard @ 2015-05-11 14:51 UTC (permalink / raw)
To: Drew Adams; +Cc: Nicolas Richard, Emacs-Devel (emacs-devel@gnu.org)
>> My own experience is that I both sometimes liked and sometimes hated
>> the behaviour.
> but
> that doesn't tell why. And why pairs and not triplets or...?
I liked it because it just worked. I threw keywords at it, I was lucky
enough to get a useful answer quickly. I know, that's not a very
interesting reason...
--
Nicolas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Why do apropos commands match only pairs of words in a word-list pattern?
2015-05-11 14:06 ` Drew Adams
2015-05-11 14:51 ` Nicolas Richard
@ 2015-05-11 14:59 ` Eli Zaretskii
1 sibling, 0 replies; 6+ messages in thread
From: Eli Zaretskii @ 2015-05-11 14:59 UTC (permalink / raw)
To: Drew Adams; +Cc: theonewiththeevillook, emacs-devel
> Date: Mon, 11 May 2015 07:06:01 -0700 (PDT)
> From: Drew Adams <drew.adams@oracle.com>
> Cc: "Emacs-Devel \(emacs-devel@gnu.org\)" <emacs-devel@gnu.org>
>
> But I would still like to hear from someone who gives a good reason
> for the current design.
In general, when such questions arise, I suggest to find out when was
the related code introduced, and then search emacs-devel and
gnu-emacs-bug for relevant discussions.
In this case, the design was discussed in this long thread:
http://lists.gnu.org/archive/html/emacs-devel/2002-05/msg00397.html
You will see that the heuristic in question did get some attention.
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: Why do apropos commands match only pairs of words in a word-list pattern?
[not found] ` <<83lhgvma79.fsf@gnu.org>
@ 2015-05-11 17:01 ` Drew Adams
0 siblings, 0 replies; 6+ messages in thread
From: Drew Adams @ 2015-05-11 17:01 UTC (permalink / raw)
To: Eli Zaretskii, Drew Adams; +Cc: theonewiththeevillook, emacs-devel
> the design was discussed in this long thread:
> http://lists.gnu.org/archive/html/emacs-devel/2002-05/msg00397.html
> You will see that the heuristic in question did get some attention.
Thanks for the reference. I'll pull out what I see as a summary, with
some comments from me.
The arguments given there in favor of the current, 2-or-more, design and
against a straightforward AND design boiled down to these 3, all from Kim:
1. Emacs has few return hits anyway.
For WEB search engines, I think AND does make sense -- since there
are SOOOO many pages to match. But for a limited universe like
emacs -- which doesn't always use the most obvious terms --
using AND doesn't make a lot of sense to me.
2. It's good enough.
I think it is adequate in practice.
3. It is more helpful when you don't know exactly what you're looking for.
[it] has a more "novice" appeal: if don't know what a specific
function is called, it will be easier to enter a few more alternatives,
and see what turns up. -- it specifying more words returns more
alternatives.
matching at least two keywords will find all the entries found by
searching for all combinations - and it may find some entries the
user didn't think about
My response to these arguments:
1. It's not clear to me that the "limited universe" of Emacs is so
limited that it is helpful to include the noise of false positives.
2. And that "adequate in practice" argument echoes the more-or-less
apologetic comment in the code that suggests that the design is
not ideal (not really what we want) but is probably OK in general.
3. And I think it is a mistake to try to be "smart", guessing that
what's best for a novice by using "dumb" matching. If you want
to try to be smart then you need to do something more/other than
just return all matches of any two of the words.
More importantly, as I said, I think this should be a user choice,
not just a design-time choice. Even Kim suggested user choice:
We could put a "button bar" at the top of the apropos output with
the following buttons:
[Match all words] [anchored match] [search documentation]
(No such user choice was ever implemented, AFAIK.)
Back to my summary of the thread -
A certain Eli Z came out clearly in favor of AND, and against OR'd
pairwise (AND) matches:
Perhaps that's because they want to show off the number of hits
they return. I was always annoyed by ORing, and many times catch
myself forgetting to type the magic that makes it do an AND.
Because I always want the AND method.
and
> I don't like the "and" approach -- at least not as the default.
I'm afraid anything else will bring too many hits. A docs search
tool that returns gobs of information is not very useful, in my
experience.
and
I'm afraid this rule will bring many false hits, and I think we
should beware of that as the plague.
Kim then backed off a bit from pairwise matching:
if matching only two words gives too many matches for documentation,
require three (or four) matching words.
To which Mr Z said:
a rule based on the number of matched keywords is not good enough,
since sometimes even one word is enough to yield a very accurate
result.
Miles said (and Mr Z agreed):
I think it's clear that we need a bit of experience with this
stuff, so we can see how well the various alternatives actually
work in practice, rather than sitting around pontificating...
Well, we don't seem to have experimented with different approaches,
but rather have just gone with OR'd AND'd pair matches. In the end,
RMS decided that pairwise matching "seems more useful", Kim
implemented it, and that was that.
Note that Kai G mentioned what Nicolas R suggested recently: It's OK
to return tons of hits, including noise, if you sort by relevance.
And RMS said "that is the best way to handle the argument".
But as I said before, a major problem with that approach is that it
interferes with other sort possibilities. If you have many hits, most
of which are noise, the *only* order in which the noise can reasonably
be ignored is relevance (e.g. more AND matches first, fewer later).
You cannot reorder those alphabetically, or by putting all function
names first, then variable names, or any other meaningful order.
Doing that would push all the noise throughout the list of hits.
IOW, a high-noise (high recall, low precision) return set requires
an ordering that keeps the noise farther from immediate view. Users
deserve to be able to use different sort orders, and the approach
of noise-might-help-sometimes-&-costs-nothing-if-far-from-view
interferes with sorting.
Finally, Eli mentioned also the possibility that is used in Icicles
and probably some other completion UIs: let the user progressively
refine the set of returned hits.
The user enters a query. The system does the search and presents
a menu of possible refinements of the original search spec. The
user chooses one of the possibilities, and the process repeats,
until the list of possible hits is shorter than some predefined
value; when that happens, the list of hits is displayed.
The user never needs to wade through gobs of hits, trying to
figure out which one is relevant to his/her query.
Whether the hits returned at each refinement stage are displayed
or not is not the question (IMO). In Icicles, a user can choose
whether to see the hits at each stage. But it is generally useful,
IMO, to show them, even when there are many. That doesn't imply
that a user must "wade through" them, but s?he can get an idea
of what's there - and that helps guide upcoming refinement patterns.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-05-11 17:01 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-09 19:16 Why do apropos commands match only pairs of words in a word-list pattern? Drew Adams
2015-05-11 5:21 ` Nicolas Richard
2015-05-11 14:06 ` Drew Adams
2015-05-11 14:51 ` Nicolas Richard
2015-05-11 14:59 ` Eli Zaretskii
[not found] <<eb17cadb-6235-4c3f-919a-6ca8dcc1da6d@default>
[not found] ` <<87pp671ygs.fsf@yahoo.fr>
[not found] ` <<fb8ad237-c97f-4ec9-94b4-6937e4d01abc@default>
[not found] ` <<83lhgvma79.fsf@gnu.org>
2015-05-11 17:01 ` Drew Adams
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.