From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Drew Adams Newsgroups: gmane.emacs.devel Subject: RE: new-flex-completion-style Date: Thu, 14 Feb 2019 16:34:38 +0000 (UTC) Message-ID: <1f4513ab-cd39-4543-9b1a-743e1307dd54@default> References: <20190202232827.27331.87300@vcs0.savannah.gnu.org> <20190202232828.4AE452159A@vcs0.savannah.gnu.org> <87lg2mynrg.fsf@gmail.com> <871s4czm5n.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="252896"; mail-complaints-to="usenet@blaine.gmane.org" Cc: Stefan Monnier , emacs-devel To: Daniel Pittman , =?utf-8?B?Sm/Do28gVMOhdm9yYQ==?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Feb 14 17:50:42 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1guKDh-0013cj-KL for ged-emacs-devel@m.gmane.org; Thu, 14 Feb 2019 17:50:41 +0100 Original-Received: from localhost ([127.0.0.1]:51384 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1guKDg-0003oK-Fx for ged-emacs-devel@m.gmane.org; Thu, 14 Feb 2019 11:50:40 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:47263) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1guK9t-0000YQ-P3 for emacs-devel@gnu.org; Thu, 14 Feb 2019 11:46:47 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1guJyL-0007GE-DF for emacs-devel@gnu.org; Thu, 14 Feb 2019 11:34:51 -0500 Original-Received: from userp2130.oracle.com ([156.151.31.86]:54596) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1guJyJ-0007Ct-M0 for emacs-devel@gnu.org; Thu, 14 Feb 2019 11:34:48 -0500 Original-Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x1EG9CEo066920; Thu, 14 Feb 2019 16:34:43 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=mime-version : message-id : date : from : sender : to : cc : subject : references : in-reply-to : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=M03G0HtdbcWijCFEQw5pNb4F9E8PVfxL4fAYzqttJbY=; b=WuBDolzZk0eY8iTWY8G5a57FR7NpJTzePKvNp1f3cDmsGBvcAsTPSPvp8WurnrSLHFUg FER04V+CCtI/fWCY8gtoD1ut4Ix5J+oTsGC/Pn4sEyPfTDLtDwIMOSb/qz+BHhKUG3je dVsKXdjp1yga+VHi33rKJ+TTgXLv611eOQaaDNDGHvAa0Efw4um1OQ6BIfVWsYgxoPSm PLbJBHLh+oE1T2vjbK/n02dTcCSabKjjwWf9dTf5fB6Yk9D44qXT8I5sGoKuAIHYSxVQ ZCdyjR3wc5l77SlKBcjkG8NLhO07dYmPpfxQQ0UI12BG7Orjax1ujBiIOI27YawsXXNv vA== Original-Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp2130.oracle.com with ESMTP id 2qhreks2x3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 14 Feb 2019 16:34:42 +0000 Original-Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id x1EGYfnf026067 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 14 Feb 2019 16:34:41 GMT Original-Received: from abhmp0009.oracle.com (abhmp0009.oracle.com [141.146.116.15]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x1EGYd2T022320; Thu, 14 Feb 2019 16:34:39 GMT In-Reply-To: X-Priority: 3 X-Mailer: Oracle Beehive Extensions for Outlook 2.0.1.9.1 (1003210) [OL 16.0.4810.0 (x86)] X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9167 signatures=668683 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1902140111 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [generic] X-Received-From: 156.151.31.86 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:233333 Archived-At: > The=C2=A0Levenshtein distance and close variants generally do > a reasonable job of approximating human expectations, yes. > Close variants (eg: only delete/insert, no mutate) also > work well. > > The cost of building the Levenshtein edit distance matrix > for picking or ordering candidates is quite high, though;=C2=A0... > gives a good summary, and you really don't want to use the > most naive option if you can avoid it.=C2=A0 Computation costs > go through the roof very fast; the cost is roughly > proportional to the length of the candidate cubed, using the > brute force approach. > > The other shortfall of using only edit distance is that most > of the time people expect to front-weight the search: you > may have a closer edit distance match later in the string, > but people usually want a less exact match, closer to the > start, intuitively. > > > I've been experimenting with a simpler function that just > > counts the number of "holes" and the length of those holes > > separately in the denominator. The numerator is the same > > and a perfect match is still a 1. It seems to fare better > > for your cases. For foo > > I'd very, very strongly encourage y'all to take a look at > the existing work on the problem, not least because of the > algorithmic complexity costs when you apply this to a large > corpus of candidates.=C2=A0 Consider, for example, completion > over the set of all defined symbols in Emacs, which is > ~ 55,000 candidates, and IIRC I calculated the average > length to 9 characters, so brute force matching would be > be on the order of=C2=A040 **million** comparisons for the first > pattern character, though thankfully dropping fast as you > eliminate non-matching candidates entirely.=C2=A0 (Raw comparisons, > accounting for all possible matches in all possible candidate, > of that pattern, so each candidate gets > $length / $pattern_length comparisons performed.) > > Anyway, the point is: this is an area where serious study > has been invested, and I strongly urge you to take advantage > of that, rather than trying to reinvent this surprisingly > complex wheel.=C2=A0 This isn't simple to do, and it definitely > isn't simple to do in a high performance way. > > PS: whatever else you do, make it trivial to configure what > function is used for ordering the candidate results.=C2=A0 That > matters far more that the details of the default match / > order choice, because it allows easy innovation over time. > > PPS: the best modern tools seem to be derived from the > problem of genome sequence matching, but heuristics are > definitely helpful too.=C2=A0 Again, fzf is the best version of > that I have found. All good info; thanks. +1 for (1) emphasizing the cost in general/naive cases and for (2) saying "make it trivial to configure" the sort order (function). Jo=C3=A3o's (first) reply to you, saying that an interactive Emacs context (e.g. completion) has needs that can be a bit different, is also relevant. FWIW, here is how Icicles uses (2 kinds of) Levenshtein matching: https://www.emacswiki.org/emacs/Icicles_-_Completion_Methods_and_Styles#Lev= enshteinCompletion And here is how it uses a Jaro-Winkler matching: https://www.emacswiki.org/emacs/Icicles_-_Completion_Methods_and_Styles#Jar= oWinklerMatchCompletion