From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: =?iso-8859-1?Q?Jo=E3o_T=E1vora?= Newsgroups: gmane.emacs.devel Subject: Re: new-flex-completion-style Date: Wed, 13 Feb 2019 17:29:30 +0000 Message-ID: References: <20190202232827.27331.87300@vcs0.savannah.gnu.org> <20190202232828.4AE452159A@vcs0.savannah.gnu.org> <87lg2mynrg.fsf@gmail.com> <871s4czm5n.fsf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="199631"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (windows-nt) Cc: emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Feb 13 18:29:49 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1gtyM0-000ppn-Vz for ged-emacs-devel@m.gmane.org; Wed, 13 Feb 2019 18:29:49 +0100 Original-Received: from localhost ([127.0.0.1]:32771 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gtyLz-0005V7-Tl for ged-emacs-devel@m.gmane.org; Wed, 13 Feb 2019 12:29:47 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:58731) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gtyLq-0005Ug-Oh for emacs-devel@gnu.org; Wed, 13 Feb 2019 12:29:39 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gtyLp-0006AH-ME for emacs-devel@gnu.org; Wed, 13 Feb 2019 12:29:38 -0500 Original-Received: from mail-wm1-x32f.google.com ([2a00:1450:4864:20::32f]:39502) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gtyLn-00066d-HG for emacs-devel@gnu.org; Wed, 13 Feb 2019 12:29:37 -0500 Original-Received: by mail-wm1-x32f.google.com with SMTP id f16so3305059wmh.4 for ; Wed, 13 Feb 2019 09:29:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:in-reply-to:references:user-agent:date :message-id:mime-version:content-transfer-encoding; bh=ziVQtzY6GjiqUbuyexqn3WSdlcf5BzpTig1ztVF5J1s=; b=uIs9RqqlG+yZfh3BKhPgldAFeDL++JOhoM/zk1xVchFCkWLmUJQwRtO1X6jPFGtoWk 8lYF3NLMaot7PnF+CU2TJgu0zbM6+haf1AqzJouzvmdODDo3otMvQJhFt3epNmO2GjNz yg0w9xdRXy++MOgMzfuHoSxnvRR2AXw+UAHXvUATH5O8obWMtMj2hUrs5L/kAgHqhRPF Gw3hnM///MovemL4JjhzmZR1tQeamZxBruiBJTkDeXAN0Dr15oGWGZ5OCUzjHU5hEcpw guQfihUGwc1VcPLIvCYtW6Vi6/TNn7kF3BY0tBeISDHQgnsRVIrI4hVVyHW/VFVpHb/X TCjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references :user-agent:date:message-id:mime-version:content-transfer-encoding; bh=ziVQtzY6GjiqUbuyexqn3WSdlcf5BzpTig1ztVF5J1s=; b=rJdLvEnbvWuBasL9j5sRMG0rIspnd3XGrnngeW3P4xFAZu4+RAeUarhBHHThfMvSI0 ZZzLh33e3YTfma113WHLtVKYag2xKMdNe9LffQX6s9TTGWvCd9Of/+54d1XAzS8rSp1e NvDSXMAffpjkibViHn60Q8I/oDfl/RbkO52FLlamGPWMwaXIkQoiMXsKjuiJwBRimYhw ki19zRJzWCXghtM0z2HItfKfQwiTtWMwxkWK3fpSp7BLmj3ab/Ftc5KalL0UAjASMKb7 ebGsoK6FUzJd4UB/N1HuPbbq2iOncxNRttTJnqEaUztG3QplqEZzcpAZFHlXRFJny9Zu rXvg== X-Gm-Message-State: AHQUAuag7TwFbrAWvc7MnNDiauICEcAA2NY3nbpJVITEtEOuNEudFb0i JMVAeW/bBsVm0F2Jf+Fqw+EvDF+C X-Google-Smtp-Source: AHgI3IYTgXeKU8fTm+oHtSCxLbOzQtWOWjwmuFzbS3CEmzQNiYY8O5WPzQA7MuXpgU2+5ypbV6RDqw== X-Received: by 2002:a1c:5656:: with SMTP id k83mr1168334wmb.125.1550078973319; Wed, 13 Feb 2019 09:29:33 -0800 (PST) Original-Received: from GONDOMAR.yourcompany.com (mail3.siscog.pt. [195.23.29.18]) by smtp.gmail.com with ESMTPSA id j5sm2969963wrw.59.2019.02.13.09.29.32 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 13 Feb 2019 09:29:32 -0800 (PST) In-Reply-To: <871s4czm5n.fsf@gmail.com> (=?iso-8859-1?Q?=22Jo=E3o_T=E1vora?= =?iso-8859-1?Q?=22's?= message of "Tue, 12 Feb 2019 22:17:56 +0000") X-Antivirus: AVG (VPS 190213-2, 13-02-2019), Outbound message X-Antivirus-Status: Clean X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::32f X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:233283 Archived-At: Jo=E3o T=E1vora writes: > I'll take a look at this new batch, do the obvious fixes. But before > submitting anything sorting-related, I'd like Dmitry to show a patch for > his sorting idea on top of scratch/new-flex-completion-style, try it out > a bit and then report back here. >> + (setq completions >> + (sort completions >> + (lambda (a b) >> + (let ((va (get-text-property 0 'completion-styl= e-sort-order a)) >> + (vb (get-text-property 0 'completion-styl= e-sort-order b))) >> + (if (and va vb) (< va vb) va))))) > > This will signal an error when one of the completions is the empty > string :-( I've removed this for now, I.e. there's absolutely no sorting yet before we agree on who should be doing it and where. But why would the empty string be there anyway? >> @@ -3056,22 +3067,41 @@ PATTERN is as returned by `completion-pcm--strin= g->pattern'." >> (let* ((pos (if point-idx (match-beginning point-idx) (match-e= nd 0))) >> (md (match-data)) >> (start (pop md)) >> - (end (pop md))) >> + (end (pop md)) >> + (len (length str)) >> + (score-numerator 0) >> + (score-denominator 0) >> + (aux 0) >> + (update-score >> + (lambda (a b) >> + "Update score variables given match range (A B)." >> + (setq >> + score-numerator (+ score-numerator (- b a)) >> + score-denominator (+ score-denominator (expt (- a a= ux) 1.5)) >> + aux b)))) > > I don't understand this scoring algorithm: please add a comment > explaining it I don't "understand" it either :-) i.e. it was developed empirically, more or less. But let's try to: Consider these bad ascii(tm) schematics matching the pattern "foo" to the strings "barfobazoquux" and "barfoobazquux": bar fo baz o quux --- ++ --- + ----=20 =20=20=20=20 =20=20=20=20 bar foo bazquux --- +++ -------=20 =20=20=20 +++ indicates parts where the pattern matched --- indicates parts where the pattern didn't match The formula assigns higher scores to "better" matches. It's bound by [0..1] and in the form of a quotient. For the numerator, it counts the +++. For the denominator, it takes counts the number of --- in each such group, exponentiates that number to a "falloff constant", adds it to the total, adds one to the final sum, and then multiples by the length of the string. If no --- you get (length/(length * (1 + 0)) =3D=3D 1 If no +++, you get 0 (but this never happens cause we pre-filtered the matches with pcm). The falloff constant (FC) used in the denominator is important. A negative falloff will value tighter matches, a positive value will value more evenly spread out matches. I set the constant to 1.5 some time ago, and I haven't gone back, but perhaps it could be -1.5. Anyway with FC=3D1.5 the examples above would be: first string 1.507% second string 1.197% with FC=3D-1.5 it would be first string 15.849% second string 19,834% If haven't mangled the numbers, this seems to point that -1.5 is a better default for the constant, which shouldn't be hardcoded anyway. On the other hand, I've been working with -1.5 for a long time now and never had any obvious problems. It's probably what you said elsewhere, "good" scoring is a question of getting used to whatever the system is doing, IOW humans learn, too. Also note that it doesn't make any sense to compare scores between different match lengths, only between same match lenghts at different positions on different strings. Anyway, what do you think? Is this acceptable or so you have anything better? > (and give a meaningful name to `aux`). Oh happy days when hackers could code "aux" and feel very smug about it. OK, I'll call it "last-b". >> + (funcall update-score 0 start) >> (while md >> - (put-text-property start (pop md) >> + (funcall update-score start (car md)) >> + (put-text-property start >> + (pop md) > > The extra line-break after `start` seems spurious. Were you expecting it to have very deep, hidden meaning? But OK. > >> + (put-text-property >> + 0 1 'completion-pcm-commonality-score >> + (/ score-numerator (* len (1+ score-denominator)) 1.0) str)) > > This will signal an error when `str` is the empty string :-( Ooops, but that that happen? Can an empty string be pcm-matched by some pattern? There's=20 > BTW, maybe PCM could also use this scoring for sorting (i.e. we could > set `completion-score` directly here) > >> + (let ((score (get-text-property 0 'completion-pcm-commonalit= y-score comp))) >> + (put-text-property 0 1 'completion-style-sort-order (- sco= re) comp) > > This will signal an error when `comp` is the empty string :-( Same here.