From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] add 'string-distance' to calculate Levenshtein distance Date: Thu, 19 Apr 2018 11:05:52 +0300 Message-ID: <83wox3zkm7.fsf@gnu.org> References: <87vacuecrn.fsf@gmail.com> <83po3246ah.fsf@gnu.org> <87lgdq831h.fsf@gmail.com> <83muy553ae.fsf@gnu.org> <87o9ilhhcd.fsf@gmail.com> <83d0z14sws.fsf@gnu.org> <87o9il0wka.fsf@gmail.com> <83bmek4jdn.fsf@gnu.org> <83k1t72b2o.fsf@gnu.org> <83bmei36dw.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1524125126 3126 195.159.176.226 (19 Apr 2018 08:05:26 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 19 Apr 2018 08:05:26 +0000 (UTC) Cc: emacs-devel@gnu.org To: chen bin Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Apr 19 10:05:22 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1f94ZF-0000hY-EL for ged-emacs-devel@m.gmane.org; Thu, 19 Apr 2018 10:05:21 +0200 Original-Received: from localhost ([::1]:58021 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f94bM-0004Rs-61 for ged-emacs-devel@m.gmane.org; Thu, 19 Apr 2018 04:07:32 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:53902) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f94Zc-0003U5-0L for emacs-devel@gnu.org; Thu, 19 Apr 2018 04:05:44 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f94ZY-0003GV-SG for emacs-devel@gnu.org; Thu, 19 Apr 2018 04:05:43 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:60411) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f94ZY-0003GL-Pi; Thu, 19 Apr 2018 04:05:40 -0400 Original-Received: from [176.228.60.248] (port=1976 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1f94ZY-0001a1-7f; Thu, 19 Apr 2018 04:05:40 -0400 In-reply-to: (message from chen bin on Tue, 17 Apr 2018 22:31:20 +1000) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:224725 Archived-At: > From: chen bin > Date: Tue, 17 Apr 2018 22:31:20 +1000 > Cc: emacs-devel@gnu.org > > As you suggested, I re-write the code using 'FETCH_STRING_CHAR_ADVANCE'. Thanks. > I also implemented the byte comparing version. It's 4 times as fast. And I do > need use it to compare file path in my package 'counsel-etags'. File names are just strings for this purpose, and they can potentially include any non-zero characters. So I don't see why they are special. > The fille path couldn't contain any funny characters (emoji). so > it'sperfectly fine > to use byte comparing version. File names can very well include emoji and other "funny" characters, Emacs supports that on all modern systems (including even MS-Windows). > diff --git a/etc/NEWS b/etc/NEWS > index 5aa92e2991..3cce2c48c7 100644 > --- a/etc/NEWS > +++ b/etc/NEWS > @@ -490,6 +490,8 @@ x-lost-selection-hooks, x-sent-selection-hooks > +++ > ** New function assoc-delete-all. > > +** New function string-distance to calculate Levenshtein distance between two strings. This long line should be filled using the fill-column setting we use in NEWS. Even better, make the header a short summary, like ** New function 'string-distance' and then describe its functionality in a separate sentence that starts immediately below that header. > +DEFUN ("string-distance", Fstring_distance, Sstring_distance, 2, 3, 0, > + doc: /* Return Levenshtein distance between STRING1 and STRING2. > +If BYTECOMPARE is nil, we compare character of strings. > +If BYTECOMPARE is t, we compare byte of strings. Please lose the "we" part, it's inappropriate in documentation, because it describes what Emacs does. > +Comparing by byte is faster and non-ascii characters has weighted distance. I would delete this sentence, it is IMO confusing more than anything else. (And I still think the bytewise comparison is not needed.) > + bool use_bytecompare = !NILP(bytecompare); ^^ Space between these 2 characters. > + else > + { > + int c1, c2; > + ptrdiff_t i1, i1_byte, i2, i2_byte; > + i2 = i2_byte = 0; > + for (x = 1; x <= len2; x++) Please move the initialization of i2 and i2_byte into the for-loop initializer (suing the comma operator). > + i1 = i1_byte = 0; > + for (y = 1, lastdiag = x - 1; y <= len1; y++) Likewise here with i1 and i1_byte. Thanks.