From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: chen bin Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] add 'string-distance' to calculate Levenshtein distance Date: Fri, 20 Apr 2018 00:55:15 +1000 Message-ID: References: <87vacuecrn.fsf@gmail.com> <83po3246ah.fsf@gnu.org> <87lgdq831h.fsf@gmail.com> <83muy553ae.fsf@gnu.org> <87o9ilhhcd.fsf@gmail.com> <83d0z14sws.fsf@gnu.org> <87o9il0wka.fsf@gmail.com> <83bmek4jdn.fsf@gnu.org> <83k1t72b2o.fsf@gnu.org> <83bmei36dw.fsf@gnu.org> <83wox3zkm7.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="f4f5e80e10306d3d71056a34c2c7" X-Trace: blaine.gmane.org 1524149640 10026 195.159.176.226 (19 Apr 2018 14:54:00 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 19 Apr 2018 14:54:00 +0000 (UTC) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Thu Apr 19 16:53:56 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1f9AwZ-0002PY-JX for ged-emacs-devel@m.gmane.org; Thu, 19 Apr 2018 16:53:51 +0200 Original-Received: from localhost ([::1]:53481 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f9Ayg-0003jS-6N for ged-emacs-devel@m.gmane.org; Thu, 19 Apr 2018 10:56:02 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46272) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f9Ay2-0003j5-EQ for emacs-devel@gnu.org; Thu, 19 Apr 2018 10:55:23 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f9Ay1-0002TS-2e for emacs-devel@gnu.org; Thu, 19 Apr 2018 10:55:22 -0400 Original-Received: from mail-lf0-x236.google.com ([2a00:1450:4010:c07::236]:41509) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1f9Axy-0002SR-Fg; Thu, 19 Apr 2018 10:55:18 -0400 Original-Received: by mail-lf0-x236.google.com with SMTP id m202-v6so592785lfe.8; Thu, 19 Apr 2018 07:55:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=GPdLlOlRWWOL+RD5psStjy02h5/pHNGAV4L5yu0MqPc=; b=kj7lS7wqRyEHMhG1wabpzkr9JWQKCBxKh5zGOpkDV9HGhSaknI/lakjH47Cl3VM35i BBzg36+laHqiOXxbNY/dsFUSgN6FYdxKGvxP43s7WQfCCbLM4X28Wyk45Qy5cCIRn7lS Flu976QBdqUdTScMhOrUSyc5ZxnLcY1oxQpUSuxIu83gbADNMfrdShTrCx2woRCii+wC nCrR3LgJZKomVGhdxiBtgUN5vjxzRZ7zud83Fm2UClMik39xdDEyPSAPzjdX+n6thX8h Vz8fFobPtFhts2mTMooc8xdubkyQDl/NDbN4I9y/6CAILoFxfvFvxf6bwW3mPcIK/e1D QscA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=GPdLlOlRWWOL+RD5psStjy02h5/pHNGAV4L5yu0MqPc=; b=HpS9arVC6SUrylLvWtFkfS9SnwPjyzuEBp54xtz5HVoeRbyt5HZib2VC2sFZwDMIZd Yx8I0MkDGVIomGjnqQfB2Tzee6QPeRYRjXq5yJxK+nXin9PEct7URurreJRa6t5eAgI4 CteWmyZ7kwzgJ5SAHrBOAEIIebv6OXdU8r+poqILD/gm7/ijbA3I88sSFbXH4x8xr8jy tsgscZ3uVdZh8i+gbttWFRhvhcBUkF3EzPfAGeXXsQhs4rDG3S3dSvtCYXXOMxA2k2jC 0cDvDM7wFsWIb8b0oOPHxHEPz3HgqqwAMqlXHNYCIgnSJcr+bFqWYwlKu9rQUXUNzTb4 Sm5Q== X-Gm-Message-State: ALQs6tA44kTIHv9OW+n1fLTqJ6AXwMRaxe2ey0HB5MsC0/Ie54kab/3p TiRykcOoZi8yviH9j9BQ9ym/TZr8m2orNHPkLpE= X-Google-Smtp-Source: AIpwx4/Hr8jC5liZQdi1v0eCsvPo1RXjTn7nlCIp7LIgynfM0n8Ilam8VEt/hkHU45SsfN8JPHX55pGZdU7PDXSDdPo= X-Received: by 10.46.154.205 with SMTP id p13mr4449524ljj.60.1524149716324; Thu, 19 Apr 2018 07:55:16 -0700 (PDT) Original-Received: by 10.46.20.93 with HTTP; Thu, 19 Apr 2018 07:55:15 -0700 (PDT) In-Reply-To: <83wox3zkm7.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4010:c07::236 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:224728 Archived-At: --f4f5e80e10306d3d71056a34c2c7 Content-Type: text/plain; charset="UTF-8" Hi, Eli, I attached the new patch. A software project usually contains nomal directory and file names. Linux kernel source, for example. My package `counsel-etags` will use byte compare version. Regards, Chen On Thu, Apr 19, 2018 at 6:05 PM, Eli Zaretskii wrote: >> From: chen bin >> Date: Tue, 17 Apr 2018 22:31:20 +1000 >> Cc: emacs-devel@gnu.org >> >> As you suggested, I re-write the code using 'FETCH_STRING_CHAR_ADVANCE'. > > Thanks. > >> I also implemented the byte comparing version. It's 4 times as fast. And I do >> need use it to compare file path in my package 'counsel-etags'. > > File names are just strings for this purpose, and they can potentially > include any non-zero characters. So I don't see why they are special. > >> The fille path couldn't contain any funny characters (emoji). so >> it'sperfectly fine >> to use byte comparing version. > > File names can very well include emoji and other "funny" characters, > Emacs supports that on all modern systems (including even MS-Windows). > >> diff --git a/etc/NEWS b/etc/NEWS >> index 5aa92e2991..3cce2c48c7 100644 >> --- a/etc/NEWS >> +++ b/etc/NEWS >> @@ -490,6 +490,8 @@ x-lost-selection-hooks, x-sent-selection-hooks >> +++ >> ** New function assoc-delete-all. >> >> +** New function string-distance to calculate Levenshtein distance between two strings. > > This long line should be filled using the fill-column setting we use > in NEWS. Even better, make the header a short summary, like > > ** New function 'string-distance' > > and then describe its functionality in a separate sentence that starts > immediately below that header. > >> +DEFUN ("string-distance", Fstring_distance, Sstring_distance, 2, 3, 0, >> + doc: /* Return Levenshtein distance between STRING1 and STRING2. >> +If BYTECOMPARE is nil, we compare character of strings. >> +If BYTECOMPARE is t, we compare byte of strings. > > Please lose the "we" part, it's inappropriate in documentation, > because it describes what Emacs does. > >> +Comparing by byte is faster and non-ascii characters has weighted distance. > > I would delete this sentence, it is IMO confusing more than anything > else. (And I still think the bytewise comparison is not needed.) > >> + bool use_bytecompare = !NILP(bytecompare); > ^^ > Space between these 2 characters. > >> + else >> + { >> + int c1, c2; >> + ptrdiff_t i1, i1_byte, i2, i2_byte; >> + i2 = i2_byte = 0; >> + for (x = 1; x <= len2; x++) > > Please move the initialization of i2 and i2_byte into the for-loop > initializer (suing the comma operator). > >> + i1 = i1_byte = 0; >> + for (y = 1, lastdiag = x - 1; y <= len1; y++) > > Likewise here with i1 and i1_byte. > > Thanks. -- help me, help you. --f4f5e80e10306d3d71056a34c2c7 Content-Type: text/x-patch; charset="UTF-8"; name="0001-add-api-string-distance.patch" Content-Disposition: attachment; filename="0001-add-api-string-distance.patch" Content-Transfer-Encoding: base64 X-Attachment-Id: f_jg6n45en0 RnJvbSBkNDIzMzhjZTVlY2I1OTJkMTJhYzQxMDgxMTkyODJjZjZhMGE1NTJiIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBDaGVuIEJpbiA8Y2hlbmJpbi5zaEBnbWFpbC5jb20+CkRhdGU6 IEZyaSwgMjAgQXByIDIwMTggMDA6Mzg6MjkgKzEwMDAKU3ViamVjdDogW1BBVENIXSBhZGQgYXBp IHN0cmluZy1kaXN0YW5jZQoKLS0tCiBldGMvTkVXUyAgICAgICAgICAgICAgICB8ICAzICsrKwog c3JjL2Zucy5jICAgICAgICAgICAgICAgfCA2MSArKysrKysrKysrKysrKysrKysrKysrKysrKysr KysrKysrKysrKysrKysrKysrKysrCiB0ZXN0L2xpc3Avc3Vici10ZXN0cy5lbCB8IDE4ICsrKysr KysrKysrKysrKwogMyBmaWxlcyBjaGFuZ2VkLCA4MiBpbnNlcnRpb25zKCspCgpkaWZmIC0tZ2l0 IGEvZXRjL05FV1MgYi9ldGMvTkVXUwppbmRleCA1YWE5MmUyOTkxLi40ZWZiN2U5ODNlIDEwMDY0 NAotLS0gYS9ldGMvTkVXUworKysgYi9ldGMvTkVXUwpAQCAtNDkwLDYgKzQ5MCw5IEBAIHgtbG9z dC1zZWxlY3Rpb24taG9va3MsIHgtc2VudC1zZWxlY3Rpb24taG9va3MKICsrKwogKiogTmV3IGZ1 bmN0aW9uIGFzc29jLWRlbGV0ZS1hbGwuCiAKKyoqIE5ldyBmdW5jdGlvbiBzdHJpbmctZGlzdGFu Y2UgdG8gY2FsY3VsYXRlIExldmVuc2h0ZWluIGRpc3RhbmNlCitiZXR3ZWVuIHR3byBzdHJpbmdz LgorCiAqKiAncHJpbnQtcXVvdGVkJyBub3cgZGVmYXVsdHMgdG8gdCwgc28gaWYgeW91IHdhbnQg dG8gc2VlCiAocXVvdGUgeCkgaW5zdGVhZCBvZiAneCB5b3Ugd2lsbCBoYXZlIHRvIGJpbmQgaXQg dG8gbmlsIHdoZXJlIGFwcGxpY2FibGUuCiAKZGlmZiAtLWdpdCBhL3NyYy9mbnMuYyBiL3NyYy9m bnMuYwppbmRleCA5NGI5ZDk4NGYwLi43NTNkYzg4ZWQ2IDEwMDY0NAotLS0gYS9zcmMvZm5zLmMK KysrIGIvc3JjL2Zucy5jCkBAIC0xNTMsNiArMTUzLDY2IEBAIElmIFNUUklORyBpcyBtdWx0aWJ5 dGUsIHRoaXMgbWF5IGJlIGdyZWF0ZXIgdGhhbiB0aGUgbGVuZ3RoIG9mIFNUUklORy4gICovKQog ICByZXR1cm4gbWFrZV9udW1iZXIgKFNCWVRFUyAoc3RyaW5nKSk7CiB9CiAKK0RFRlVOICgic3Ry aW5nLWRpc3RhbmNlIiwgRnN0cmluZ19kaXN0YW5jZSwgU3N0cmluZ19kaXN0YW5jZSwgMiwgMywg MCwKKyAgICAgICBkb2M6IC8qIFJldHVybiBMZXZlbnNodGVpbiBkaXN0YW5jZSBiZXR3ZWVuIFNU UklORzEgYW5kIFNUUklORzIuCitJZiBCWVRFQ09NUEFSRSBpcyBuaWwsIGNvbXBhcmUgY2hhcmFj dGVyIG9mIHN0cmluZ3MuCitJZiBCWVRFQ09NUEFSRSBpcyB0LCBjb21wYXJlIGJ5dGUgb2Ygc3Ry aW5ncy4KK0Nhc2UgaXMgc2lnbmlmaWNhbnQsIGJ1dCB0ZXh0IHByb3BlcnRpZXMgYXJlIGlnbm9y ZWQuICovKQorICAoTGlzcF9PYmplY3Qgc3RyaW5nMSwgTGlzcF9PYmplY3Qgc3RyaW5nMiwgTGlz cF9PYmplY3QgYnl0ZWNvbXBhcmUpCisKK3sKKyAgQ0hFQ0tfU1RSSU5HIChzdHJpbmcxKTsKKyAg Q0hFQ0tfU1RSSU5HIChzdHJpbmcyKTsKKworICBib29sIHVzZV9ieXRlX2NvbXBhcmUgPSAhTklM UCAoYnl0ZWNvbXBhcmUpOworICBwdHJkaWZmX3QgbGVuMSA9IHVzZV9ieXRlX2NvbXBhcmU/IFNC WVRFUyAoc3RyaW5nMSkgOiBTQ0hBUlMgKHN0cmluZzEpOworICBwdHJkaWZmX3QgbGVuMiA9IHVz ZV9ieXRlX2NvbXBhcmU/IFNCWVRFUyAoc3RyaW5nMikgOiBTQ0hBUlMgKHN0cmluZzIpOworICBw dHJkaWZmX3QgeCwgeSwgbGFzdGRpYWcsIG9sZGRpYWc7CisKKyAgVVNFX1NBRkVfQUxMT0NBOwor ICBwdHJkaWZmX3QgKmNvbHVtbiA9IFNBRkVfQUxMT0NBICgobGVuMSArIDEpICogc2l6ZW9mIChw dHJkaWZmX3QpKTsKKyAgZm9yICh5ID0gMTsgeSA8PSBsZW4xOyB5KyspCisgICAgY29sdW1uW3ld ID0geTsKKworICBpZiAodXNlX2J5dGVfY29tcGFyZSkKKyAgICB7CisgICAgICBjaGFyICpzMSA9 IFNTREFUQSAoc3RyaW5nMSk7CisgICAgICBjaGFyICpzMiA9IFNTREFUQSAoc3RyaW5nMik7CisK KyAgICAgIGZvciAoeCA9IDE7IHggPD0gbGVuMjsgeCsrKQorICAgICAgICB7CisgICAgICAgICAg Y29sdW1uWzBdID0geDsKKyAgICAgICAgICBmb3IgKHkgPSAxLCBsYXN0ZGlhZyA9IHggLSAxOyB5 IDw9IGxlbjE7IHkrKykKKyAgICAgICAgICAgIHsKKyAgICAgICAgICAgICAgb2xkZGlhZyA9IGNv bHVtblt5XTsKKyAgICAgICAgICAgICAgY29sdW1uW3ldID0gbWluIChtaW4gKGNvbHVtblt5XSAr IDEsIGNvbHVtblt5LTFdICsgMSksIGxhc3RkaWFnICsgKHMxW3ktMV0gPT0gczJbeC0xXT8gMCA6 IDEpKTsKKyAgICAgICAgICAgICAgbGFzdGRpYWcgPSBvbGRkaWFnOworICAgICAgICAgICAgfQor ICAgICAgICB9CisgICAgfQorICBlbHNlCisgICAgeworICAgICAgaW50IGMxLCBjMjsKKyAgICAg IHB0cmRpZmZfdCBpMSwgaTFfYnl0ZSwgaTIgPSAwLCBpMl9ieXRlID0gMDsKKyAgICAgIGZvciAo eCA9IDE7IHggPD0gbGVuMjsgeCsrKQorICAgICAgICB7CisgICAgICAgICAgY29sdW1uWzBdID0g eDsKKyAgICAgICAgICBGRVRDSF9TVFJJTkdfQ0hBUl9BRFZBTkNFIChjMiwgc3RyaW5nMiwgaTIs IGkyX2J5dGUpOworICAgICAgICAgIGkxID0gaTFfYnl0ZSA9IDA7CisgICAgICAgICAgZm9yICh5 ID0gMSwgbGFzdGRpYWcgPSB4IC0gMTsgeSA8PSBsZW4xOyB5KyspCisgICAgICAgICAgICB7Cisg ICAgICAgICAgICAgIG9sZGRpYWcgPSBjb2x1bW5beV07CisgICAgICAgICAgICAgIEZFVENIX1NU UklOR19DSEFSX0FEVkFOQ0UgKGMxLCBzdHJpbmcxLCBpMSwgaTFfYnl0ZSk7CisgICAgICAgICAg ICAgIGNvbHVtblt5XSA9IG1pbiAobWluIChjb2x1bW5beV0gKyAxLCBjb2x1bW5beS0xXSArIDEp LCBsYXN0ZGlhZyArIChjMSA9PSBjMj8gMCA6IDEpKTsKKyAgICAgICAgICAgICAgbGFzdGRpYWcg PSBvbGRkaWFnOworICAgICAgICAgICAgfQorICAgICAgICB9CisgICAgfQorCisgIFNBRkVfRlJF RSAoKTsKKyAgcmV0dXJuIG1ha2VfbnVtYmVyIChjb2x1bW5bbGVuMV0pOworfQorCiBERUZVTiAo InN0cmluZy1lcXVhbCIsIEZzdHJpbmdfZXF1YWwsIFNzdHJpbmdfZXF1YWwsIDIsIDIsIDAsCiAg ICAgICAgZG9jOiAvKiBSZXR1cm4gdCBpZiB0d28gc3RyaW5ncyBoYXZlIGlkZW50aWNhbCBjb250 ZW50cy4KIENhc2UgaXMgc2lnbmlmaWNhbnQsIGJ1dCB0ZXh0IHByb3BlcnRpZXMgYXJlIGlnbm9y ZWQuCkBAIC01MjI2LDYgKzUyODYsNyBAQCB0aGlzIHZhcmlhYmxlLiAgKi8pOwogICBkZWZzdWJy ICgmU2xlbmd0aCk7CiAgIGRlZnN1YnIgKCZTc2FmZV9sZW5ndGgpOwogICBkZWZzdWJyICgmU3N0 cmluZ19ieXRlcyk7CisgIGRlZnN1YnIgKCZTc3RyaW5nX2Rpc3RhbmNlKTsKICAgZGVmc3ViciAo JlNzdHJpbmdfZXF1YWwpOwogICBkZWZzdWJyICgmU2NvbXBhcmVfc3RyaW5ncyk7CiAgIGRlZnN1 YnIgKCZTc3RyaW5nX2xlc3NwKTsKZGlmZiAtLWdpdCBhL3Rlc3QvbGlzcC9zdWJyLXRlc3RzLmVs IGIvdGVzdC9saXNwL3N1YnItdGVzdHMuZWwKaW5kZXggNTJiNjFkOWZiOS4uMTMxMGY1YjJhNSAx MDA2NDQKLS0tIGEvdGVzdC9saXNwL3N1YnItdGVzdHMuZWwKKysrIGIvdGVzdC9saXNwL3N1YnIt dGVzdHMuZWwKQEAgLTI4MSw2ICsyODEsMjQgQEAgc3Vici10ZXN0LS1mcmFtZXMtMQogICAoc2hv dWxkIChlcXVhbCAoc3RyaW5nLW1hdGNoLXAgIlxcYFtbOmJsYW5rOl1dXFwnIiAiXHUzMDAwIikg MCkpCiAgIChzaG91bGQtbm90IChzdHJpbmctbWF0Y2gtcCAiXFxgW1s6Ymxhbms6XV1cXCciICJc TntMSU5FIFNFUEFSQVRPUn0iKSkpCiAKKyhlcnQtZGVmdGVzdCBzdWJyLXRlc3RzLS1zdHJpbmct ZGlzdGFuY2UgKCkKKyAgIlRlc3QgYHN0cmluZy1kaXN0YW5jZScgYmVoYXZpb3IuIgorICA7OyBB U0NJSSBjaGFyYWN0ZXJzIGFyZSBhbHdheXMgZmluZQorICAoc2hvdWxkIChlcXVhbCAxIChzdHJp bmctZGlzdGFuY2UgImhlZWxvIiAiaGVsbG8iKSkpCisgIChzaG91bGQgKGVxdWFsIDIgKHN0cmlu Zy1kaXN0YW5jZSAiYWVlbG8iICJoZWxsbyIpKSkKKyAgKHNob3VsZCAoZXF1YWwgMCAoc3RyaW5n LWRpc3RhbmNlICJhYiIgImFiIiB0KSkpCisgIChzaG91bGQgKGVxdWFsIDEgKHN0cmluZy1kaXN0 YW5jZSAiYWIiICJhYmMiIHQpKSkKKworICA7OyBzdHJpbmcgY29udGFpbmluZyBoYW56aSBjaGFy YWN0ZXIsIGNvbXBhcmUgYnkgYnl0ZQorICAoc2hvdWxkIChlcXVhbCA2IChzdHJpbmctZGlzdGFu Y2UgImFiIiAiYWLmiJHlpbkiIHQpKSkKKyAgKHNob3VsZCAoZXF1YWwgMyAoc3RyaW5nLWRpc3Rh bmNlICJhYiIgImHmiJFiIiB0KSkpCisgIChzaG91bGQgKGVxdWFsIDMgKHN0cmluZy1kaXN0YW5j ZSAi5oiRIiAi5aW5IiB0KSkKKworICA7OyBzdHJpbmcgY29udGFpbmluZyBoYW56aSBjaGFyYWN0 ZXIsIGNvbXBhcmUgYnkgY2hhcmFjdGVyCisgIChzaG91bGQgKGVxdWFsIDIgKHN0cmluZy1kaXN0 YW5jZSAiYWIiICJhYuaIkeWluSIpKSkKKyAgKHNob3VsZCAoZXF1YWwgMSAoc3RyaW5nLWRpc3Rh bmNlICJhYiIgImHmiJFiIikpKQorICAoc2hvdWxkIChlcXVhbCAxIChzdHJpbmctZGlzdGFuY2Ug IuaIkSIgIuWluSIpKSkpCisKIChlcnQtZGVmdGVzdCBzdWJyLXRlc3RzLS1kb2xpc3QtLXdyb25n LW51bWJlci1vZi1hcmdzICgpCiAgICJUZXN0IHRoYXQgYGRvbGlzdCcgZG9lc24ndCBhY2NlcHQg d3JvbmcgdHlwZXMgb3IgbGVuZ3RoIG9mIFNQRUMsCiBjZi4gQnVnIzI1NDc3LiIKLS0gCjIuMTYu MwoK --f4f5e80e10306d3d71056a34c2c7--