From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Chen Bin Newsgroups: gmane.emacs.devel Subject: [PATCH] add 'string-distance' to calculate Levenshtein distance Date: Sat, 14 Apr 2018 12:35:08 +1000 Message-ID: <87vacuecrn.fsf@gmail.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: blaine.gmane.org 1523673209 14032 195.159.176.226 (14 Apr 2018 02:33:29 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 14 Apr 2018 02:33:29 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Apr 14 04:33:24 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1f7B0F-0003Yp-VZ for ged-emacs-devel@m.gmane.org; Sat, 14 Apr 2018 04:33:24 +0200 Original-Received: from localhost ([::1]:48298 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f7B2L-0000y2-Gh for ged-emacs-devel@m.gmane.org; Fri, 13 Apr 2018 22:35:33 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:38871) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f7B2B-0000v1-SB for emacs-devel@gnu.org; Fri, 13 Apr 2018 22:35:25 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f7B28-0008Nf-Nn for emacs-devel@gnu.org; Fri, 13 Apr 2018 22:35:23 -0400 Original-Received: from mail-pf0-x243.google.com ([2607:f8b0:400e:c00::243]:41818) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1f7B28-0008NB-FD for emacs-devel@gnu.org; Fri, 13 Apr 2018 22:35:20 -0400 Original-Received: by mail-pf0-x243.google.com with SMTP id a2so7372787pff.8 for ; Fri, 13 Apr 2018 19:35:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:mime-version; bh=AE11AOYEvbwl8xaojy4TbCG3bHJaRAKoQSdW5UFsbCM=; b=jdFhsrhe62Dfx0t1FKr6O+cYMrYSsAGZbjHXALKe1gtdjguwGlbk5/iJBsR906/z2K UvttkGgL/Nhu7lJ+xRQbHKN2YMhDqfKl5p4sTchbkT8a0utgTX/eOGY5iF75wVKT0jdW Mz0aC21FMbozr+5s/6yfERwE5t3LNEZTeSODSDxYy46/5L/2FYQ9Ri4CJ6dj5yaSrNGr cpqi+uLfhekHhqGH+MgbcER1yFGViHfeTx3iz/Hh0JKkUu4ZTUYkhxTXb4Mew6RXnGFK ubT86xe9NCsvoObnZTgWVwcG4/1kPNJbzNr/NU/LgkhEdffBm72iIEfDWfKxlH5dHUH1 Olow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:mime-version; bh=AE11AOYEvbwl8xaojy4TbCG3bHJaRAKoQSdW5UFsbCM=; b=jOwjjIcPtHAgF4cpmve4+L673A4hKq6yqScaMFw6WLcWwxd8lH0SzUnoKNK3RLXQ5t lf74UHma+7yzcxQLd1x3+1ymlERO8JyDmPgACWAXHrVK5yYeG6FMOWw4V6METeYey6wB HrMPHW+JRQRqmGEGQwnm7ayUzMDL6vkasubWTEc2rRYs/kpjn93j7GMewBCCA6LCIvB9 SFfEHdFYNl9G6ssjoC6WTnWvZ9IjXHtgt2pzfs/Vzhjj0mU7fWQA2fQMheFDlmchFiBY vqnG9RK+z9VaZmuBIznLcvzrqWuua7SkAH+EW/1M88a8PnNYxi8aUSa/FzycMNB4Jcs9 Mbuw== X-Gm-Message-State: ALQs6tAclBbv48lhEdKc+NaQv5ukzro7lM0N0gNK9NwYGrtLPDsb3hyP gFeDOqvlybcUiZUFobI9O/zolw== X-Google-Smtp-Source: AIpwx4+btkxNKieYH2NOA/zzhU9yIzkVcpfnqcVv6NYnlWTFjSbdclqt3vwa6RSFPXYcap1VHV3C2g== X-Received: by 10.101.75.12 with SMTP id r12mr2633383pgq.36.1523673318615; Fri, 13 Apr 2018 19:35:18 -0700 (PDT) Original-Received: from sydneypc.homepc ([115.187.209.174]) by smtp.gmail.com with ESMTPSA id e82sm15393446pfh.115.2018.04.13.19.35.15 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 13 Apr 2018 19:35:16 -0700 (PDT) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::243 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:224577 Archived-At: --=-=-= Content-Type: text/plain Hello, Attached patch implemented 'string-distance' is much faster then existing Lisp implementation like 'org-babel-edit-distance'. I benchmarked by below snippet: (setq s1 "projs1/emacs/etc/imeees/icon/emacs-document.svg") (setq s2 "projs2/test/etc/images/icons/emacs-document23.svg") (let* ((gc-cons-threshold most-positive-fixnum)) (message "%S vs %S" (benchmark-run-compiled 100 (org-babel-edit-distance s1 s2)) (benchmark-run-compiled 100 (string-distance s1 s2)))) Got: (0.506494223 0 0.0) vs (0.001317414 0 0.0) --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=0001-add-api-string-distance.patch >From bb32afb18bd06e6843921205987b23641145f51a Mon Sep 17 00:00:00 2001 From: Chen Bin Date: Sat, 14 Apr 2018 10:33:44 +1000 Subject: [PATCH] add api string-distance --- src/fns.c | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/src/fns.c b/src/fns.c index 94b9d98..b63d4f4 100644 --- a/src/fns.c +++ b/src/fns.c @@ -41,6 +41,8 @@ along with GNU Emacs. If not, see . */ # define gnutls_rnd w32_gnutls_rnd #endif +#define min3(a, b, c) ((a) < (b) ? ((a) < (c) ? (a) : (c)) : ((b) < (c) ? (b) : (c))) + static void sort_vector_copy (Lisp_Object, ptrdiff_t, Lisp_Object *restrict, Lisp_Object *restrict); enum equal_kind { EQUAL_NO_QUIT, EQUAL_PLAIN, EQUAL_INCLUDING_PROPERTIES }; @@ -153,6 +155,33 @@ If STRING is multibyte, this may be greater than the length of STRING. */) return make_number (SBYTES (string)); } +DEFUN ("string-distance", Fstring_distance, Sstring_distance, 2, 2, 0, + doc: /* Return Levenshtein distance of two strings. +Case is significant, but text properties are ignored. */) + (register Lisp_Object string1, Lisp_Object string2) +{ + CHECK_STRING (string1); + CHECK_STRING (string2); + + char *s1 = SSDATA (string1); + char *s2 = SSDATA (string2); + unsigned int s1len, s2len, x, y, lastdiag, olddiag; + s1len = strlen(s1); + s2len = strlen(s2); + unsigned int column[s1len+1]; + for (y = 1; y <= s1len; y++) + column[y] = y; + for (x = 1; x <= s2len; x++) { + column[0] = x; + for (y = 1, lastdiag = x-1; y <= s1len; y++) { + olddiag = column[y]; + column[y] = min3(column[y] + 1, column[y-1] + 1, lastdiag + (s1[y-1] == s2[x-1] ? 0 : 1)); + lastdiag = olddiag; + } + } + return make_number(column[s1len]); +} + DEFUN ("string-equal", Fstring_equal, Sstring_equal, 2, 2, 0, doc: /* Return t if two strings have identical contents. Case is significant, but text properties are ignored. @@ -5226,6 +5255,7 @@ this variable. */); defsubr (&Slength); defsubr (&Ssafe_length); defsubr (&Sstring_bytes); + defsubr (&Sstring_distance); defsubr (&Sstring_equal); defsubr (&Scompare_strings); defsubr (&Sstring_lessp); -- 2.16.3 --=-=-= Content-Type: text/plain -- Best Regards, Chen Bin -- Help me, help you --=-=-=--