unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* [PATCH] add 'string-distance' to calculate Levenshtein distance
@ 2018-04-14  2:35 Chen Bin
  2018-04-14  7:05 ` Eli Zaretskii
  2018-04-14 17:18 ` Nathan Moreau
  0 siblings, 2 replies; 24+ messages in thread
From: Chen Bin @ 2018-04-14  2:35 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 590 bytes --]

Hello,
Attached patch implemented 'string-distance' is much faster then
existing Lisp implementation like 'org-babel-edit-distance'.

I benchmarked by below snippet:

  (setq s1 "projs1/emacs/etc/imeees/icon/emacs-document.svg")
  (setq s2 "projs2/test/etc/images/icons/emacs-document23.svg")
  (let* ((gc-cons-threshold most-positive-fixnum))
    (message "%S vs %S"
             (benchmark-run-compiled 100
               (org-babel-edit-distance s1 s2))
             (benchmark-run-compiled 100
               (string-distance s1 s2))))

Got:
(0.506494223 0 0.0) vs (0.001317414 0 0.0)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-add-api-string-distance.patch --]
[-- Type: text/x-diff, Size: 2238 bytes --]

From bb32afb18bd06e6843921205987b23641145f51a Mon Sep 17 00:00:00 2001
From: Chen Bin <chenbin.sh@gmail.com>
Date: Sat, 14 Apr 2018 10:33:44 +1000
Subject: [PATCH] add api string-distance

---
 src/fns.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/src/fns.c b/src/fns.c
index 94b9d98..b63d4f4 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -41,6 +41,8 @@ along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.  */
 # define gnutls_rnd w32_gnutls_rnd
 #endif
 
+#define min3(a, b, c) ((a) < (b) ? ((a) < (c) ? (a) : (c)) : ((b) < (c) ? (b) : (c)))
+
 static void sort_vector_copy (Lisp_Object, ptrdiff_t,
 			      Lisp_Object *restrict, Lisp_Object *restrict);
 enum equal_kind { EQUAL_NO_QUIT, EQUAL_PLAIN, EQUAL_INCLUDING_PROPERTIES };
@@ -153,6 +155,33 @@ If STRING is multibyte, this may be greater than the length of STRING.  */)
   return make_number (SBYTES (string));
 }
 
+DEFUN ("string-distance", Fstring_distance, Sstring_distance, 2, 2, 0,
+       doc: /* Return Levenshtein distance of two strings.
+Case is significant, but text properties are ignored. */)
+  (register Lisp_Object string1, Lisp_Object string2)
+{
+  CHECK_STRING (string1);
+  CHECK_STRING (string2);
+
+  char *s1 = SSDATA (string1);
+  char *s2 = SSDATA (string2);
+  unsigned int s1len, s2len, x, y, lastdiag, olddiag;
+  s1len = strlen(s1);
+  s2len = strlen(s2);
+  unsigned int column[s1len+1];
+  for (y = 1; y <= s1len; y++)
+    column[y] = y;
+  for (x = 1; x <= s2len; x++) {
+    column[0] = x;
+    for (y = 1, lastdiag = x-1; y <= s1len; y++) {
+      olddiag = column[y];
+      column[y] = min3(column[y] + 1, column[y-1] + 1, lastdiag + (s1[y-1] == s2[x-1] ? 0 : 1));
+      lastdiag = olddiag;
+    }
+  }
+  return make_number(column[s1len]);
+}
+
 DEFUN ("string-equal", Fstring_equal, Sstring_equal, 2, 2, 0,
        doc: /* Return t if two strings have identical contents.
 Case is significant, but text properties are ignored.
@@ -5226,6 +5255,7 @@ this variable.  */);
   defsubr (&Slength);
   defsubr (&Ssafe_length);
   defsubr (&Sstring_bytes);
+  defsubr (&Sstring_distance);
   defsubr (&Sstring_equal);
   defsubr (&Scompare_strings);
   defsubr (&Sstring_lessp);
-- 
2.16.3


[-- Attachment #3: Type: text/plain, Size: 50 bytes --]


-- 
Best Regards,
Chen Bin

--
Help me, help you

^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-05-06  9:53 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-04-14  2:35 [PATCH] add 'string-distance' to calculate Levenshtein distance Chen Bin
2018-04-14  7:05 ` Eli Zaretskii
     [not found]   ` <87lgdq831h.fsf@gmail.com>
2018-04-14 13:24     ` Eli Zaretskii
2018-04-14 16:40       ` Chen Bin
2018-04-14 17:08         ` Eli Zaretskii
2018-04-15  7:15           ` Chen Bin
2018-04-15 14:47             ` Eli Zaretskii
     [not found]               ` <CAAE-R+-RDWvyrv+uqHszzh6VMH6An3disOw=PyPWaTnUTHDOCw@mail.gmail.com>
     [not found]                 ` <83k1t72b2o.fsf@gnu.org>
2018-04-17  2:43                   ` chen bin
2018-04-17 15:44                     ` Eli Zaretskii
2018-04-18  7:11                       ` chen bin
     [not found]                   ` <CAAE-R+8s++_LRcQCLX60Z=TQeQHdtbM5X1k525bfNnnPSLDvRw@mail.gmail.com>
     [not found]                     ` <83bmei36dw.fsf@gnu.org>
2018-04-17 12:31                       ` chen bin
2018-04-19  8:05                         ` Eli Zaretskii
2018-04-19 14:55                           ` chen bin
2018-04-20  4:37                             ` chen bin
2018-04-20  6:01                             ` Thien-Thi Nguyen
2018-04-20 10:47                             ` chen bin
2018-04-21  7:22                               ` Eli Zaretskii
2018-04-21 20:47                                 ` Juri Linkov
2018-04-28  7:36                                 ` Eli Zaretskii
2018-05-06  9:53                                   ` chen bin
2018-04-15 18:53             ` Paul Eggert
2018-04-14 17:18 ` Nathan Moreau
2018-04-14 17:36   ` Paul Eggert
2018-04-15 18:17     ` Andreas Politz

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).