From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] add 'string-distance' to calculate Levenshtein distance Date: Sat, 14 Apr 2018 10:36:49 -0700 Organization: UCLA Computer Science Department Message-ID: References: <87vacuecrn.fsf@gmail.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1523727302 27459 195.159.176.226 (14 Apr 2018 17:35:02 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 14 Apr 2018 17:35:02 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 Cc: emacs-devel To: Nathan Moreau , Chen Bin Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Apr 14 19:34:58 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1f7P4k-00071f-3o for ged-emacs-devel@m.gmane.org; Sat, 14 Apr 2018 19:34:58 +0200 Original-Received: from localhost ([::1]:51796 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f7P6q-0005wG-Rm for ged-emacs-devel@m.gmane.org; Sat, 14 Apr 2018 13:37:08 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:35021) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f7P6h-0005vd-PY for emacs-devel@gnu.org; Sat, 14 Apr 2018 13:37:00 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f7P6f-0007LP-5g for emacs-devel@gnu.org; Sat, 14 Apr 2018 13:36:59 -0400 Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:35368) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1f7P6e-0007J7-VA for emacs-devel@gnu.org; Sat, 14 Apr 2018 13:36:57 -0400 Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 1C3E4161625; Sat, 14 Apr 2018 10:36:54 -0700 (PDT) Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id Ws0L7XLdBZvG; Sat, 14 Apr 2018 10:36:53 -0700 (PDT) Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 76C5316162D; Sat, 14 Apr 2018 10:36:53 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 1CbZh3SJdnkP; Sat, 14 Apr 2018 10:36:53 -0700 (PDT) Original-Received: from [192.168.1.9] (unknown [47.154.30.119]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 5063D161625; Sat, 14 Apr 2018 10:36:53 -0700 (PDT) In-Reply-To: Content-Language: en-US X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 131.179.128.68 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:224604 Archived-At: Nathan Moreau wrote: > What is the difference with the code present in lib/diffseq.h? lib/diffseq.h uses the Myers-Ukkonen algorithm that scales better for the= common=20 case where strings are closely related. If the two strings are length N a= nd=20 their Levenshtein distance is D (where D is much less than N), then=20 lib/diffseq.h is O(N*D) whereas the proposed algorithm is O(N**2). So yes, it'd be better if the code used lib/diffseq.h rather than rolled = its own=20 algorithm.