From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Paul Eggert Newsgroups: gmane.emacs.devel Subject: Re: [PATCH] add 'string-distance' to calculate Levenshtein distance Date: Sun, 15 Apr 2018 11:53:05 -0700 Organization: UCLA Computer Science Department Message-ID: References: <87vacuecrn.fsf@gmail.com> <83po3246ah.fsf@gnu.org> <87lgdq831h.fsf@gmail.com> <83muy553ae.fsf@gnu.org> <87o9ilhhcd.fsf@gmail.com> <83d0z14sws.fsf@gnu.org> <87o9il0wka.fsf@gmail.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: blaine.gmane.org 1523818284 20309 195.159.176.226 (15 Apr 2018 18:51:24 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 15 Apr 2018 18:51:24 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 Cc: emacs-devel@gnu.org To: Chen Bin , Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Apr 15 20:51:20 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1f7mkB-00059Z-OO for ged-emacs-devel@m.gmane.org; Sun, 15 Apr 2018 20:51:19 +0200 Original-Received: from localhost ([::1]:35065 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f7mmG-0000ZC-Ez for ged-emacs-devel@m.gmane.org; Sun, 15 Apr 2018 14:53:28 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:46230) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f7mm6-0000YY-0l for emacs-devel@gnu.org; Sun, 15 Apr 2018 14:53:18 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f7mm5-0004y0-3d for emacs-devel@gnu.org; Sun, 15 Apr 2018 14:53:18 -0400 Original-Received: from zimbra.cs.ucla.edu ([131.179.128.68]:58046) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1f7mm1-0004sc-Dq; Sun, 15 Apr 2018 14:53:13 -0400 Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 475FE161642; Sun, 15 Apr 2018 11:53:10 -0700 (PDT) Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id bkPb20rucevB; Sun, 15 Apr 2018 11:53:09 -0700 (PDT) Original-Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 149EC161684; Sun, 15 Apr 2018 11:53:09 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Original-Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id RCsY0tdDFO6X; Sun, 15 Apr 2018 11:53:08 -0700 (PDT) Original-Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id EF0A9161642; Sun, 15 Apr 2018 11:53:08 -0700 (PDT) In-Reply-To: <87o9il0wka.fsf@gmail.com> Content-Language: en-US X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 131.179.128.68 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:224624 Archived-At: On 04/15/2018 12:15 AM, Chen Bin wrote: > As 'org-babel-edit-distance' documented, it will "Return the edit > (levenshtein) distance between strings S1 S2". So the problem here is to > calculate*Levenshtein distance*. First, I doubt whether the callers care whether the code computes Levenshtein distance, LCS distance, or some other reasonable string-distance measure. Second, the Myers-Ukkonen algorithm does compute Levenshtein distance; see, for example: Papamichail D, Papamichail G. Improved algorithms for approximate string matching. BMC Bioinformatics. 2009; 10(Suppl 1): S10. https://dx.doi.org/10.1186/1471-2105-10-S1-S10 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648743/ I don't offhand know whether diffseq.h uses the original Myers-Ukkonen algorithm or one of Myers's variations with a different distance measure, but if it's the latter and if users really care then we should be able to change the algorithm to match the requirements.