>>>>> "Eli" == Eli Zaretskii <eliz@gnu.org> writes:

    Eli> [Please CC the mailing list when you respond, so others could
    Eli> see your messages.]

    >> From: Chen Bin <chenbin.sh@gmail.com> Date: Sat, 14 Apr 2018
    >> 21:01:46 +1000
    >> 
    >> Hi, Eli, Thanks for the review.
    >> 
    >> I fixed most issues except two things.
    >> 
    >> 1. In Asia, it's possible to cacluate distance between one
    >> unibyte and one multibyte string. As a Chinese, I might create a
    >> document containing Hanzi characters whose file name is obviously
    >> multibyte string. I may need get the distance of this document to
    >> a file named "README.txt".

    Eli> If you mean unibyte pure-ASCII strings, then I agree.  But that
    Eli> doesn't mean we should avoid the text altogether, because we
    Eli> might compute non-zero distance between a string and its
    Eli> encoded unibyte variant, which will confuse users.  At the very
    Eli> least the doc string should say something about this.

    >> 2. Algorithm is based on
    >> https://en.wikibooks.org/w/index.php?title=Algorithm_Implementation/Strings/Levenshtein_distance&stable=0#C
    >> It's optimized to use O(min(m,n)) space instead of O(mn).  Say
    >> you compare two string whose string length is 512 bytes.  You
    >> only need allocate 512 bytes instead of 262K (512*512) in memory.
    >> 
    >> Please check attached patch for latest code.
    >> 
    >> --- a/etc/NEWS +++ b/etc/NEWS @@ -463,6 +463,8 @@
    >> x-lost-selection-hooks, x-sent-selection-hooks +++ ** New
    >> function assoc-delete-all.
    >> 
    >> +** New function string-distance.

    Eli> This should mention Levenshtein distance.

    >> +DEFUN ("string-distance", Fstring_distance, Sstring_distance, 2,
    >> 2, 0, + doc: /* Return Levenshtein distance of STRING1 and
    >> STRING2.
    Eli>                                               ^^^^^^^^^^^^^^^^^^^^^^
    Eli> "between STRING1 and STRING2"

    >> + unsigned int s1len, s2len, x, y, lastdiag, olddiag;

    Eli> These variables should be declared EMACS_INT, not unsigned int,
    Eli> because the size of Emacs strings can be larger than UINT_MAX,
    Eli> especially on 64-bit systems.

    >> + unsigned int *column = SAFE_ALLOCA ((s1len + 1) * sizeof
    >> (unsigned int));

    Eli> Likewise here.

    >> + char *s1 = SSDATA (string1); + char *s2 = SSDATA (string2); + +
    >> unsigned int s1len, s2len, x, y, lastdiag, olddiag; + s1len =
    >> strlen(s1); + s2len = strlen(s2);

    Eli> You could optimize the code by using SCHARS and SBYTES, instead
    Eli> of calling strlen.

    >> +(ert-deftest subr-tests--string-distance () + "Test
    >> `string-distance' behavior."  + (should (equal 1 (string-distance
    >> "heelo" "hello"))) + (should (equal 2 (string-distance "aeelo"
    >> "hello"))) + (should (equal 0 (string-distance "ab" "ab"))) +
    >> (should (equal 1 (string-distance "ab" "abc"))))

    Eli> Could you please add a test or two with non-ASCII characters?

    Eli> Thanks.

-- 
Best Regards,
Chen Bin

--
Help me, help you