From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id KHMIND1PB2BiUwAA0tVLHw (envelope-from ) for ; Tue, 19 Jan 2021 21:29:33 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id WA7xLz1PB2BPIAAAbx9fmQ (envelope-from ) for ; Tue, 19 Jan 2021 21:29:33 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 2F037940341 for ; Tue, 19 Jan 2021 21:29:33 +0000 (UTC) Received: from localhost ([::1]:49582 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1l1yZA-00055e-4P for larch@yhetil.org; Tue, 19 Jan 2021 16:29:32 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:40332) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1l1yYh-0004go-2f for guix-patches@gnu.org; Tue, 19 Jan 2021 16:29:03 -0500 Received: from debbugs.gnu.org ([209.51.188.43]:40398) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1l1yYg-0008MD-RB for guix-patches@gnu.org; Tue, 19 Jan 2021 16:29:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1l1yYg-0005ow-Oo for guix-patches@gnu.org; Tue, 19 Jan 2021 16:29:02 -0500 X-Loop: help-debbugs@gnu.org Subject: [bug#45893] [PATCH v3 1/3] utils: Add string distance. References: <20210115163732.53665-1-zimon.toutoune@gmail.com> In-Reply-To: <20210115163732.53665-1-zimon.toutoune@gmail.com> Resent-From: zimoun Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Tue, 19 Jan 2021 21:29:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 45893 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: patch To: 45893@debbugs.gnu.org Received: via spool by 45893-submit@debbugs.gnu.org id=B45893.161109171422322 (code B ref 45893); Tue, 19 Jan 2021 21:29:02 +0000 Received: (at 45893) by debbugs.gnu.org; 19 Jan 2021 21:28:34 +0000 Received: from localhost ([127.0.0.1]:51942 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1l1yYD-0005nr-TB for submit@debbugs.gnu.org; Tue, 19 Jan 2021 16:28:34 -0500 Received: from mail-wr1-f52.google.com ([209.85.221.52]:44971) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1l1yY7-0005nB-0P for 45893@debbugs.gnu.org; Tue, 19 Jan 2021 16:28:30 -0500 Received: by mail-wr1-f52.google.com with SMTP id w5so21112716wrm.11 for <45893@debbugs.gnu.org>; Tue, 19 Jan 2021 13:28:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=bvEA+JM8FF2+1rpkio+/nDOgmiys3kqOHjH7bNLroAQ=; b=cYoDO/oRpe2oMtspohAX5dTZkCg4rVGZBtygXrfvNs3PxNGtXukelN130Si1TENLhi 1SP2HZPM2YLTZYscMxD5ff+E/MEtuJ8dILvn81ZlosxwHxaz6ycS9VxVnoMbnxKqmyEz ZWa2UqkLnT8eHdvNYDjCWXCorZw3e7larkm1Y8jfG1dO3wg9je0yyhc9MP3G6fEpDS/E pLVqbnYAom3uaZr3p55Vm8K2RYqaWbKncGk+06JUw3H/vKRQwprl6PhUytd46ICZI26B vtD/h4WBL8OgTSYqe109xvNtTLZRuyByD/5TNuh6Cn1ZAcDtxxtT1AAwauWcqE+N/Qd5 P3NQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=bvEA+JM8FF2+1rpkio+/nDOgmiys3kqOHjH7bNLroAQ=; b=moKxQfiaTdNWU+ctVB/A9vKHOh+aqTee08zTv2J04JaLT3Q53FibtIquJj7t2psB5Y vgrpYXdgRWEFMBBIMr78LdNitGzjCy67TOQaUsM/ZMggMmdq7K7hD0AzN7Gbs7lZWBEk dX6wT5HQlA08DLj1YAEKJHiVf2DBGYsPRcqqKA292xjIjL0+tOcgefjzi+A9PhTF8ENN EApPmdwo3WXg1+OhfC2tHyRS7v/IO+XADu44QYJ6o2baVxiO4bFCwpZS7iS2hteZnPwp u6FatNw+hlaPnjdlGr2DtK2vFetiMhyrZeNIBei90VkuNMm3Jc78MCPHJEYGLdt2WkFM 2U+w== X-Gm-Message-State: AOAM533SzTOEJaWk0y0Ma8CK4qGroiSkylEKRcTUUJ4eK5clx5JIuXAI R588OwhO0JJmpCqvHrjfE5VbFYs2em8= X-Google-Smtp-Source: ABdhPJzoa4c/EdwJ9/+NcpmWRVvVqXqGOvlfAvN73H7wgWG/Clp75nxvG8W6SNPsEtlMfB0arrzsFQ== X-Received: by 2002:adf:d0d0:: with SMTP id z16mr6212835wrh.209.1611091700987; Tue, 19 Jan 2021 13:28:20 -0800 (PST) Received: from lili.univ-paris-diderot.fr ([2a01:e0a:59b:9120:65d2:2476:f637:db1e]) by smtp.gmail.com with ESMTPSA id w8sm37844239wrl.91.2021.01.19.13.28.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Jan 2021 13:28:20 -0800 (PST) From: zimoun Date: Tue, 19 Jan 2021 22:28:08 +0100 Message-Id: <20210119212810.20681-1-zimon.toutoune@gmail.com> X-Mailer: git-send-email 2.29.2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+larch=yhetil.org@gnu.org Sender: "Guix-patches" X-Migadu-Flow: FLOW_IN X-Migadu-Spam-Score: -1.25 Authentication-Results: aspmx1.migadu.com; dkim=fail ("body hash did not verify") header.d=gmail.com header.s=20161025 header.b="cYoDO/oR"; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of guix-patches-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-patches-bounces@gnu.org X-Migadu-Queue-Id: 2F037940341 X-Spam-Score: -1.25 X-Migadu-Scanner: scn0.migadu.com X-TUID: Q3MNjRUV6hCg * guix/utils.scm (string-distance): New procedure. (string-closest): New procedure. * tests/utils.scm: Test it. --- guix/utils.scm | 47 ++++++++++++++++++++++++++++++++++++++++++++++- tests/utils.scm | 18 ++++++++++++++++++ 2 files changed, 64 insertions(+), 1 deletion(-) diff --git a/guix/utils.scm b/guix/utils.scm index f8b05e7e80..dc2259ef8c 100644 --- a/guix/utils.scm +++ b/guix/utils.scm @@ -8,6 +8,7 @@ ;;; Copyright © 2017 Mathieu Othacehe ;;; Copyright © 2018, 2020 Marius Bakke ;;; Copyright © 2020 Efraim Flashner +;;; Copyright © 2021 Simon Tournier ;;; ;;; This file is part of GNU Guix. ;;; @@ -37,6 +38,7 @@ #:use-module (guix memoization) #:use-module ((guix build utils) #:select (dump-port mkdir-p delete-file-recursively)) #:use-module ((guix build syscalls) #:select (mkdtemp! fdatasync)) + #:use-module ((guix combinators) #:select (fold2)) #:use-module (guix diagnostics) ;, &error-location, etc. #:use-module (ice-9 format) #:use-module (ice-9 regex) @@ -114,7 +116,10 @@ call-with-decompressed-port compressed-output-port call-with-compressed-output-port - canonical-newline-port)) + canonical-newline-port + + string-distance + string-closest)) ;;; @@ -847,6 +852,46 @@ be determined." ;; raising an error would upset Geiser users #f)))))) + +;;; +;;; String comparison. +;;; + +(define (string-distance s1 s2) + "Compute the Levenshtein distance between two strings." + ;; Naive implemenation + (define loop + (mlambda (as bt) + (match as + (() (length bt)) + ((a s ...) + (match bt + (() (length as)) + ((b t ...) + (if (char=? a b) + (loop s t) + (1+ (min + (loop as t) + (loop s bt) + (loop s t)))))))))) + + (let ((c1 (string->list s1)) + (c2 (string->list s2))) + (loop c1 c2))) + +(define* (string-closest trial tests #:key (threshold 3)) + "Return the string from TESTS that is the closest from the TRIAL, +according to 'string-distance'. If the TESTS are too far from TRIAL, +according to THRESHOLD, then #f is returned." + (identity ;discard second return value + (fold2 (lambda (test closest minimal) + (let ((dist (string-distance trial test))) + (if (and (< dist minimal) (< dist threshold)) + (values test dist) + (values closest minimal)))) + #f +inf.0 + tests))) + ;;; Local Variables: ;;; eval: (put 'call-with-progress-reporter 'scheme-indent-function 1) ;;; End: diff --git a/tests/utils.scm b/tests/utils.scm index 9bce446d98..40eaf65bbc 100644 --- a/tests/utils.scm +++ b/tests/utils.scm @@ -2,6 +2,7 @@ ;;; Copyright © 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 Ludovic Courtès ;;; Copyright © 2014 Eric Bavier ;;; Copyright © 2016 Mathieu Lirzin +;;; Copyright © 2021 Simon Tournier ;;; ;;; This file is part of GNU Guix. ;;; @@ -265,6 +266,23 @@ skip these tests." string-reverse) (call-with-input-file temp-file get-string-all))) +(test-equal "string-distance" + '(0 1 1 5 5) + (list + (string-distance "hello" "hello") + (string-distance "hello" "helo") + (string-distance "helo" "hello") + (string-distance "" "hello") + (string-distance "hello" ""))) + +(test-equal "string-closest" + '("hello" "hello" "helo" #f) + (list + (string-closest "hello" '("hello")) + (string-closest "hello" '("helo" "hello" "halo")) + (string-closest "hello" '("kikoo" "helo" "hihihi" "halo")) + (string-closest "hello" '("aaaaa" "12345" "hellohello" "h")))) + (test-end) (false-if-exception (delete-file temp-file)) base-commit: 884f320e7ceb35cb8472510e47fc5f1944675d82 prerequisite-patch-id: 07abf72be0f4db9fbc19cb719d87bc1c69e8479d -- 2.29.2