From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Manuel Giraud via "Bug reports for GNU Emacs, the Swiss army knife of text editors" Newsgroups: gmane.emacs.bugs Subject: bug#61394: 30.0.50; [PATCH] Image-dired thumb name based on content Date: Sat, 11 Feb 2023 13:30:48 +0100 Message-ID: <87k00oo03r.fsf@ledu-giraud.fr> References: <874jruy7xx.fsf@ledu-giraud.fr> <87ttztk0yw.fsf@tcd.ie> <87v8k9s6j9.fsf@ledu-giraud.fr> <83ilg8jzti.fsf@gnu.org> Reply-To: Manuel Giraud Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="37596"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: contovob@tcd.ie, 61394@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Feb 11 13:31:15 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pQp2B-0009az-9d for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 11 Feb 2023 13:31:15 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pQp20-00034i-3f; Sat, 11 Feb 2023 07:31:04 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pQp1z-00034S-FK for bug-gnu-emacs@gnu.org; Sat, 11 Feb 2023 07:31:03 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pQp1y-0003ve-Vy for bug-gnu-emacs@gnu.org; Sat, 11 Feb 2023 07:31:03 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pQp1y-0004aK-F8 for bug-gnu-emacs@gnu.org; Sat, 11 Feb 2023 07:31:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Manuel Giraud Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 11 Feb 2023 12:31:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 61394 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 61394-submit@debbugs.gnu.org id=B61394.167611865817611 (code B ref 61394); Sat, 11 Feb 2023 12:31:02 +0000 Original-Received: (at 61394) by debbugs.gnu.org; 11 Feb 2023 12:30:58 +0000 Original-Received: from localhost ([127.0.0.1]:39141 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pQp1u-0004Zz-3g for submit@debbugs.gnu.org; Sat, 11 Feb 2023 07:30:58 -0500 Original-Received: from ledu-giraud.fr ([51.159.28.247]:44426) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pQp1o-0004Wo-8Q for 61394@debbugs.gnu.org; Sat, 11 Feb 2023 07:30:56 -0500 DKIM-Signature: v=1; a=ed25519-sha256; c=simple/simple; s=ed25519; bh=NNkl4paD FR9LGqBxXWUXA1M9CmLlcwjX7eokDFllVVs=; h=date:references:in-reply-to: subject:cc:to:from; d=ledu-giraud.fr; b=/ntnvdF4rKv26f3rfViOkDUfcKGAMk ecvr2rqv2SoeovxS3vaoXJP2xeXEo9sN8oi8c5C3G1vNSM3fwc6FZvBg== DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; s=rsa; bh=NNkl4paDFR9LGqBx XWUXA1M9CmLlcwjX7eokDFllVVs=; h=date:references:in-reply-to:subject: cc:to:from; d=ledu-giraud.fr; b=lh5bDFA5NTqpxeMMw+mSpzwkJyK/WDN3wyHYHO 5j4pgOaB/P2A1zjKxIFQWzHG4f1ekECt4k2hoDc2JuQ9/xmlLf6LeGZGKQIpdZ08AE8REb HqSwASViB6KEPjA/x+LFg2449lB4bkZIYP8ou9sxdZtLDZCdb0knex1uG99h1XVpLkvwtK fuFPjm+EY4imAp6POP8NEnRTv99w90gPSja/DL5AWHeAUlaWaPNeF8pM/9o3/KdgANtN7/ KvzJRQbYB+OuNKH3Wcnrod7LhdAv22g3+DorF7iceQj9gxeNvMD127U5slxtFl9H7mnJtn PXjQkwFGFpAB77ZqJ2pJYJCw== Original-Received: from computer ( [10.1.1.1]) by ledu-giraud.fr (OpenSMTPD) with ESMTPSA id f0fc1e84 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Sat, 11 Feb 2023 13:30:50 +0100 (CET) In-Reply-To: <83ilg8jzti.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 11 Feb 2023 11:50:33 +0200") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:255330 Archived-At: Eli Zaretskii writes: >> Cc: 61394@debbugs.gnu.org >> Date: Fri, 10 Feb 2023 19:46:02 +0100 >> From: Manuel Giraud via "Bug reports for GNU Emacs, >> the Swiss army knife of text editors" >> >> +(defun image-dired-content-sha1 (filename) >> + "Compute the SHA-1 of a part of FILENAME." > > Not "part of FILENAME", but "the first 4KiB of FILENAME's contents". Yes, I'll fix that. > Btw, using only the first 4KiB would mean a collision is still > possible, albeit rarely, right? So your use case of having all the > thumbnails in the same directory could sometimes fail, right? The 4KiB was "quite large but not so much" guess. I've made tests with the following code: --8<---------------cut here---------------start------------->8--- (defun sha1-test (filename size) (with-temp-buffer (insert-file-contents-literally filename nil 0 size) (sha1 (current-buffer)))) ;; From 1KiB to 64KiB (list (benchmark-run-compiled 1000 (sha1-test "/tmp/a-5MiB-photo.jpg" (expt 2 10))) (benchmark-run-compiled 1000 (sha1-test "/tmp/a-5MiB-photo.jpg" (expt 2 11))) (benchmark-run-compiled 1000 (sha1-test "/tmp/a-5MiB-photo.jpg" (expt 2 12))) (benchmark-run-compiled 1000 (sha1-test "/tmp/a-5MiB-photo.jpg" (expt 2 13))) (benchmark-run-compiled 1000 (sha1-test "/tmp/a-5MiB-photo.jpg" (expt 2 14))) (benchmark-run-compiled 1000 (sha1-test "/tmp/a-5MiB-photo.jpg" (expt 2 15))) (benchmark-run-compiled 1000 (sha1-test "/tmp/a-5MiB-photo.jpg" (expt 2 16)))) --8<---------------cut here---------------end--------------->8--- And here are the results on my machine: ((0.664336771 1 0.14466495299998883) (0.707937024 2 0.28811983400001395) (0.940229304 3 0.44037704100000497) ;; <- 4KiB (1.672118528 4 0.7672738199999856) (2.6194289370000003 6 1.046699996000001) (3.169999951 11 1.5916382949999957) (6.547043287 21 3.195145416999992)) So this 4KiB seems practical: about 1 second for one thousand run. WDYT? About collision, my wild guess here is that, as we are considering images, most of the modifications on these images we'll have an impact on those first 4KiB anyway. But you're that collision is still possible and the thumb could be wrong. I'll try to find out what is the probability of a SHA-1 collision on 4KiB of data. >> + (with-temp-buffer >> + (insert-file-contents filename nil 0 4096) > > Please use insert-file-contents-literally here. It should be much > faster, and we only care about the file's bytestream anyway. Thanks, I'll do that too. >> (defun image-dired-thumb-name (file) >> "Return absolute file name for thumbnail FILE. >> Depending on the value of `image-dired-thumbnail-storage', the >> file name of the thumbnail will vary: >> -- For `use-image-dired-dir', make a SHA1-hash of the image file's >> - directory name and add that to make the thumbnail file name >> - unique. >> +- For `image-dired', make a SHA1-hash of some of the image file. >> - For `per-directory' storage, just add a subdirectory. >> - For `standard' storage, produce the file name according to the >> Thumbnail Managing Standard. Among other things, an MD5-hash > > This doc string "needs work". Could you please fix it as part of the > patch, even though most of the problems are not due to this patch? In > any case, please either say here that only the first 4KiB of the file's > contents are SHA1-hashed or include a link to the new function. You're right it is even not complete for `per-directory'. I could try to come up with a fix. Thanks. -- Manuel Giraud