unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Manuel Giraud via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: contovob@tcd.ie, 61394@debbugs.gnu.org
Subject: bug#61394: 30.0.50; [PATCH] Image-dired thumb name based on content
Date: Sat, 11 Feb 2023 13:30:48 +0100	[thread overview]
Message-ID: <87k00oo03r.fsf@ledu-giraud.fr> (raw)
In-Reply-To: <83ilg8jzti.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 11 Feb 2023 11:50:33 +0200")

Eli Zaretskii <eliz@gnu.org> writes:

>> Cc: 61394@debbugs.gnu.org
>> Date: Fri, 10 Feb 2023 19:46:02 +0100
>> From:  Manuel Giraud via "Bug reports for GNU Emacs,
>>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
>> 
>> +(defun image-dired-content-sha1 (filename)
>> +  "Compute the SHA-1 of a part of FILENAME."
>
> Not "part of FILENAME", but "the first 4KiB of FILENAME's contents".

Yes, I'll fix that.

> Btw, using only the first 4KiB would mean a collision is still
> possible, albeit rarely, right?  So your use case of having all the
> thumbnails in the same directory could sometimes fail, right?

The 4KiB was "quite large but not so much" guess.  I've made tests with
the following code:

--8<---------------cut here---------------start------------->8---
(defun sha1-test (filename size)
  (with-temp-buffer
    (insert-file-contents-literally filename nil 0 size)
    (sha1 (current-buffer))))

;; From 1KiB to 64KiB
(list
 (benchmark-run-compiled 1000
   (sha1-test "/tmp/a-5MiB-photo.jpg" (expt 2 10)))
 (benchmark-run-compiled 1000
   (sha1-test "/tmp/a-5MiB-photo.jpg" (expt 2 11)))
 (benchmark-run-compiled 1000
   (sha1-test "/tmp/a-5MiB-photo.jpg" (expt 2 12)))
 (benchmark-run-compiled 1000
   (sha1-test "/tmp/a-5MiB-photo.jpg" (expt 2 13)))
 (benchmark-run-compiled 1000
   (sha1-test "/tmp/a-5MiB-photo.jpg" (expt 2 14)))
 (benchmark-run-compiled 1000
   (sha1-test "/tmp/a-5MiB-photo.jpg" (expt 2 15)))
 (benchmark-run-compiled 1000
   (sha1-test "/tmp/a-5MiB-photo.jpg" (expt 2 16))))
--8<---------------cut here---------------end--------------->8---

And here are the results on my machine:
((0.664336771 1 0.14466495299998883)
 (0.707937024 2 0.28811983400001395)
 (0.940229304 3 0.44037704100000497) ;; <- 4KiB
 (1.672118528 4 0.7672738199999856)
 (2.6194289370000003 6 1.046699996000001)
 (3.169999951 11 1.5916382949999957)
 (6.547043287 21 3.195145416999992))

So this 4KiB seems practical: about 1 second for one thousand run.
WDYT?

About collision, my wild guess here is that, as we are considering
images, most of the modifications on these images we'll have an impact
on those first 4KiB anyway.  But you're that collision is still possible
and the thumb could be wrong.  I'll try to find out what is the
probability of a SHA-1 collision on 4KiB of data.

>> +  (with-temp-buffer
>> +    (insert-file-contents filename nil 0 4096)
>
> Please use insert-file-contents-literally here.  It should be much
> faster, and we only care about the file's bytestream anyway.

Thanks, I'll do that too.

>>  (defun image-dired-thumb-name (file)
>>    "Return absolute file name for thumbnail FILE.
>>  Depending on the value of `image-dired-thumbnail-storage', the
>>  file name of the thumbnail will vary:
>> -- For `use-image-dired-dir', make a SHA1-hash of the image file's
>> -  directory name and add that to make the thumbnail file name
>> -  unique.
>> +- For `image-dired', make a SHA1-hash of some of the image file.
>>  - For `per-directory' storage, just add a subdirectory.
>>  - For `standard' storage, produce the file name according to the
>>    Thumbnail Managing Standard.  Among other things, an MD5-hash
>
> This doc string "needs work".  Could you please fix it as part of the
> patch, even though most of the problems are not due to this patch?  In
> any case, please either say here that only the first 4KiB of the file's
> contents are SHA1-hashed or include a link to the new function.

You're right it is even not complete for `per-directory'.  I could try
to come up with a fix.  Thanks.
-- 
Manuel Giraud





  reply	other threads:[~2023-02-11 12:30 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-09 19:06 bug#61394: 30.0.50; [PATCH] Image-dired thumb name based on content Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-10 15:13 ` Basil Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-10 18:46   ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-11  9:50     ` Eli Zaretskii
2023-02-11 12:30       ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors [this message]
2023-02-11 14:53         ` Eli Zaretskii
2023-02-11 22:33           ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-11 23:06           ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-12  3:02             ` Stefan Kangas
2023-02-12 21:53               ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 14:19                 ` Stefan Kangas
2023-02-15 15:35                   ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-19 14:06                     ` Stefan Kangas
2023-02-19 14:43                       ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-19 16:19                         ` Stefan Kangas
2023-02-20  9:20                           ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-25 18:45                           ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-07-26 19:18                             ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-07-27  7:04                               ` Eli Zaretskii
2023-07-27 13:52                                 ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-07-27 14:16                                   ` Eli Zaretskii
2023-07-27 21:30                                     ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-07-28  6:55                                       ` Eli Zaretskii
2023-07-28  9:33                                         ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-07-28 12:20                                           ` Eli Zaretskii
2023-07-28 16:00                                             ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-07-28 18:44                                               ` Eli Zaretskii
2023-07-29  9:51                                                 ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-07-29 10:34                                                   ` Eli Zaretskii
2023-07-29 16:50                                                     ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-08-03  8:43                                                       ` Eli Zaretskii
2023-07-29 10:47                                                   ` Michael Albinus
2023-07-31 15:53                                                   ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-08-01 17:05                                                     ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-08-02 11:42                                                       ` Eli Zaretskii
2023-08-02 18:00                                                         ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-08-02 18:16                                                           ` Eli Zaretskii
2023-08-03 11:10                                                             ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-08-03 11:38                                                               ` Eli Zaretskii
2023-08-03 16:51                                                                 ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-08-03 18:30                                                                   ` Eli Zaretskii
2023-08-04  7:44                                                                     ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-08-04 10:55                                                                       ` Eli Zaretskii
2023-08-04 13:37                                                                         ` Manuel Giraud via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-08-04 14:05                                                                           ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k00oo03r.fsf@ledu-giraud.fr \
    --to=bug-gnu-emacs@gnu.org \
    --cc=61394@debbugs.gnu.org \
    --cc=contovob@tcd.ie \
    --cc=eliz@gnu.org \
    --cc=manuel@ledu-giraud.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).