unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* bug#16361: compile cache confused about file identity
@ 2014-01-05 23:08 Zefram
  2014-10-01 19:22 ` Mark H Weaver
  0 siblings, 1 reply; 3+ messages in thread
From: Zefram @ 2014-01-05 23:08 UTC (permalink / raw)
  To: 16361

The automatic cache of compiled versions of scripts in guile-2.0.9
identifies scripts mainly by name, and partially by mtime.  This is not
actually sufficient: it is easily misled by a pathname that refers to
different files at different times.  Test case:

$ echo '(display "aaa\n")' >t13
$ echo '(display "bbb\n")' >t14
$ guile-2.0 t13
;;; note: auto-compilation is enabled, set GUILE_AUTO_COMPILE=0
;;;       or pass the --no-auto-compile argument to disable.
;;; compiling /home/zefram/usr/guile/t13
;;; compiled /home/zefram/.cache/guile/ccache/2.0-LE-8-2.0/home/zefram/usr/guile/t13.go
aaa
$ mv t14 t13
$ guile-2.0 t13
aaa

You can see that the mtime is not fully used here: the cache is misapplied
even if there is a delay of seconds between the creations of the two
script files.  The cache's mtime check will only notice a mismatch if
the script currently seen under the supplied name was modified later
than when the previous script was *compiled*.

Obviously, in this test case the cache could trivially distinguish the
two script files by looking at the inode numbers.  On its own the inode
number isn't sufficient, but exact match on device, inode number, and
mtime would be far superior to the current behaviour, only going wrong
in the presence of deliberate timestamp manipulation.  As a bonus, if
the cache were actually *keyed* by inode number and device, rather than
by pathname, it would retain the caching of compilation across renamings
of the script.

Or, even better, the cache could be keyed by a cryptographic hash of the
file contents.  This would be immune even to timestamp manipulation, and
would preserve the cached compilation even across the script being copied
to a fresh file or being edited and reverted.  This would be a cache
worthy of the name.  The only downside is the expense of computing the
hash, but I expect this is small compared to the expense of compilation.

Debian incarnation of this bug report:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=734178

-zefram





^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#16361: compile cache confused about file identity
  2014-01-05 23:08 bug#16361: compile cache confused about file identity Zefram
@ 2014-10-01 19:22 ` Mark H Weaver
  2015-05-13 11:07   ` Zefram
  0 siblings, 1 reply; 3+ messages in thread
From: Mark H Weaver @ 2014-10-01 19:22 UTC (permalink / raw)
  To: Zefram; +Cc: 16361, request

tags 16361 + notabug wontfix
close 16361
thanks

Zefram <zefram@fysh.org> writes:

> The automatic cache of compiled versions of scripts in guile-2.0.9
> identifies scripts mainly by name, and partially by mtime.  This is not
> actually sufficient: it is easily misled by a pathname that refers to
> different files at different times.  Test case:
>
> $ echo '(display "aaa\n")' >t13
> $ echo '(display "bbb\n")' >t14
> $ guile-2.0 t13
> ;;; note: auto-compilation is enabled, set GUILE_AUTO_COMPILE=0
> ;;;       or pass the --no-auto-compile argument to disable.
> ;;; compiling /home/zefram/usr/guile/t13
> ;;; compiled /home/zefram/.cache/guile/ccache/2.0-LE-8-2.0/home/zefram/usr/guile/t13.go
> aaa
> $ mv t14 t13
> $ guile-2.0 t13
> aaa
>
> You can see that the mtime is not fully used here: the cache is misapplied
> even if there is a delay of seconds between the creations of the two
> script files.  The cache's mtime check will only notice a mismatch if
> the script currently seen under the supplied name was modified later
> than when the previous script was *compiled*.
>
> Obviously, in this test case the cache could trivially distinguish the
> two script files by looking at the inode numbers.  On its own the inode
> number isn't sufficient, but exact match on device, inode number, and
> mtime would be far superior to the current behaviour, only going wrong
> in the presence of deliberate timestamp manipulation.  As a bonus, if
> the cache were actually *keyed* by inode number and device, rather than
> by pathname, it would retain the caching of compilation across renamings
> of the script.
>
> Or, even better, the cache could be keyed by a cryptographic hash of the
> file contents.  This would be immune even to timestamp manipulation, and
> would preserve the cached compilation even across the script being copied
> to a fresh file or being edited and reverted.  This would be a cache
> worthy of the name.  The only downside is the expense of computing the
> hash, but I expect this is small compared to the expense of compilation.

You could make the same complaint about 'make', 'rsync', or any number
of other programs.  It's true that a cryptographic hash would be more
robust, but it would also be considerably more expensive in the common
case where the .go file is already in the cache.

I don't think it's worth paying this cost every time a .go file is
loaded, to guard against the unlikely scenario you outlined above.

The mtime check is very widely used, and accepted practice.

I'm closing this ticket.

      Mark





^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#16361: compile cache confused about file identity
  2014-10-01 19:22 ` Mark H Weaver
@ 2015-05-13 11:07   ` Zefram
  0 siblings, 0 replies; 3+ messages in thread
From: Zefram @ 2015-05-13 11:07 UTC (permalink / raw)
  To: 16361

Mark H Weaver wrote:
>You could make the same complaint about 'make', 'rsync', or any number
>of other programs.

Not really.  make does use this type of freshness check, but it's used
in a specific situation where the freshness issue is immediately obvious
and is part of the program's visible primary concern.  That's quite
unlike guile's compile cache, which as the name suggests is a cache.
It's meant to be unobtrusive, and the cache semantics are not a direct
part of the transaction that is ostensibly taking place, of running
a program that happens to be written in Scheme.  Those circumstances,
of running an arbitrary program, are much broader than circumstances in
which make's freshness checks become relevant.  make also gets a pass
from having always worked this way, whereas guile used to not cache
compilations.  rsync, by contrast, does not use this type of freshness
checking; I believe it uses a hash mechanism.

>                    It's true that a cryptographic hash would be more
>robust, but it would also be considerably more expensive in the common
>case where the .go file is already in the cache.
>
>I don't think it's worth paying this cost every time

OK, you can rule that suggestion out, but I think you have erred in
jumping from that to wontfix on the general problem.  You have not
addressed my prior suggestion of identifying programs by exact match on
device, inode number, and mtime.  (File size could also be included.)
This freshness check is very cheap, because it's just a few fixed-size
fields from the stat structure, and you're already necessarily doing a
stat on the program file.  Using the identifying fields as the cache
key even saves you a stat on the cached file.  Although not quite as
effective as a hash comparison, it would be a huge practical improvement
over the current filename-and-inexact-mtime comparison.

-zefram





^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-05-13 11:07 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-05 23:08 bug#16361: compile cache confused about file identity Zefram
2014-10-01 19:22 ` Mark H Weaver
2015-05-13 11:07   ` Zefram

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).