unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: Mark H Weaver <mhw@netris.org>
To: Danny Milosavljevic <dannym@scratchpost.org>
Cc: 30820@debbugs.gnu.org
Subject: bug#30820: Chunked store references in compiled code break grafting (again)
Date: Mon, 19 Mar 2018 21:04:09 -0400	[thread overview]
Message-ID: <87efkfblrq.fsf@netris.org> (raw)
In-Reply-To: <20180319223402.1ba0a369@scratchpost.org> (Danny Milosavljevic's message of "Mon, 19 Mar 2018 22:34:02 +0100")

Danny Milosavljevic <dannym@scratchpost.org> writes:

> On Mon, 19 Mar 2018 15:05:26 -0400
> Mark H Weaver <mhw@netris.org> wrote:
>
>> I think that we should generalize our reference scanning and grafting
>> code to support store references broken into pieces, as long as each
>> piece containing part of the hash is at least 8 bytes long.
>
> I don't think it's possible to get that to work reliably over all possible
> optimizations.  I mean why 8 Byte?  That's probably because of the
> 8 Byte registers on x86_64.
>
> What would happen on ARM? (32 bit registers)?
>
> And on MIPS (immediates only smaller than 32 bits)?

It wouldn't make sense to do this kind of optimization with 4-byte
chunks, because the overhead for the instructions would be too large to
justify it.  In any case, I've not seen any reports of any compiler
generating code like this with 4-byte chunks.

> What if gcc finds some repetition in the hash and decides not to load
> the register again the second time?

The probability of that happening with 8-byte chunks is on the order of
1 in 10^14, even if the GCC developers implemented such a thing.

> I think the best way - since we have control over the entire toolchain anyway -
> is to make gcc not do the data inlining in the first place.
>
> As far as I understand the gcc patches work after all.  Ludo just made
> a mistake testing it.
>
> Although when we do this patching of gcc we should add a test to gcc
> that makes sure that the data inlining is indeed not there anymore
> for store paths.
>
>> Here's my preliminary proposal:
>> 
>> (1) The reference scanner should recognize any 8-byte substring of a
>>     hash as a valid reference to that hash.
>> 
>> (2) To enable reliable grafting of chunked references, we should impose
>>     the following new restrictions: (a) the store prefix must be at
>>     least 6 bytes, (b) grafting can change only the hash, not the
>>     readable part of the store name, and (c) the readable part of the
>>     store name must be at least 6 bytes.
>> 
>> (3) The grafter should recognize and replace any 8-byte subsequence of
>>     the absolute store file name.
>> 
>> The rationale for the restrictions is to ensure that any byte that needs
>> to be modified by the grafter should be part of an 8-byte substring of
>> the absolute store file name.  This requires that there be at least 7
>> bytes of known text before the first changed byte and after the last
>> changed byte.  This is needed to provide a reasonable upper bound on the
>> probability of grafting a matching sequence of bytes that is not a store
>> reference.
>
> I think that is a good way to get the risk down - but it will not actually
> fix the problem for good.

Nothing that we do will ever fix this problem for good.  Don't pretend
that patching GCC would fix the problem for good.  There are problems in
other software as well (e.g. in JAR manifests), and we already patched
GCC once, and it broke some time later without anyone noticing.

> Also, complexity like the above makes the reference scanner more brittle and
> it's easier for bugs to hide in it (and it's slower).

I think it could be implemented about as fast in practice, although it
would require more memory.  The hash table would be bigger, containing
all 8-byte substrings of the hashes.

As for bugs in the reference scanner: it's still simple enough that it
could be formally proved correct without much difficulty.

       Mark

  parent reply	other threads:[~2018-03-20  1:06 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-14 15:47 bug#30820: Chunked store references in compiled code break grafting (again) Ludovic Courtès
2018-03-14 17:24 ` Ludovic Courtès
2018-03-16  8:54   ` bug#30395: " Ludovic Courtès
2018-03-20 23:07     ` Ludovic Courtès
2018-03-21  6:39       ` Ricardo Wurmus
2018-03-21 20:59         ` Ludovic Courtès
2018-03-19 21:22   ` Danny Milosavljevic
2018-03-19 22:29     ` Ludovic Courtès
2018-03-19 19:05 ` Mark H Weaver
2018-03-19 19:16   ` Mark H Weaver
2018-03-19 21:34   ` Danny Milosavljevic
2018-03-19 22:27     ` Ludovic Courtès
2018-03-20  1:04     ` Mark H Weaver [this message]
2018-03-20  8:50       ` Ludovic Courtès
2018-03-19 22:34   ` Ludovic Courtès
2018-03-20  0:52     ` Mark H Weaver
2018-03-20  8:56       ` Ludovic Courtès
2018-03-21  4:17         ` Mark H Weaver
2018-03-21  5:43   ` Mark H Weaver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87efkfblrq.fsf@netris.org \
    --to=mhw@netris.org \
    --cc=30820@debbugs.gnu.org \
    --cc=dannym@scratchpost.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).