From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark H Weaver Subject: bug#30820: Chunked store references in compiled code break grafting (again) Date: Mon, 19 Mar 2018 21:04:09 -0400 Message-ID: <87efkfblrq.fsf@netris.org> References: <87o9jq7j7r.fsf@gnu.org> <87muz3dgy1.fsf@netris.org> <20180319223402.1ba0a369@scratchpost.org> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:50938) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ey5j4-0000V9-4c for bug-guix@gnu.org; Mon, 19 Mar 2018 21:06:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ey5j0-0007nS-UK for bug-guix@gnu.org; Mon, 19 Mar 2018 21:06:06 -0400 Received: from debbugs.gnu.org ([208.118.235.43]:34515) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1ey5j0-0007nA-Pd for bug-guix@gnu.org; Mon, 19 Mar 2018 21:06:02 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <20180319223402.1ba0a369@scratchpost.org> (Danny Milosavljevic's message of "Mon, 19 Mar 2018 22:34:02 +0100") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Danny Milosavljevic Cc: 30820@debbugs.gnu.org Danny Milosavljevic writes: > On Mon, 19 Mar 2018 15:05:26 -0400 > Mark H Weaver wrote: > >> I think that we should generalize our reference scanning and grafting >> code to support store references broken into pieces, as long as each >> piece containing part of the hash is at least 8 bytes long. > > I don't think it's possible to get that to work reliably over all possible > optimizations. I mean why 8 Byte? That's probably because of the > 8 Byte registers on x86_64. > > What would happen on ARM? (32 bit registers)? > > And on MIPS (immediates only smaller than 32 bits)? It wouldn't make sense to do this kind of optimization with 4-byte chunks, because the overhead for the instructions would be too large to justify it. In any case, I've not seen any reports of any compiler generating code like this with 4-byte chunks. > What if gcc finds some repetition in the hash and decides not to load > the register again the second time? The probability of that happening with 8-byte chunks is on the order of 1 in 10^14, even if the GCC developers implemented such a thing. > I think the best way - since we have control over the entire toolchain anyway - > is to make gcc not do the data inlining in the first place. > > As far as I understand the gcc patches work after all. Ludo just made > a mistake testing it. > > Although when we do this patching of gcc we should add a test to gcc > that makes sure that the data inlining is indeed not there anymore > for store paths. > >> Here's my preliminary proposal: >> >> (1) The reference scanner should recognize any 8-byte substring of a >> hash as a valid reference to that hash. >> >> (2) To enable reliable grafting of chunked references, we should impose >> the following new restrictions: (a) the store prefix must be at >> least 6 bytes, (b) grafting can change only the hash, not the >> readable part of the store name, and (c) the readable part of the >> store name must be at least 6 bytes. >> >> (3) The grafter should recognize and replace any 8-byte subsequence of >> the absolute store file name. >> >> The rationale for the restrictions is to ensure that any byte that needs >> to be modified by the grafter should be part of an 8-byte substring of >> the absolute store file name. This requires that there be at least 7 >> bytes of known text before the first changed byte and after the last >> changed byte. This is needed to provide a reasonable upper bound on the >> probability of grafting a matching sequence of bytes that is not a store >> reference. > > I think that is a good way to get the risk down - but it will not actually > fix the problem for good. Nothing that we do will ever fix this problem for good. Don't pretend that patching GCC would fix the problem for good. There are problems in other software as well (e.g. in JAR manifests), and we already patched GCC once, and it broke some time later without anyone noticing. > Also, complexity like the above makes the reference scanner more brittle and > it's easier for bugs to hide in it (and it's slower). I think it could be implemented about as fast in practice, although it would require more memory. The hash table would be bigger, containing all 8-byte substrings of the hashes. As for bugs in the reference scanner: it's still simple enough that it could be formally proved correct without much difficulty. Mark