From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark H Weaver Subject: Re: Trustworthiness of build farms (was Re: CDN performance) Date: Sat, 05 Jan 2019 12:16:25 -0500 Message-ID: <87pntbkosb.fsf@netris.org> References: <87va34ywci.fsf@ITSx01.pdp10.guru> Mime-Version: 1.0 Content-Type: text/plain Return-path: Received: from eggsout.gnu.org ([209.51.188.92]:46881 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gfpZy-0005Zu-Ai for guix-devel@gnu.org; Sat, 05 Jan 2019 12:17:47 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gfpZs-0006E2-Lg for guix-devel@gnu.org; Sat, 05 Jan 2019 12:17:45 -0500 Received: from world.peace.net ([64.112.178.59]:54564) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gfpZq-0006CB-UH for guix-devel@gnu.org; Sat, 05 Jan 2019 12:17:39 -0500 In-Reply-To: <87va34ywci.fsf@ITSx01.pdp10.guru> (Jeremiah's message of "Fri, 04 Jan 2019 20:57:33 +0000") List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: Jeremiah@pdp10.guru Cc: guix-devel@gnu.org Hi Jeremiah, Jeremiah@pdp10.guru writes: >>>> To truly solve that problem, we need bug-free compilers. >>> Impossible for all but the simplest of languages as the complexity of >>> implementing a compiler/assembler/interpreter is ln(c)+a but the >>> complexity of implementing a bug-free compiler/assembler/interpreter >>> is (e^(c))! - a. Where a is the complexity cost of supporting it's >>> host architecture. >> >> Where are you getting those complexity expressions from? > Approximation of developer effort spent on single pass workflows and > bugfree libraries in the State of Michigan Welfware Eligibility System > extracted from it's ClearCase commit history. (Thank god, I finally got > them to convert to git after 3 years of wailing and gnashing of teeth) > >> Can you cite references to back them up? > I can site the numbers I used to generate those approximate complexity equations > >> If not, can you explain how you arrived at them? > Simple graphing and curve matching of collected data. In other words, it's anecdotal. This is analogous to saying: "I once saw someone adding numbers A and B by counting beads, and observed that the time required was |A| + |B|. From this I will conclude that the complexity of adding numbers is |A| + |B|, and therefore that adding numbers is impossible for all but the smallest numbers." Obviously, the flaw in this argument is that there exist much more efficient ways to add numbers than counting beads. A claim of complexity doesn't mean anything unless you can prove that it's a lower bound. There's a reason that I followed my call for bug-free compilers with the claim: "In practice, this requires provably correct compilers." I wrote that because I know that it's the only known practical way to do it for non-trivial languages. >> If you're referring to the bugs found in CompCert, the ones I know about >> were actually bugs in the unverified parts of the toolchain. > I was referring to the bugs in the verified parts in regards to C's > undefined behavior. Interesting. I hadn't heard of those. Can you cite the details? >> In the past, its frontend was unverified, and several bugs were found there. >> Even today, it produces assembly code, and depends on an unverified >> assembler and linker. > Which depending on if the compiler has been formally proven to only > output valid assembly (no 33bit offset loads) then that is less of a > concern. > (Let us just hope the assembler doesn't arbitrarily choose 8bit > immediates for performance reasons when given a 32bit int) Right, which shows the problems with depending on unverified components, but CompCert is no longer the state of the art in this regard. In contrast, the CakeML compiler not only produces raw machine code, moreover the proofs apply to the raw machine code of the CakeML compiler itself. As a result, the only trusted specifications are those of the source language and of the underlying machine code behavior (for the subset of instructions that are actually generated). >> Bugs can also exist in the specifications themselves, of course. > The most important class of bugs indeed > >> I'm not sure what "and human model first behavior" means, but if you >> mean that the semantics of languages should strive to match what a human >> would naturally expect, avoiding surprising or unintuitive behavior, I >> certainly agree. > That is exactly what I mean. The problem is ultimately do be useful > things one has to support things that violate that in very terriable > ways (like support IEEE floating point, I agree that IEEE floating point (and more generally inexact arithmetic) is a problem, but a great many useful programs can be written without using them, including most (if not all) of the programs that are most important for us to trust. Operating systems, compilers, editors, and communication tools such as chat clients and mail readers, have no need for inexact arithmetic. These are the programs I care most about. > disable garbage collection, etc). Why do you include garbage collection in this list? I consider garbage collection to be a _requirement_ for human model first behavior. Do you consider it a hindrance? Also, multiple verified garbage collectors already exist, e.g. those used in the runtimes of CakeML and Jitawa. >> I consider Standard ML, and some subsets of Scheme and >> Lisp, to be such languages > They certainly do have wonderful properties but I wouldn't say they > qualify as matching the human model first behavior requirement (easy to > verify by doing the 50 non-programmers first hand test) I think it's too ambitious to target non-programmers, because they do not have the required skills to understand the behavior of complex programs, regardless of what language is used. They could perhaps understand toy programs, but that's not useful for helping them evaluate the trustworthiness of the computing tools they actually use, which is what I'm interested in. Anyway, what language(s) would you consider to be superior to ML or a subset of Scheme, based on the goal of human model first behavior? Would you consider your low-level macro languages to be better suited to this task? I might agree that such simple assembly-like languages are more straightforward for understanding the behavior of toy programs, where one can keep all of the moving parts in one's head. However, I would argue that those desirable properties of assembly-like languages do not scale to larger programs. Regards, Mark