From mboxrd@z Thu Jan  1 00:00:00 1970
From: Mark H Weaver <mhw@netris.org>
Subject: Re: Trustworthiness of build farms (was Re: CDN performance)
Date: Sat, 05 Jan 2019 12:16:25 -0500
Message-ID: <87pntbkosb.fsf@netris.org>
References: <87va34ywci.fsf@ITSx01.pdp10.guru>
Mime-Version: 1.0
Content-Type: text/plain
Return-path: <guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org>
Received: from eggsout.gnu.org ([209.51.188.92]:46881 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mhw@netris.org>) id 1gfpZy-0005Zu-Ai
	for guix-devel@gnu.org; Sat, 05 Jan 2019 12:17:47 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mhw@netris.org>) id 1gfpZs-0006E2-Lg
	for guix-devel@gnu.org; Sat, 05 Jan 2019 12:17:45 -0500
Received: from world.peace.net ([64.112.178.59]:54564)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
	(Exim 4.71) (envelope-from <mhw@netris.org>) id 1gfpZq-0006CB-UH
	for guix-devel@gnu.org; Sat, 05 Jan 2019 12:17:39 -0500
In-Reply-To: <87va34ywci.fsf@ITSx01.pdp10.guru> (Jeremiah's message of "Fri,
	04 Jan 2019 20:57:33 +0000")
List-Id: "Development of GNU Guix and the GNU System distribution."
	<guix-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/guix-devel>,
	<mailto:guix-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/guix-devel/>
List-Post: <mailto:guix-devel@gnu.org>
List-Help: <mailto:guix-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/guix-devel>,
	<mailto:guix-devel-request@gnu.org?subject=subscribe>
Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org
Sender: "Guix-devel" <guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org>
To: Jeremiah@pdp10.guru
Cc: guix-devel@gnu.org

Hi Jeremiah,

Jeremiah@pdp10.guru writes:

>>>> To truly solve that problem, we need bug-free compilers.
>>> Impossible for all but the simplest of languages as the complexity of
>>> implementing a compiler/assembler/interpreter is ln(c)+a but the
>>> complexity of implementing a bug-free compiler/assembler/interpreter
>>> is (e^(c))! - a. Where a is the complexity cost of supporting it's
>>> host architecture.
>>
>> Where are you getting those complexity expressions from?
> Approximation of developer effort spent on single pass workflows and
> bugfree libraries in the State of Michigan Welfware Eligibility System
> extracted from it's ClearCase commit history. (Thank god, I finally got
> them to convert to git after 3 years of wailing and gnashing of teeth)
>
>> Can you cite references to back them up?
> I can site the numbers I used to generate those approximate complexity equations
>
>> If not, can you explain how you arrived at them?
> Simple graphing and curve matching of collected data.

In other words, it's anecdotal.  This is analogous to saying:

  "I once saw someone adding numbers A and B by counting beads, and
  observed that the time required was |A| + |B|.  From this I will
  conclude that the complexity of adding numbers is |A| + |B|, and
  therefore that adding numbers is impossible for all but the smallest
  numbers."

Obviously, the flaw in this argument is that there exist much more
efficient ways to add numbers than counting beads.  A claim of
complexity doesn't mean anything unless you can prove that it's a lower
bound.

There's a reason that I followed my call for bug-free compilers with the
claim: "In practice, this requires provably correct compilers."  I wrote
that because I know that it's the only known practical way to do it for
non-trivial languages.

>> If you're referring to the bugs found in CompCert, the ones I know about
>> were actually bugs in the unverified parts of the toolchain.
> I was referring to the bugs in the verified parts in regards to C's
> undefined behavior.

Interesting.  I hadn't heard of those.  Can you cite the details?

>> In the past, its frontend was unverified, and several bugs were found there.
>> Even today, it produces assembly code, and depends on an unverified
>> assembler and linker.
> Which depending on if the compiler has been formally proven to only
> output valid assembly (no 33bit offset loads) then that is less of a
> concern.
> (Let us just hope the assembler doesn't arbitrarily choose 8bit
> immediates for performance reasons when given a 32bit int)

Right, which shows the problems with depending on unverified components,
but CompCert is no longer the state of the art in this regard.  In
contrast, the CakeML compiler not only produces raw machine code,
moreover the proofs apply to the raw machine code of the CakeML compiler
itself.  As a result, the only trusted specifications are those of the
source language and of the underlying machine code behavior (for the
subset of instructions that are actually generated).

>> Bugs can also exist in the specifications themselves, of course.
> The most important class of bugs indeed
>
>> I'm not sure what "and human model first behavior" means, but if you
>> mean that the semantics of languages should strive to match what a human
>> would naturally expect, avoiding surprising or unintuitive behavior, I
>> certainly agree.
> That is exactly what I mean. The problem is ultimately do be useful
> things one has to support things that violate that in very terriable
> ways (like support IEEE floating point,

I agree that IEEE floating point (and more generally inexact arithmetic)
is a problem, but a great many useful programs can be written without
using them, including most (if not all) of the programs that are most
important for us to trust.  Operating systems, compilers, editors, and
communication tools such as chat clients and mail readers, have no need
for inexact arithmetic.  These are the programs I care most about.

> disable garbage collection, etc).

Why do you include garbage collection in this list?  I consider garbage
collection to be a _requirement_ for human model first behavior.  Do you
consider it a hindrance?  Also, multiple verified garbage collectors
already exist, e.g. those used in the runtimes of CakeML and Jitawa.

>> I consider Standard ML, and some subsets of Scheme and
>> Lisp, to be such languages
> They certainly do have wonderful properties but I wouldn't say they
> qualify as matching the human model first behavior requirement (easy to
> verify by doing the 50 non-programmers first hand test)

I think it's too ambitious to target non-programmers, because they do
not have the required skills to understand the behavior of complex
programs, regardless of what language is used.  They could perhaps
understand toy programs, but that's not useful for helping them evaluate
the trustworthiness of the computing tools they actually use, which is
what I'm interested in.

Anyway, what language(s) would you consider to be superior to ML or a
subset of Scheme, based on the goal of human model first behavior?

Would you consider your low-level macro languages to be better suited to
this task?

I might agree that such simple assembly-like languages are more
straightforward for understanding the behavior of toy programs, where
one can keep all of the moving parts in one's head.  However, I would
argue that those desirable properties of assembly-like languages do not
scale to larger programs.

     Regards,
       Mark