Trustworthiness of build farms (was Re: CDN performance)

unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Trustworthiness of build farms (was Re: CDN performance)
  2018-12-19 12:40             ` Giovanni Biscuolo
@ 2018-12-21  0:23               ` Mark H Weaver
  0 siblings, 0 replies; 11+ messages in thread
From: Mark H Weaver @ 2018-12-21  0:23 UTC (permalink / raw)
  To: Giovanni Biscuolo; +Cc: guix-devel, 33600

Hi Giovanni,

Giovanni Biscuolo <g@xelera.eu> writes:

> Mark H Weaver <mhw@netris.org> writes:
>
>> Giovanni Biscuolo <g@xelera.eu> writes:
>>> with a solid infrastructure of "scientifically" trustable build farms,
>>> there are no reasons not to trust substitutes servers (this implies
>>> working towards 100% reproducibility of GuixSD)
>>
>> What does "scientifically trustable" mean?
>
> I'm still not able to elaborate on that (working on it, a sort of
> self-research-hack project) but I'm referencing to this message related
> to reduced bootstrap tarballs:
>
>   https://lists.gnu.org/archive/html/guix-devel/2018-11/msg00347.html
>
> and the related reply by Jeremiah (unfortunately cannot find it in
> archives, Message-ID: <877eh81tm4.fsf@ITSx01.pdp10.guru>)
>
> in particular Jeremiah replied this:
>
>> so, if I don't get it wrong, every skilled engineer will be able to
>> build an "almost analogic" (zero bit of software preloaded) computing
>> machine ad use stage0/mes [1] as the "metre" [2] to calibrate all other
>> computing machines (thanks to reproducible builds)?
>
> well, I haven't thought of it in those terms but yes I guess that is one
> of the properties of the plan.
>
>
> and
>
>> so, having the scientific proof that binary conforms to source, there
>> will be noo need to trust (the untrastable)
>
> Well, that is what someone else could do with it but not a direct goal
> of the work.
>
> maybe a more correct definition of the above "scientific proof" should be
> "mathematical proof"

I agree that a mathematical proof is what we should be aiming for, and
the only kind of proof that I could trust in, in this scenario.

However, it's important to note several caveats:

* A mathematical proof showing that the binary conforms to the source
  requires a proof of correctness of the language implementation, which
  in turn requires formal semantics for both the source language and the
  underlying machine code.  As far as I know, the current
  bootstrappable.org effort does not include anything like this.
  Existing projects that provide this include CakeML and Jitawa.

* One must assume that the hardware behaves according to its formal
  specification.  The veracity of this assumption is not something we
  can currently verify, and even if we could, it would be invalidated if
  someone gained physical access to the machine and modified it.

* The hardware initialization program (e.g. coreboot) and the kernel
  used to perform the bootstrap must still be trusted, and unless I'm
  mistaken, the bootstrappable.org effort does not currently address
  these issues.

* The software running on the substitute servers could be compromised by
  stealing SSH keys from someone with root access.

* If the private signing key of the substitute server can be stolen,
  e.g. by gaining physical access to the machine for a short time, then
  a man-in-the-middle can deliver to users compromised binaries that
  appear to come from the substitute server itself.

* Not only the substitute server, but also all of its build slaves, must
  be trusted as well.

So, while I certainly agree that it will be a *major* improvement to
avoid the need to blindly trust precompiled bootstrap binaries, we
should be clear that the current efforts fall far short of a proof, and
there still remain several valid reasons not to place one's trust in
substitute servers.

Does that make sense?

    Regards,
      Mark

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Trustworthiness of build farms (was Re: CDN performance)
@ 2018-12-22 17:23 Jeremiah
  2018-12-23 16:37 ` Mark H Weaver
  0 siblings, 1 reply; 11+ messages in thread
From: Jeremiah @ 2018-12-22 17:23 UTC (permalink / raw)
  To: guix-devel

> I agree that a mathematical proof is what we should be aiming for, and
> the only kind of proof that I could trust in, in this scenario.

A formal proof would be just one piece used in building layers of trust,
each of them indpendent and reinforcing of each other like layers of
kevlar in a bullet proof vest; thus even if some of the layers break,
the bullet doesn't get in to do damage.

> However, it's important to note several caveats:
Of course, we always miss things.

> * A mathematical proof showing that the binary conforms to the source
>   requires a proof of correctness of the language implementation, which
>     in turn requires formal semantics for both the source language and
>   the
>    underlying machine code.  As far as I know, the current
>       bootstrappable.org effort does not include anything like this.
>         Existing projects that provide this include CakeML and Jitawa.
Well hex2's entire language is:
# line comment - ignore all text until lf
; line comment - ignore all text until lf
[0-F, 0-F] - The 2 chars are nibbles combined into a byte (Big Endian)
:label - The absolute address of the label used by pointers
!label{>label} - the 8bit relative address to label
@label{>label} - the 16bit relative address to label
$label - the 16bit absolute address of label
%label{>label} - the 32bit relative address to label
&label - the 32bit absolute address of label

the ?label>label form changes the displacement from current IP to the IP
of the label (aka calculates the length between labels) and outputs the
value in the format inidicated by the leading char (!@%)
With the explicit requirement of specifying the architecture (which
specifies the format and displacement calculation) and the base address
(aka what address does the IP start at)

The M1-macro language is even smaller:
# line comment - ignore all text until lf
; line comment - ignore all text until lf
DEFINE STRING1 STRING2 - Replace all STRING1 with STRING2
!number - the 8bit value of number in hex form
@number - the 16bit value of number in hex form
%number - the 32bit value of number in hex form
{:!@$%&}string - do not change and just output
"RAW STRING" - Convert to Hex (Big Endian)
'HEX' - Do not change and just output

So the specification for correctness is defined by the languages
themselves but you are correct in that someone has not picked up the
task of making a formally verified version of the tools yet.


> * One must assume that the hardware behaves according to its formal
>   specification.  The veracity of this assumption is not something we
>     can currently verify, and even if we could, it would be invalidated
>   if
>     someone gained physical access to the machine and modified it.
Actually no, I am expecting arbitrary hardware platforms to cross verify
each other and thus provided one system is uncompromised or trusted, the
rest can be validated.


> * The hardware initialization program (e.g. coreboot) and the kernel
>   used to perform the bootstrap must still be trusted, and unless I'm
>     mistaken, the bootstrappable.org effort does not currently address
>       these issues.
Stage0 assumes no operating system nor firmware and runs on bare metal
(literally TTL when done). Mescc-tools is just a port of the
specification to Linux to allow developers to leverage it to simplify
the later task of finishing.


> * The software running on the substitute servers could be compromised by
>   stealing SSH keys from someone with root access.
Yes that is true but I don't want you to trust the substitute
servers. They exist for the purpose of convience not security


> * If the private signing key of the substitute server can be stolen,
>   e.g. by gaining physical access to the machine for a short time, then
>     a man-in-the-middle can deliver to users compromised binaries that
>       appear to come from the substitute server itself.
I believe that is a duplicate of the previous point but yes you are
correct that is a risk of having a single point of trust. Which is why
stage0 by design can not have a single point of trust because the
bootstrap steps can be done on arbitrary hardware and rewritten from
scratch with minimal effort. Thus one must need only review the source
code (which focuses on clarity and understandability and any issues with
it is a bug that needs to be reported).


> * Not only the substitute server, but also all of its build slaves, must
>   be trusted as well.
Assume all machines could be compromised, then design assuming that any
of them can be compromised at any time but not all of them all of the
time. Then you have the solution to trust.


> So, while I certainly agree that it will be a *major* improvement to
> avoid the need to blindly trust precompiled bootstrap binaries
In terms of size that is true but in number of people who understand the
seed pieces, that unfortunately has reduced and that issue needs to be
addressed.

> we should be clear that the current efforts fall far short of a proof,
Absolutely agree at this port, we only have the specification for which
the proof must be written and someone will need to write a formally
verified version to verify that our roots of truth which are
approximations of the specification limited by the constraints of time
and human effort have not missed a subtle error.

> and there still remain several valid reasons not to place one's trust in
> substitute servers.
Personally I believe in terms of security and trust in absolute senses
of the words, you are beyound a shadow of a doubt correct but the
substitutes have never been about trust or security but convience which
many guix users want. But that doesn't mean that there are not things we
should start doing to make the task of compromising that trust harder.

For example:
Cryptographic port knocking for administration of servers requiring
access to a physically seperate hard token.

Cryptographically signed software white-listing on servers with the
signing keys being on a seperate system that is offline and the binaries
being signed being built from source on machines that have no internal
state and running only software explicitly specified and required for
the build.

Random system spot checks and wipes.

In short anything that makes single point compromises worthless needs to
be actively considered.

> Does that make sense?
Yes and hopefully my perspective makes sense as well.

-Jeremiah

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Trustworthiness of build farms (was Re: CDN performance)
  2018-12-22 17:23 Jeremiah
@ 2018-12-23 16:37 ` Mark H Weaver
  0 siblings, 0 replies; 11+ messages in thread
From: Mark H Weaver @ 2018-12-23 16:37 UTC (permalink / raw)
  To: Jeremiah; +Cc: guix-devel

Hi Jeremiah,

If you could add an "In-Reply-To:" header to your responses, that would
be very helpful.  It's easy to add it manually if needed: just copy the
"Message-ID:" header from the original message and replace "Message-ID:"
with "In-Reply-To:".  As is, it's very difficult for me to keep track of
any conversation with you.

Jeremiah@pdp10.guru writes:

>> I agree that a mathematical proof is what we should be aiming for, and
>> the only kind of proof that I could trust in, in this scenario.
>
> A formal proof would be just one piece used in building layers of trust,
> each of them indpendent and reinforcing of each other like layers of
> kevlar in a bullet proof vest; thus even if some of the layers break,
> the bullet doesn't get in to do damage.

Agreed.

>> However, it's important to note several caveats:
> Of course, we always miss things.
>
>> * A mathematical proof showing that the binary conforms to the source
>>   requires a proof of correctness of the language implementation, which
>>     in turn requires formal semantics for both the source language and
>>   the
>>    underlying machine code.  As far as I know, the current
>>       bootstrappable.org effort does not include anything like this.
>>         Existing projects that provide this include CakeML and Jitawa.
> Well hex2's entire language is:
[...]
> The M1-macro language is even smaller:
[...]

I'm not worried about those languages.  Very little code is written in
them anyway, only a small part of the early bootstrap.  My concern is
the correspondence between the source code and machine code for the
majority of the operating system and applications.

It's important to note that even a relatively obscure bug in the
compiler is enough to create an exploitable bug in the machine code of
compiled programs that's not present in the source code.  Such compiler
bugs and the resulting exploits could be systematically searched for by
well-resourced attackers.

So, if we want to *truly* solve the problem of exploitable bugs existing
only in the machine code and not in the corresponding source, it is not
enough to eliminate the possibility of code deliberately inserted in our
toolchain that inserts trojan horses in other software.

To truly solve that problem, we need bug-free compilers.  In practice,
this requires provably correct compilers.

Does that make sense?

       Mark

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Trustworthiness of build farms (was Re: CDN performance)
@ 2018-12-26  2:29 Jeremiah
  2018-12-28  7:12 ` Mark H Weaver
  0 siblings, 1 reply; 11+ messages in thread
From: Jeremiah @ 2018-12-26  2:29 UTC (permalink / raw)
  To: mhw; +Cc: guix-devel

> If you could add an "In-Reply-To:" header to your responses, that would
> be very helpful.  It's easy to add it manually if needed: just copy the
> "Message-ID:" header from the original message and replace "Message-ID:"
> with "In-Reply-To:".  As is, it's very difficult for me to keep track of
> any conversation with you.

Sorry abou that, was not something I generally even consider as emacs
handles that bit of tracking for me but I will try to add that in the
future.

> I'm not worried about those languages.  Very little code is written in
> them anyway, only a small part of the early bootstrap.
Yet also the point where the trusting trust attacks can most effectively
be inserted. Hence more concern and less trust needs to be placed there.


> My concern is the correspondence between the source code and machine code for the
> majority of the operating system and applications.
That seems more like a feature request for GCC and guile than the
bootstrap or the trustworthiness of the build farms.

> It's important to note that even a relatively obscure bug in the
> compiler is enough to create an exploitable bug in the machine code of
> compiled programs that's not present in the source code.
Absolutely and we already have dozens of living examples due to
performance optimizations done by GCC and Clang exploiting undefined
behavior in the C language and in some cases violating the spec for sake
of compatibility.

> Such compiler bugs and the resulting exploits could be systematically searched for by
> well-resourced attackers.
And actively exploiting as they are already doing.

> So, if we want to *truly* solve the problem of exploitable bugs existing
> only in the machine code and not in the corresponding source, it is not
> enough to eliminate the possibility of code deliberately inserted in our
> toolchain that inserts trojan horses in other software.
Very true, we need formally defined languages which do not have
undefined behavior and rigourous tests that prevent optimizations from
violating the spec. In essense a series of acid tests that ensure the
resulting mental model always exactly matches the computing model that
the compilers will be based upon.


> To truly solve that problem, we need bug-free compilers.
Impossible for all but the simplest of languages as the complexity of
implementing a compiler/assembler/interpreter is ln(c)+a but the
complexity of implementing a bug-free compiler/assembler/interpreter is
(e^(c))! - a. Where a is the complexity cost of supporting it's host architecture.


> In practice, this requires provably correct compilers.
Which in practice turn out *NOT* to be bug free nor complete in regards
to the standard specification. Now, don't get me wrong; provably correct
compilers are a correct step in the right direction but the real
solution is to first generate simplified languages that don't have
undefined behavior and human model first behavior.

> Does that make sense?
Absolutely, certainly something possible to do; but extremely human
effort intensive and I don't see anyone willing to throw 2+ years of
human effort at the problem outside of non-free Businesses like
CompCert.

I'd love to see someone do it, I'd even throw in a few dollars into a
pot to fund it but it is so low down on my list of priorities, I'm not
going to be touching it in the next 2 decades...

-Jeremiah

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Trustworthiness of build farms (was Re: CDN performance)
  2018-12-26  2:29 Trustworthiness of build farms (was Re: CDN performance) Jeremiah
@ 2018-12-28  7:12 ` Mark H Weaver
  0 siblings, 0 replies; 11+ messages in thread
From: Mark H Weaver @ 2018-12-28  7:12 UTC (permalink / raw)
  To: Jeremiah; +Cc: guix-devel

Hi Jeremiah,

Jeremiah@pdp10.guru writes:

>> To truly solve that problem, we need bug-free compilers.
> Impossible for all but the simplest of languages as the complexity of
> implementing a compiler/assembler/interpreter is ln(c)+a but the
> complexity of implementing a bug-free compiler/assembler/interpreter
> is (e^(c))! - a. Where a is the complexity cost of supporting it's
> host architecture.

Where are you getting those complexity expressions from?  Can you cite
references to back them up?  If not, can you explain how you arrived at
them?

What is 'c'?

>> In practice, this requires provably correct compilers.
> Which in practice turn out *NOT* to be bug free nor complete in regards
> to the standard specification.

If you're referring to the bugs found in CompCert, the ones I know about
were actually bugs in the unverified parts of the toolchain.  In the
past, its frontend was unverified, and several bugs were found there.
Even today, it produces assembly code, and depends on an unverified
assembler and linker.

Bugs can also exist in the specifications themselves, of course.

However, in the areas covered by the proofs, the results are quite
dramatic.  For example, in the paper "Finding and Understanding Bugs in
C Compilers" by Xuejun Yang, Yang Chen, Eric Eide, and John Regehr,

  https://web.stanford.edu/class/cs343/resources/finding-bugs-compilers.pdf

the authors state on page 6:

     The striking thing about our CompCert results is that the middle-
  end bugs we found in all other compilers are absent.  As of early
  2011, the under-development version of CompCert is the only compiler
  we have tested for which Csmith cannot find wrong-code errors.  This
  is not for lack of trying: we have devoted about six CPU-years to the
  task.  The apparent unbreakability of CompCert supports a strong
  argument that developing compiler optimizations within a proof
  framework, where safety checks are explicit and machine-checked, has
  tangible benefits for compiler users

> Now, don't get me wrong; provably correct
> compilers are a correct step in the right direction but the real
> solution is to first generate simplified languages that don't have
> undefined behavior and human model first behavior.

I'm not sure what "and human model first behavior" means, but if you
mean that the semantics of languages should strive to match what a human
would naturally expect, avoiding surprising or unintuitive behavior, I
certainly agree.  I consider Standard ML, and some subsets of Scheme and
Lisp, to be such languages.

>> Does that make sense?
> Absolutely, certainly something possible to do; but extremely human
> effort intensive and I don't see anyone willing to throw 2+ years of
> human effort at the problem outside of non-free Businesses like
> CompCert.

If I understand correctly, what you don't expect to happen has already
been done.  CakeML is free software, and formally proven correct all the
way down to the machine code.  Moreover, it implements a language with
an exceptionally clear semantics and no undefined behavior.

Anyway, you've made it fairly clear that you're not interested in this
line of work, and that's fine.  I appreciate the work you're doing
nonetheless.

     Regards,
       Mark

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Trustworthiness of build farms (was Re: CDN performance)
@ 2019-01-04 20:57 Jeremiah
  2019-01-05 17:16 ` Mark H Weaver
  0 siblings, 1 reply; 11+ messages in thread
From: Jeremiah @ 2019-01-04 20:57 UTC (permalink / raw)
  To: mhw; +Cc: guix-devel

> Where are you getting those complexity expressions from?
Approximation of developer effort spent on single pass workflows and
bugfree libraries in the State of Michigan Welfware Eligibility System
extracted from it's ClearCase commit history. (Thank god, I finally got
them to convert to git after 3 years of wailing and gnashing of teeth)

> Can you cite references to back them up?
I can site the numbers I used to generate those approximate complexity equations

> If not, can you explain how you arrived at them?
Simple graphing and curve matching of collected data.

> What is 'c'?
Number of 'Features' a program had.
In short I found it is far easier for a developer to add features to a
program but it is really really hard to make sure the existing
functionality is bug free.

> If you're referring to the bugs found in CompCert, the ones I know about
> were actually bugs in the unverified parts of the toolchain.
I was referring to the bugs in the verified parts in regards to C's
undefined behavior.

> In the past, its frontend was unverified, and several bugs were found there.
> Even today, it produces assembly code, and depends on an unverified
> assembler and linker.
Which depending on if the compiler has been formally proven to only
output valid assembly (no 33bit offset loads) then that is less of a
concern.
(Let us just hope the assembler doesn't arbitrarily choose 8bit
immediates for performance reasons when given a 32bit int)

> Bugs can also exist in the specifications themselves, of course.
The most important class of bugs indeed

> I'm not sure what "and human model first behavior" means, but if you
> mean that the semantics of languages should strive to match what a human
> would naturally expect, avoiding surprising or unintuitive behavior, I
> certainly agree.
That is exactly what I mean. The problem is ultimately do be useful
things one has to support things that violate that in very terriable
ways (like support IEEE floating point, disable garbage collection, etc).

> I consider Standard ML, and some subsets of Scheme and
> Lisp, to be such languages
They certainly do have wonderful properties but I wouldn't say they
qualify as matching the human model first behavior requirement (easy to
verify by doing the 50 non-programmers first hand test)

Ask a room full of non-programmers to use scheme to write a standard web
CRUD app using an SQLite3 database backend or do XML parsing or generate
a pdf/excel file. It doesn't look pretty and most just give up on the
offer of free money after a bit.


> If I understand correctly, what you don't expect to happen has already
> been done.  CakeML is free software, and formally proven correct all the
> way down to the machine code.  Moreover, it implements a language with
> an exceptionally clear semantics and no undefined behavior.
I don't deny that the CakeML team did an excellent job on their formally
verified backend and their type inferencer. Most humans are not
programmers and of the programmers, most of them programming by proof
isn't in the realm of intuitive. 

> Anyway, you've made it fairly clear that you're not interested in this
> line of work, and that's fine.
It isn't so much as not interested but rather it is lower on my priorities

> I appreciate the work you're doing > nonetheless.
As I appreciate the work you do as well.

-Jeremiah

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Trustworthiness of build farms (was Re: CDN performance)
  2019-01-04 20:57 Jeremiah
@ 2019-01-05 17:16 ` Mark H Weaver
  0 siblings, 0 replies; 11+ messages in thread
From: Mark H Weaver @ 2019-01-05 17:16 UTC (permalink / raw)
  To: Jeremiah; +Cc: guix-devel

Hi Jeremiah,

Jeremiah@pdp10.guru writes:

>>>> To truly solve that problem, we need bug-free compilers.
>>> Impossible for all but the simplest of languages as the complexity of
>>> implementing a compiler/assembler/interpreter is ln(c)+a but the
>>> complexity of implementing a bug-free compiler/assembler/interpreter
>>> is (e^(c))! - a. Where a is the complexity cost of supporting it's
>>> host architecture.
>>
>> Where are you getting those complexity expressions from?
> Approximation of developer effort spent on single pass workflows and
> bugfree libraries in the State of Michigan Welfware Eligibility System
> extracted from it's ClearCase commit history. (Thank god, I finally got
> them to convert to git after 3 years of wailing and gnashing of teeth)
>
>> Can you cite references to back them up?
> I can site the numbers I used to generate those approximate complexity equations
>
>> If not, can you explain how you arrived at them?
> Simple graphing and curve matching of collected data.

In other words, it's anecdotal.  This is analogous to saying:

  "I once saw someone adding numbers A and B by counting beads, and
  observed that the time required was |A| + |B|.  From this I will
  conclude that the complexity of adding numbers is |A| + |B|, and
  therefore that adding numbers is impossible for all but the smallest
  numbers."

Obviously, the flaw in this argument is that there exist much more
efficient ways to add numbers than counting beads.  A claim of
complexity doesn't mean anything unless you can prove that it's a lower
bound.

There's a reason that I followed my call for bug-free compilers with the
claim: "In practice, this requires provably correct compilers."  I wrote
that because I know that it's the only known practical way to do it for
non-trivial languages.

>> If you're referring to the bugs found in CompCert, the ones I know about
>> were actually bugs in the unverified parts of the toolchain.
> I was referring to the bugs in the verified parts in regards to C's
> undefined behavior.

Interesting.  I hadn't heard of those.  Can you cite the details?

>> In the past, its frontend was unverified, and several bugs were found there.
>> Even today, it produces assembly code, and depends on an unverified
>> assembler and linker.
> Which depending on if the compiler has been formally proven to only
> output valid assembly (no 33bit offset loads) then that is less of a
> concern.
> (Let us just hope the assembler doesn't arbitrarily choose 8bit
> immediates for performance reasons when given a 32bit int)

Right, which shows the problems with depending on unverified components,
but CompCert is no longer the state of the art in this regard.  In
contrast, the CakeML compiler not only produces raw machine code,
moreover the proofs apply to the raw machine code of the CakeML compiler
itself.  As a result, the only trusted specifications are those of the
source language and of the underlying machine code behavior (for the
subset of instructions that are actually generated).

>> Bugs can also exist in the specifications themselves, of course.
> The most important class of bugs indeed
>
>> I'm not sure what "and human model first behavior" means, but if you
>> mean that the semantics of languages should strive to match what a human
>> would naturally expect, avoiding surprising or unintuitive behavior, I
>> certainly agree.
> That is exactly what I mean. The problem is ultimately do be useful
> things one has to support things that violate that in very terriable
> ways (like support IEEE floating point,

I agree that IEEE floating point (and more generally inexact arithmetic)
is a problem, but a great many useful programs can be written without
using them, including most (if not all) of the programs that are most
important for us to trust.  Operating systems, compilers, editors, and
communication tools such as chat clients and mail readers, have no need
for inexact arithmetic.  These are the programs I care most about.

> disable garbage collection, etc).

Why do you include garbage collection in this list?  I consider garbage
collection to be a _requirement_ for human model first behavior.  Do you
consider it a hindrance?  Also, multiple verified garbage collectors
already exist, e.g. those used in the runtimes of CakeML and Jitawa.

>> I consider Standard ML, and some subsets of Scheme and
>> Lisp, to be such languages
> They certainly do have wonderful properties but I wouldn't say they
> qualify as matching the human model first behavior requirement (easy to
> verify by doing the 50 non-programmers first hand test)

I think it's too ambitious to target non-programmers, because they do
not have the required skills to understand the behavior of complex
programs, regardless of what language is used.  They could perhaps
understand toy programs, but that's not useful for helping them evaluate
the trustworthiness of the computing tools they actually use, which is
what I'm interested in.

Anyway, what language(s) would you consider to be superior to ML or a
subset of Scheme, based on the goal of human model first behavior?

Would you consider your low-level macro languages to be better suited to
this task?

I might agree that such simple assembly-like languages are more
straightforward for understanding the behavior of toy programs, where
one can keep all of the moving parts in one's head.  However, I would
argue that those desirable properties of assembly-like languages do not
scale to larger programs.

     Regards,
       Mark

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Trustworthiness of build farms (was Re: CDN performance)
@ 2019-01-05 20:15 jeremiah
  2019-01-06 20:47 ` Chris Marusich
  0 siblings, 1 reply; 11+ messages in thread
From: jeremiah @ 2019-01-05 20:15 UTC (permalink / raw)
  To: mhw; +Cc: guix-devel

> In other words, it's anecdotal.
True, it definitely was not a formal proof just data based upon real
world development with large teams of programmers and formal
specifications for correct behavior.

> Obviously, the flaw in this argument is that there exist much more
> efficient ways to add numbers than counting beads.  A claim of
> complexity doesn't mean anything unless you can prove that it's a lower
> bound.
I don't deny that there are possibly lower bounds for the work, rather
that is the observed values from large teams doing that development are
what I have available to share.

> There's a reason that I followed my call for bug-free compilers with the
> claim: "In practice, this requires provably correct compilers."  I wrote
> that because I know that it's the only known practical way to do it for
> non-trivial languages.
Perfection is a road, not a destination.
One requires a "provabily correct compiler" to bootstrap a "provabily
correct compiler". A step completely skipped.

> Interesting.  I hadn't heard of those.  Can you cite the details?
One the most obvious is the float size definition and rounding behavior
on x86 hardware. The C standard requires rounding towards even, IEEE
only supports rounding towards infinity, negative infinity and zero. The
C standard allows floats to be expanded (to 80bits) but IEEE doesn't as
it can change the stability of some algorithms.


> Right, which shows the problems with depending on unverified components,
> but CompCert is no longer the state of the art in this regard.
It goes deeper than that.

> In contrast, the CakeML compiler not only produces raw machine code,
> moreover the proofs apply to the raw machine code of the CakeML compiler
> itself.  As a result, the only trusted specifications are those of the
> source language and of the underlying machine code behavior (for the
> subset of instructions that are actually generated).
So either CakeML must refuse to compile valid code or will generate
invalid binaries or both.

> I agree that IEEE floating point (and more generally inexact arithmetic)
> is a problem, but a great many useful programs can be written without
> using them, including most (if not all) of the programs that are most
> important for us to trust.  Operating systems, compilers, editors, and
> communication tools such as chat clients and mail readers, have no need
> for inexact arithmetic.  These are the programs I care most about.
Aside from the compilers that janneke and I have written, no other real
world compilers don't depend upon floating point; why do you think
janneke spent so many hours adding floating point to Mescc?
Try to build Linux without floating point support, it can't be done.
To to build emacs or vim without floating point support in your
compiler.

The reason is simple: Math libraries abuse it for performance and
performance is the only metric most people care about.

The Countless sins of our Industry exit because of those cursed words.

> Why do you include garbage collection in this list?  I consider garbage
> collection to be a _requirement_ for human model first behavior.  Do you
> consider it a hindrance?  Also, multiple verified garbage collectors
> already exist, e.g. those used in the runtimes of CakeML and Jitawa.
Find me one Enterprise grade Operating system that isn't a Lisp machine
that includes garbage collection at the native level.

> I think it's too ambitious to target non-programmers, because they do
> not have the required skills to understand the behavior of complex
> programs, regardless of what language is used.  They could perhaps
> understand toy programs, but that's not useful for helping them evaluate
> the trustworthiness of the computing tools they actually use, which is
> what I'm interested in.
Fair preference to have

> Anyway, what language(s) would you consider to be superior to ML or a
> subset of Scheme, based on the goal of human model first behavior?
As much as I hate to admit it but C# and Go tend to be far easier for
most people new to programming. Not that I would ever claim they are
superior languages in any technical sense.

> Would you consider your low-level macro languages to be better suited to
> this task?
Oh no, it exists solely to create a floor for bootstrapping ease and
simplify the problem of cross-platform bootstrap verification.

> I might agree that such simple assembly-like languages are more
> straightforward for understanding the behavior of toy programs, where
> one can keep all of the moving parts in one's head.  However, I would
> argue that those desirable properties of assembly-like languages do not
> scale to larger programs.
A self-hosting C compiler isn't exactly a toy program and if you read
it's code, you'll quickly discover there is no need to keep very much in
one's head at all.

It's real weakness is it's type system and lack of runtime.

One must remember a language's greatest strength is also it's greatest
weakness as well. Every desired feature must be paid for, usually in
regards to other desired features.

Performance, correctness and ease of development.
Pick one (the Industry picked Performance)

Things that should have been done differently if correctness was the
goal:
Hardware support for typed memory
Hardware support for garbage collection
Hardware support protecting against memory corruption (ECC, rowhammer
resistence circuits and SRAM instead of DRAM)
Barrel processor architectures (No risk from Spectre, Meltdown or
Raiden)
Hardware Capability tagging of processes
Flatten virtualization down to user processes.
Stack space and Heap Space different from each other and Program space.
Atleast a single attempt in the last 60 years from any military on the
planet to deal with the risks written in the Nexus Intruder Report
published in 1958.

I could spend literal weeks ranting and raving about modern hardware
makes correctness impossible for all but the trivial or the so isolated
from the hardware that performance makes it a non-starter for anything
but journal articles which are never read and forgotten within a
generation.

-Jeremiah

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Trustworthiness of build farms (was Re: CDN performance)
  2019-01-05 20:15 jeremiah
@ 2019-01-06 20:47 ` Chris Marusich
  2019-01-18 10:54   ` Giovanni Biscuolo
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Marusich @ 2019-01-06 20:47 UTC (permalink / raw)
  To: jeremiah; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 341 bytes --]

Hi Jeremiah,

jeremiah@pdp10.guru writes:

> Atleast a single attempt in the last 60 years from any military on the
> planet to deal with the risks written in the Nexus Intruder Report
> published in 1958.

Do you know where one can obtain a copy of this report?  I did an
Internet search but couldn't find anything.

-- 
Chris

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Trustworthiness of build farms (was Re: CDN performance)
  2019-01-06 20:47 ` Chris Marusich
@ 2019-01-18 10:54   ` Giovanni Biscuolo
  0 siblings, 0 replies; 11+ messages in thread
From: Giovanni Biscuolo @ 2019-01-18 10:54 UTC (permalink / raw)
  To: Chris Marusich, jeremiah; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 587 bytes --]

Chris Marusich <cmmarusich@gmail.com> writes:

> Hi Jeremiah,
>
> jeremiah@pdp10.guru writes:
>
>> Atleast a single attempt in the last 60 years from any military on the
>> planet to deal with the risks written in the Nexus Intruder Report
>> published in 1958.
>
> Do you know where one can obtain a copy of this report?  I did an
> Internet search but couldn't find anything.

me too

Jeremiah: sorry if I insist (last time, promised!) but could you give us
some more info about that report?

Thanks!
Giovanni

-- 
Giovanni Biscuolo

Xelera IT Infrastructures

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Trustworthiness of build farms (was Re: CDN performance)
@ 2019-01-20 12:24 Jeremiah
  0 siblings, 0 replies; 11+ messages in thread
From: Jeremiah @ 2019-01-20 12:24 UTC (permalink / raw)
  To: cmmarusich, g; +Cc: guix-devel

> > Do you know where one can obtain a copy of this report?  I did an
> > Internet search but couldn't find anything.

> me too

> Jeremiah: sorry if I insist (last time, promised!) but could you give us
> some more info about that report?

I am sorry for the delay, the Government shutdown really disabled access
for me in regards to the archives in which it was found.

As I am currently unable to link that resource, I'll do my best to
provide the key points:

It was a top secret report for the Department of Defense written in 1958
and declassified by the Clinton Administration.

1) Computers are being used to replace human thinking and as computers
are growing faster and faster in complexity; there is going to be a
point in the future where computers will be required to design
computers. References back to a 1952 paper about lithography (that I
couldn't find) and that it is likely that chips will replace single
piece logic and thus provide the ultimate place for hiding of malicous
functionality. 
2) It is possible to infect the software used in the designing of
Computers on elements common to all computers, which will alter the
circuits to provide weaknesses we can exploit and/or functionality to
leverage that the computer designer, builder and owner do not know
about.
3) If done on a large enough machine, there is room to include infectors
for tools such as assmblers, linkers, loaders and compilers on
functionality that can not be removed.
4) It then details how they could backdoor the Strela computer and how
it could be leveraged to compromise future Soviet computers to ensure a
permanent weapon against the Soviet Union.
5) Then it has a huge section of blacked out text
6) Then a section of possible future hooks depending on how software
evolves in the Soviet Union, thus allowing more pervasive hardware
compromises and eliminating the possibility of trustworthy computing
ever becoming possible on Soviet Computers.
7) Another big blacked out section.
8) Then the final section detailed a list of steps required for a
lithography plant to be assembled by an Intelligence Agency to prevent
their own infrastructure from being compromised by a similiar Soviet
attack; with an estimated spinup time of almost a Decade.

Examples included running traces close to the transistors to create a
radio induced functionality such as The intensional leaking of crypto
secrets upon recieving a very specific frequency.

Allowing magic numbers in a set of memory addresses or registers to
cause functionality to be engaged; such as disabling protections or
giving a process priviledges that would normally be restricted for
security reasons.

I'm sorry as I am likely missing alot of the details and attacks.
Once the Shutdown is done, I'll try again to find that paper for you.

-Jeremiah

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-01-20 12:24 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-26  2:29 Trustworthiness of build farms (was Re: CDN performance) Jeremiah
2018-12-28  7:12 ` Mark H Weaver
  -- strict thread matches above, loose matches on Subject: below --
2019-01-20 12:24 Jeremiah
2019-01-05 20:15 jeremiah
2019-01-06 20:47 ` Chris Marusich
2019-01-18 10:54   ` Giovanni Biscuolo
2019-01-04 20:57 Jeremiah
2019-01-05 17:16 ` Mark H Weaver
2018-12-22 17:23 Jeremiah
2018-12-23 16:37 ` Mark H Weaver
2018-12-03 15:43 [PATCH 0/3] Defaulting to ci.guix.info (aka. berlin.guixsd.org) Ludovic Courtès
2018-12-03 16:12 ` Using a CDN or some other mirror? Ludovic Courtès
2018-12-09  3:33   ` Chris Marusich
2018-12-09 15:59     ` CDN performance Ludovic Courtès
2018-12-13  8:05       ` Chris Marusich
2018-12-13 10:41         ` Giovanni Biscuolo
2018-12-15  1:40           ` Mark H Weaver
2018-12-19 12:40             ` Giovanni Biscuolo
2018-12-21  0:23               ` Trustworthiness of build farms (was Re: CDN performance) Mark H Weaver

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).