unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* Verifying Toolchain Semantics
@ 2014-10-02 14:54 Ian Grant
  2014-10-03  6:23 ` Mark H Weaver
  0 siblings, 1 reply; 27+ messages in thread
From: Ian Grant @ 2014-10-02 14:54 UTC (permalink / raw)
  To: llvmdev, gcc@gcc.gnu.org, guile-devel, mlton-devel
  Cc: Linus Torvalds, Theo deRaadt, Richard Stallman, Markus Kuhn,
	schellr, John Harrison, monnier, Taylan Bayırlı

Dear programming language types,

I wrote this to try once again to explain what is the nature of the
problem that one would have in verifying the integrity of _any_
software toolchain, whether it is aimed ultimately at the production
of other software, or of hardware.

    http://livelogic.blogspot.com/2014/10/the-foundation-part-i.html

This three page text is ostensively about verifying the integrity of a
communications link operating over an un-trusted transport layer, but
a compiler is really a type of communications channel.

I am sure everyone still reading this has wondered about the
possibilities of using universal programming languages (universal in
the Church-Turing sense) as communications protocols. For example, one
could establish a point-to-point connection by writing a program which,
when run, output two more programs: one which, when run, outputs a
decoding pad for the next mesage one would transmit over that channel,
and the other the decoder which prints the message text together with
another program, which was the encoder for returning an
acknowledgement. Both endpoints would do this, and so the
programs/messages would be exchanged, with each one encoding the text
of the other.  Then these programs could include decisions not only
about the encoding of data, choice of one-time pads, etc. but perhaps
also the routes of messages, sending different parts via different
trusted routes over similar channels etc. etc. The variations are
endless, and limited only by the ingenuity of the programmers
communicating over those channels.

And really, I sorely pity anyone charged with organising any kind of
surveillance of a group of people who enjoy that sort of
game. Cracking "the code" would be practically impossible, because
there need never be any fixed concrete representation whatsoever of
the fundamental encoding as it is transmitted over the transport
medium: _all_ of the knowledge about the current and future encodings
can be sent over the previous "incarnations" of that and/or another
channel, and the encoding of channels thereby made non-deterministic:
this means that there could never be _in principle,_ any mechanical
process _whatsoever_ which could decode more than a few parts of any
of those messages. After this brief success, the poor would-be spy
would be right back at square one.

What I try to explain here is the essential distinction between what I
call _actual knowledge,_ as opposed to mere _represented knowledge,_
such as a password, or an SSL certificate, or the documentation for
some file format appearing on a web page. The distinction is that only
in the case of actual knowledge does one know _how_ and _why_ one
knows.

The motivation is the idea that by using actual rather than
represented knowledge, it is possible to construct such a trustworthy
system in practice. But there's a catch! The catch is that this will
only work for an organisation whose motives and governance are
completely open and transparent. This is because the technique relies
upon mutual trust, which is something that cannot exist without
openness and transparency. Bad guys just won't get it! To understand
why (in case it is not immediately obvious to you, that is) you will
need to read (or at least think carefully about) about how
error-detection would work in such a system.

The text consists of a title page with the abstract, and two full
pages. So it should be an easy read. I earlier sent out a nine page
document entitled GNU Thunder, in which I attempted to describe what I
consider to be essentially the same idea, but with the emphasis on
compilers and interpreters, rather than communications. The Thunder
text was a concrete suggestion for an implementation. This text
however is more abstract. But these two documents could be
considered to be complementary in two different senses.

I hope everyone enjoys this, and that it stimulates some interesting
thoughts, and subsequent discussion, and that those thoughts are all
good, and the discussion open and transparent. That way we could save
ourselves an awful lot of _really hairy_ metaprogramming!

Please feel free to copy either text and/or this message and pass it
around. Neither of these two texts are copyright, and the more people
that see them the better. Bad guys in particular need to know about
this much more than the good ones do.

Ian


^ permalink raw reply	[flat|nested] 27+ messages in thread
* Re: Verifying Toolchain Semantics
@ 2014-10-05 17:42 Ian Grant
  2014-10-05 18:19 ` Ian Grant
  0 siblings, 1 reply; 27+ messages in thread
From: Ian Grant @ 2014-10-05 17:42 UTC (permalink / raw)
  To: guile-devel, Taylan Bayırlı, William Leslie,
	Richard Stallman, Mark H Weaver

Taylan wrote:
> In your PDF analogy, the solution is to write a spurious
> amount of PDF implementations.  Or for C, to implement
> a spurious amount of C compilers.  That is impractical
> because C is complex.

It's not as complex as you might think. In the space of a couple of
months, I wrote what I reckon is 50% of a C interpreter all on my own.
And I have the basis of an abstract language for specifying semantics
as just sets of term-rewriting rules, which work on AST
representations of program source.

https://github.com/IanANGrant/red-october/blob/master/src/dynlibs/ffi/CPPLexer.lex
https://github.com/IanANGrant/red-october/blob/master/src/dynlibs/ffi/CParser.grm
https://github.com/IanANGrant/red-october/blob/master/src/dynlibs/ffi/CSyntax.sml

> What might be practical might be to write one new C
> compiler (guaranteed clean, since it's new), and verify
> GCC with that[*].

It will take a year or so to get a C compiler written in C working. So
it won't be new once you've got it to the point of being able to
compile GCC. And by the time that happens, if there has been a
compiler trap door, then the GCC binaries, and probably your "new" C
compiler binaries would "know about" each other. So you would not have
any more reason to believe the
new binaries are good than you had at the beginning. You will have
wasted a year.

> What's more useful in the
> long term is to define a much simpler language, and base
> our everything on that instead of on C.  Then we can verify
> our toolchain more often, since we needn't write a whole
> freaking C compiler every time. :-)

Almost, but that "Much Simpler Language" needs to be one designed to
write language interpreters by reading abstract specifications of the
syntax and semantics from "metadata" files. Then a standard
"interpreter interpreter for the 'MSL'" can be defined to do this job
automatically, and it could even implement itself. See the Reynolds'
paper I referenced in my first reply to William.

Then we will be able to write "a spurious number of whole freakin' C
interpreters" So for example, as I said to Richard a month ago, we
will be able to implement a C compiler in Microsoft Word BASIC, or in
COBOL, and that will be capable of compiling GCC, if we had a year or
so to wait while it does it ... but it could still be capable of
compiling checksumming programs, or diff programs, or unzipping
programs, and these would be extra data points to check whether other
tools like binary file viewers etc, have been compromised by a
compiler trap door.

Ian



^ permalink raw reply	[flat|nested] 27+ messages in thread
* Re: Verifying Toolchain Semantics
@ 2014-10-06  0:30 Ian Grant
       [not found] ` <CAKFjmdzxAMvcry8N6B_atM_8vGyzA1Dfz9ygWxSgh3fD7EUAuQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 27+ messages in thread
From: Ian Grant @ 2014-10-06  0:30 UTC (permalink / raw)
  To: William Leslie, Taylan Bayırlı, guile-devel, lightning

http://lists.gnu.org/archive/html/guile-devel/2014-10/msg00016.html

From:    William ML Leslie
Date:    Mon, 6 Oct 2014 00:57:49 +1100
On 3 October 2014 22:56, Taylan Ulrich Bayirli/Kammer <address@hidden> wrote:

> Say, for example, that I can guess you will end up viewing the
> document with one of five output programs - maybe, evince, chrome,
> tk, gecko translating xsl-fo to xul or firefox viewing html 4
> directly, and say that I only have exploits for the first three.
> Now I have a bit of a problem: I need to find a way to get your tool
> to emit the specific structure that will invoke the exploit.  Since
> it will show a document that looks correct, although underneath may
> have a different structure, I will have to account for variation in
> what it emits but there will be a limited class of algebraic
> manipulations that it can make without changing the semantics of the
> document.

It's not about this. These are all the programs that have problems.
We're not going to use these sorts of programs anymore. We're going to
use auto-generated, nearly perfect code. And after a few years it
won't be just near perfect, it'll be perfect.

> Maybe I would not find a way to /reliably/ produce a document that
> will infect your machine.  Or maybe I would.  It's hypothetical.
> But you know as well as I do that fuzzing a document generator to
> see what instructions produce exploitable output is easier than
> either writing a verifiable PDF viewer or writing a tool that
> bisimulates a PDF document with a class of layout languages.  The
> cherry on top is that all that extra code you introduce is surely
> ripe with additional vulnerabilities - so maybe I don't need to
> bother with producing those back-end exploits at all.

This is wrong. If you look at the automatically-generated assembler
code from a formal spec, in the form of a functional program
generating that code:

  https://github.com/IanANGrant/red-october/blob/master/src/dynlibs/ffi/AbstractMachineWord.sml

you will see that the extra code has no extra attack vectors. It's all
machine-generated. If this were a piece of code for reading a binary
file, then it would have been generated from a grammar describing
exactly what are the valid words in that language. (Using words and
language in the technical sense as "valid strings" and "set of all
valid strings".) So buffer overruns won't happen. And if all the
semantics are formalised, then off-by-one errors won't happen
either. If they do, they'll happen all over the place and you will
find out very quickly.

> Though I think your analogy is wrong.  PDF is already an abstract
> semantics, like any formal language spec.  The question is whether
> an implementation realizes precisely those semantics, or adds "oh
> and insert a backdoor" to some pieces of code in that language, "as
> an addition to the language semantics."  This augmentation to the
> semantics needn't be apparent in the source code of the
> implementation, since the language it's written in might again be
> compiled by a compiler that "augments" the semantics of the language
> it compiles, and that in turn needn't be apparent in the compiler's
> source ... until you arrive at the since long vanished initial
> source code that planted the virus.

> I hope by now you understand why I think this to be nonsense.  It's
> still a C compiler.  You know that, I know that.  Why wouldn't the
> machine be able to tell?

Because it's formally undecidable. That's Rice's Theorem. And it's not
"in practice" probably detectable by semantantic methods, because we
are going to extra lengths to obscure the effective operational
semantics by multiple re-encodings. How are you going to recognise a
compiled Fortran90 program translating a C program into an abstract
assembler like lightning, as a C compiler? You don't even know what
the opcode numbers are: we can permute them.

> The problem that I have with this discussion is that rather than
> address many* of the points in my email, he instead resorted to
> insisting I was a shill.  It's very difficult to have a conversation
> with someone who uses silencing tactics.  Or better - it's
> pointless.

What are you talking about? Here is what you wrote to me:

   http://lists.gnu.org/archive/html/guile-devel/2014-09/msg00036.html

> Please understand that our lack of enthusiasm for it is not without
> our own individual commitments to the security and freedoms of
> users; rather while the extra assurances your project may provide
> sound nice, they are frankly rather minimal and don't have much
> additional utility, and so are not priority one.

Now if you wanted me to continue answering your questions, then well,
I'm sorry, I just got the wrong end of stick completely. I thought you
wanted me to fuck off!

So I replied as below. To you, privately. And that was it. Silencing
tactics? You then copied it to the world, and a guy who I _told_ you,
was deputy director of the NSA's National Computer Security Centre!

And I didn't mind. (I did wonder what the hell does he think he's
doing?! But it's your privilege to say what you want, to whom you want,
and none of my business.)

Date: Sun, 7 Sep 2014 11:39:44 +1000
Message-ID: <CAHgd1hEP2=dnqXTQx-EdozAXc0vjYh0ucthbyWFnPecQqB=7pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Subject: Re: GNU Thunder
From: William ML Leslie <william.leslie.ttg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: Ian Grant <ian.a.n.grant-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>, guile-devel <guile-devel-mXXj517/zsQ@public.gmane.org>,
    lightning <lightning-mXXj517/zsQ@public.gmane.org>, schellr-EkmVulN54Sk@public.gmane.org
X-List-Received-Date: Sun, 07 Sep 2014 01:39:50 -0000

--e89a8ff1ca20e0b37105026fc673
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 7 September 2014 01:49, Ian Grant <ian.a.n.grant-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:

> You are just not at all convincing, I'm afraid. Tell your boss they didn'=
t
> train you properly, and can you get assigned somewhere else.
>

=E2=80=8BUnfortunately, this is a reality we have to deal with when discuss=
ing
security on the internet: we've got to assume the enemy is listening.  It
could be me, I could even be intercepting your emails and making them sound
incoherrent, and upping the tinfoil-hat quotient by =E2=80=8Badding people =
like
Theo and Linus to the To: list.  I could be suggesting that effort be spent
in a project that will provide very little value, wasting your time.

Remember:  I'm not suggesting what the outcome of your project will be,
just that if the result is negative, we still know nothing.  When testing a
system for subterfuge, we need to examine *all* of the moving parts, even
those that appear to be unused.  If the system you're building your
assembler on is compromised, it can still give you a negative answer.
That's what was so scary about this particular type of attack.

--=20
William Leslie

[...]

Here's my e-mail:

MIME-Version: 1.0
Received: by 10.194.219.234 with HTTP; Sat, 6 Sep 2014 08:49:15 -0700 (PDT)
In-Reply-To: <CAHgd1hH+VD9HqE=pRZOHaPK-fOgMJbHpv_RgPt5LUdLkTXCBwA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
References: <CAKFjmdzP89qSD03_MGqS1UawQvaq6yvme-abKcmLuA8DfUQE+A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
    <E1XKsRt-0002zo-64-iW7gFb+/I3LZHJUXO5efmti2O/JbrIOy@public.gmane.org>
    <CAKFjmdyUphk2LmDdDE_7gkDSKAu4COurtvafBwO5XwCgyM1OfA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
    <E1XLP5I-0005zC-Sn-iW7gFb+/I3LZHJUXO5efmti2O/JbrIOy@public.gmane.org>
    <CAKFjmdxt4jWAHAYXjzwPeUw+dTUBTPC94YJDNifsO7JVkNHjTQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
    <E1XNssG-00083e-BB-iW7gFb+/I3LZHJUXO5efmti2O/JbrIOy@public.gmane.org>
    <CAKFjmdzdkC5rW3DjBRbRueSiBs9dmJUSxGkPp2-6uc7SH9o79g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
    <E1XPA13-0002hD-Kp-iW7gFb+/I3LZHJUXO5efmti2O/JbrIOy@public.gmane.org>
    <CAKFjmdyc0D5vYBK=rQKzKNK+WRmWhkkL-RXqBMSHvhOzX3fHiw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
    <E1XPifZ-0004g6-KA-iW7gFb+/I3LZHJUXO5efmti2O/JbrIOy@public.gmane.org>
    <CAKFjmdwEs8TtPZjXWDYKoQXz4FEKy6p3T9+8jWLMY87Onn=VaQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
    <CAHgd1hH+VD9HqE=pRZOHaPK-fOgMJbHpv_RgPt5LUdLkTXCBwA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Date: Sat, 6 Sep 2014 11:49:15 -0400
Delivered-To: ian.a.n.grant-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org
Message-ID: <CAKFjmdzzRoTdzWxAOh2byRbWDtszWZtt8Z2k0eLCFYR+qmAC9g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Subject: Re: GNU Thunder
From: Ian Grant <ian.a.n.grant-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>
To: William ML Leslie <william.leslie.ttg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Content-Type: multipart/alternative; boundary=001a1136144a26f19e05026787e6

--001a1136144a26f19e05026787e6
Content-Type: text/plain; charset=UTF-8

You are just not at all convincing, I'm afraid. Tell your boss they didn't
train you properly, and can you get assigned somewhere else.

Ian

--001a1136144a26f19e05026787e6
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">You are just not at all convincing, I&#39;m afraid. Tell y=
our boss they didn&#39;t train you properly, and can you get assigned somew=
here else.<br><div class=3D"gmail_extra"><br></div><div class=3D"gmail_extr=
a">Ian<br></div><div class=3D"gmail_extra"><br></div></div>

--001a1136144a26f19e05026787e6--

=============================================================================

You see, I thought you were a Microsoft plant. I just couldn't figure
out any other reason why someone who clearly didn't know anything
about computers, would say they were going to write a secure operating
system! :-)

Ian

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2014-10-08 18:26 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-02 14:54 Verifying Toolchain Semantics Ian Grant
2014-10-03  6:23 ` Mark H Weaver
2014-10-03  7:15   ` William ML Leslie
2014-10-03 12:56     ` Taylan Ulrich Bayirli/Kammer
2014-10-03 17:13       ` Mark H Weaver
2014-10-05 13:57       ` William ML Leslie
2014-10-03  8:45   ` Nala Ginrut
2014-10-05  1:35   ` Ian Grant
2014-10-05  5:39     ` Nala Ginrut
2014-10-05 14:14       ` Ian Grant
2014-10-05 15:15         ` Nala Ginrut
2014-10-05 16:24           ` Ian Grant
2014-10-06  8:25             ` Nala Ginrut
2014-10-05  6:58     ` Mike Gerwitz
2014-10-05 16:11       ` Ian Grant
2014-10-06  4:23         ` Mike Gerwitz
     [not found]           ` <20141006042323.GA31390-iW7gFb+/I3LZHJUXO5efmti2O/JbrIOy@public.gmane.org>
2014-10-07 17:18             ` Ian Grant
     [not found]               ` <CAKFjmdx+jzfapvrq6EEO8Skx2L2UZwi-DZ22xiq9t1438E7kOw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-07 17:28                 ` Mark H Weaver
2014-10-07 17:56                   ` Ian Grant
     [not found]                     ` <CAKFjmdwNTjJ7nU-rKEWkA+5whsfyrpqJ6RkhU+VRbUW6rqT03A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-07 19:24                       ` Philip Herron
     [not found]                         ` <CAEvRbeoEJPTtoDu0nDudJyfBoaT1vpuvHzL=t+TkJr_ZGkzYEQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-07 19:47                           ` Ian Grant
2014-10-08 18:26                       ` Mark H Weaver
2014-10-08  3:55               ` Mike Gerwitz
  -- strict thread matches above, loose matches on Subject: below --
2014-10-05 17:42 Ian Grant
2014-10-05 18:19 ` Ian Grant
2014-10-06  0:30 Ian Grant
     [not found] ` <CAKFjmdzxAMvcry8N6B_atM_8vGyzA1Dfz9ygWxSgh3fD7EUAuQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-06  8:51   ` William ML Leslie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).