Re: Verifying Toolchain Semantics

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

From: William ML Leslie <william.leslie.ttg@gmail.com>
To: guile-devel <guile-devel@gnu.org>
Subject: Re: Verifying Toolchain Semantics
Date: Mon, 6 Oct 2014 00:57:49 +1100	[thread overview]
Message-ID: <CAHgd1hG_qAW54AByD9Y7mPtp3eS8zr+4A42YGPaL=NJvSrCkAg@mail.gmail.com> (raw)
In-Reply-To: <87vbo1l3ai.fsf@taylan.uni.cx>

[-- Attachment #1: Type: text/plain, Size: 9146 bytes --]

On 3 October 2014 22:56, Taylan Ulrich Bayirli/Kammer <
taylanbayirli@gmail.com> wrote:

> William ML Leslie <william.leslie.ttg@gmail.com> writes:
>
> > Oh, interesting point. Maybe we should define PDF as an abstract
> > semantics that we can convert into a wide range of equivalent document
> > layout languages? If the attacker can't tell exactly what xsl-fo or
> > dsssl the tool will output, or what software you're using to render
> > the result, it will magically make it more difficult to attack!
>
> Come on, that snark is undue. :-)
>
> It won't make it "magically" more difficult, it will make it actually
> more difficult.

It makes it harder in the same way ASLR makes attacks on a lack of memory
safety harder: there is less you can rely on, but it's still basically the
same game.

Say, for example, that I can guess you will end up viewing the document
with one of five output programs - maybe, evince, chrome, tk, gecko
translating xsl-fo to xul or firefox viewing html 4 directly, and say that
I only have exploits for the first three.  Now I have a bit of a problem: I
need to find a way to get your tool to emit the specific structure that
will invoke the exploit.  Since it will show a document that looks correct,
although underneath may have a different structure, I will have to account
for variation in what it emits but there will be a limited class of
algebraic manipulations that it can make without changing the semantics of
the document.

Maybe I would not find a way to /reliably/ produce a document that will
infect your machine.  Or maybe I would.  It's hypothetical.  But you know
as well as I do that fuzzing a document generator to see what instructions
produce exploitable output is easier than either writing a verifiable PDF
viewer or writing a tool that bisimulates a PDF document with a class of
layout languages.  The cherry on top is that all that extra code you
introduce is surely ripe with additional vulnerabilities - so maybe I don't
need to bother with producing those back-end exploits at all.

> 
>   Though I think your analogy is wrong.  PDF is already
> 
> an abstract semantics, like any formal language spec.  The question is
> 
> whether an implementation realizes precisely those semantics, or adds
> "oh and insert a backdoor" to some pieces of code in that language, "as
> an addition to the language semantics."  This augmentation to the
> semantics needn't be apparent in the source code of the implementation,
> since the language it's written in might again be compiled by a compiler
> that "augments" the semantics of the language it compiles, and that in
> turn needn't be apparent in the compiler's source ... until you arrive
> at the since long vanished initial source code that planted the virus.
>

I wasn't suggesting that someone had a thompson hack in your pdf viewer
(do you consider postscript a reasonable language for a bootstrap compiler?
:-).  I was implying that infection via PDF is similar to a Thompson hack
in an interesting way:  having many possibly pdf pipelines makes the
problem only marginally harder, and in this, the problem is dual to that of
the Thompson hack.

Where it would take a great fool to rely on syntactic properties - a
/specific set of instructions/ - to recognise and infect a C compiler in
practice, it would take similar foolishness to imagine that applying a
class of machine-describable translations to a source would so reduce the
ability to get exploitable code to come out at the other end.  It is
*possible*, at least, it is possible to get to the point where you now only
have to rely on the program consuming the output, be it x.org or a linux
vt, by verfying that the output program is well typed; but then you could
have just written the well-typed viewer and been done with it.

>
> That's just reiterating what the Thompson hack is.  It relies on us
> using a single lineage of compilers/software.  (Or a couple lineages
> could be compromised separately; that would still be plausible if it's
> plausible for one.)  Because at every layer the semantics can only be
> compromised enough to propagate the compromise through one concrete
> code-base.

No no no.  This is the point.  Anyone smart enough to have pulled of this
attack - someone who has compromised the sha function, too - is not relying
on syntactic features, they are doing decent static analysis of
higher-order behaviour.  Saying, this looks like - semantically - like
register allocation, that looks like it's producing an ELF header (very
easy to do), that has appropriate allocation and loop structure to do SSA
transformation.  Looking at the higher-order behaviour of code is not
difficult, it's exactly what Ian is suggesting we /do/ to combat the
problem.  I think it's probably easier to use this technique to propogate a
Thompson hack than to combat it.

>   In your PDF analogy, the solution is to write a spurious
> amount of PDF implementations.

Except that in doing so, you've structured your vulnerabilities.  Please
keep in mind that nobody sets out to write a PDF reader that has
vulnerabilities (or, well, maybe.  hello Adobe!) - they are there by
accident.  Prodicing a wide range of reader implementations just means
that the attacker needs to make some guesses.  You may have introduced more
severe bugs by this process.

>   Or for C, to implement a spurious amount
> of C compilers.  That is impractical because C is complex.  What might
> be practical might be to write one new C compiler (guaranteed clean,
> since it's new)

I hope by now you understand why I think this to be nonsense.  It's still a
C compiler.  You know that, I know that.  Why wouldn't the machine be able
to tell?

> , and verify GCC with that[*].  What's more useful in the
> long term is to define a much simpler language, and base our everything
> on that instead of on C.  Then we can verify our toolchain more often,
> since we needn't write a whole freaking C compiler every time. :-)
>
> [*] Compile version X of GCC with the Clean C Compiler™, compile version
> X of GCC with some GCC binary, then compile some GCC with both GCC-X
> binaries, one of which is guaranteed clean, to see if they behave same
> while compiling GCC; thus we get both a guaranteed clean GCC, and we get
> to know whether the other was compromised in first place, or whether we
> wasted a lot of time.
>
> I think Ian is doing himself a disservice by using many fancy abstract
> terms and, if I may be blunt, acting in some tinfoil hat-like ways, when
> in fact the underlying ideas are relatively sane and simple.  It's all
> there if you read a little between the lines, and a little around the
> parts written in all-caps, like in:
>
> http://lists.gnu.org/archive/html/guile-devel/2014-09/msg00035.html
>

This asssumption that I haven't been reading everything is unusual and
I'm not sure where it's coming from.  I guess Ian didn't manage to infect
me?  Maybe I wrote my own PDF viewer, maybe I use out of date software, or
maybe I just read the postscript in my terminal.

The problem that I have with this discussion is that rather than address
many* of the points in my email, he instead resorted to insisting I was a
shill.  It's very difficult to have a conversation with someone who uses
silencing tactics.  Or better - it's pointless.

If it were true, then you all lost a long time ago, I'm sorry.  So I
perpetuated a Thompson hack on a large, well-known free software codebase
while I was in primary school and without regular computer access, and
nobody noticed?  And now he doesn't realise I'm editing his emails and PDFs
in-flight to make him sound arrogant and clueless?

To be fair, I have seen similar levels of gaslighting before.  It's safe to
always assume /something/ is up.  But throwing around those sort of
allegations just because someone pokes holes in your arguments is an
acceptable cause for me to dismiss you as a nut, imho.

I don't have an issue with fancy abstract terms, but I do see clearly how
the proposition only solves a narrow subclass of the problems, and
specifically a subclass that would be better addressed with static analysis
and ideally types at the assembly level.  Show me a Thompson attack that
this 'semantic fixpoint' can work around, and I'll show you a crack that
can work around any similar 'semantic fixpoint'.

* well, he responded to the point about hardware.  If people want to know
why he's wrong, I can elaborate.  But by that point, I hoped people could
see for themselves that there's no reason to talk any further.

-- 
William Leslie

Notice:
Likely much of this email is, by the nature of copyright, covered under
copyright law.  You absolutely MAY reproduce any part of it in accordance
with the copyright law of the nation you are reading this in.  Any attempt
to DENY YOU THOSE RIGHTS would be illegal without prior contractual
agreement.

[-- Attachment #2: Type: text/html, Size: 12091 bytes --]

next prev parent reply	other threads:[~2014-10-05 13:57 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-02 14:54 Verifying Toolchain Semantics Ian Grant
2014-10-03  6:23 ` Mark H Weaver
2014-10-03  7:15   ` William ML Leslie
2014-10-03 12:56     ` Taylan Ulrich Bayirli/Kammer
2014-10-03 17:13       ` Mark H Weaver
2014-10-05 13:57       ` William ML Leslie [this message]
2014-10-03  8:45   ` Nala Ginrut
2014-10-05  1:35   ` Ian Grant
2014-10-05  5:39     ` Nala Ginrut
2014-10-05 14:14       ` Ian Grant
2014-10-05 15:15         ` Nala Ginrut
2014-10-05 16:24           ` Ian Grant
2014-10-06  8:25             ` Nala Ginrut
2014-10-05  6:58     ` Mike Gerwitz
2014-10-05 16:11       ` Ian Grant
2014-10-06  4:23         ` Mike Gerwitz
     [not found]           ` <20141006042323.GA31390-iW7gFb+/I3LZHJUXO5efmti2O/JbrIOy@public.gmane.org>
2014-10-07 17:18             ` Ian Grant
     [not found]               ` <CAKFjmdx+jzfapvrq6EEO8Skx2L2UZwi-DZ22xiq9t1438E7kOw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-07 17:28                 ` Mark H Weaver
2014-10-07 17:56                   ` Ian Grant
     [not found]                     ` <CAKFjmdwNTjJ7nU-rKEWkA+5whsfyrpqJ6RkhU+VRbUW6rqT03A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-07 19:24                       ` Philip Herron
     [not found]                         ` <CAEvRbeoEJPTtoDu0nDudJyfBoaT1vpuvHzL=t+TkJr_ZGkzYEQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-07 19:47                           ` Ian Grant
2014-10-08 18:26                       ` Mark H Weaver
2014-10-08  3:55               ` Mike Gerwitz
  -- strict thread matches above, loose matches on Subject: below --
2014-10-05 17:42 Ian Grant
2014-10-05 18:19 ` Ian Grant
2014-10-06  0:30 Ian Grant
     [not found] ` <CAKFjmdzxAMvcry8N6B_atM_8vGyzA1Dfz9ygWxSgh3fD7EUAuQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-06  8:51   ` William ML Leslie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHgd1hG_qAW54AByD9Y7mPtp3eS8zr+4A42YGPaL=NJvSrCkAg@mail.gmail.com' \
    --to=william.leslie.ttg@gmail.com \
    --cc=guile-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).