* Verifying Toolchain Semantics
@ 2014-10-02 14:54 Ian Grant
2014-10-03 6:23 ` Mark H Weaver
0 siblings, 1 reply; 27+ messages in thread
From: Ian Grant @ 2014-10-02 14:54 UTC (permalink / raw)
To: llvmdev, gcc@gcc.gnu.org, guile-devel, mlton-devel
Cc: Linus Torvalds, Theo deRaadt, Richard Stallman, Markus Kuhn,
schellr, John Harrison, monnier, Taylan Bayırlı
Dear programming language types,
I wrote this to try once again to explain what is the nature of the
problem that one would have in verifying the integrity of _any_
software toolchain, whether it is aimed ultimately at the production
of other software, or of hardware.
http://livelogic.blogspot.com/2014/10/the-foundation-part-i.html
This three page text is ostensively about verifying the integrity of a
communications link operating over an un-trusted transport layer, but
a compiler is really a type of communications channel.
I am sure everyone still reading this has wondered about the
possibilities of using universal programming languages (universal in
the Church-Turing sense) as communications protocols. For example, one
could establish a point-to-point connection by writing a program which,
when run, output two more programs: one which, when run, outputs a
decoding pad for the next mesage one would transmit over that channel,
and the other the decoder which prints the message text together with
another program, which was the encoder for returning an
acknowledgement. Both endpoints would do this, and so the
programs/messages would be exchanged, with each one encoding the text
of the other. Then these programs could include decisions not only
about the encoding of data, choice of one-time pads, etc. but perhaps
also the routes of messages, sending different parts via different
trusted routes over similar channels etc. etc. The variations are
endless, and limited only by the ingenuity of the programmers
communicating over those channels.
And really, I sorely pity anyone charged with organising any kind of
surveillance of a group of people who enjoy that sort of
game. Cracking "the code" would be practically impossible, because
there need never be any fixed concrete representation whatsoever of
the fundamental encoding as it is transmitted over the transport
medium: _all_ of the knowledge about the current and future encodings
can be sent over the previous "incarnations" of that and/or another
channel, and the encoding of channels thereby made non-deterministic:
this means that there could never be _in principle,_ any mechanical
process _whatsoever_ which could decode more than a few parts of any
of those messages. After this brief success, the poor would-be spy
would be right back at square one.
What I try to explain here is the essential distinction between what I
call _actual knowledge,_ as opposed to mere _represented knowledge,_
such as a password, or an SSL certificate, or the documentation for
some file format appearing on a web page. The distinction is that only
in the case of actual knowledge does one know _how_ and _why_ one
knows.
The motivation is the idea that by using actual rather than
represented knowledge, it is possible to construct such a trustworthy
system in practice. But there's a catch! The catch is that this will
only work for an organisation whose motives and governance are
completely open and transparent. This is because the technique relies
upon mutual trust, which is something that cannot exist without
openness and transparency. Bad guys just won't get it! To understand
why (in case it is not immediately obvious to you, that is) you will
need to read (or at least think carefully about) about how
error-detection would work in such a system.
The text consists of a title page with the abstract, and two full
pages. So it should be an easy read. I earlier sent out a nine page
document entitled GNU Thunder, in which I attempted to describe what I
consider to be essentially the same idea, but with the emphasis on
compilers and interpreters, rather than communications. The Thunder
text was a concrete suggestion for an implementation. This text
however is more abstract. But these two documents could be
considered to be complementary in two different senses.
I hope everyone enjoys this, and that it stimulates some interesting
thoughts, and subsequent discussion, and that those thoughts are all
good, and the discussion open and transparent. That way we could save
ourselves an awful lot of _really hairy_ metaprogramming!
Please feel free to copy either text and/or this message and pass it
around. Neither of these two texts are copyright, and the more people
that see them the better. Bad guys in particular need to know about
this much more than the good ones do.
Ian
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Verifying Toolchain Semantics
2014-10-02 14:54 Verifying Toolchain Semantics Ian Grant
@ 2014-10-03 6:23 ` Mark H Weaver
2014-10-03 7:15 ` William ML Leslie
` (2 more replies)
0 siblings, 3 replies; 27+ messages in thread
From: Mark H Weaver @ 2014-10-03 6:23 UTC (permalink / raw)
To: Ian Grant; +Cc: guile-devel
Ian Grant <ian.a.n.grant@googlemail.com> writes:
> Dear programming language types,
>
> I wrote this to try once again to explain what is the nature of the
> problem that one would have in verifying the integrity of _any_
> software toolchain, whether it is aimed ultimately at the production
> of other software, or of hardware.
>
> http://livelogic.blogspot.com/2014/10/the-foundation-part-i.html
I downloaded the PDF linked in that blog entry and attempted to view it
using Emacs's docview mode, which reported that the pdf->png process
died with a segfault.
It's ironic that someone who claims to be so concerned with security
steadfastly refuses to provide his most important essays in a simple,
transparent format. Instead, he insists to distribute them in an opaque
format that can only be interpreted by a small handful of very complex
programs with a large attack surface.
For that matter, it's also interesting that someone concerned about
Thompson viruses would suggest that Guile should distribute it's
compiler in the form of pre-compiled intermediate C code (compiled from
Scheme) instead of bootstrapping from source code, in order to speed up
the compilation process.
I've wasted more time than I should have reading Ian's writings, looking
for an answer to this apparent contradiction in his views, and I haven't
found it.
While we're on the subject of paranoid theories, here's one for you:
maybe Ian Grant's true motive is to induce some of the most important
developers of free toolchains and the Linux kernel to load PDFs that
infect their computers with malware, in order to subvert our core
infrastructure.
Ian: tell me again, why do you refuse to distribute your essays in plain
text? I read GNU Thunder and I don't remember seeing anything in there
that justifies the use of such a complex format. As I recall, it's just
plain text anyway.
Mark
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Verifying Toolchain Semantics
2014-10-03 6:23 ` Mark H Weaver
@ 2014-10-03 7:15 ` William ML Leslie
2014-10-03 12:56 ` Taylan Ulrich Bayirli/Kammer
2014-10-03 8:45 ` Nala Ginrut
2014-10-05 1:35 ` Ian Grant
2 siblings, 1 reply; 27+ messages in thread
From: William ML Leslie @ 2014-10-03 7:15 UTC (permalink / raw)
To: guile-devel
[-- Attachment #1: Type: text/plain, Size: 929 bytes --]
On 3 October 2014 16:23, Mark H Weaver <mhw@netris.org> wrote:
> Instead, he insists to distribute them in an opaque
> format that can only be interpreted by a small handful of very complex
> programs with a large attack surface.
>
Oh, interesting point. Maybe we should define PDF as an abstract
semantics that we can convert into a wide range of equivalent document
layout languages? If the attacker can't tell exactly what xsl-fo or dsssl
the tool will output, or what software you're using to render the result,
it will magically make it more difficult to attack!
--
William Leslie
Notice:
Likely much of this email is, by the nature of copyright, covered under
copyright law. You absolutely MAY reproduce any part of it in accordance
with the copyright law of the nation you are reading this in. Any attempt
to DENY YOU THOSE RIGHTS would be illegal without prior contractual
agreement.
[-- Attachment #2: Type: text/html, Size: 1349 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Verifying Toolchain Semantics
2014-10-03 7:15 ` William ML Leslie
@ 2014-10-03 12:56 ` Taylan Ulrich Bayirli/Kammer
2014-10-03 17:13 ` Mark H Weaver
2014-10-05 13:57 ` William ML Leslie
0 siblings, 2 replies; 27+ messages in thread
From: Taylan Ulrich Bayirli/Kammer @ 2014-10-03 12:56 UTC (permalink / raw)
To: William ML Leslie; +Cc: guile-devel
William ML Leslie <william.leslie.ttg@gmail.com> writes:
> Oh, interesting point. Maybe we should define PDF as an abstract
> semantics that we can convert into a wide range of equivalent document
> layout languages? If the attacker can't tell exactly what xsl-fo or
> dsssl the tool will output, or what software you're using to render
> the result, it will magically make it more difficult to attack!
Come on, that snark is undue. :-)
It won't make it "magically" more difficult, it will make it actually
more difficult. Though I think your analogy is wrong. PDF is already
an abstract semantics, like any formal language spec. The question is
whether an implementation realizes precisely those semantics, or adds
"oh and insert a backdoor" to some pieces of code in that language, "as
an addition to the language semantics." This augmentation to the
semantics needn't be apparent in the source code of the implementation,
since the language it's written in might again be compiled by a compiler
that "augments" the semantics of the language it compiles, and that in
turn needn't be apparent in the compiler's source ... until you arrive
at the since long vanished initial source code that planted the virus.
That's just reiterating what the Thompson hack is. It relies on us
using a single lineage of compilers/software. (Or a couple lineages
could be compromised separately; that would still be plausible if it's
plausible for one.) Because at every layer the semantics can only be
compromised enough to propagate the compromise through one concrete
code-base. In your PDF analogy, the solution is to write a spurious
amount of PDF implementations. Or for C, to implement a spurious amount
of C compilers. That is impractical because C is complex. What might
be practical might be to write one new C compiler (guaranteed clean,
since it's new), and verify GCC with that[*]. What's more useful in the
long term is to define a much simpler language, and base our everything
on that instead of on C. Then we can verify our toolchain more often,
since we needn't write a whole freaking C compiler every time. :-)
[*] Compile version X of GCC with the Clean C Compiler™, compile version
X of GCC with some GCC binary, then compile some GCC with both GCC-X
binaries, one of which is guaranteed clean, to see if they behave same
while compiling GCC; thus we get both a guaranteed clean GCC, and we get
to know whether the other was compromised in first place, or whether we
wasted a lot of time.
I think Ian is doing himself a disservice by using many fancy abstract
terms and, if I may be blunt, acting in some tinfoil hat-like ways, when
in fact the underlying ideas are relatively sane and simple. It's all
there if you read a little between the lines, and a little around the
parts written in all-caps, like in:
http://lists.gnu.org/archive/html/guile-devel/2014-09/msg00035.html
Taylan
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Verifying Toolchain Semantics
2014-10-03 12:56 ` Taylan Ulrich Bayirli/Kammer
@ 2014-10-03 17:13 ` Mark H Weaver
2014-10-05 13:57 ` William ML Leslie
1 sibling, 0 replies; 27+ messages in thread
From: Mark H Weaver @ 2014-10-03 17:13 UTC (permalink / raw)
To: Taylan Ulrich Bayirli/Kammer; +Cc: guile-devel
Taylan Ulrich Bayirli/Kammer <taylanbayirli@gmail.com> writes:
> In your PDF analogy, the solution is to write a spurious
> amount of PDF implementations. Or for C, to implement a spurious amount
> of C compilers. That is impractical because C is complex. What might
> be practical might be to write one new C compiler (guaranteed clean,
> since it's new), and verify GCC with that[*]. What's more useful in the
> long term is to define a much simpler language, and base our everything
> on that instead of on C.
Okay, so why does he insist on using PDF to distribute an essay that is
just plain text anyway? He should practice what he preaches.
Mark
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Verifying Toolchain Semantics
2014-10-03 12:56 ` Taylan Ulrich Bayirli/Kammer
2014-10-03 17:13 ` Mark H Weaver
@ 2014-10-05 13:57 ` William ML Leslie
1 sibling, 0 replies; 27+ messages in thread
From: William ML Leslie @ 2014-10-05 13:57 UTC (permalink / raw)
To: guile-devel
[-- Attachment #1: Type: text/plain, Size: 9146 bytes --]
On 3 October 2014 22:56, Taylan Ulrich Bayirli/Kammer <
taylanbayirli@gmail.com> wrote:
> William ML Leslie <william.leslie.ttg@gmail.com> writes:
>
> > Oh, interesting point. Maybe we should define PDF as an abstract
> > semantics that we can convert into a wide range of equivalent document
> > layout languages? If the attacker can't tell exactly what xsl-fo or
> > dsssl the tool will output, or what software you're using to render
> > the result, it will magically make it more difficult to attack!
>
> Come on, that snark is undue. :-)
>
> It won't make it "magically" more difficult, it will make it actually
> more difficult.
It makes it harder in the same way ASLR makes attacks on a lack of memory
safety harder: there is less you can rely on, but it's still basically the
same game.
Say, for example, that I can guess you will end up viewing the document
with one of five output programs - maybe, evince, chrome, tk, gecko
translating xsl-fo to xul or firefox viewing html 4 directly, and say that
I only have exploits for the first three. Now I have a bit of a problem: I
need to find a way to get your tool to emit the specific structure that
will invoke the exploit. Since it will show a document that looks correct,
although underneath may have a different structure, I will have to account
for variation in what it emits but there will be a limited class of
algebraic manipulations that it can make without changing the semantics of
the document.
Maybe I would not find a way to /reliably/ produce a document that will
infect your machine. Or maybe I would. It's hypothetical. But you know
as well as I do that fuzzing a document generator to see what instructions
produce exploitable output is easier than either writing a verifiable PDF
viewer or writing a tool that bisimulates a PDF document with a class of
layout languages. The cherry on top is that all that extra code you
introduce is surely ripe with additional vulnerabilities - so maybe I don't
need to bother with producing those back-end exploits at all.
>
> Though I think your analogy is wrong. PDF is already
>
> an abstract semantics, like any formal language spec. The question is
>
> whether an implementation realizes precisely those semantics, or adds
> "oh and insert a backdoor" to some pieces of code in that language, "as
> an addition to the language semantics." This augmentation to the
> semantics needn't be apparent in the source code of the implementation,
> since the language it's written in might again be compiled by a compiler
> that "augments" the semantics of the language it compiles, and that in
> turn needn't be apparent in the compiler's source ... until you arrive
> at the since long vanished initial source code that planted the virus.
>
I wasn't suggesting that someone had a thompson hack in your pdf viewer
(do you consider postscript a reasonable language for a bootstrap compiler?
:-). I was implying that infection via PDF is similar to a Thompson hack
in an interesting way: having many possibly pdf pipelines makes the
problem only marginally harder, and in this, the problem is dual to that of
the Thompson hack.
Where it would take a great fool to rely on syntactic properties - a
/specific set of instructions/ - to recognise and infect a C compiler in
practice, it would take similar foolishness to imagine that applying a
class of machine-describable translations to a source would so reduce the
ability to get exploitable code to come out at the other end. It is
*possible*, at least, it is possible to get to the point where you now only
have to rely on the program consuming the output, be it x.org or a linux
vt, by verfying that the output program is well typed; but then you could
have just written the well-typed viewer and been done with it.
>
> That's just reiterating what the Thompson hack is. It relies on us
> using a single lineage of compilers/software. (Or a couple lineages
> could be compromised separately; that would still be plausible if it's
> plausible for one.) Because at every layer the semantics can only be
> compromised enough to propagate the compromise through one concrete
> code-base.
No no no. This is the point. Anyone smart enough to have pulled of this
attack - someone who has compromised the sha function, too - is not relying
on syntactic features, they are doing decent static analysis of
higher-order behaviour. Saying, this looks like - semantically - like
register allocation, that looks like it's producing an ELF header (very
easy to do), that has appropriate allocation and loop structure to do SSA
transformation. Looking at the higher-order behaviour of code is not
difficult, it's exactly what Ian is suggesting we /do/ to combat the
problem. I think it's probably easier to use this technique to propogate a
Thompson hack than to combat it.
> In your PDF analogy, the solution is to write a spurious
> amount of PDF implementations.
Except that in doing so, you've structured your vulnerabilities. Please
keep in mind that nobody sets out to write a PDF reader that has
vulnerabilities (or, well, maybe. hello Adobe!) - they are there by
accident. Prodicing a wide range of reader implementations just means
that the attacker needs to make some guesses. You may have introduced more
severe bugs by this process.
> Or for C, to implement a spurious amount
> of C compilers. That is impractical because C is complex. What might
> be practical might be to write one new C compiler (guaranteed clean,
> since it's new)
I hope by now you understand why I think this to be nonsense. It's still a
C compiler. You know that, I know that. Why wouldn't the machine be able
to tell?
> , and verify GCC with that[*]. What's more useful in the
> long term is to define a much simpler language, and base our everything
> on that instead of on C. Then we can verify our toolchain more often,
> since we needn't write a whole freaking C compiler every time. :-)
>
> [*] Compile version X of GCC with the Clean C Compiler™, compile version
> X of GCC with some GCC binary, then compile some GCC with both GCC-X
> binaries, one of which is guaranteed clean, to see if they behave same
> while compiling GCC; thus we get both a guaranteed clean GCC, and we get
> to know whether the other was compromised in first place, or whether we
> wasted a lot of time.
>
> I think Ian is doing himself a disservice by using many fancy abstract
> terms and, if I may be blunt, acting in some tinfoil hat-like ways, when
> in fact the underlying ideas are relatively sane and simple. It's all
> there if you read a little between the lines, and a little around the
> parts written in all-caps, like in:
>
> http://lists.gnu.org/archive/html/guile-devel/2014-09/msg00035.html
>
This asssumption that I haven't been reading everything is unusual and
I'm not sure where it's coming from. I guess Ian didn't manage to infect
me? Maybe I wrote my own PDF viewer, maybe I use out of date software, or
maybe I just read the postscript in my terminal.
The problem that I have with this discussion is that rather than address
many* of the points in my email, he instead resorted to insisting I was a
shill. It's very difficult to have a conversation with someone who uses
silencing tactics. Or better - it's pointless.
If it were true, then you all lost a long time ago, I'm sorry. So I
perpetuated a Thompson hack on a large, well-known free software codebase
while I was in primary school and without regular computer access, and
nobody noticed? And now he doesn't realise I'm editing his emails and PDFs
in-flight to make him sound arrogant and clueless?
To be fair, I have seen similar levels of gaslighting before. It's safe to
always assume /something/ is up. But throwing around those sort of
allegations just because someone pokes holes in your arguments is an
acceptable cause for me to dismiss you as a nut, imho.
I don't have an issue with fancy abstract terms, but I do see clearly how
the proposition only solves a narrow subclass of the problems, and
specifically a subclass that would be better addressed with static analysis
and ideally types at the assembly level. Show me a Thompson attack that
this 'semantic fixpoint' can work around, and I'll show you a crack that
can work around any similar 'semantic fixpoint'.
* well, he responded to the point about hardware. If people want to know
why he's wrong, I can elaborate. But by that point, I hoped people could
see for themselves that there's no reason to talk any further.
--
William Leslie
Notice:
Likely much of this email is, by the nature of copyright, covered under
copyright law. You absolutely MAY reproduce any part of it in accordance
with the copyright law of the nation you are reading this in. Any attempt
to DENY YOU THOSE RIGHTS would be illegal without prior contractual
agreement.
[-- Attachment #2: Type: text/html, Size: 12091 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Verifying Toolchain Semantics
2014-10-03 6:23 ` Mark H Weaver
2014-10-03 7:15 ` William ML Leslie
@ 2014-10-03 8:45 ` Nala Ginrut
2014-10-05 1:35 ` Ian Grant
2 siblings, 0 replies; 27+ messages in thread
From: Nala Ginrut @ 2014-10-03 8:45 UTC (permalink / raw)
To: Mark H Weaver; +Cc: Ian Grant, guile-devel
On Fri, 2014-10-03 at 02:23 -0400, Mark H Weaver wrote:
> Ian Grant <ian.a.n.grant@googlemail.com> writes:
>
> > Dear programming language types,
> >
> > I wrote this to try once again to explain what is the nature of the
> > problem that one would have in verifying the integrity of _any_
> > software toolchain, whether it is aimed ultimately at the production
> > of other software, or of hardware.
> >
> > http://livelogic.blogspot.com/2014/10/the-foundation-part-i.html
>
> I downloaded the PDF linked in that blog entry and attempted to view it
> using Emacs's docview mode, which reported that the pdf->png process
> died with a segfault.
>
I encountered error when I tried to open it yesterday, several times. So
it maybe not a data transmitting problem.
> It's ironic that someone who claims to be so concerned with security
> steadfastly refuses to provide his most important essays in a simple,
> transparent format. Instead, he insists to distribute them in an opaque
> format that can only be interpreted by a small handful of very complex
> programs with a large attack surface.
Maybe the author want a better layout of an article. It's better to
provide multi formats include TXT. Some guys just want to get to known
the idea quickly, like me.
But I guess there're several maths formulas in the article, so maybe TXT
is not enough for it.
--
People who are really serious about software should make their own
hardware. -- Alan Kay
nalaginrut.com
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Verifying Toolchain Semantics
2014-10-03 6:23 ` Mark H Weaver
2014-10-03 7:15 ` William ML Leslie
2014-10-03 8:45 ` Nala Ginrut
@ 2014-10-05 1:35 ` Ian Grant
2014-10-05 5:39 ` Nala Ginrut
2014-10-05 6:58 ` Mike Gerwitz
2 siblings, 2 replies; 27+ messages in thread
From: Ian Grant @ 2014-10-05 1:35 UTC (permalink / raw)
To: Mark H Weaver, Markus Kuhn; +Cc: guile-devel
On Fri, Oct 3, 2014 at 2:23 AM, Mark H Weaver <mhw@netris.org> wrote:
>> http://livelogic.blogspot.com/2014/10/the-foundation-part-i.html
> I downloaded the PDF linked in that blog entry and attempted to view it
> using Emacs's docview mode, which reported that the pdf->png process
> died with a segfault.
This means that there is a bug in that program. PDF (or ISO 32000) is
a well-defined text file format. See
http://www.adobe.com/devnet/pdf/pdf_reference.html
So no software tools should ever segfault reading a PDF file. Even one
that contained a trojan program. If the software were properly
designed and written then it would be _impossible_ for there to be a
trojan in a PDF file.
> It's ironic that someone who claims to be so concerned with security
> steadfastly refuses to provide his most important essays in a simple,
> transparent format. Instead, he insists to distribute them in an opaque
> format that can only be interpreted by a small handful of very complex
> programs with a large attack surface.
The "attack surface" is indeed huge, but only because of the shoddy
standards of the software developers who implement it!
> For that matter, it's also interesting that someone concerned about
> Thompson viruses would suggest that Guile should distribute it's
> compiler in the form of pre-compiled intermediate C code (compiled from
> Scheme) instead of bootstrapping from source code, in order to speed up
> the compilation process.
I have already explained why this is an invalid argument. It is also
hypocritical, because the guile source distribution already includes
over 50,000 lines of intermediate shell script. This is unintelligible
(I have tried reading ./configure, there are these functions which
construct strings of 512 backslashes, and stuff like that, and _no_
explanation for why that is necessary!)
On the contrary, the intermediate code I suggest shipping need not
necessarily be unintelligible. It would be very clear, and quite
concise. For an example (in Standard ML) of the sort of
code-generating code I am talking about, see
http://livelogic.blogspot.com/2014/07/writing-assembler-using-standardml.html
Basically, the inductive structure of the parser and interpreter will
still be very clear in the application of the assembler-generating
primitives. Now Standard ML is nothing much more than than typed
scheme, with a _very_ powerful "module system" which has a sound
formal basis in constructive logic: an ML functor application is
essentiallya formal correctness proof. So we could quite easily
translate Standard ML to scheme, and the generated scheme would be
perfectly readable and auditable.
> I've wasted more time than I should have reading Ian's writings, looking
> for an answer to this apparent contradiction in his views, and I haven't
> found it.
You have indeed wasted your time, because the contradiction is not in
my writing, it's only in your own mind.
> While we're on the subject of paranoid theories, here's one for you:
> maybe Ian Grant's true motive is to induce some of the most important
> developers of free toolchains and the Linux kernel to load PDFs that
> infect their computers with malware, in order to subvert our core
> infrastructure.
Well, if I do succeed in distributing malware, it will be a good
demonstration of what I have been arguing for months now, which is
that your "core infrastructure" is _very,_ _very_ flaky, and that far
from being "the most important developers," you are in fact just
part-time amateur hackers playing at your 'hobbies'.
What I am trying to do here is wake you people up from what will
otherwise prove to be terminal sleep. This is not a hobby, you are
combatants in a global information war, and it will cost some of you
your lives,
> Ian: tell me again, why do you refuse to distribute your essays in plain
> text? I read GNU Thunder and I don't remember seeing anything in there
> that justifies the use of such a complex format. As I recall, it's just
> plain text anyway.
I don't distribute plain text because it is too easy to alter. Once I
send one of these "essays" out I have no control over what happens to
it. So I try to make it as hard as I reasonably can for people to edit
what I have written.
If you were "the most important developers" you would be able to write
a program to reliably display PDF.
Ian
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Verifying Toolchain Semantics
2014-10-05 1:35 ` Ian Grant
@ 2014-10-05 5:39 ` Nala Ginrut
2014-10-05 14:14 ` Ian Grant
2014-10-05 6:58 ` Mike Gerwitz
1 sibling, 1 reply; 27+ messages in thread
From: Nala Ginrut @ 2014-10-05 5:39 UTC (permalink / raw)
To: Ian Grant; +Cc: Mark H Weaver, Markus Kuhn, guile-devel
[-- Attachment #1: Type: text/plain, Size: 1388 bytes --]
2014年10月5日 上午9:35于 "Ian Grant" <ian.a.n.grant@googlemail.com>写道:
>
> Well, if I do succeed in distributing malware, it will be a good
> demonstration of what I have been arguing for months now, which is
> that your "core infrastructure" is _very,_ _very_ flaky, and that far
> from being "the most important developers," you are in fact just
> part-time amateur hackers playing at your 'hobbies'.
>
Ian, you don't have to say so, since people here are the guys who could
actually help you. Or you don't have to email them.
I understand you've been trying to explain you idea. And I believe that
there're many people are listening to you silently. Your mind is profound
and too complex so that your essays are very important for them.
The real problem here, is the provided PDF can't be opened normally. That's
bad, for your idea. It's your mistake, not others. IMO, the point is not
what you concerned that PDF is more security than TXT to protect your
original mind. You have to let them see the essay successfully and
understand your idea first.
> If you were "the most important developers" you would be able to write
> a program to reliably display PDF.
>
As I said, don't be so ridiculous and aggressive. I am one of the guys who
are interested your mind, please consider to build your mind easier, rather
than sarcasm.
> Ian
>
[-- Attachment #2: Type: text/html, Size: 1702 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Verifying Toolchain Semantics
2014-10-05 5:39 ` Nala Ginrut
@ 2014-10-05 14:14 ` Ian Grant
2014-10-05 15:15 ` Nala Ginrut
0 siblings, 1 reply; 27+ messages in thread
From: Ian Grant @ 2014-10-05 14:14 UTC (permalink / raw)
To: Nala Ginrut; +Cc: Mark H Weaver, Markus Kuhn, guile-devel
On Sun, Oct 5, 2014 at 1:39 AM, Nala Ginrut <nalaginrut@gmail.com> wrote:
>
> The real problem here, is the provided PDF can't be opened normally. That's
> bad, for your idea. It's your mistake, not others.
Then tell me the name, the sha512sum of the file, the URL from which
you downloaded it and the size of the file as reported by your
filesystem. Also tell me the exact error, and the program (including
the version number) and if possible the file location of the error.
Thanks
Ian
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Verifying Toolchain Semantics
2014-10-05 14:14 ` Ian Grant
@ 2014-10-05 15:15 ` Nala Ginrut
2014-10-05 16:24 ` Ian Grant
0 siblings, 1 reply; 27+ messages in thread
From: Nala Ginrut @ 2014-10-05 15:15 UTC (permalink / raw)
To: Ian Grant; +Cc: guile-devel
[-- Attachment #1: Type: text/plain, Size: 857 bytes --]
On Sun, Oct 5, 2014 at 10:14 PM, Ian Grant <ian.a.n.grant@googlemail.com>
wrote:
> On Sun, Oct 5, 2014 at 1:39 AM, Nala Ginrut <nalaginrut@gmail.com> wrote:
> >
> > The real problem here, is the provided PDF can't be opened normally.
> That's
> > bad, for your idea. It's your mistake, not others.
>
> Then tell me the name, the sha512sum of the file, the URL from which
> you downloaded it and the size of the file as reported by your
> filesystem. Also tell me the exact error, and the program (including
> the version number) and if possible the file location of the error.
>
>
Alright, I changed a system and try it again with evince successfully.
Anyway, I did't find any maths or special symbols in it, so it could be
published on your blog as plain text. But you may insist on the opinion of
PDF.
It's not your mistake but mine. ;-)
Thanks
> Ian
>
[-- Attachment #2: Type: text/html, Size: 1650 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Verifying Toolchain Semantics
2014-10-05 15:15 ` Nala Ginrut
@ 2014-10-05 16:24 ` Ian Grant
2014-10-06 8:25 ` Nala Ginrut
0 siblings, 1 reply; 27+ messages in thread
From: Ian Grant @ 2014-10-05 16:24 UTC (permalink / raw)
To: Nala Ginrut, guile-devel
On Sun, Oct 5, 2014 at 11:15 AM, Nala Ginrut <nalaginrut@gmail.com> wrote:
>
> Alright, I changed a system and try it again with evince successfully.
> Anyway, I did't find any maths or special symbols in it, so it could be
> published on your blog as plain text. But you may insist on the opinion of
> PDF.
There is another reason why I use PDF. It's much nicer to read. I love
Garamond. Did you see the Jobs movie? Do you remember what he said
about Garamond?
> It's not your mistake but mine. ;-)
Well being mistaken about somebody else's mistake that wasn't a
mistake is about the least mistaken it's possible to be: it really
doesn't matter at all.
What we really need to do is find out what software Mark was using to
pdf->png. If it segfaults then there is a good chance that that bug
can be turned into a working exploit. Because all binaries distributed
by the binary-distributors are identical, attackers can analyse the
file and work out how to turn a bad pointer dereference into an
exploitable "PDF attack vector" which executes binary code contained
in the PDF.
Ian
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Verifying Toolchain Semantics
2014-10-05 16:24 ` Ian Grant
@ 2014-10-06 8:25 ` Nala Ginrut
0 siblings, 0 replies; 27+ messages in thread
From: Nala Ginrut @ 2014-10-06 8:25 UTC (permalink / raw)
To: Ian Grant; +Cc: guile-devel
[-- Attachment #1: Type: text/plain, Size: 1466 bytes --]
On Mon, Oct 6, 2014 at 12:24 AM, Ian Grant <ian.a.n.grant@googlemail.com>
wrote:
> On Sun, Oct 5, 2014 at 11:15 AM, Nala Ginrut <nalaginrut@gmail.com> wrote:
> >
> > Alright, I changed a system and try it again with evince successfully.
> > Anyway, I did't find any maths or special symbols in it, so it could be
> > published on your blog as plain text. But you may insist on the opinion
> of
> > PDF.
>
> There is another reason why I use PDF. It's much nicer to read. I love
> Garamond. Did you see the Jobs movie? Do you remember what he said
> about Garamond?
>
>
Well, I'm a naive font fan so I may understand you. I usually use my fav
font on page according to the design with HTML5/CSS stuff.
Forget about it, it's far from the topic.
> What we really need to do is find out what software Mark was using to
> pdf->png. If it segfaults then there is a good chance that that bug
> can be turned into a working exploit. Because all binaries distributed
> by the binary-distributors are identical, attackers can analyse the
> file and work out how to turn a bad pointer dereference into an
> exploitable "PDF attack vector" which executes binary code contained
> in the PDF.
>
>
Agreed. But I think the PDF is more complicated than TXT. One of my fav
ezine, Phrack is plain txt, and it's cool.
So my suggestion is to provide both TXT and PDF if the article is important.
Anyway, the font is a big issue for whom loves it, so I don't involve such
argue. ;-)
[-- Attachment #2: Type: text/html, Size: 2271 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Verifying Toolchain Semantics
2014-10-05 1:35 ` Ian Grant
2014-10-05 5:39 ` Nala Ginrut
@ 2014-10-05 6:58 ` Mike Gerwitz
2014-10-05 16:11 ` Ian Grant
1 sibling, 1 reply; 27+ messages in thread
From: Mike Gerwitz @ 2014-10-05 6:58 UTC (permalink / raw)
To: Ian Grant; +Cc: Mark H Weaver, Markus Kuhn, guile-devel
[-- Attachment #1: Type: text/plain, Size: 3376 bytes --]
On Sat, Oct 04, 2014 at 09:35:09PM -0400, Ian Grant wrote:
> Well, if I do succeed in distributing malware, it will be a good
> demonstration of what I have been arguing for months now, which is
> that your "core infrastructure" is _very,_ _very_ flaky, and that far
> from being "the most important developers," you are in fact just
> part-time amateur hackers playing at your 'hobbies'.
>
> What I am trying to do here is wake you people up from what will
> otherwise prove to be terminal sleep. This is not a hobby, you are
> combatants in a global information war, and it will cost some of you
> your lives,
As has been stated---your concerns are substantiated and understood,
and you clearly have much experience and information to contribute,
but your unnecessary and unsubstantiated insults and holier-than-thou
attitude prevent meaningful discussion, especially from those who are
spectating and unwilling to participate in a discussion that is
consequently destined to yield little more than childish banter and
silence, albeit sprinkled with bits of very interesting information
and resources.
The additional drama you infuse into the conversation---an example
being the latter paragraph above---also works against you. There are
many things that may cost us our lives, and I'm fairly certain that
this does not make the top million or so for most of us. I'm killing
myself sitting here typing this message.[0] From my understanding,
you're allowing your body to degenerate as we speak.
> I don't distribute plain text because it is too easy to alter. Once I
> send one of these "essays" out I have no control over what happens to
> it. So I try to make it as hard as I reasonably can for people to edit
> what I have written.
This argument is not valid---why is it hard to alter a PDF? In fact,
PDF manipulation is a dark (and probably cancer-causing) art that's
automated by countless businesses worldwide; it is a topic that eats
up a significant portion of development time at my employer's office.
Have you considered just distributing a GPG/PGP signature with your
works, or even signing the work itself? After all, this whole
discussion is about proving the unlikelihood of and preventing the
modification of data. Unlike the topic of complex binaries, your works
are trivially verifiable even by hand---take advantage of that. If in ASCII,
verification is a simple matter of diffing, even without cryptographic
assurances, provided that your original work is archived in a number of
reputable places (though I'd still sign my works); however, PDFs introduce
an infinite number of display modifications that can be produce a document
yielding a text isomorphic to the original---just because two PDFs of your
work are 99% different when binary-compared doesn't mean that the visual
meaning of text it renders is not 100% the same.
(To be fair: I'm fine with PDFs; it's hard to convert most TeX-heavy
writings using equations into meaningful ASCII, but I still provide ASCII
alternatives whenever reasonable, which is >90% of the time. Unicode is
often suitable when ASCII isn't.)
[0]: http://apps.washingtonpost.com/g/page/national/the-health-hazards-of-sitting/750/
--
Mike Gerwitz
Free Software Hacker | GNU Maintainer
http://mikegerwitz.com
FSF Member #5804 | GPG Key ID: 0x8EE30EAB
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Verifying Toolchain Semantics
2014-10-05 6:58 ` Mike Gerwitz
@ 2014-10-05 16:11 ` Ian Grant
2014-10-06 4:23 ` Mike Gerwitz
0 siblings, 1 reply; 27+ messages in thread
From: Ian Grant @ 2014-10-05 16:11 UTC (permalink / raw)
To: Ian Grant, Mark H Weaver, Markus Kuhn, guile-devel
On Sun, Oct 5, 2014 at 2:58 AM, Mike Gerwitz <mikegerwitz@gnu.org> wrote:
> On Sat, Oct 04, 2014 at 09:35:09PM -0400, Ian Grant wrote:
>> Well, if I do succeed in distributing malware, it will be a good
>> demonstration of what I have been arguing for months now, which is
>> that your "core infrastructure" is _very,_ _very_ flaky, and that far
>> from being "the most important developers," you are in fact just
>> part-time amateur hackers playing at your 'hobbies'.
>>
>> What I am trying to do here is wake you people up from what will
>> otherwise prove to be terminal sleep. This is not a hobby, you are
>> combatants in a global information war, and it will cost some of you
>> your lives,
>
> As has been stated---your concerns are substantiated and understood,
I wasn't aware that my concerns _have_ been substantiated. How? I am
not sure they have been understood, either.
> and you clearly have much experience and information to contribute,
> but your unnecessary and unsubstantiated insults and holier-than-thou
> attitude prevent meaningful discussion, especially from those who are
> spectating and unwilling to participate in a discussion that is
> consequently destined to yield little more than childish banter and
> silence, albeit sprinkled with bits of very interesting information
> and resources.
I have not insulted anyone, as I explained in the other thread. I am
"holier than though" And I wish that wasn't the case. So if someone
can prove me wrong, then please do so, the sooner the better!
> The additional drama you infuse into the conversation---an example
> being the latter paragraph above---also works against you. There are
> many things that may cost us our lives, and I'm fairly certain that
> this does not make the top million or so for most of us. I'm killing
> myself sitting here typing this message.[0] From my understanding,
> you're allowing your body to degenerate as we speak.
Well my body is getting on for 50 years old, so it doesn't matter so
much. Here's a "better" example:
http://en.wikipedia.org/wiki/Aaron_Swartz
>> I don't distribute plain text because it is too easy to alter. Once I
>> send one of these "essays" out I have no control over what happens to
>> it. So I try to make it as hard as I reasonably can for people to edit
>> what I have written.
>
> This argument is not valid---why is it hard to alter a PDF? In fact,
> PDF manipulation is a dark (and probably cancer-causing) art that's
> automated by countless businesses worldwide; it is a topic that eats
> up a significant portion of development time at my employer's office.
It is valid. If you want to see why, then try to alter one of the PDFs
I've sent out.
What will you do when I later send out a series of checksums using a
checksum algorithm that neither you or anyone else ever heard of
before?
> Have you considered just distributing a GPG/PGP signature with your
> works, or even signing the work itself? After all, this whole
> discussion is about proving the unlikelihood of and preventing the
> modification of data.
Of course I have thought of that, and rejected it as a pointless waste
of time! If you read what I've written, on this list and also in that
blog article Mark pointed up, you will see why I think this. And even
if public key crypto was positively secure, it is worthless when I
have no rational basis for confidence in the security of the system
which holds the private keys. PK crypto, in this state we're in, has
negative value: it just creates a false sense oif security, and
_discourages_ people from actually thinking about the real problem
because they "know" their comms. are secure. They don't know that at
all.
> ... Unlike the topic of complex binaries, your works
> are trivially verifiable even by hand---take advantage of that. If in ASCII,
> verification is a simple matter of diffing, even without cryptographic
> assurances, provided that your original work is archived in a number of
> reputable places (though I'd still sign my works);
I could ship them as low-res PNG files. But really, it would be much
better just to get good working PDF viewers. There are _millions_ of
important documents which are in PDF form. We need to be able to
reliably verify and reference text within them. These include things
like the Intel architecture reference manuals.
> [...] however, PDFs introduce
> an infinite number of display modifications that can be produce a document
> yielding a text isomorphic to the original---just because two PDFs of your
> work are 99% different when binary-compared doesn't mean that the visual
> meaning of text it renders is not 100% the same.
I am well aware of this, and it is one reason why we need to be able
to read PDFs reliably. We can then vary the concrete representation
and defeat attempts at mass-surveillance and also prevent people from
using automated methods to edit the files in transit. This was
implicit in the very first message I sent Stallman, Torvalds, deRaadt
etc. and which I later cc'ed to this list:
http://lists.gnu.org/archive/html/guile-devel/2014-09/msg00021.html
And before some smart-ass says "Oh yeah, and how are you going to
verify the contents with cryptographic checksums?" I warn you to
_think_ first, because I will metaphorically tear you into fifty thin,
bloody little strips before you can say "Doh!"!
You do all know were making a movie about this whole affair, don't you?
http://livelogic.blogspot.com/2014/04/what-is-sixth-estate.html
Ian
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Verifying Toolchain Semantics
2014-10-05 16:11 ` Ian Grant
@ 2014-10-06 4:23 ` Mike Gerwitz
[not found] ` <20141006042323.GA31390-iW7gFb+/I3LZHJUXO5efmti2O/JbrIOy@public.gmane.org>
0 siblings, 1 reply; 27+ messages in thread
From: Mike Gerwitz @ 2014-10-06 4:23 UTC (permalink / raw)
To: Ian Grant; +Cc: Mark H Weaver, Markus Kuhn, guile-devel
[-- Attachment #1: Type: text/plain, Size: 10091 bytes --]
On Sun, Oct 05, 2014 at 12:11:00PM -0400, Ian Grant wrote:
> > As has been stated---your concerns are substantiated and understood,
>
> I wasn't aware that my concerns _have_ been substantiated. How? I am
> not sure they have been understood, either.
They were substantiated long ago by the very references you've provided.
Your application to the GNU project doesn't make this a novel discussion.
The problem is understood. A proper and **practical** solution to the
problem may be less well understood, but you haven't yet convinced those in
charge of making such decisions that this is a threat (a) greater than all
other threats worth devoting time to protect against; and (b) worthy of such
an enormous time investment at this time.
It is worthy of addressing, and addressing it in the manner that you and
Dijkstra suggest will certainly go a long way to preventing nearly all
problems contributing to (a), but you need to take a different approach if
you're going to incite such change.
> Well my body is getting on for 50 years old, so it doesn't matter so
> much. Here's a "better" example:
>
> http://en.wikipedia.org/wiki/Aaron_Swartz
I hope very much that using Aaron as an example is to further your argument
that programming is a life-or-death situation, and that it's not a "better"
example of personal neglect or instability. The former has no relation to
the topic at hand, and the latter is---well, let's just hope that you poorly
represented yourself.
> > This argument is not valid---why is it hard to alter a PDF? In fact,
> > PDF manipulation is a dark (and probably cancer-causing) art that's
> > automated by countless businesses worldwide; it is a topic that eats
> > up a significant portion of development time at my employer's office.
>
> It is valid. If you want to see why, then try to alter one of the PDFs
> I've sent out.
> What will you do when I later send out a series of checksums using a
> checksum algorithm that neither you or anyone else ever heard of
> before?
Your argument favored PDF over plain text. The point that I had made was
that you are unable to verify that two PDFs---unless binary
identical---render text that is unchanged from the original. If I convert
your PDF to a PNG and back to a PDF, it renders the same text. What does
that mean to you? Does that mean that the text has been altered? Do you care
that your work has been distribute in an alternative form? At what point
does it become "modified"? And how can you distinguish reasonable
alterations from unreasonable ones?
What does your checksum algorithm accomplish than existing checksum or
hashing algorithms do not? Why can that same principle not be applied to
plain text---why does the implementation of your checksum make it any more
difficult to modify the PDF? And why is your checksum better than an
established and heavily studied cryptographic standard (e.g. SHA-256)?
> > Have you considered just distributing a GPG/PGP signature with your
> > works, or even signing the work itself? After all, this whole
> > discussion is about proving the unlikelihood of and preventing the
> > modification of data.
>
> Of course I have thought of that, and rejected it as a pointless waste
> of time! If you read what I've written, on this list
You have written a lot on this list, and there has been an absurd amount of
redundancy. I have not read it all.
> and also in that blog article Mark pointed up, you will see why I think
> this. And even if public key crypto was positively secure, it is worthless
> when I have no rational basis for confidence in the security of the system
> which holds the private keys.
In the case of PGP, you hold your private keys. I suppose if you're worried
about a Thompson virus, you can't even provide yourself assurances of the
security of your own system, but in such a severe case, there is nothing you
do on your system that can be trusted, so surely you must not enter your
password to access your e-mail on any of those systems. Because if you did,
surely someone could just impersonate you to begin with, and there'd be no
point in signing anything cryptographically, because such attempts would
have been preempted.
All that aside, it is still possible to host your private key on a machine
that is wholly segregated from the rest of the world, perform your
cryptographic operation on that, transfer its output (doesn't really matter
how at this point, as long as it's output-only and cannot be used to
compromise the machine---it only matters that the same output is supplied to
each machine) to N machines with N different lineages (even if one is a
PDP-11!), and verify the result of the cryptographic operation on those
machines to ensure that your data were not modified in the process of
applying the cryptographic operation.
But we're not talking about such assurances here, so most of those steps
would be unnecessary---upon receiving your signed request from the machine
hosting your private key, it *does not matter* what its output is: the plain
text is left, well, plain. You can verify it by hand. Then everyone else in
the world with your public key can verify that it is indeed the signature
produced by the same machine that produced the signature for all of your
other archived messages all over the internet (mailing lists, your own blog
entries, etc)---a task that can be trivially automated to scan for
discrepancies.
Your checksum would merely state "this PDF has not been modified". This
would say "this PDF has not been modified and I am actually the same person
that wrote all these other articles / e-mail messages, while claiming to be
Ian". And even if you have no trust in the cryptography that links the
signature to the private key, or have no trust in public key cryptography,
the signature *still* contains a hash[0], and so you get the former
assurance for free, assuming a [currently] trusted digest algorithm.
So tell me: how does your yet-to-be-released checksum algorithm provide
any better assurances than this?
Oh, but let's not stop there! Your signature relies on the security of the
hashing algorithm---if a collision is found, what does that mean for
cleartext signing? Well, if you had distributed a plain-text document in
ASCII (and Unicode is almost certainly fine too), it's very highly likely,
unless the hashing algorithm is just brutally and fundamentally flawed, that
the collision will contain a lot of garbage---garbage that would be
immediately noticeable. But if you distribute a PDF, it might be possible to
hide all of that garbage within its constructs.
> PK crypto, in this state we're in, has negative value: it just creates a
> false sense oif security, and _discourages_ people from actually thinking
> about the real problem because they "know" their comms. are secure. They
> don't know that at all.
I certainly hope that developers of systems that use public-key cryptography
understand the obvious, fundamental principle that you described in the
article that Mark linked.[1] Of course, I understand that this is not often
the case, and it's certainly not the case for most *users* of the software.
And that is dangerous, because GPG/PGP/PK-crypto users should certainly be
aware that any message they encrypt, even if the private key is discarded,
may very well one day be broken. And the adversary may be little 10yo Bobby
on his modern-era PC in the not-too-distant future.
But for the purpose of *this* discussion, the integrity of your chosen PK
algorithm isn't entirely relevant. Yes, your identity assurance is nice to
have, but at the *very least*, your signature provides a hash that is
verifiable.
Let's not play the "well the software performing the verification may be
compromised" card---it's still possible to manually perform the verification
if you really felt it to be necessary, if the situation were dire enough.
So again---have you considered distributing a signature with your documents?
> I could ship them as low-res PNG files. But really, it would be much
> better just to get good working PDF viewers. There are _millions_ of
> important documents which are in PDF form. We need to be able to
> reliably verify and reference text within them. These include things
> like the Intel architecture reference manuals.
Certainly, and I'm not arguing that. I'm arguing against your assertion that
PDFs are harder to modify than plain text and somehow provide greater
assurances; I'm arguing the exact opposite.
> I am well aware of this, and it is one reason why we need to be able
> to read PDFs reliably. We can then vary the concrete representation
> and defeat attempts at mass-surveillance and also prevent people from
> using automated methods to edit the files in transit.
Preventing file modifications in transit is a well-understood problem
(and the problems with various implementations understood) unrelated to this
one.
But having a PDF reader that is generated from a formal definition won't
prevent the problems that I described (although it'd prevent the majority of
today's attacks against, say, well-known proprietary readers), because *PDF
exists to present formatted text*, and there are countless ways to present
what appears to be the exact same formatting. So the answer to the
fundamental question "do PDFs X and Y present equivalent content?" is
non-trivial.
> And before some smart-ass says "Oh yeah, and how are you going to
> verify the contents with cryptographic checksums?" I warn you to
> _think_ first, because I will metaphorically tear you into fifty thin,
> bloody little strips before you can say "Doh!"!
I don't follow. Surely your method isn't novel.
[0]: http://tools.ietf.org/html/rfc4880
[1]: http://livelogic.blogspot.dk/2014/09/the-free-fair-and-forged.html
--
Mike Gerwitz
Free Software Hacker | GNU Maintainer
http://mikegerwitz.com
FSF Member #5804 | GPG Key ID: 0x8EE30EAB
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Verifying Toolchain Semantics
@ 2014-10-05 17:42 Ian Grant
2014-10-05 18:19 ` Ian Grant
0 siblings, 1 reply; 27+ messages in thread
From: Ian Grant @ 2014-10-05 17:42 UTC (permalink / raw)
To: guile-devel, Taylan Bayırlı, William Leslie,
Richard Stallman, Mark H Weaver
Taylan wrote:
> In your PDF analogy, the solution is to write a spurious
> amount of PDF implementations. Or for C, to implement
> a spurious amount of C compilers. That is impractical
> because C is complex.
It's not as complex as you might think. In the space of a couple of
months, I wrote what I reckon is 50% of a C interpreter all on my own.
And I have the basis of an abstract language for specifying semantics
as just sets of term-rewriting rules, which work on AST
representations of program source.
https://github.com/IanANGrant/red-october/blob/master/src/dynlibs/ffi/CPPLexer.lex
https://github.com/IanANGrant/red-october/blob/master/src/dynlibs/ffi/CParser.grm
https://github.com/IanANGrant/red-october/blob/master/src/dynlibs/ffi/CSyntax.sml
> What might be practical might be to write one new C
> compiler (guaranteed clean, since it's new), and verify
> GCC with that[*].
It will take a year or so to get a C compiler written in C working. So
it won't be new once you've got it to the point of being able to
compile GCC. And by the time that happens, if there has been a
compiler trap door, then the GCC binaries, and probably your "new" C
compiler binaries would "know about" each other. So you would not have
any more reason to believe the
new binaries are good than you had at the beginning. You will have
wasted a year.
> What's more useful in the
> long term is to define a much simpler language, and base
> our everything on that instead of on C. Then we can verify
> our toolchain more often, since we needn't write a whole
> freaking C compiler every time. :-)
Almost, but that "Much Simpler Language" needs to be one designed to
write language interpreters by reading abstract specifications of the
syntax and semantics from "metadata" files. Then a standard
"interpreter interpreter for the 'MSL'" can be defined to do this job
automatically, and it could even implement itself. See the Reynolds'
paper I referenced in my first reply to William.
Then we will be able to write "a spurious number of whole freakin' C
interpreters" So for example, as I said to Richard a month ago, we
will be able to implement a C compiler in Microsoft Word BASIC, or in
COBOL, and that will be capable of compiling GCC, if we had a year or
so to wait while it does it ... but it could still be capable of
compiling checksumming programs, or diff programs, or unzipping
programs, and these would be extra data points to check whether other
tools like binary file viewers etc, have been compromised by a
compiler trap door.
Ian
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Verifying Toolchain Semantics
2014-10-05 17:42 Ian Grant
@ 2014-10-05 18:19 ` Ian Grant
0 siblings, 0 replies; 27+ messages in thread
From: Ian Grant @ 2014-10-05 18:19 UTC (permalink / raw)
To: guile-devel, Taylan Bayırlı, William Leslie,
Richard Stallman, Mark H Weaver
On Sun, Oct 5, 2014 at 1:42 PM, Ian Grant <ian.a.n.grant@googlemail.com> wrote:
> [we] will be able to implement a C compiler in Microsoft Word BASIC, or in
> COBOL, and that will be capable of compiling GCC, if we had a year or
> so to wait while it does it ...
This is not true. Word BASIC or COBOL could easily write out an
executable binary defined by an abstract formal description in a text
file. It would take a while, but the binary would be able to compile
GCC pretty quickly.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Verifying Toolchain Semantics
@ 2014-10-06 0:30 Ian Grant
[not found] ` <CAKFjmdzxAMvcry8N6B_atM_8vGyzA1Dfz9ygWxSgh3fD7EUAuQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 27+ messages in thread
From: Ian Grant @ 2014-10-06 0:30 UTC (permalink / raw)
To: William Leslie, Taylan Bayırlı, guile-devel, lightning
http://lists.gnu.org/archive/html/guile-devel/2014-10/msg00016.html
From: William ML Leslie
Date: Mon, 6 Oct 2014 00:57:49 +1100
On 3 October 2014 22:56, Taylan Ulrich Bayirli/Kammer <address@hidden> wrote:
> Say, for example, that I can guess you will end up viewing the
> document with one of five output programs - maybe, evince, chrome,
> tk, gecko translating xsl-fo to xul or firefox viewing html 4
> directly, and say that I only have exploits for the first three.
> Now I have a bit of a problem: I need to find a way to get your tool
> to emit the specific structure that will invoke the exploit. Since
> it will show a document that looks correct, although underneath may
> have a different structure, I will have to account for variation in
> what it emits but there will be a limited class of algebraic
> manipulations that it can make without changing the semantics of the
> document.
It's not about this. These are all the programs that have problems.
We're not going to use these sorts of programs anymore. We're going to
use auto-generated, nearly perfect code. And after a few years it
won't be just near perfect, it'll be perfect.
> Maybe I would not find a way to /reliably/ produce a document that
> will infect your machine. Or maybe I would. It's hypothetical.
> But you know as well as I do that fuzzing a document generator to
> see what instructions produce exploitable output is easier than
> either writing a verifiable PDF viewer or writing a tool that
> bisimulates a PDF document with a class of layout languages. The
> cherry on top is that all that extra code you introduce is surely
> ripe with additional vulnerabilities - so maybe I don't need to
> bother with producing those back-end exploits at all.
This is wrong. If you look at the automatically-generated assembler
code from a formal spec, in the form of a functional program
generating that code:
https://github.com/IanANGrant/red-october/blob/master/src/dynlibs/ffi/AbstractMachineWord.sml
you will see that the extra code has no extra attack vectors. It's all
machine-generated. If this were a piece of code for reading a binary
file, then it would have been generated from a grammar describing
exactly what are the valid words in that language. (Using words and
language in the technical sense as "valid strings" and "set of all
valid strings".) So buffer overruns won't happen. And if all the
semantics are formalised, then off-by-one errors won't happen
either. If they do, they'll happen all over the place and you will
find out very quickly.
> Though I think your analogy is wrong. PDF is already an abstract
> semantics, like any formal language spec. The question is whether
> an implementation realizes precisely those semantics, or adds "oh
> and insert a backdoor" to some pieces of code in that language, "as
> an addition to the language semantics." This augmentation to the
> semantics needn't be apparent in the source code of the
> implementation, since the language it's written in might again be
> compiled by a compiler that "augments" the semantics of the language
> it compiles, and that in turn needn't be apparent in the compiler's
> source ... until you arrive at the since long vanished initial
> source code that planted the virus.
> I hope by now you understand why I think this to be nonsense. It's
> still a C compiler. You know that, I know that. Why wouldn't the
> machine be able to tell?
Because it's formally undecidable. That's Rice's Theorem. And it's not
"in practice" probably detectable by semantantic methods, because we
are going to extra lengths to obscure the effective operational
semantics by multiple re-encodings. How are you going to recognise a
compiled Fortran90 program translating a C program into an abstract
assembler like lightning, as a C compiler? You don't even know what
the opcode numbers are: we can permute them.
> The problem that I have with this discussion is that rather than
> address many* of the points in my email, he instead resorted to
> insisting I was a shill. It's very difficult to have a conversation
> with someone who uses silencing tactics. Or better - it's
> pointless.
What are you talking about? Here is what you wrote to me:
http://lists.gnu.org/archive/html/guile-devel/2014-09/msg00036.html
> Please understand that our lack of enthusiasm for it is not without
> our own individual commitments to the security and freedoms of
> users; rather while the extra assurances your project may provide
> sound nice, they are frankly rather minimal and don't have much
> additional utility, and so are not priority one.
Now if you wanted me to continue answering your questions, then well,
I'm sorry, I just got the wrong end of stick completely. I thought you
wanted me to fuck off!
So I replied as below. To you, privately. And that was it. Silencing
tactics? You then copied it to the world, and a guy who I _told_ you,
was deputy director of the NSA's National Computer Security Centre!
And I didn't mind. (I did wonder what the hell does he think he's
doing?! But it's your privilege to say what you want, to whom you want,
and none of my business.)
Date: Sun, 7 Sep 2014 11:39:44 +1000
Message-ID: <CAHgd1hEP2=dnqXTQx-EdozAXc0vjYh0ucthbyWFnPecQqB=7pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Subject: Re: GNU Thunder
From: William ML Leslie <william.leslie.ttg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: Ian Grant <ian.a.n.grant-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>, guile-devel <guile-devel-mXXj517/zsQ@public.gmane.org>,
lightning <lightning-mXXj517/zsQ@public.gmane.org>, schellr-EkmVulN54Sk@public.gmane.org
X-List-Received-Date: Sun, 07 Sep 2014 01:39:50 -0000
--e89a8ff1ca20e0b37105026fc673
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
On 7 September 2014 01:49, Ian Grant <ian.a.n.grant-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote:
> You are just not at all convincing, I'm afraid. Tell your boss they didn'=
t
> train you properly, and can you get assigned somewhere else.
>
=E2=80=8BUnfortunately, this is a reality we have to deal with when discuss=
ing
security on the internet: we've got to assume the enemy is listening. It
could be me, I could even be intercepting your emails and making them sound
incoherrent, and upping the tinfoil-hat quotient by =E2=80=8Badding people =
like
Theo and Linus to the To: list. I could be suggesting that effort be spent
in a project that will provide very little value, wasting your time.
Remember: I'm not suggesting what the outcome of your project will be,
just that if the result is negative, we still know nothing. When testing a
system for subterfuge, we need to examine *all* of the moving parts, even
those that appear to be unused. If the system you're building your
assembler on is compromised, it can still give you a negative answer.
That's what was so scary about this particular type of attack.
--=20
William Leslie
[...]
Here's my e-mail:
MIME-Version: 1.0
Received: by 10.194.219.234 with HTTP; Sat, 6 Sep 2014 08:49:15 -0700 (PDT)
In-Reply-To: <CAHgd1hH+VD9HqE=pRZOHaPK-fOgMJbHpv_RgPt5LUdLkTXCBwA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
References: <CAKFjmdzP89qSD03_MGqS1UawQvaq6yvme-abKcmLuA8DfUQE+A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
<E1XKsRt-0002zo-64-iW7gFb+/I3LZHJUXO5efmti2O/JbrIOy@public.gmane.org>
<CAKFjmdyUphk2LmDdDE_7gkDSKAu4COurtvafBwO5XwCgyM1OfA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
<E1XLP5I-0005zC-Sn-iW7gFb+/I3LZHJUXO5efmti2O/JbrIOy@public.gmane.org>
<CAKFjmdxt4jWAHAYXjzwPeUw+dTUBTPC94YJDNifsO7JVkNHjTQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
<E1XNssG-00083e-BB-iW7gFb+/I3LZHJUXO5efmti2O/JbrIOy@public.gmane.org>
<CAKFjmdzdkC5rW3DjBRbRueSiBs9dmJUSxGkPp2-6uc7SH9o79g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
<E1XPA13-0002hD-Kp-iW7gFb+/I3LZHJUXO5efmti2O/JbrIOy@public.gmane.org>
<CAKFjmdyc0D5vYBK=rQKzKNK+WRmWhkkL-RXqBMSHvhOzX3fHiw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
<E1XPifZ-0004g6-KA-iW7gFb+/I3LZHJUXO5efmti2O/JbrIOy@public.gmane.org>
<CAKFjmdwEs8TtPZjXWDYKoQXz4FEKy6p3T9+8jWLMY87Onn=VaQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
<CAHgd1hH+VD9HqE=pRZOHaPK-fOgMJbHpv_RgPt5LUdLkTXCBwA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Date: Sat, 6 Sep 2014 11:49:15 -0400
Delivered-To: ian.a.n.grant-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org
Message-ID: <CAKFjmdzzRoTdzWxAOh2byRbWDtszWZtt8Z2k0eLCFYR+qmAC9g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Subject: Re: GNU Thunder
From: Ian Grant <ian.a.n.grant-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>
To: William ML Leslie <william.leslie.ttg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Content-Type: multipart/alternative; boundary=001a1136144a26f19e05026787e6
--001a1136144a26f19e05026787e6
Content-Type: text/plain; charset=UTF-8
You are just not at all convincing, I'm afraid. Tell your boss they didn't
train you properly, and can you get assigned somewhere else.
Ian
--001a1136144a26f19e05026787e6
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">You are just not at all convincing, I'm afraid. Tell y=
our boss they didn't train you properly, and can you get assigned somew=
here else.<br><div class=3D"gmail_extra"><br></div><div class=3D"gmail_extr=
a">Ian<br></div><div class=3D"gmail_extra"><br></div></div>
--001a1136144a26f19e05026787e6--
=============================================================================
You see, I thought you were a Microsoft plant. I just couldn't figure
out any other reason why someone who clearly didn't know anything
about computers, would say they were going to write a secure operating
system! :-)
Ian
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2014-10-08 18:26 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-02 14:54 Verifying Toolchain Semantics Ian Grant
2014-10-03 6:23 ` Mark H Weaver
2014-10-03 7:15 ` William ML Leslie
2014-10-03 12:56 ` Taylan Ulrich Bayirli/Kammer
2014-10-03 17:13 ` Mark H Weaver
2014-10-05 13:57 ` William ML Leslie
2014-10-03 8:45 ` Nala Ginrut
2014-10-05 1:35 ` Ian Grant
2014-10-05 5:39 ` Nala Ginrut
2014-10-05 14:14 ` Ian Grant
2014-10-05 15:15 ` Nala Ginrut
2014-10-05 16:24 ` Ian Grant
2014-10-06 8:25 ` Nala Ginrut
2014-10-05 6:58 ` Mike Gerwitz
2014-10-05 16:11 ` Ian Grant
2014-10-06 4:23 ` Mike Gerwitz
[not found] ` <20141006042323.GA31390-iW7gFb+/I3LZHJUXO5efmti2O/JbrIOy@public.gmane.org>
2014-10-07 17:18 ` Ian Grant
[not found] ` <CAKFjmdx+jzfapvrq6EEO8Skx2L2UZwi-DZ22xiq9t1438E7kOw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-07 17:28 ` Mark H Weaver
2014-10-07 17:56 ` Ian Grant
[not found] ` <CAKFjmdwNTjJ7nU-rKEWkA+5whsfyrpqJ6RkhU+VRbUW6rqT03A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-07 19:24 ` Philip Herron
[not found] ` <CAEvRbeoEJPTtoDu0nDudJyfBoaT1vpuvHzL=t+TkJr_ZGkzYEQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-07 19:47 ` Ian Grant
2014-10-08 18:26 ` Mark H Weaver
2014-10-08 3:55 ` Mike Gerwitz
-- strict thread matches above, loose matches on Subject: below --
2014-10-05 17:42 Ian Grant
2014-10-05 18:19 ` Ian Grant
2014-10-06 0:30 Ian Grant
[not found] ` <CAKFjmdzxAMvcry8N6B_atM_8vGyzA1Dfz9ygWxSgh3fD7EUAuQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-06 8:51 ` William ML Leslie
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).