unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* bug#16362: compiler disrespects referential integrity
@ 2014-01-05 23:13 Zefram
  2014-01-15 19:57 ` Mark H Weaver
  0 siblings, 1 reply; 6+ messages in thread
From: Zefram @ 2014-01-05 23:13 UTC (permalink / raw)
  To: 16362

The guile-2.0.9 compiler doesn't preserve the distinctness of mutable
objects that are referenced in code via the read-eval (#.) facility.
(I'm not mutating the code itself, only quoted objects.)  The interpreter,
and for comparison guile-1.8, do preserve object identity, allowing
read-eval to be used to incorporate direct object references into code.
Test case:

$ cat t9
(cond-expand
  (guile-2 (defmacro compile-time f `(eval-when (compile eval) ,@f)))
  (else (defmacro compile-time f `(begin ,@f))))
(compile-time (fluid-set! read-eval? #t))
(compile-time (define aaa (cons 1 2)))
(set-car! '#.aaa 5)
(write '#.aaa)
(newline)
(write '(1 . 2))
(newline)
$ guile-1.8 t9
(5 . 2)
(1 . 2)
$ guile-2.0 --no-auto-compile t9
(5 . 2)
(1 . 2)
$ guile-2.0 t9
;;; note: auto-compilation is enabled, set GUILE_AUTO_COMPILE=0
;;;       or pass the --no-auto-compile argument to disable.
;;; compiling /home/zefram/usr/guile/t9
;;; compiled /home/zefram/.cache/guile/ccache/2.0-LE-8-2.0/home/zefram/usr/guile/t9.go
(5 . 2)
(5 . 2)
$ guile-2.0 t9
(5 . 2)
(5 . 2)

In the test case, the explicitly-constructed pair aaa is conflated with
the pair literal (1 . 2), and so the runtime modification of aaa (which
is correctly mutable) affects the literal.

This issue seems closely related to the problem described at
<http://debbugs.gnu.org/cgi/bugreport.cgi?bug=11198>, wherein the compiler
is entirely unable to handle code incorporating references to some kinds
of object.  In that case the failure mode is a compile-time error, so
the problem can be worked around.  The failure mode with pairs, silent
misbehaviour, is a more serious problem.  Between them, these problems
break most of the interesting uses for read-eval, albeit only when using
the compiler.

Debian incarnation of this bug report:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=734157

-zefram





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#16362: compiler disrespects referential integrity
  2014-01-05 23:13 bug#16362: compiler disrespects referential integrity Zefram
@ 2014-01-15 19:57 ` Mark H Weaver
  2014-01-15 21:02   ` Zefram
  0 siblings, 1 reply; 6+ messages in thread
From: Mark H Weaver @ 2014-01-15 19:57 UTC (permalink / raw)
  To: Zefram; +Cc: 16362, request

tags 16362 notabug
thanks

Zefram <zefram@fysh.org> writes:
> The guile-2.0.9 compiler doesn't preserve the distinctness of mutable
> objects that are referenced in code via the read-eval (#.) facility.
> (I'm not mutating the code itself, only quoted objects.)

I'm sorry that you've written code that assumes that this is allowed,
but in Scheme all literals are immutable.

> The interpreter, and for comparison guile-1.8, do preserve object
> identity, allowing read-eval to be used to incorporate direct object
> references into code.

It worked by accident in Guile 1.8, but there's simply no way to support
this robustly in an ahead-of-time compiler, which must serialize all
literals to an object file.

    Thanks,
      Mark





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#16362: compiler disrespects referential integrity
  2014-01-15 19:57 ` Mark H Weaver
@ 2014-01-15 21:02   ` Zefram
  2014-01-15 22:15     ` Mark H Weaver
  0 siblings, 1 reply; 6+ messages in thread
From: Zefram @ 2014-01-15 21:02 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: 16362

Mark H Weaver wrote:
>I'm sorry that you've written code that assumes that this is allowed,
>but in Scheme all literals are immutable.

It's not a literal: the object was not constructed by the action of
the reader.  It was constructed by non-literal means, and merely *passed
through* the reader.

That's not to say your not-a-bug opinion is wrong, though.  Scheme as
defined by RnRS certainly doesn't support this kind of thing.  It treats
the print form of an expression as primary, and so doesn't like having
anything unprintable in the object form.

>It worked by accident in Guile 1.8,

This is the bit that's really news to me.  *Scheme* doesn't support
it, but *Guile* is more than just Scheme, and I presumed that it was
intentional that it took a more enlightened view of what constitutes
an expression.  If that was just an accident, then what you actually
support ought to be documented.  In principle it would also be a good
idea to enforce this restriction in the interpreter, to avoid having
this incompatibility between interpreter and compiler of the `same'
implementation.

>                                    but there's simply no way to support
>this robustly in an ahead-of-time compiler, which must serialize all
>literals to an object file.

Sure there is.  The object in question is eminently serialisable: it
contains only references to other serialisable data.  All that needs
to change is to distinguish between actual literal pairs (that can be
merged) and non-literals whose distinct identity needs to be preserved.
This might well be painful to add to your existing code, given the
way you represent pairs.  But that's a difficulty with the specific
implementation, not an inherent limitation of compilation.

-zefram





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#16362: compiler disrespects referential integrity
  2014-01-15 21:02   ` Zefram
@ 2014-01-15 22:15     ` Mark H Weaver
  2014-01-16  1:57       ` Zefram
  0 siblings, 1 reply; 6+ messages in thread
From: Mark H Weaver @ 2014-01-15 22:15 UTC (permalink / raw)
  To: Zefram; +Cc: 16362

Zefram <zefram@fysh.org> writes:

> Mark H Weaver wrote:
>>I'm sorry that you've written code that assumes that this is allowed,
>>but in Scheme all literals are immutable.
>
> It's not a literal: the object was not constructed by the action of
> the reader.  It was constructed by non-literal means, and merely *passed
> through* the reader.

In Scheme terminology, an expression of the form (quote <datum>) is a
literal.  Where that <datum> came from is not relevant to the definition
of "literal".

> That's not to say your not-a-bug opinion is wrong, though.  Scheme as
> defined by RnRS certainly doesn't support this kind of thing.  It treats
> the print form of an expression as primary, and so doesn't like having
> anything unprintable in the object form.
>
>>It worked by accident in Guile 1.8,
>
> This is the bit that's really news to me.  *Scheme* doesn't support
> it, but *Guile* is more than just Scheme, and I presumed that it was
> intentional that it took a more enlightened view of what constitutes
> an expression.  If that was just an accident, then what you actually
> support ought to be documented.

Where does it say in the documentation that this is allowed?

To my mind, Guile documents itself as Scheme plus extensions, but you
cannot determine what extensions you can depend on by experiment.  If a
given extension is not documented, then you cannot depend on it.

> In principle it would also be a good idea to enforce this restriction
> in the interpreter, to avoid having this incompatibility between
> interpreter and compiler of the `same' implementation.

Perhaps, but there are always going to be discernable differences
between multiple implementations of the same language.

>>                                    but there's simply no way to support
>>this robustly in an ahead-of-time compiler, which must serialize all
>>literals to an object file.
>
> Sure there is.  The object in question is eminently serialisable: it
> contains only references to other serialisable data.

Yes, but the identity of the objects cannot in general be preserved by
serialization where multiple object files and multiple Guile sessions
are involved.

Consider this: you serialize an object to one file, and then the same
object to a second file.  Now you load them both in from a different
Guile session.  How can the Guile loader know whether these two objects
should have the same identity or be distinct?

> All that needs to change is to distinguish between actual literal
> pairs (that can be merged) and non-literals whose distinct identity
> needs to be preserved.

That information is not preserved by the reader.

> This might well be painful to add to your existing code, given the
> way you represent pairs.  But that's a difficulty with the specific
> implementation, not an inherent limitation of compilation.

There are inherent limitations to serialization.  In the general case,
the identity of mutable objects cannot be reliably preserved.

For example, how do you correctly serialize a procedure produced by
make-counter?

  (define (make-counter)
    (let ((n 0))
      (lambda ()
        (set! n (+ n 1)) n)))

      Mark





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#16362: compiler disrespects referential integrity
  2014-01-15 22:15     ` Mark H Weaver
@ 2014-01-16  1:57       ` Zefram
  2014-10-01 19:04         ` Mark H Weaver
  0 siblings, 1 reply; 6+ messages in thread
From: Zefram @ 2014-01-16  1:57 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: 16362

Mark H Weaver wrote:
>In Scheme terminology, an expression of the form (quote <datum>) is a
>literal.

Ah, sorry, I see your usage now.  R6RS speaks of that kind of expression
being a "literal expression".  (Elsewhere it uses "literal" in the sense
I was using it, referring to the readable representation of an object.)
Section 5.10 "Storage model" says "It is desirable for constants (i.e. the
values of literal expressions) to reside in read-only memory.".  So in
the Scheme model whatever that <datum> in the expression is it's a
"constant".  Of course, that's in the RnRS view of expressions that
ignores the homoiconic representation.  It's assuming that these
"constants" will always be "literal" in the sense I was using.

>Where does it say in the documentation that this is allowed?

It doesn't: as far as I can see it doesn't document that aspect of the
language at all.  It would be nice if it did.

>To my mind, Guile documents itself as Scheme plus extensions,

I thought the documentation was attempting to document the language that
Guile implements per se.  It doesn't generally just refer to RnRS for the
language definition; it actually tells you most of what it could have
referred to RnRS for.  For example, it fully describes tail recursion,
without any reference to RnRS.  It's good that it does this, and it
would be good for it to be more complete in the areas such as this where
it's lacking.

So maybe I got the wrong impression of the documentation's role.  As the
documentation doesn't describe expressions in the RnRS character-based
way, I got the impression that Guile had not necessarily adopted that
restriction.  As it doesn't describe expressions in the homoiconic way
either, I interpreted it as silent on the issue, making experimentation
appropriate to determine the intent.

Maybe the documentation should have a note about its relationship
to the Scheme language definition: say which things it tries to be
authoritative about.

>cannot determine what extensions you can depend on by experiment.

Fair point, and I'm not bitter about my experiment turning out to have
this limited applicability.

>Consider this: you serialize an object to one file, and then the same
>object to a second file.  Now you load them both in from a different
>Guile session.  How can the Guile loader know whether these two objects
>should have the same identity or be distinct?

That's an interesting case, and I suppose I wouldn't expect that to
preserve identity.  I also wouldn't expect you to serialise an I/O port.
But the case I'm concerned about is a standalone script, being compiled
as a whole, and the objects it's setting up at compile time are made of
ordinary data.

I think some of our difference of opinion here comes because you're
mainly thinking of the compiler as something to apply to modules, so
you expect to deal with many compiled files in one session, whereas I'm
thinking about compilation of a program as a whole.  Your viewpoint is
the more general.

>For example, how do you correctly serialize a procedure produced by
>make-counter?

Assuming we're only serialising it to one file, it shouldn't be any more
difficult than my test case with a mutable pair.  The procedure object
needs to contain a reference to the body expression and a reference to
the lexical environment that it closed over.  The lexical environment
contains the binding of the symbol "n" to a variable, which contains
some current numeric value.  That variable is the basic mutable item
whose identity needs to be maintained through serialisation.  If we have
multiple procedures generated by make-counter, they'll have distinct
variables, and therefore distinct lexical environments, and therefore
be distinct procedures, though they'll share bodies.

The only part of this that looks at all difficult to me is that you may
have compiled the function body down to VM code, which is not exactly
a normal Lisp object and needs its own serialisation arrangements.
Presumably you already have that solved in order to compile code that
contains function definitions.  Aside from that it's all ordinary
Lisp objects that look totally serialisable.  What do you think is the
difficult part?

-zefram





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#16362: compiler disrespects referential integrity
  2014-01-16  1:57       ` Zefram
@ 2014-10-01 19:04         ` Mark H Weaver
  0 siblings, 0 replies; 6+ messages in thread
From: Mark H Weaver @ 2014-10-01 19:04 UTC (permalink / raw)
  To: Zefram; +Cc: 16362, request

tags 16362 + notabug wontfix
close 16362
thanks

I'm sorry that you came to depend on the undocumented behavior of
earlier versions of Guile, but the Scheme standards are quite clear that
literals are immutable and that no guarantees are made about preserving
object identity as seen by eq? or eqv?.  To my knowledge we never made
any promises that this would work, and we can't make it work properly in
the general case in our new ahead-of-time compilation model.

I'm closing this ticket.

      Mark





^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-10-01 19:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-05 23:13 bug#16362: compiler disrespects referential integrity Zefram
2014-01-15 19:57 ` Mark H Weaver
2014-01-15 21:02   ` Zefram
2014-01-15 22:15     ` Mark H Weaver
2014-01-16  1:57       ` Zefram
2014-10-01 19:04         ` Mark H Weaver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).