unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#54501: 27.2; to be disclosed in private
@ 2022-03-21 14:26 Andy Gaynor
  2022-03-22 14:44 ` bug#54501: Segfault on recursive structure Lars Ingebrigtsen
  2022-03-26 15:58 ` Mattias Engdegård
  0 siblings, 2 replies; 9+ messages in thread
From: Andy Gaynor @ 2022-03-21 14:26 UTC (permalink / raw)
  To: 54501

[-- Attachment #1: Type: text/html, Size: 6086 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#54501: Segfault on recursive structure
  2022-03-21 14:26 bug#54501: 27.2; to be disclosed in private Andy Gaynor
@ 2022-03-22 14:44 ` Lars Ingebrigtsen
  2022-03-22 15:02   ` Andreas Schwab
  2022-03-26 15:58 ` Mattias Engdegård
  1 sibling, 1 reply; 9+ messages in thread
From: Lars Ingebrigtsen @ 2022-03-22 14:44 UTC (permalink / raw)
  To: Andy Gaynor; +Cc: 54501, Stefan Monnier

So this bug report is about two things.  The first is that this reads to
(nil):

#0=#0#

Which seems odd.  Reading #0=#1# signals an error, but it's not
immediately clear to me whether #0=#0# is totally nonsensical or not.
And if not, is (nil) the right result?  Anybody?

The other thing is more serious, and reading the following will segfault
your Emacs, so don't do that:

#0=[#1=(#0# . #1#)]

Now, Emacs segfaults on trying to gc a number of recursive objects
(especially ones that recurse in the `car'), but this seems to actually
segfault in the reader.  Is it obvious to anybody why?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no






^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#54501: Segfault on recursive structure
  2022-03-22 14:44 ` bug#54501: Segfault on recursive structure Lars Ingebrigtsen
@ 2022-03-22 15:02   ` Andreas Schwab
  2022-03-22 15:04     ` Lars Ingebrigtsen
  0 siblings, 1 reply; 9+ messages in thread
From: Andreas Schwab @ 2022-03-22 15:02 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Andy Gaynor, 54501, Stefan Monnier

On Mär 22 2022, Lars Ingebrigtsen wrote:

> So this bug report is about two things.  The first is that this reads to
> (nil):
>
> #0=#0#
>
> Which seems odd.  Reading #0=#1# signals an error, but it's not
> immediately clear to me whether #0=#0# is totally nonsensical or not.
> And if not, is (nil) the right result?  Anybody?

That's an side effect of the implementation: (nil) is the placeholder
object which #0# then references.

> The other thing is more serious, and reading the following will segfault
> your Emacs, so don't do that:
>
> #0=[#1=(#0# . #1#)]
>
> Now, Emacs segfaults on trying to gc a number of recursive objects
> (especially ones that recurse in the `car'), but this seems to actually
> segfault in the reader.  Is it obvious to anybody why?

Does it crash in substitute_object_recurse?

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#54501: Segfault on recursive structure
  2022-03-22 15:02   ` Andreas Schwab
@ 2022-03-22 15:04     ` Lars Ingebrigtsen
       [not found]       ` <trinity-1bb5c502-bafe-4a6c-b6be-08a2a1b27232-1648049044877@3c-app-mailcom-lxa04>
  0 siblings, 1 reply; 9+ messages in thread
From: Lars Ingebrigtsen @ 2022-03-22 15:04 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Andy Gaynor, 54501, Stefan Monnier

Andreas Schwab <schwab@linux-m68k.org> writes:

> That's an side effect of the implementation: (nil) is the placeholder
> object which #0# then references.

Ah, right.  Should it signal an error instead, do you think?

> Does it crash in substitute_object_recurse?

Yup.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#54501: Segfault on recursive structure
       [not found]       ` <trinity-1bb5c502-bafe-4a6c-b6be-08a2a1b27232-1648049044877@3c-app-mailcom-lxa04>
@ 2022-03-25 15:34         ` Lars Ingebrigtsen
  0 siblings, 0 replies; 9+ messages in thread
From: Lars Ingebrigtsen @ 2022-03-25 15:34 UTC (permalink / raw)
  To: Andy Gaynor; +Cc: 54501

(Re-sending for the bug tracker.)

Andy Gaynor <goldipox@mail.com> writes:

>> #0=#0#  =  (nil)
>
> This error is intrinsic to the process, much like the first time Scheme
> sees (define z z).  It probably needs to be checked explicitly--when
> first defining a label, its value cannot be a reference to that label.
> If the label is already defined, dandy, nothing to check or initialize.
>
> That (nil) = (nil . nil) looks suspiciously stubby.  (Nil nil nil, so
> much nil in the world, the most distinguished value in the language,
> tossed around so casually.)  I wouldn't be surprised to learn that it's
> an optimization, a speculative initialization favoring the common case of
> defining a label to a pair.  I could be wrong.  Hmmm, (#0=3 #0=#0#) is
> well-defined, and should be (3 3).  Or prohibited for no good reason,
> which seems to be the trend nowadays.  Prohibiting this is consistent
> with prohibiting assignments and restricting alists to only allow one
> association per key.  Stupid.  (Oh, did I write that out loud?)
>
> In Emacs, (#0=3 #0=#0#) = (3 (nil)), ung, (nil . nil) again.  Given that
> [#0=3 #0=#0#] = [3 3] and #s(#0=Z #0=#0#) = #s(Z Z), I'm more inclined to
> call this another pair-handling error.
>
>> Emacs segfaults on trying to gc a number of recursive objects,
>> but #0=[#1=(#0# . #1#)] seems to actually segfault in the reader.
>> Is it obvious to anybody why?
>
> Perhaps this instance is more... distilled.  Both objects are labeled,
> both labels are used, all components are labels, and one is self-cyclic.
> Note that the expression crashes when either pair component is
> self-cyclic, and doesn't crash when- Strike that, let's start with simple
> and work our way up.
>
>   #0=(#0# . #0#)  =  #1=(#1# . #1#)  =  ok
>
>   #1=#0=[#0#   #0#]  =   #1=[#1#   #1#]         =  ok
>   #1=#0=(#0# . #0#)  =  (#1=(#1# . #1#) . #1#)  =  bad
>
>   #2=#1=#0=[#0#   #0#]  =    #1=[#1#   #1#]         =  ok
>   #2=#1=#0=(#0# . #0#)  ->  (#1=(#1# . #1#) . #1#)  =  bad
>
> Another bug manifesting for pairs and not other stuff?  I'm satisfied.
>
> I haven't looked inside Emacs yet, but usually, most types are treated
> much the same, but pairs are augmented with optimizations for lists,
> making them more complicated.  Heck, in my still-skeletal fasl, arrays
> are handled with 4 instructions, the model-to-be for most referential
> types.  However, pairs/lists have 11 instructions, handling list and
> list* under various conditions (automatically selected, of course).  I
> added the list optimizations very early, in near isolation, because this
> is subtle business.
>
> I just polled 24 Lisps.  9 didn't implement labels.  2 gave me guff
> (dammit Racket and a no-name), so screw 'em.  Of the 13 left, the same 5
> flubbed label-thyself and relabel-thyself.  4 flubbed #0=(#0# . #0#), and
> 3 flubbed #0=[#1=(#0# . #1#)].  The point, oh yeah.  These folks are
> skilled programmers on familiar turf and still have problems.  Being a
> GNU venue, I'll mention that GCL failed to build (incorrectly setting
> things up for signal.h?) and Guile flubbed all four tests.
>
> Other than the label issues and pairs/lists going to hell in a humv, do
> things seem ok?  I just fed emacs a lot of funk, but with no pairs/lists
> or fringe label cases, and everything worked.  I recommend running with
> that, which seems safer than trying to debug something unfamiliar that
> trips up everyone.  Make a working copy of read.  Completely remove any
> handling for pairs/lists, label stubs, whatever.  Make labels nice, work
> the kinks out of the fringe cases.  Add pairs back generically, coded
> much like everything else--no label or list optimizations.  When you've
> got it right, commit to the copy.  The snipped optimizations can be
> snarfed from a trusted source (no guff or flubs from Bigloo, Chez, Clisp,
> Gambit, Gauche, Kawa, SBCL) at your convenience.  Kawa was the one that
> retained label redefinitions, making it worth a peek.
>
> Regards, Andy





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#54501: Segfault on recursive structure
  2022-03-21 14:26 bug#54501: 27.2; to be disclosed in private Andy Gaynor
  2022-03-22 14:44 ` bug#54501: Segfault on recursive structure Lars Ingebrigtsen
@ 2022-03-26 15:58 ` Mattias Engdegård
  2022-03-26 16:33   ` Lars Ingebrigtsen
  1 sibling, 1 reply; 9+ messages in thread
From: Mattias Engdegård @ 2022-03-26 15:58 UTC (permalink / raw)
  To: Andy Gaynor; +Cc: Lars Ingebrigtsen, Andreas Schwab, 54501, Stefan Monnier

[-- Attachment #1: Type: text/plain, Size: 1388 bytes --]

> #0=[#1=(#0# . #1#)]

When the reader encounters an expression in the form #N=X, the following steps take place:

1. Create a placeholder value P which is a fresh (nil . nil) cons pair.
2. Assign the number N to P in the read_objects_map.
3. Read X as the value V, where P is used for any occurrences of #N#.
4. Add V to the read_objects_completed set. This is used for future substitutions.
5. Traverse V to replace any occurrence of P with V itself, and return V so modified.

So far all good, but there is an optimisation: if X is a cons, then step 5 is skipped. Instead, since P is already a cons, its CAR and CDR slots are modified to those of V, and P is returned. That way no potentially expensive traversal of V is required.

The alert (human) reader has now spotted the error in the (lisp) reader: step 4 added the now defunct value V to read_objects_completed, not the actually returned value P. The traversal of the outer value, the vector #0 in the above example, will then enter infinite recursion because value #1 was never added to read_objects_completed.

The simplest solution is to remove the optimisation but I'd say it's algorithmically valuable and propose the attached patch.

The patch fixes the #0=#0# nonsense as well since it's a trivial check. Admittedly it doesn't handle #1=#2=#1# -- please keep this bug open if you think it's important.


[-- Attachment #2: 0001-Fix-reader-infinite-recursion-for-circular-mixed-typ.patch --]
[-- Type: application/octet-stream, Size: 4545 bytes --]

From 6819b064585470f2bdcb7baf88beba6b2937d811 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Sat, 26 Mar 2022 16:44:18 +0100
Subject: [PATCH] Fix reader infinite recursion for circular mixed-type values

Make sure that the value added to the `read_objects_completed` set is
the one we actually return; previously this wasn't the case for conses
because of an optimisation (bug#54501).

Also add a check for vacuous self-references such as #1=#1# instead of
returning a nonsense value from thin air.

* src/lread.c (read1): Treat numbered conses correctly as described
above.  Detect vacuous self-references.
* test/src/lread-tests.el (lread-test-read-and-print)
(lread-test-circle-cases, lread-circle): Add tests.
---
 src/lread.c             | 46 +++++++++++++++++++++++++++--------------
 test/src/lread-tests.el | 22 ++++++++++++++++++++
 2 files changed, 52 insertions(+), 16 deletions(-)

diff --git a/src/lread.c b/src/lread.c
index d7b56c5087..17d993abd1 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -3480,6 +3480,29 @@ read1 (Lisp_Object readcharfun, int *pch, bool first_in_list, bool locate_syms)
 		      /* Read the object itself.  */
 		      Lisp_Object tem = read0 (readcharfun, locate_syms);
 
+                      if (CONSP (tem))
+                        {
+			  if (BASE_EQ (tem, placeholder))
+			    /* Catch silly games like #1=#1# */
+			    invalid_syntax ("nonsensical self-reference",
+					    readcharfun);
+
+			  /* Optimisation: since the placeholder is already
+			     a cons, repurpose it as the actual value.
+			     This allows us to skip the substition below,
+			     since the placeholder is already referenced
+			     inside TEM at the appropriate places.  */
+                          Fsetcar (placeholder, XCAR (tem));
+                          Fsetcdr (placeholder, XCDR (tem));
+
+			  struct Lisp_Hash_Table *h2
+			    = XHASH_TABLE (read_objects_completed);
+			  ptrdiff_t i = hash_lookup (h2, placeholder, &hash);
+			  eassert (i < 0);
+			  hash_put (h2, placeholder, Qnil, hash);
+			  return placeholder;
+			}
+
 		      /* If it can be recursive, remember it for
 			 future substitutions.  */
 		      if (! SYMBOLP (tem)
@@ -3494,24 +3517,15 @@ read1 (Lisp_Object readcharfun, int *pch, bool first_in_list, bool locate_syms)
 			}
 
 		      /* Now put it everywhere the placeholder was...  */
-                      if (CONSP (tem))
-                        {
-                          Fsetcar (placeholder, XCAR (tem));
-                          Fsetcdr (placeholder, XCDR (tem));
-                          return placeholder;
-                        }
-                      else
-                        {
-		          Flread__substitute_object_in_subtree
-			    (tem, placeholder, read_objects_completed);
+		      Flread__substitute_object_in_subtree
+			(tem, placeholder, read_objects_completed);
 
-		          /* ...and #n# will use the real value from now on.  */
-			  i = hash_lookup (h, number, &hash);
-			  eassert (i >= 0);
-			  set_hash_value_slot (h, i, tem);
+		      /* ...and #n# will use the real value from now on.  */
+		      i = hash_lookup (h, number, &hash);
+		      eassert (i >= 0);
+		      set_hash_value_slot (h, i, tem);
 
-		          return tem;
-                        }
+		      return tem;
 		    }
 
 		  /* #n# returns a previously read object.  */
diff --git a/test/src/lread-tests.el b/test/src/lread-tests.el
index 862f6a6595..9ec54c719c 100644
--- a/test/src/lread-tests.el
+++ b/test/src/lread-tests.el
@@ -258,5 +258,27 @@ lread-float
   (should (equal (read "-0.e-5") -0.0))
   )
 
+(defun lread-test-read-and-print (str)
+  (let* ((read-circle t)
+         (print-circle t)
+         (val (read-from-string str)))
+    (if (consp val)
+        (prin1-to-string (car val))
+      (error "reading %S failed: %S" str val))))
+
+(defconst lread-test-circle-cases
+  '("#1=(#1# . #1#)"
+    "#1=[#1# a #1#]"
+    "#1=(#2=[#1# #2#] . #1#)"
+    "#1=(#2=[#1# #2#] . #2#)"
+    "#1=[#2=(#1# . #2#)]"
+    "#1=(#2=[#3=(#1# . #2#) #4=(#3# . #4#)])"
+    ))
+
+(ert-deftest lread-circle ()
+  (dolist (str lread-test-circle-cases)
+    (ert-info (str :prefix "input: ")
+      (should (equal (lread-test-read-and-print str) str))))
+  (should-error (read-from-string "#1=#1#") :type 'invalid-read-syntax))
 
 ;;; lread-tests.el ends here
-- 
2.32.0 (Apple Git-132)


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* bug#54501: Segfault on recursive structure
  2022-03-26 15:58 ` Mattias Engdegård
@ 2022-03-26 16:33   ` Lars Ingebrigtsen
       [not found]     ` <8F7060F3-8137-4835-873F-68E3F6B8010D@acm.org>
  2022-03-26 18:00     ` Eli Zaretskii
  0 siblings, 2 replies; 9+ messages in thread
From: Lars Ingebrigtsen @ 2022-03-26 16:33 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Andy Gaynor, 54501, Andreas Schwab, Stefan Monnier

Mattias Engdegård <mattiase@acm.org> writes:

> The patch fixes the #0=#0# nonsense as well since it's a trivial
> check.

Thanks; seems to work perfectly.

Do you think this is safe enough to go on the release branch?

This isn't a regression (Emacs has segfaulted on the #0=[#1=(#0# . #1#)]
as far back as I have Emacs versions), so I'd tend toward thinking it's
not vital enough to install on the release branch, but perhaps Eli has a
different opinion.  Eli?

> Admittedly it doesn't handle #1=#2=#1# -- please keep this bug
> open if you think it's important.

No, I think this fix is sufficient.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#54501: Segfault on recursive structure
       [not found]     ` <8F7060F3-8137-4835-873F-68E3F6B8010D@acm.org>
@ 2022-03-26 17:43       ` Mattias Engdegård
  0 siblings, 0 replies; 9+ messages in thread
From: Mattias Engdegård @ 2022-03-26 17:43 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Andy Gaynor, 54501-done, Andreas Schwab, Stefan Monnier

26 mars 2022 kl. 17.33 skrev Lars Ingebrigtsen <larsi@gnus.org>:

> Do you think this is safe enough to go on the release branch?
> 
> This isn't a regression (Emacs has segfaulted on the #0=[#1=(#0# . #1#)]
> as far back as I have Emacs versions), so I'd tend toward thinking it's
> not vital enough to install on the release branch

The context of the bug wasn't clear to me, but I didn't get the impression that it was impeding normal Emacs usage. I'm committing the patch to master for now.

Thanks for taking a look!






^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#54501: Segfault on recursive structure
  2022-03-26 16:33   ` Lars Ingebrigtsen
       [not found]     ` <8F7060F3-8137-4835-873F-68E3F6B8010D@acm.org>
@ 2022-03-26 18:00     ` Eli Zaretskii
  1 sibling, 0 replies; 9+ messages in thread
From: Eli Zaretskii @ 2022-03-26 18:00 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: mattiase, 54501, schwab, monnier, goldipox

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Andy Gaynor <goldipox@mail.com>,  54501@debbugs.gnu.org,  Stefan Monnier
>  <monnier@iro.umontreal.ca>,  Andreas Schwab <schwab@linux-m68k.org>, Eli
>  Zaretskii <eliz@gnu.org>
> Date: Sat, 26 Mar 2022 17:33:22 +0100
> 
> Mattias Engdegård <mattiase@acm.org> writes:
> 
> > The patch fixes the #0=#0# nonsense as well since it's a trivial
> > check.
> 
> Thanks; seems to work perfectly.
> 
> Do you think this is safe enough to go on the release branch?
> 
> This isn't a regression (Emacs has segfaulted on the #0=[#1=(#0# . #1#)]
> as far back as I have Emacs versions), so I'd tend toward thinking it's
> not vital enough to install on the release branch, but perhaps Eli has a
> different opinion.  Eli?

I think we shouldn't install this on the release branch, it's too
risky, and the original problem has been with us forever, or
thereabouts.





^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-03-26 18:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-21 14:26 bug#54501: 27.2; to be disclosed in private Andy Gaynor
2022-03-22 14:44 ` bug#54501: Segfault on recursive structure Lars Ingebrigtsen
2022-03-22 15:02   ` Andreas Schwab
2022-03-22 15:04     ` Lars Ingebrigtsen
     [not found]       ` <trinity-1bb5c502-bafe-4a6c-b6be-08a2a1b27232-1648049044877@3c-app-mailcom-lxa04>
2022-03-25 15:34         ` Lars Ingebrigtsen
2022-03-26 15:58 ` Mattias Engdegård
2022-03-26 16:33   ` Lars Ingebrigtsen
     [not found]     ` <8F7060F3-8137-4835-873F-68E3F6B8010D@acm.org>
2022-03-26 17:43       ` Mattias Engdegård
2022-03-26 18:00     ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).