unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: "Mattias Engdegård" <mattiase@acm.org>
To: Andy Gaynor <goldipox@mail.com>
Cc: Lars Ingebrigtsen <larsi@gnus.org>,
	Andreas Schwab <schwab@linux-m68k.org>,
	54501@debbugs.gnu.org, Stefan Monnier <monnier@iro.umontreal.ca>
Subject: bug#54501: Segfault on recursive structure
Date: Sat, 26 Mar 2022 16:58:53 +0100	[thread overview]
Message-ID: <839D376A-2234-4A2A-AB7E-DCCB3D6CB149@acm.org> (raw)
In-Reply-To: <trinity-3fb98d93-dfba-4296-8aa2-25b533a96589-1647872762799@3c-app-mailcom-lxa14>

[-- Attachment #1: Type: text/plain, Size: 1388 bytes --]

> #0=[#1=(#0# . #1#)]

When the reader encounters an expression in the form #N=X, the following steps take place:

1. Create a placeholder value P which is a fresh (nil . nil) cons pair.
2. Assign the number N to P in the read_objects_map.
3. Read X as the value V, where P is used for any occurrences of #N#.
4. Add V to the read_objects_completed set. This is used for future substitutions.
5. Traverse V to replace any occurrence of P with V itself, and return V so modified.

So far all good, but there is an optimisation: if X is a cons, then step 5 is skipped. Instead, since P is already a cons, its CAR and CDR slots are modified to those of V, and P is returned. That way no potentially expensive traversal of V is required.

The alert (human) reader has now spotted the error in the (lisp) reader: step 4 added the now defunct value V to read_objects_completed, not the actually returned value P. The traversal of the outer value, the vector #0 in the above example, will then enter infinite recursion because value #1 was never added to read_objects_completed.

The simplest solution is to remove the optimisation but I'd say it's algorithmically valuable and propose the attached patch.

The patch fixes the #0=#0# nonsense as well since it's a trivial check. Admittedly it doesn't handle #1=#2=#1# -- please keep this bug open if you think it's important.


[-- Attachment #2: 0001-Fix-reader-infinite-recursion-for-circular-mixed-typ.patch --]
[-- Type: application/octet-stream, Size: 4545 bytes --]

From 6819b064585470f2bdcb7baf88beba6b2937d811 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Sat, 26 Mar 2022 16:44:18 +0100
Subject: [PATCH] Fix reader infinite recursion for circular mixed-type values

Make sure that the value added to the `read_objects_completed` set is
the one we actually return; previously this wasn't the case for conses
because of an optimisation (bug#54501).

Also add a check for vacuous self-references such as #1=#1# instead of
returning a nonsense value from thin air.

* src/lread.c (read1): Treat numbered conses correctly as described
above.  Detect vacuous self-references.
* test/src/lread-tests.el (lread-test-read-and-print)
(lread-test-circle-cases, lread-circle): Add tests.
---
 src/lread.c             | 46 +++++++++++++++++++++++++++--------------
 test/src/lread-tests.el | 22 ++++++++++++++++++++
 2 files changed, 52 insertions(+), 16 deletions(-)

diff --git a/src/lread.c b/src/lread.c
index d7b56c5087..17d993abd1 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -3480,6 +3480,29 @@ read1 (Lisp_Object readcharfun, int *pch, bool first_in_list, bool locate_syms)
 		      /* Read the object itself.  */
 		      Lisp_Object tem = read0 (readcharfun, locate_syms);
 
+                      if (CONSP (tem))
+                        {
+			  if (BASE_EQ (tem, placeholder))
+			    /* Catch silly games like #1=#1# */
+			    invalid_syntax ("nonsensical self-reference",
+					    readcharfun);
+
+			  /* Optimisation: since the placeholder is already
+			     a cons, repurpose it as the actual value.
+			     This allows us to skip the substition below,
+			     since the placeholder is already referenced
+			     inside TEM at the appropriate places.  */
+                          Fsetcar (placeholder, XCAR (tem));
+                          Fsetcdr (placeholder, XCDR (tem));
+
+			  struct Lisp_Hash_Table *h2
+			    = XHASH_TABLE (read_objects_completed);
+			  ptrdiff_t i = hash_lookup (h2, placeholder, &hash);
+			  eassert (i < 0);
+			  hash_put (h2, placeholder, Qnil, hash);
+			  return placeholder;
+			}
+
 		      /* If it can be recursive, remember it for
 			 future substitutions.  */
 		      if (! SYMBOLP (tem)
@@ -3494,24 +3517,15 @@ read1 (Lisp_Object readcharfun, int *pch, bool first_in_list, bool locate_syms)
 			}
 
 		      /* Now put it everywhere the placeholder was...  */
-                      if (CONSP (tem))
-                        {
-                          Fsetcar (placeholder, XCAR (tem));
-                          Fsetcdr (placeholder, XCDR (tem));
-                          return placeholder;
-                        }
-                      else
-                        {
-		          Flread__substitute_object_in_subtree
-			    (tem, placeholder, read_objects_completed);
+		      Flread__substitute_object_in_subtree
+			(tem, placeholder, read_objects_completed);
 
-		          /* ...and #n# will use the real value from now on.  */
-			  i = hash_lookup (h, number, &hash);
-			  eassert (i >= 0);
-			  set_hash_value_slot (h, i, tem);
+		      /* ...and #n# will use the real value from now on.  */
+		      i = hash_lookup (h, number, &hash);
+		      eassert (i >= 0);
+		      set_hash_value_slot (h, i, tem);
 
-		          return tem;
-                        }
+		      return tem;
 		    }
 
 		  /* #n# returns a previously read object.  */
diff --git a/test/src/lread-tests.el b/test/src/lread-tests.el
index 862f6a6595..9ec54c719c 100644
--- a/test/src/lread-tests.el
+++ b/test/src/lread-tests.el
@@ -258,5 +258,27 @@ lread-float
   (should (equal (read "-0.e-5") -0.0))
   )
 
+(defun lread-test-read-and-print (str)
+  (let* ((read-circle t)
+         (print-circle t)
+         (val (read-from-string str)))
+    (if (consp val)
+        (prin1-to-string (car val))
+      (error "reading %S failed: %S" str val))))
+
+(defconst lread-test-circle-cases
+  '("#1=(#1# . #1#)"
+    "#1=[#1# a #1#]"
+    "#1=(#2=[#1# #2#] . #1#)"
+    "#1=(#2=[#1# #2#] . #2#)"
+    "#1=[#2=(#1# . #2#)]"
+    "#1=(#2=[#3=(#1# . #2#) #4=(#3# . #4#)])"
+    ))
+
+(ert-deftest lread-circle ()
+  (dolist (str lread-test-circle-cases)
+    (ert-info (str :prefix "input: ")
+      (should (equal (lread-test-read-and-print str) str))))
+  (should-error (read-from-string "#1=#1#") :type 'invalid-read-syntax))
 
 ;;; lread-tests.el ends here
-- 
2.32.0 (Apple Git-132)


  parent reply	other threads:[~2022-03-26 15:58 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-21 14:26 bug#54501: 27.2; to be disclosed in private Andy Gaynor
2022-03-22 14:44 ` bug#54501: Segfault on recursive structure Lars Ingebrigtsen
2022-03-22 15:02   ` Andreas Schwab
2022-03-22 15:04     ` Lars Ingebrigtsen
     [not found]       ` <trinity-1bb5c502-bafe-4a6c-b6be-08a2a1b27232-1648049044877@3c-app-mailcom-lxa04>
2022-03-25 15:34         ` Lars Ingebrigtsen
2022-03-26 15:58 ` Mattias Engdegård [this message]
2022-03-26 16:33   ` Lars Ingebrigtsen
     [not found]     ` <8F7060F3-8137-4835-873F-68E3F6B8010D@acm.org>
2022-03-26 17:43       ` Mattias Engdegård
2022-03-26 18:00     ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=839D376A-2234-4A2A-AB7E-DCCB3D6CB149@acm.org \
    --to=mattiase@acm.org \
    --cc=54501@debbugs.gnu.org \
    --cc=goldipox@mail.com \
    --cc=larsi@gnus.org \
    --cc=monnier@iro.umontreal.ca \
    --cc=schwab@linux-m68k.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).