From: "Mattias Engdegård" <mattiase@acm.org>
To: Andy Gaynor <goldipox@mail.com>
Cc: Lars Ingebrigtsen <larsi@gnus.org>,
Andreas Schwab <schwab@linux-m68k.org>,
54501@debbugs.gnu.org, Stefan Monnier <monnier@iro.umontreal.ca>
Subject: bug#54501: Segfault on recursive structure
Date: Sat, 26 Mar 2022 16:58:53 +0100 [thread overview]
Message-ID: <839D376A-2234-4A2A-AB7E-DCCB3D6CB149@acm.org> (raw)
In-Reply-To: <trinity-3fb98d93-dfba-4296-8aa2-25b533a96589-1647872762799@3c-app-mailcom-lxa14>
[-- Attachment #1: Type: text/plain, Size: 1388 bytes --]
> #0=[#1=(#0# . #1#)]
When the reader encounters an expression in the form #N=X, the following steps take place:
1. Create a placeholder value P which is a fresh (nil . nil) cons pair.
2. Assign the number N to P in the read_objects_map.
3. Read X as the value V, where P is used for any occurrences of #N#.
4. Add V to the read_objects_completed set. This is used for future substitutions.
5. Traverse V to replace any occurrence of P with V itself, and return V so modified.
So far all good, but there is an optimisation: if X is a cons, then step 5 is skipped. Instead, since P is already a cons, its CAR and CDR slots are modified to those of V, and P is returned. That way no potentially expensive traversal of V is required.
The alert (human) reader has now spotted the error in the (lisp) reader: step 4 added the now defunct value V to read_objects_completed, not the actually returned value P. The traversal of the outer value, the vector #0 in the above example, will then enter infinite recursion because value #1 was never added to read_objects_completed.
The simplest solution is to remove the optimisation but I'd say it's algorithmically valuable and propose the attached patch.
The patch fixes the #0=#0# nonsense as well since it's a trivial check. Admittedly it doesn't handle #1=#2=#1# -- please keep this bug open if you think it's important.
[-- Attachment #2: 0001-Fix-reader-infinite-recursion-for-circular-mixed-typ.patch --]
[-- Type: application/octet-stream, Size: 4545 bytes --]
From 6819b064585470f2bdcb7baf88beba6b2937d811 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Mattias=20Engdeg=C3=A5rd?= <mattiase@acm.org>
Date: Sat, 26 Mar 2022 16:44:18 +0100
Subject: [PATCH] Fix reader infinite recursion for circular mixed-type values
Make sure that the value added to the `read_objects_completed` set is
the one we actually return; previously this wasn't the case for conses
because of an optimisation (bug#54501).
Also add a check for vacuous self-references such as #1=#1# instead of
returning a nonsense value from thin air.
* src/lread.c (read1): Treat numbered conses correctly as described
above. Detect vacuous self-references.
* test/src/lread-tests.el (lread-test-read-and-print)
(lread-test-circle-cases, lread-circle): Add tests.
---
src/lread.c | 46 +++++++++++++++++++++++++++--------------
test/src/lread-tests.el | 22 ++++++++++++++++++++
2 files changed, 52 insertions(+), 16 deletions(-)
diff --git a/src/lread.c b/src/lread.c
index d7b56c5087..17d993abd1 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -3480,6 +3480,29 @@ read1 (Lisp_Object readcharfun, int *pch, bool first_in_list, bool locate_syms)
/* Read the object itself. */
Lisp_Object tem = read0 (readcharfun, locate_syms);
+ if (CONSP (tem))
+ {
+ if (BASE_EQ (tem, placeholder))
+ /* Catch silly games like #1=#1# */
+ invalid_syntax ("nonsensical self-reference",
+ readcharfun);
+
+ /* Optimisation: since the placeholder is already
+ a cons, repurpose it as the actual value.
+ This allows us to skip the substition below,
+ since the placeholder is already referenced
+ inside TEM at the appropriate places. */
+ Fsetcar (placeholder, XCAR (tem));
+ Fsetcdr (placeholder, XCDR (tem));
+
+ struct Lisp_Hash_Table *h2
+ = XHASH_TABLE (read_objects_completed);
+ ptrdiff_t i = hash_lookup (h2, placeholder, &hash);
+ eassert (i < 0);
+ hash_put (h2, placeholder, Qnil, hash);
+ return placeholder;
+ }
+
/* If it can be recursive, remember it for
future substitutions. */
if (! SYMBOLP (tem)
@@ -3494,24 +3517,15 @@ read1 (Lisp_Object readcharfun, int *pch, bool first_in_list, bool locate_syms)
}
/* Now put it everywhere the placeholder was... */
- if (CONSP (tem))
- {
- Fsetcar (placeholder, XCAR (tem));
- Fsetcdr (placeholder, XCDR (tem));
- return placeholder;
- }
- else
- {
- Flread__substitute_object_in_subtree
- (tem, placeholder, read_objects_completed);
+ Flread__substitute_object_in_subtree
+ (tem, placeholder, read_objects_completed);
- /* ...and #n# will use the real value from now on. */
- i = hash_lookup (h, number, &hash);
- eassert (i >= 0);
- set_hash_value_slot (h, i, tem);
+ /* ...and #n# will use the real value from now on. */
+ i = hash_lookup (h, number, &hash);
+ eassert (i >= 0);
+ set_hash_value_slot (h, i, tem);
- return tem;
- }
+ return tem;
}
/* #n# returns a previously read object. */
diff --git a/test/src/lread-tests.el b/test/src/lread-tests.el
index 862f6a6595..9ec54c719c 100644
--- a/test/src/lread-tests.el
+++ b/test/src/lread-tests.el
@@ -258,5 +258,27 @@ lread-float
(should (equal (read "-0.e-5") -0.0))
)
+(defun lread-test-read-and-print (str)
+ (let* ((read-circle t)
+ (print-circle t)
+ (val (read-from-string str)))
+ (if (consp val)
+ (prin1-to-string (car val))
+ (error "reading %S failed: %S" str val))))
+
+(defconst lread-test-circle-cases
+ '("#1=(#1# . #1#)"
+ "#1=[#1# a #1#]"
+ "#1=(#2=[#1# #2#] . #1#)"
+ "#1=(#2=[#1# #2#] . #2#)"
+ "#1=[#2=(#1# . #2#)]"
+ "#1=(#2=[#3=(#1# . #2#) #4=(#3# . #4#)])"
+ ))
+
+(ert-deftest lread-circle ()
+ (dolist (str lread-test-circle-cases)
+ (ert-info (str :prefix "input: ")
+ (should (equal (lread-test-read-and-print str) str))))
+ (should-error (read-from-string "#1=#1#") :type 'invalid-read-syntax))
;;; lread-tests.el ends here
--
2.32.0 (Apple Git-132)
next prev parent reply other threads:[~2022-03-26 15:58 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-21 14:26 bug#54501: 27.2; to be disclosed in private Andy Gaynor
2022-03-22 14:44 ` bug#54501: Segfault on recursive structure Lars Ingebrigtsen
2022-03-22 15:02 ` Andreas Schwab
2022-03-22 15:04 ` Lars Ingebrigtsen
[not found] ` <trinity-1bb5c502-bafe-4a6c-b6be-08a2a1b27232-1648049044877@3c-app-mailcom-lxa04>
2022-03-25 15:34 ` Lars Ingebrigtsen
2022-03-26 15:58 ` Mattias Engdegård [this message]
2022-03-26 16:33 ` Lars Ingebrigtsen
[not found] ` <8F7060F3-8137-4835-873F-68E3F6B8010D@acm.org>
2022-03-26 17:43 ` Mattias Engdegård
2022-03-26 18:00 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=839D376A-2234-4A2A-AB7E-DCCB3D6CB149@acm.org \
--to=mattiase@acm.org \
--cc=54501@debbugs.gnu.org \
--cc=goldipox@mail.com \
--cc=larsi@gnus.org \
--cc=monnier@iro.umontreal.ca \
--cc=schwab@linux-m68k.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).