bug#55395: What does (1 2 3 . #2) mean?

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#55395: What does (1 2 3 . #2) mean?
@ 2022-05-13 11:32 Mattias Engdegård
  2022-05-13 15:22 ` Lars Ingebrigtsen
  2022-05-13 16:08 ` Andreas Schwab
  0 siblings, 2 replies; 10+ messages in thread
From: Mattias Engdegård @ 2022-05-13 11:32 UTC (permalink / raw)
  To: 55395

What, exactly, does the #N print notation mean (with print-circle=nil)?

Let's define (rho LEAD LOOP) as the iota list that has a loop LOOP long after LEAD initial elements:

(defun rho (lead loop)
  (let ((l (number-sequence 1 (+ lead loop))))
    (setcdr (nthcdr (+ lead loop -1) l) (nthcdr lead l))
    l))

Then we have:

(rho 0 1) => (1 . #0)
(rho 0 2) => (1 2 1 2 . #2)
(rho 0 3) => (1 2 3 1 2 . #2)
(rho 0 4) => (1 2 3 4 1 2 3 4 1 2 . #5)
(rho 0 5) => (1 2 3 4 5 1 2 3 4 5 1 . #5)
(rho 1 4) => (1 2 3 4 5 2 3 4 5 2 . #5)
(rho 4 1) => (1 2 3 4 5 5 5 . #3)

and so on. The pattern is not obvious to me.

It may have made more sense before the switch of cycle-detection algorithm from Floyd to Brent. This can be fixed by hand-coding the list iteration and explicitly remembering the index of the tortoise, but would that be correct? What's the spec?

If #N means 'Nth object from the top along the path to the current object, starting at 0' then we should have

(rho 2 3) => (1 2 3 4 5 . #2)
(list (rho 2 3)) => ((1 2 3 4 5 . #3))

ie, adding the print depth to the index in the list. Do you agree?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#55395: What does (1 2 3 . #2) mean?
  2022-05-13 11:32 bug#55395: What does (1 2 3 . #2) mean? Mattias Engdegård
@ 2022-05-13 15:22 ` Lars Ingebrigtsen
  2022-05-13 17:20   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-05-13 16:08 ` Andreas Schwab
  1 sibling, 1 reply; 10+ messages in thread
From: Lars Ingebrigtsen @ 2022-05-13 15:22 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 55395, Stefan Monnier

Mattias Engdegård <mattiase@acm.org> writes:

> It may have made more sense before the switch of cycle-detection algorithm from Floyd to Brent. This can be fixed by hand-coding the list iteration and explicitly remembering the index of the tortoise, but would that be correct? What's the spec?

I don't think I've ever considered #x to be meaningful outside of
print-circle, but I guess if we wanted to have some semantics here, I
think I would have expected the index of the tortoise?  But...

> If #N means 'Nth object from the top along the path to the current object, starting at 0' then we should have
>
> (rho 2 3) => (1 2 3 4 5 . #2)
> (list (rho 2 3)) => ((1 2 3 4 5 . #3))
>
> ie, adding the print depth to the index in the list. Do you agree?

I've added Stefan to the CCs; I'm sure he has an opinion.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#55395: What does (1 2 3 . #2) mean?
  2022-05-13 11:32 bug#55395: What does (1 2 3 . #2) mean? Mattias Engdegård
  2022-05-13 15:22 ` Lars Ingebrigtsen
@ 2022-05-13 16:08 ` Andreas Schwab
  1 sibling, 0 replies; 10+ messages in thread
From: Andreas Schwab @ 2022-05-13 16:08 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 55395

On Mai 13 2022, Mattias Engdegård wrote:

> Let's define (rho LEAD LOOP) as the iota list that has a loop LOOP long after LEAD initial elements:
>
> (defun rho (lead loop)
>   (let ((l (number-sequence 1 (+ lead loop))))
>     (setcdr (nthcdr (+ lead loop -1) l) (nthcdr lead l))
>     l))
>
> Then we have:
>
> (rho 0 1) => (1 . #0)
> (rho 0 2) => (1 2 1 2 . #2)
> (rho 0 3) => (1 2 3 1 2 . #2)
> (rho 0 4) => (1 2 3 4 1 2 3 4 1 2 . #5)
> (rho 0 5) => (1 2 3 4 5 1 2 3 4 5 1 . #5)
> (rho 1 4) => (1 2 3 4 5 2 3 4 5 2 . #5)
> (rho 4 1) => (1 2 3 4 5 5 5 . #3)
>
> and so on. The pattern is not obvious to me.
>
> It may have made more sense before the switch of cycle-detection algorithm from Floyd to Brent. This can be fixed by hand-coding the list iteration and explicitly remembering the index of the tortoise, but would that be correct? What's the spec?

I don't think there is a defined meaning behind the number, it's more an
implementation detail.  If you want to have precise cycle detection you
need to enable print-circle.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#55395: What does (1 2 3 . #2) mean?
  2022-05-13 15:22 ` Lars Ingebrigtsen
@ 2022-05-13 17:20   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-05-13 19:54     ` Lars Ingebrigtsen
  2022-05-13 20:01     ` Mattias Engdegård
  0 siblings, 2 replies; 10+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-05-13 17:20 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 55395, Mattias Engdegård

>> It may have made more sense before the switch of cycle-detection algorithm
>> from Floyd to Brent. This can be fixed by hand-coding the list iteration
>> and explicitly remembering the index of the tortoise, but would that be
>> correct? What's the spec?
>
> I don't think I've ever considered #x to be meaningful outside of
> print-circle, but I guess if we wanted to have some semantics here, I
> think I would have expected the index of the tortoise?  But...

Agreed (and agreed with Andreas as well).
The main goal is to avoid inf-looping and the #NNN chosen is
somewhat arbitrary.

I don't think it's worth it to try and make those #NNN more precise.
If the arbitrariness of the specific #NNN chosen is a problem, we could
replace it with a fixed `#¡cycle!` or something like that (or if we
want it to be more user-readable we could print something like
#cycle:<OBJ> where OBJ is the first N layers of the rest of the cycle,
like (rho 0 2) => (1 2 1 2 . #cycle:(1 2 1 ...))

I'm personally more bothered by the fact that those #NNN use exactly the
same syntax as used with `print-circle` yet they don't have the
same semantics.

        Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#55395: What does (1 2 3 . #2) mean?
  2022-05-13 17:20   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-05-13 19:54     ` Lars Ingebrigtsen
  2022-05-13 20:01     ` Mattias Engdegård
  1 sibling, 0 replies; 10+ messages in thread
From: Lars Ingebrigtsen @ 2022-05-13 19:54 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 55395, Mattias Engdegård

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> Agreed (and agreed with Andreas as well).
> The main goal is to avoid inf-looping and the #NNN chosen is
> somewhat arbitrary.
>
> I don't think it's worth it to try and make those #NNN more precise.

So it seems like everybody basically agrees on that...  so if Mattias
wants to implement something here to give them some semantics, he can
pretty much choose whatever he wants?

Or just document that there's really no semantics here without
print-circle, if he prefers.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#55395: What does (1 2 3 . #2) mean?
  2022-05-13 17:20   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-05-13 19:54     ` Lars Ingebrigtsen
@ 2022-05-13 20:01     ` Mattias Engdegård
  2022-05-14 13:45       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 10+ messages in thread
From: Mattias Engdegård @ 2022-05-13 20:01 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 55395, Lars Ingebrigtsen

13 maj 2022 kl. 19.20 skrev Stefan Monnier <monnier@iro.umontreal.ca>:

> The main goal is to avoid inf-looping and the #NNN chosen is
> somewhat arbitrary.

It seems to have sort-of worked before and was broken, inadvertently and with the best intentions, by a later change.

Additionally the #N value appears to be correct for other object types. For example,

  [a [b #1=[[c #1# d]]]]

is printed as

  [a [b [[c #2 d]]]]

which is consistent with the manual.

> I don't think it's worth it to try and make those #NNN more precise.

I have a simple patch but it is not based on master so it will have to wait.

> I'm personally more bothered by the fact that those #NNN use exactly the
> same syntax as used with `print-circle` yet they don't have the
> same semantics.

The syntax isn't exactly the same (#N vs #N= and #N#) but annoyingly close.

For that matter I would have preferred numbering the other way, starting at the bottom going up, like de Bruijn indices. That way the numbering would be local and wouldn't depend on where the circular subtree occurs. Technically that would be an easy change, and if we agree that the numbers are unreliable right now, we might as well.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#55395: What does (1 2 3 . #2) mean?
  2022-05-13 20:01     ` Mattias Engdegård
@ 2022-05-14 13:45       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-05-18 14:29         ` Mattias Engdegård
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-05-14 13:45 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 55395, Lars Ingebrigtsen

>> The main goal is to avoid inf-looping and the #NNN chosen is
>> somewhat arbitrary.
> It seems to have sort-of worked before and was broken, inadvertently and
> with the best intentions, by a later change.
>
> Additionally the #N value appears to be correct for other object types. For example,
>
>   [a [b #1=[[c #1# d]]]]
>
> is printed as
>
>   [a [b [[c #2 d]]]]
>
> which is consistent with the manual.

I do have the impression that it used to be "correct", but I can't
remember of a single time where I actually made use of that NNN.

And even if you know what it means and it works correctly, it's pretty
hard for a normal human to correctly count the depth starting from
the root.

I suspect if we want to "do better", printing "the beginning" of the
object to which we're looping (and saying explicitly that there's
a cycle) will be a lot more useful to the average user.
[ Or we could even print "the period" rather than "the beginning".  ]

>> I'm personally more bothered by the fact that those #NNN use exactly the
>> same syntax as used with `print-circle` yet they don't have the
>> same semantics.
> The syntax isn't exactly the same (#N vs #N= and #N#) but annoyingly close.

Indeed, it's not as bad as I remembered.

> For that matter I would have preferred numbering the other way, starting at
> the bottom going up, like de Bruijn indices.

I love&hate de Bruijn indices as much as the next guy, but they're not
human-friendly (even more than the "like de Bruijn levels" we currently
use).

        Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#55395: What does (1 2 3 . #2) mean?
  2022-05-14 13:45       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-05-18 14:29         ` Mattias Engdegård
  2022-05-18 21:16           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 10+ messages in thread
From: Mattias Engdegård @ 2022-05-18 14:29 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 55395, Lars Ingebrigtsen

[-- Attachment #1: Type: text/plain, Size: 1303 bytes --]

14 maj 2022 kl. 15.45 skrev Stefan Monnier <monnier@iro.umontreal.ca>:

> I do have the impression that it used to be "correct", but I can't
> remember of a single time where I actually made use of that NNN.
> 
> And even if you know what it means and it works correctly, it's pretty
> hard for a normal human to correctly count the depth starting from
> the root.

It is. Counting upwards from a leaf seems slightly easier.

Perhaps these attempts to generate a meaningful circularity reference is a fool's errand and we should just go with #!circle! or something similar.

Anyway, I'm attaching a patch that tries to put some meaning back in the ` . #N` notation: it's an index into the same list, as it used to be, I think, prior to the switch from Floyd to Brent. This is done by keeping track of the index of the tortoise at all times, which isn't very expensive at all.

Without the patch: (rho 4 1) => (1 2 3 4 5 5 5 . #3)
With the patch:    (rho 4 1) => (1 2 3 4 5 5 5 . #6)

which should be a mild improvement.
It's still ambiguous:

 (1 1 . #1)

could mean either

 #1=(1 . #1#)

or

 #1=(1 1 . #1#)

and

 [(1 2 3 . #0)]

could mean either

 [#1=(1 2 3 . #1#)]

or
 #1=[(1 2 3 . #1#)]

but maybe it's less wrong, with room for future improvement?


[-- Attachment #2: circular-list-print-index.diff --]
[-- Type: application/octet-stream, Size: 2471 bytes --]

diff --git a/src/print.c b/src/print.c
index da4869e8fb..823bc07406 100644
--- a/src/print.c
+++ b/src/print.c
@@ -2028,12 +2028,12 @@ named_escape (int i)
   union {
     struct {
       Lisp_Object last;		/* cons whose car was just printed  */
-      ptrdiff_t idx;		/* index of next element */
-      intmax_t maxlen;		/* max length (from Vprint_length) */
+      intmax_t maxlen;		/* max number of elements left to print */
       /* state for Brent cycle detection */
       Lisp_Object tortoise;     /* slow pointer */
       ptrdiff_t n;		/* tortoise step countdown */
       ptrdiff_t m;		/* tortoise step period */
+      ptrdiff_t tortoise_idx;	/* index of tortoise */
     } list;
     struct {
       Lisp_Object obj;		/* object to print after " . " */
@@ -2408,10 +2408,10 @@ print_object (Lisp_Object obj, Lisp_Object printcharfun, bool escapeflag)
 		  .type = PE_list,
 		  .u.list.last = obj,
 		  .u.list.maxlen = print_length,
-		  .u.list.idx = 1,
 		  .u.list.tortoise = obj,
 		  .u.list.n = 2,
 		  .u.list.m = 2,
+		  .u.list.tortoise_idx = 0,
 		});
 	      /* print the car */
 	      obj = XCAR (obj);
@@ -2582,10 +2582,8 @@ print_object (Lisp_Object obj, Lisp_Object printcharfun, bool escapeflag)
 
 		printchar (' ', printcharfun);
 
-		/* FIXME: We wouldn't need to keep track of idx if we
-		   count down maxlen instead, and maintain a separate
-		   tortoise index if required.  */
-		if (e->u.list.idx >= e->u.list.maxlen)
+		--e->u.list.maxlen;
+		if (e->u.list.maxlen <= 0)
 		  {
 		    print_c_string ("...)", printcharfun);
 		    --prstack.sp;
@@ -2594,22 +2592,21 @@ print_object (Lisp_Object obj, Lisp_Object printcharfun, bool escapeflag)
 		  }
 
 		e->u.list.last = next;
-		e->u.list.idx++;
 		e->u.list.n--;
 		if (e->u.list.n == 0)
 		  {
 		    /* Double tortoise update period and teleport it.  */
+		    e->u.list.tortoise_idx += e->u.list.m;
 		    e->u.list.m <<= 1;
 		    e->u.list.n = e->u.list.m;
 		    e->u.list.tortoise = next;
 		  }
 		else if (BASE_EQ (next, e->u.list.tortoise))
 		  {
-		    /* FIXME: This #N tail index is bug-compatible with
-		       previous implementations but actually nonsense;
+		    /* FIXME: This #N tail index is somewhat ambiguous;
 		       see bug#55395.  */
 		    int len = sprintf (buf, ". #%" PRIdMAX ")",
-				       (e->u.list.idx >> 1) - 1);
+				       e->u.list.tortoise_idx);
 		    strout (buf, len, len, printcharfun);
 		    --prstack.sp;
 		    --print_depth;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* bug#55395: What does (1 2 3 . #2) mean?
  2022-05-18 14:29         ` Mattias Engdegård
@ 2022-05-18 21:16           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2022-05-23 14:59             ` Mattias Engdegård
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-05-18 21:16 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: 55395, Lars Ingebrigtsen

>  [(1 2 3 . #0)]
>
> could mean either
>
>  [#1=(1 2 3 . #1#)]
>
> or
>
>  #1=[(1 2 3 . #1#)]

AFAIK it *should* mean the latter (#0 is the root of the overall
printed object, not just the list).

Mattias Engdegård [2022-05-18 16:29:51] wrote:

> Perhaps these attempts to generate a meaningful circularity reference is
> a fool's errand and we should just go with #!circle! or something similar.

Yup.  But we don't shy away from playing the fools.


        Stefan






^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#55395: What does (1 2 3 . #2) mean?
  2022-05-18 21:16           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2022-05-23 14:59             ` Mattias Engdegård
  0 siblings, 0 replies; 10+ messages in thread
From: Mattias Engdegård @ 2022-05-23 14:59 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 55395, Lars Ingebrigtsen

18 maj 2022 kl. 23.16 skrev Stefan Monnier <monnier@iro.umontreal.ca>:
> 
>> [(1 2 3 . #0)]
>> 
>> could mean either
>> 
>> [#1=(1 2 3 . #1#)]
>> 
>> or
>> 
>> #1=[(1 2 3 . #1#)]
> 
> AFAIK it *should* mean the latter (#0 is the root of the overall
> printed object, not just the list).

Yes, but the code was already a bit ambiguous in that respect. We don't keep track of the cons-level depth when printing; print_depth treats each list nesting as one step regardless of where in the list the nesting occurs. I suppose this could be remedied but I'm not going to work on it right now.

The previously attached patch has been pushed to master, on the grounds that it should be strictly better than what we had before; correctness should be back at the level before the change in circularity detection algorithm broke the tail index completely.

> Yup.  But we don't shy away from playing the fools.

You know me, I don't even need to play one. It all comes natural.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-05-23 14:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-13 11:32 bug#55395: What does (1 2 3 . #2) mean? Mattias Engdegård
2022-05-13 15:22 ` Lars Ingebrigtsen
2022-05-13 17:20   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-05-13 19:54     ` Lars Ingebrigtsen
2022-05-13 20:01     ` Mattias Engdegård
2022-05-14 13:45       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-05-18 14:29         ` Mattias Engdegård
2022-05-18 21:16           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-05-23 14:59             ` Mattias Engdegård
2022-05-13 16:08 ` Andreas Schwab

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).