Re: master d582356: * src/fns.c (Frandom): Handle bignum `limit`s

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Re: master d582356: * src/fns.c (Frandom): Handle bignum `limit`s
       [not found] ` <20210305170957.AF99920E1B@vcs0.savannah.gnu.org>
@ 2021-03-05 19:42   ` Pip Cet
  2021-03-05 19:56     ` Stefan Monnier
  0 siblings, 1 reply; 19+ messages in thread
From: Pip Cet @ 2021-03-05 19:42 UTC (permalink / raw)
  To: emacs-devel, Stefan Monnier

On Fri, Mar 5, 2021 at 5:28 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> branch: master
> commit d582356a7f704f8a209a3ef31d6ea970520c6224
> Author: Stefan Monnier <monnier@iro.umontreal.ca>
> Commit: Stefan Monnier <monnier@iro.umontreal.ca>
>
>     * src/fns.c (Frandom): Handle bignum `limit`s
>
>     (ccall2, get_random_bignum): New functions.
> ---
>  doc/lispref/numbers.texi |  2 +-
>  src/fns.c                | 53 +++++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 53 insertions(+), 2 deletions(-)
>
> diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi
> index 63e3e0b..4c5f7212 100644
> --- a/doc/lispref/numbers.texi
> +++ b/doc/lispref/numbers.texi
> @@ -1250,7 +1250,7 @@ other strings to choose various seed values.
>  This function returns a pseudo-random integer.  Repeated calls return a
>  series of pseudo-random integers.
>
> -If @var{limit} is a positive fixnum, the value is chosen to be
> +If @var{limit} is a positive integer, the value is chosen to be
>  nonnegative and less than @var{limit}.  Otherwise, the value might be

Should we add "with every value equally likely" here, or is that
perfectly obvious?

>  any fixnum, i.e., any integer from @code{most-negative-fixnum} through
>  @code{most-positive-fixnum} (@pxref{Integer Basics}).
> diff --git a/src/fns.c b/src/fns.c
> index 7914bd4..b193ad6 100644
> --- a/src/fns.c
> +++ b/src/fns.c
> @@ -54,10 +54,55 @@ DEFUN ("identity", Fidentity, Sidentity, 1, 1, 0,
>    return argument;
>  }
>
> +static Lisp_Object
> +ccall2 (Lisp_Object (f) (ptrdiff_t nargs, Lisp_Object *args),
> +        Lisp_Object arg1, Lisp_Object arg2)
> +{
> +  Lisp_Object args[2] = {arg1, arg2};
> +  return f (2, args);
> +}

Can't we use CALLN?

> +static Lisp_Object
> +get_random_bignum (Lisp_Object limit)
> +{
> +  /* This is a naive transcription into bignums of the fixnum algorithm.
> +     I'd be quite surprised if that's anywhere near the best algorithm
> +     for it.  */
> +  while (true)
> +    {
> +      Lisp_Object val = make_fixnum (0);
> +      Lisp_Object lim = limit;
> +      int bits = 0;
> +      int bitsperiteration = FIXNUM_BITS - 1;
> +      do
> +        {
> +          /* Shift by one so it is a valid positive fixnum.  */
> +          EMACS_INT rand = get_random () >> 1;
> +          Lisp_Object lrand = make_fixnum (rand);
> +          bits += bitsperiteration;
> +          val = ccall2 (Flogior,
> +                        Fash (val, make_fixnum (bitsperiteration)),
> +                        lrand);
> +          lim = Fash (lim, make_fixnum (- bitsperiteration));
> +        }
> +      while (!EQ (lim, make_fixnum (0)));
> +      /* Return the remainder, except reject the rare case where
> +        get_random returns a number so close to INTMASK that the

No longer INTMASK.

> +        remainder isn't random.  */
> +      Lisp_Object remainder = Frem (val, limit);
> +      if (!NILP (ccall2 (Fleq,
> +                        ccall2 (Fminus, val, remainder),
> +                        ccall2 (Fminus,
> +                                Fash (make_fixnum (1), make_fixnum (bits)),
> +                                limit))))
> +       return remainder;

Whenever I see that algorithm, I think it can't possibly be correct,
but it is :-)

> +    }
> +}
> +
>  DEFUN ("random", Frandom, Srandom, 0, 1, 0,
>         doc: /* Return a pseudo-random integer.
>  By default, return a fixnum; all fixnums are equally likely.
> -With positive fixnum LIMIT, return random integer in interval [0,LIMIT).
> +With positive integer LIMIT, return random integer in interval [0,LIMIT).
>  With argument t, set the random number seed from the system's entropy
>  pool if available, otherwise from less-random volatile data such as the time.
>  With a string argument, set the seed based on the string's contents.

That docstring always tricks me into thinking "oh, don't worry about
passing something invalid, you'll get an error", when in fact, you get
a fixnum. (random -1)? Random fixnum. (random 1.0)? Random fixnum.
(random 'many)? Random fixnum.

> @@ -71,6 +116,12 @@ See Info node `(elisp)Random Numbers' for more details.  */)
>      init_random ();
>    else if (STRINGP (limit))
>      seed_random (SSDATA (limit), SBYTES (limit));
> +  if (BIGNUMP (limit))
> +    {
> +      if (0 > mpz_sgn (*xbignum_val (limit)))
> +        xsignal2 (Qwrong_type_argument, Qnatnump, limit);

It's inconsistent for this function to handle the negative bignum case
correctly.

But I'm really writing to ask whether it might be a good idea to add
float support while we're there. I'll leave uniformly-distributed big
ratios as an exercise for the reader ;-)

And,all of this could happen in Lisp, couldn't it? Should it?

Pip



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: master d582356: * src/fns.c (Frandom): Handle bignum `limit`s
  2021-03-05 19:42   ` master d582356: * src/fns.c (Frandom): Handle bignum `limit`s Pip Cet
@ 2021-03-05 19:56     ` Stefan Monnier
  2021-03-05 20:13       ` Pip Cet
  2021-03-06  7:42       ` Pip Cet
  0 siblings, 2 replies; 19+ messages in thread
From: Stefan Monnier @ 2021-03-05 19:56 UTC (permalink / raw)
  To: Pip Cet; +Cc: emacs-devel

>> -If @var{limit} is a positive fixnum, the value is chosen to be
>> +If @var{limit} is a positive integer, the value is chosen to be
>>  nonnegative and less than @var{limit}.  Otherwise, the value might be
> Should we add "with every value equally likely" here, or is that
> perfectly obvious?

We could do that, yes.  While I do understand what's a probability
distribution, that's about as far as much knowledge goes in this area,
so I'll let others take care of that.

>> +static Lisp_Object
>> +ccall2 (Lisp_Object (f) (ptrdiff_t nargs, Lisp_Object *args),
>> +        Lisp_Object arg1, Lisp_Object arg2)
>> +{
>> +  Lisp_Object args[2] = {arg1, arg2};
>> +  return f (2, args);
>> +}
>
> Can't we use CALLN?

And you thought it'd be funny to wait until after I push the patch to
tell me?

>> +      /* Return the remainder, except reject the rare case where
>> +        get_random returns a number so close to INTMASK that the
> No longer INTMASK.

Yet close enough ;-)

>> +        remainder isn't random.  */
>> +      Lisp_Object remainder = Frem (val, limit);
>> +      if (!NILP (ccall2 (Fleq,
>> +                        ccall2 (Fminus, val, remainder),
>> +                        ccall2 (Fminus,
>> +                                Fash (make_fixnum (1), make_fixnum (bits)),
>> +                                limit))))
>> +       return remainder;
>
> Whenever I see that algorithm, I think it can't possibly be correct,
> but it is :-)

I'll trust you on that.

> That docstring always tricks me into thinking "oh, don't worry about
> passing something invalid, you'll get an error", when in fact, you get
> a fixnum. (random -1)? Random fixnum. (random 1.0)? Random fixnum.
> (random 'many)? Random fixnum.

Yes, this sucks, but I didn't dare to fix it.
I did fix the negative bignum case, tho: it now signals an error ;-)

> But I'm really writing to ask whether it might be a good idea to add
> float support while we're there.

Could be: AFAIK we already have code for it in Calc, so it might be
a small matter of moving the code.

To be honest: I only added support for it because I wanted to write some
randomized tests for 64bit bindat support and my machine is using 32bit
pointers still ;-)

> And, all of this could happen in Lisp, couldn't it? Should it?

You might be right: we should probably export just `get_random` (and the
seeding part) to ELisp and then write the rest in ELisp.


        Stefan




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: master d582356: * src/fns.c (Frandom): Handle bignum `limit`s
  2021-03-05 19:56     ` Stefan Monnier
@ 2021-03-05 20:13       ` Pip Cet
  2021-03-05 20:34         ` Stefan Monnier
  2021-03-06  7:42       ` Pip Cet
  1 sibling, 1 reply; 19+ messages in thread
From: Pip Cet @ 2021-03-05 20:13 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

On Fri, Mar 5, 2021 at 7:56 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> >> -If @var{limit} is a positive fixnum, the value is chosen to be
> >> +If @var{limit} is a positive integer, the value is chosen to be
> >>  nonnegative and less than @var{limit}.  Otherwise, the value might be
> > Should we add "with every value equally likely" here, or is that
> > perfectly obvious?
>
> We could do that, yes.  While I do understand what's a probability
> distribution, that's about as far as much knowledge goes in this area,
> so I'll let others take care of that.

Some people get philosophical about PRNGs, and they might be upset
about the mere suggestion that 2^N states can be used to generate an
(N+K)-bit pseudo-random integer...

> > That docstring always tricks me into thinking "oh, don't worry about
> > passing something invalid, you'll get an error", when in fact, you get
> > a fixnum. (random -1)? Random fixnum. (random 1.0)? Random fixnum.
> > (random 'many)? Random fixnum.
>
> Yes, this sucks, but I didn't dare to fix it.
> I did fix the negative bignum case, tho: it now signals an error ;-)

But what if someone relied on that?! Let's add a defcustom for it!

> > But I'm really writing to ask whether it might be a good idea to add
> > float support while we're there.
>
> Could be: AFAIK we already have code for it in Calc, so it might be
> a small matter of moving the code.

I'm so glad you didn't suggest cl-random.

> To be honest: I only added support for it because I wanted to write some
> randomized tests for 64bit bindat support and my machine is using 32bit
> pointers still ;-)

Should work for that :-)

> > And, all of this could happen in Lisp, couldn't it? Should it?
>
> You might be right: we should probably export just `get_random` (and the
> seeding part) to ELisp and then write the rest in ELisp.

And then we can get rid of the horror that is cl-random, too, right?

Pip



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: master d582356: * src/fns.c (Frandom): Handle bignum `limit`s
  2021-03-05 20:13       ` Pip Cet
@ 2021-03-05 20:34         ` Stefan Monnier
  0 siblings, 0 replies; 19+ messages in thread
From: Stefan Monnier @ 2021-03-05 20:34 UTC (permalink / raw)
  To: Pip Cet; +Cc: emacs-devel

> Some people get philosophical about PRNGs, and they might be upset
> about the mere suggestion that 2^N states can be used to generate an
> (N+K)-bit pseudo-random integer...

I haven't even bothered to look at the size of the state of our PRNG, to
tell you the truth.  `random` doesn't try to be
cryptography-quality anyway.

> And then we can get rid of the horror that is cl-random, too, right?

I'll let whoever implements it figure that out.


        Stefan




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: master d582356: * src/fns.c (Frandom): Handle bignum `limit`s
  2021-03-05 19:56     ` Stefan Monnier
  2021-03-05 20:13       ` Pip Cet
@ 2021-03-06  7:42       ` Pip Cet
  2021-03-06  8:44         ` Eli Zaretskii
  1 sibling, 1 reply; 19+ messages in thread
From: Pip Cet @ 2021-03-06  7:42 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 356 bytes --]

On Fri, Mar 5, 2021 at 7:56 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote:

> > And, all of this could happen in Lisp, couldn't it? Should it?
>
> You might be right: we should probably export just `get_random` (and the
> seeding part) to ELisp and then write the rest in ELisp.

Does this look okay? It passes make check, so it must be correct!

Pip

[-- Attachment #2: 0001-Implement-random-in-Lisp-exposing-only-random-fixnum.patch --]
[-- Type: text/x-patch, Size: 7204 bytes --]

From d1e97317b0fdb8eb6cd34d13a2874c1792a484c3 Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@gmail.com>
Date: Sat, 6 Mar 2021 07:37:22 +0000
Subject: [PATCH] Implement random in Lisp, exposing only random-fixnum from C

* src/buffer.c (Fgenerate_new_buffer_name): Call intern ("random").
* src/fns.c (Frandom): Rename to Frandom_fixnum and simplify.
(ccall2): Remove function.
(random_bignum): Remove function.
(syms_of_fns): Register random-fixnum, not random.
* lisp/subr.el (random): New function.
---
 lisp/subr.el | 43 ++++++++++++++++++++++++++++
 src/buffer.c |  6 +++-
 src/fns.c    | 80 +++++-----------------------------------------------
 3 files changed, 55 insertions(+), 74 deletions(-)

diff --git a/lisp/subr.el b/lisp/subr.el
index 0b5634739993f..fd7bc0283875b 100644
--- a/lisp/subr.el
+++ b/lisp/subr.el
@@ -6238,4 +6238,47 @@ internal--format-docstring-line
 This is intended for internal use only."
   (internal--fill-string-single-line (apply #'format string objects)))
 
+(defun random (&optional limit)
+  "Return a pseudo-random integer.
+By default, return a fixnum; all fixnums are equally likely.
+With positive integer LIMIT, return random integer in interval [0,LIMIT).
+With float argument LIMIT, return a float between 0 and LIMIT.
+With argument t, set the random number seed from the system's entropy
+pool if available, otherwise from less-random volatile data such as the time.
+With a string argument, set the seed based on the string's contents.
+
+See Info node `(elisp)Random Numbers' for more details."
+  (cond
+   ((null limit)
+    (random-fixnum))
+   ((natnump limit)
+    (if (<= limit 0)
+        (error "Non-positive argument"))
+    (let (okay remainder)
+      (while (not okay)
+        (let ((val 0)
+              (lim limit)
+              (bits 0)
+              (bits-per-iteration (1-
+                                   (truncate
+                                    (log (1+ most-positive-fixnum) 2)))))
+          (while (not (zerop lim))
+            (let ((rand (logand (1- (lsh 1 bits-per-iteration))
+                                (random-fixnum))))
+              (setq bits (+ bits bits-per-iteration))
+              (setq val (logior (lsh val bits-per-iteration)
+                                rand))
+              (setq lim (lsh lim (- bits-per-iteration)))))
+          (setq remainder (% val limit))
+          (setq okay (<= (- val remainder)
+                         (- (lsh 1 bits) limit)))))
+      remainder))
+  ((floatp limit)
+   (* (random-integer (lsh 1 64))
+      (/ 1.0 (float (lsh 1 64)))
+      limit))
+  ((eq limit t) (random-fixnum limit))
+  ((stringp limit) (random-fixnum limit))
+  ((error "invalid limit %S" limit))))
+
 ;;; subr.el ends here
diff --git a/src/buffer.c b/src/buffer.c
index 03c10cc7ae5ba..20a2219e10a49 100644
--- a/src/buffer.c
+++ b/src/buffer.c
@@ -1149,7 +1149,11 @@ DEFUN ("generate-new-buffer-name", Fgenerate_new_buffer_name,
       char number[sizeof "-999999"];
 
       /* Use XFIXNUM instead of XFIXNAT to work around GCC bug 80776.  */
-      int i = XFIXNUM (Frandom (make_fixnum (1000000)));
+      Lisp_Object rand = CALLN (Ffuncall, intern ("random"),
+				make_fixnum (1000000));
+      if (!FIXNUMP (rand) || XFIXNUM (rand) < 0 || XFIXNUM (rand) >= 1000000)
+	error ("random broken");
+      int i = XFIXNUM (rand);
       eassume (0 <= i && i < 1000000);
 
       AUTO_STRING_WITH_LEN (lnumber, number, sprintf (number, "-%d", i));
diff --git a/src/fns.c b/src/fns.c
index b193ad648a96c..a65a5d88d4e4a 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -54,88 +54,22 @@ DEFUN ("identity", Fidentity, Sidentity, 1, 1, 0,
   return argument;
 }
 
-static Lisp_Object
-ccall2 (Lisp_Object (f) (ptrdiff_t nargs, Lisp_Object *args),
-        Lisp_Object arg1, Lisp_Object arg2)
-{
-  Lisp_Object args[2] = {arg1, arg2};
-  return f (2, args);
-}
-
-static Lisp_Object
-get_random_bignum (Lisp_Object limit)
-{
-  /* This is a naive transcription into bignums of the fixnum algorithm.
-     I'd be quite surprised if that's anywhere near the best algorithm
-     for it.  */
-  while (true)
-    {
-      Lisp_Object val = make_fixnum (0);
-      Lisp_Object lim = limit;
-      int bits = 0;
-      int bitsperiteration = FIXNUM_BITS - 1;
-      do
-        {
-          /* Shift by one so it is a valid positive fixnum.  */
-          EMACS_INT rand = get_random () >> 1;
-          Lisp_Object lrand = make_fixnum (rand);
-          bits += bitsperiteration;
-          val = ccall2 (Flogior,
-                        Fash (val, make_fixnum (bitsperiteration)),
-                        lrand);
-          lim = Fash (lim, make_fixnum (- bitsperiteration));
-        }
-      while (!EQ (lim, make_fixnum (0)));
-      /* Return the remainder, except reject the rare case where
-	 get_random returns a number so close to INTMASK that the
-	 remainder isn't random.  */
-      Lisp_Object remainder = Frem (val, limit);
-      if (!NILP (ccall2 (Fleq,
-	                 ccall2 (Fminus, val, remainder),
-	                 ccall2 (Fminus,
-	                         Fash (make_fixnum (1), make_fixnum (bits)),
-	                         limit))))
-	return remainder;
-    }
-}
-
-DEFUN ("random", Frandom, Srandom, 0, 1, 0,
+DEFUN ("random-fixnum", Frandom_fixnum, Srandom_fixnum, 0, 1, 0,
        doc: /* Return a pseudo-random integer.
-By default, return a fixnum; all fixnums are equally likely.
-With positive integer LIMIT, return random integer in interval [0,LIMIT).
-With argument t, set the random number seed from the system's entropy
-pool if available, otherwise from less-random volatile data such as the time.
-With a string argument, set the seed based on the string's contents.
+Return a fixnum; all fixnums are equally likely.  With argument t, also set
+the random number seed from the system's entropy pool if available, otherwise
+from less-random volatile data such as the time.
+With a string argument, also set the seed based on the string's contents.
 
 See Info node `(elisp)Random Numbers' for more details.  */)
   (Lisp_Object limit)
 {
-  EMACS_INT val;
-
   if (EQ (limit, Qt))
     init_random ();
   else if (STRINGP (limit))
     seed_random (SSDATA (limit), SBYTES (limit));
-  if (BIGNUMP (limit))
-    {
-      if (0 > mpz_sgn (*xbignum_val (limit)))
-        xsignal2 (Qwrong_type_argument, Qnatnump, limit);
-      return get_random_bignum (limit);
-    }
 
-  val = get_random ();
-  if (FIXNUMP (limit) && 0 < XFIXNUM (limit))
-    while (true)
-      {
-	/* Return the remainder, except reject the rare case where
-	   get_random returns a number so close to INTMASK that the
-	   remainder isn't random.  */
-	EMACS_INT remainder = val % XFIXNUM (limit);
-	if (val - remainder <= INTMASK - XFIXNUM (limit) + 1)
-	  return make_fixnum (remainder);
-	val = get_random ();
-      }
-  return make_ufixnum (val);
+  return make_ufixnum (get_random ());
 }
 \f
 /* Random data-structure functions.  */
@@ -5968,7 +5902,7 @@ syms_of_fns (void)
   use_short_answers = false;
 
   defsubr (&Sidentity);
-  defsubr (&Srandom);
+  defsubr (&Srandom_fixnum);
   defsubr (&Slength);
   defsubr (&Ssafe_length);
   defsubr (&Slength_less);
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: master d582356: * src/fns.c (Frandom): Handle bignum `limit`s
  2021-03-06  7:42       ` Pip Cet
@ 2021-03-06  8:44         ` Eli Zaretskii
  2021-03-06  9:44           ` Pip Cet
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2021-03-06  8:44 UTC (permalink / raw)
  To: Pip Cet; +Cc: monnier, emacs-devel

> From: Pip Cet <pipcet@gmail.com>
> Date: Sat, 6 Mar 2021 07:42:20 +0000
> Cc: emacs-devel@gnu.org
> 
> * src/buffer.c (Fgenerate_new_buffer_name): Call intern ("random").

Previously, Fgenerate_new_buffer_name couldn't cause GC, but with this
change it can, because the modified code calls funcall.  For the same
reason, it could now QUIT, where it previously couldn't.  So this
change would need to be accompanied by auditing the C callers of
Fgenerate_new_buffer_name, to make sure they don't use any code that
doesn't expect any of these 2 events to happen (such as 'char *'
pointers lying around, which might become invalid after GC).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: master d582356: * src/fns.c (Frandom): Handle bignum `limit`s
  2021-03-06  8:44         ` Eli Zaretskii
@ 2021-03-06  9:44           ` Pip Cet
  2021-03-06 10:56             ` Eli Zaretskii
  0 siblings, 1 reply; 19+ messages in thread
From: Pip Cet @ 2021-03-06  9:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Stefan Monnier, emacs-devel

On Sat, Mar 6, 2021 at 8:45 AM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Sat, 6 Mar 2021 07:42:20 +0000
> > Cc: emacs-devel@gnu.org
> >
> > * src/buffer.c (Fgenerate_new_buffer_name): Call intern ("random").
>
> Previously, Fgenerate_new_buffer_name couldn't cause GC, but with this
> change it can, because the modified code calls funcall.  For the same
> reason, it could now QUIT, where it previously couldn't.

Thanks! You're quite right.

> So this
> change would need to be accompanied by auditing the C callers of
> Fgenerate_new_buffer_name, to make sure they don't use any code that
> doesn't expect any of these 2 events to happen (such as 'char *'
> pointers lying around, which might become invalid after GC).

I thought this code in code_conversion_save was safe:

      Lisp_Object name
        = Fgenerate_new_buffer_name (Vcode_conversion_workbuf_name, Qnil);
      workbuf = Fget_buffer_create (name, Qt);

but I had misread the second argument to Fget_buffer_create: it's
inhibit-hooks, not run-hooks.

So I'm not sure whether code_conversion_save is allowed to call Lisp.
It would really help to document the "doesn't call Lisp" and "doesn't
quit" restrictions somewhere (but I'm not volunteering...)

As an alternative, we could simply use get_random() % 1000000 and
accept that the first 737418-ish buffer names are microscopically more
likely to be used on 32-bit narrow-int systems.

Pip



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: master d582356: * src/fns.c (Frandom): Handle bignum `limit`s
  2021-03-06  9:44           ` Pip Cet
@ 2021-03-06 10:56             ` Eli Zaretskii
  2021-03-06 13:22               ` Pip Cet
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2021-03-06 10:56 UTC (permalink / raw)
  To: Pip Cet; +Cc: monnier, emacs-devel

> From: Pip Cet <pipcet@gmail.com>
> Date: Sat, 6 Mar 2021 09:44:18 +0000
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
> 
> I thought this code in code_conversion_save was safe:
> 
>       Lisp_Object name
>         = Fgenerate_new_buffer_name (Vcode_conversion_workbuf_name, Qnil);
>       workbuf = Fget_buffer_create (name, Qt);
> 
> but I had misread the second argument to Fget_buffer_create: it's
> inhibit-hooks, not run-hooks.
> 
> So I'm not sure whether code_conversion_save is allowed to call Lisp.

I'd rather it didn't, for more than one reason.  But we can side-step
this by making Fgenerate_new_buffer_name use random-fixnum, which is
still a pure-C implementation.

> It would really help to document the "doesn't call Lisp" and "doesn't
> quit" restrictions somewhere (but I'm not volunteering...)

I agree, on both counts.  We are making Emacs ever more complex, and
keeping correctness as we go, let alone avoiding performance
degradations, becomes more and more problematic.  So much so that you
can easily observe what control engineers call "limit-cycle problem"
in Emacs development, whereby the stability of Emacs doesn't become
steadily better with time, but instead shows small oscillations, even
though we fix bugs at a very high rate, including in old code.

> As an alternative, we could simply use get_random() % 1000000 and
> accept that the first 737418-ish buffer names are microscopically more
> likely to be used on 32-bit narrow-int systems.

Yes, we could do that.  Perhaps it's even better, and we could re-use
the same logic as in Frandom regarding the proximity to INTMASK.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: master d582356: * src/fns.c (Frandom): Handle bignum `limit`s
  2021-03-06 10:56             ` Eli Zaretskii
@ 2021-03-06 13:22               ` Pip Cet
  2021-03-06 14:45                 ` Eli Zaretskii
  0 siblings, 1 reply; 19+ messages in thread
From: Pip Cet @ 2021-03-06 13:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1831 bytes --]

On Sat, Mar 6, 2021 at 10:57 AM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Sat, 6 Mar 2021 09:44:18 +0000
> > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
> >
> > I thought this code in code_conversion_save was safe:
> >
> >       Lisp_Object name
> >         = Fgenerate_new_buffer_name (Vcode_conversion_workbuf_name, Qnil);
> >       workbuf = Fget_buffer_create (name, Qt);
> >
> > but I had misread the second argument to Fget_buffer_create: it's
> > inhibit-hooks, not run-hooks.
> >
> > So I'm not sure whether code_conversion_save is allowed to call Lisp.

> I'd rather it didn't, for more than one reason.  But we can side-step
> this by making Fgenerate_new_buffer_name use random-fixnum, which is
> still a pure-C implementation.

Here's a patch which makes it use get_random() directly.

> > It would really help to document the "doesn't call Lisp" and "doesn't
> > quit" restrictions somewhere (but I'm not volunteering...)
>
> I agree, on both counts.

Actually, I think it would be best to have these restrictions
represented in the code. I see two ways of doing that:

1. Have FUNCTION_MAY_GC etc. translate into a GCC attribute in debug
builds so we can statically check that a function that says it never
calls GC doesn't call a function that says it may call GC.
2. Have a statement at the beginning of non-GCing functions which sets
a flag that is then checked by garbage-collecting functions, so that
we may dynamically check this.

(1) seems easy to implement, but has a high rate of false negatives as
many functions are safe to call from non-GCing functions as long as
the arguments are correct.
(2) is difficult to implement, and would only trigger at runtime.

So I say we should do (1) in preference to (2), but maybe we should do both.

Pip

[-- Attachment #2: 0001-Implement-random-in-Lisp-exposing-only-random-fixnum.patch --]
[-- Type: text/x-patch, Size: 7507 bytes --]

From 755d9fe9a256a72f8cac9e6a006151bd6cc4c46e Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@gmail.com>
Date: Sat, 6 Mar 2021 07:37:22 +0000
Subject: [PATCH] Implement random in Lisp, exposing only random-fixnum from C

* src/buffer.c (Fgenerate_new_buffer_name): Call get_random().
* src/fns.c (Frandom): Rename to Frandom_fixnum and simplify.
(ccall2): Remove function.
(random_bignum): Remove function.
(syms_of_fns): Register random-fixnum, not random.
* lisp/subr.el (random): New function.
---
 lisp/subr.el | 43 ++++++++++++++++++++++++++++
 src/buffer.c | 15 ++++++++--
 src/fns.c    | 80 +++++-----------------------------------------------
 3 files changed, 62 insertions(+), 76 deletions(-)

diff --git a/lisp/subr.el b/lisp/subr.el
index 0b5634739993f..fd7bc0283875b 100644
--- a/lisp/subr.el
+++ b/lisp/subr.el
@@ -6238,4 +6238,47 @@ internal--format-docstring-line
 This is intended for internal use only."
   (internal--fill-string-single-line (apply #'format string objects)))
 
+(defun random (&optional limit)
+  "Return a pseudo-random integer.
+By default, return a fixnum; all fixnums are equally likely.
+With positive integer LIMIT, return random integer in interval [0,LIMIT).
+With float argument LIMIT, return a float between 0 and LIMIT.
+With argument t, set the random number seed from the system's entropy
+pool if available, otherwise from less-random volatile data such as the time.
+With a string argument, set the seed based on the string's contents.
+
+See Info node `(elisp)Random Numbers' for more details."
+  (cond
+   ((null limit)
+    (random-fixnum))
+   ((natnump limit)
+    (if (<= limit 0)
+        (error "Non-positive argument"))
+    (let (okay remainder)
+      (while (not okay)
+        (let ((val 0)
+              (lim limit)
+              (bits 0)
+              (bits-per-iteration (1-
+                                   (truncate
+                                    (log (1+ most-positive-fixnum) 2)))))
+          (while (not (zerop lim))
+            (let ((rand (logand (1- (lsh 1 bits-per-iteration))
+                                (random-fixnum))))
+              (setq bits (+ bits bits-per-iteration))
+              (setq val (logior (lsh val bits-per-iteration)
+                                rand))
+              (setq lim (lsh lim (- bits-per-iteration)))))
+          (setq remainder (% val limit))
+          (setq okay (<= (- val remainder)
+                         (- (lsh 1 bits) limit)))))
+      remainder))
+  ((floatp limit)
+   (* (random-integer (lsh 1 64))
+      (/ 1.0 (float (lsh 1 64)))
+      limit))
+  ((eq limit t) (random-fixnum limit))
+  ((stringp limit) (random-fixnum limit))
+  ((error "invalid limit %S" limit))))
+
 ;;; subr.el ends here
diff --git a/src/buffer.c b/src/buffer.c
index 03c10cc7ae5ba..6bdc857bff0b0 100644
--- a/src/buffer.c
+++ b/src/buffer.c
@@ -1149,10 +1149,19 @@ DEFUN ("generate-new-buffer-name", Fgenerate_new_buffer_name,
       char number[sizeof "-999999"];
 
       /* Use XFIXNUM instead of XFIXNAT to work around GCC bug 80776.  */
-      int i = XFIXNUM (Frandom (make_fixnum (1000000)));
-      eassume (0 <= i && i < 1000000);
+      EMACS_INT random_fixnum;
+      int random;
+      do
+	{
+	  random_fixnum = get_random ();
+	  random = random_fixnum % 1000000;
+	}
+      while (random_fixnum - random >= INTMASK + 1 - 1000000);
+      /* GCC sometimes doesn't grok this if written as a single eassume. */
+      eassume (0 <= random);
+      eassume (random < 1000000);
 
-      AUTO_STRING_WITH_LEN (lnumber, number, sprintf (number, "-%d", i));
+      AUTO_STRING_WITH_LEN (lnumber, number, sprintf (number, "-%d", random));
       genbase = concat2 (name, lnumber);
       if (NILP (Fget_buffer (genbase)))
 	return genbase;
diff --git a/src/fns.c b/src/fns.c
index b193ad648a96c..a65a5d88d4e4a 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -54,88 +54,22 @@ DEFUN ("identity", Fidentity, Sidentity, 1, 1, 0,
   return argument;
 }
 
-static Lisp_Object
-ccall2 (Lisp_Object (f) (ptrdiff_t nargs, Lisp_Object *args),
-        Lisp_Object arg1, Lisp_Object arg2)
-{
-  Lisp_Object args[2] = {arg1, arg2};
-  return f (2, args);
-}
-
-static Lisp_Object
-get_random_bignum (Lisp_Object limit)
-{
-  /* This is a naive transcription into bignums of the fixnum algorithm.
-     I'd be quite surprised if that's anywhere near the best algorithm
-     for it.  */
-  while (true)
-    {
-      Lisp_Object val = make_fixnum (0);
-      Lisp_Object lim = limit;
-      int bits = 0;
-      int bitsperiteration = FIXNUM_BITS - 1;
-      do
-        {
-          /* Shift by one so it is a valid positive fixnum.  */
-          EMACS_INT rand = get_random () >> 1;
-          Lisp_Object lrand = make_fixnum (rand);
-          bits += bitsperiteration;
-          val = ccall2 (Flogior,
-                        Fash (val, make_fixnum (bitsperiteration)),
-                        lrand);
-          lim = Fash (lim, make_fixnum (- bitsperiteration));
-        }
-      while (!EQ (lim, make_fixnum (0)));
-      /* Return the remainder, except reject the rare case where
-	 get_random returns a number so close to INTMASK that the
-	 remainder isn't random.  */
-      Lisp_Object remainder = Frem (val, limit);
-      if (!NILP (ccall2 (Fleq,
-	                 ccall2 (Fminus, val, remainder),
-	                 ccall2 (Fminus,
-	                         Fash (make_fixnum (1), make_fixnum (bits)),
-	                         limit))))
-	return remainder;
-    }
-}
-
-DEFUN ("random", Frandom, Srandom, 0, 1, 0,
+DEFUN ("random-fixnum", Frandom_fixnum, Srandom_fixnum, 0, 1, 0,
        doc: /* Return a pseudo-random integer.
-By default, return a fixnum; all fixnums are equally likely.
-With positive integer LIMIT, return random integer in interval [0,LIMIT).
-With argument t, set the random number seed from the system's entropy
-pool if available, otherwise from less-random volatile data such as the time.
-With a string argument, set the seed based on the string's contents.
+Return a fixnum; all fixnums are equally likely.  With argument t, also set
+the random number seed from the system's entropy pool if available, otherwise
+from less-random volatile data such as the time.
+With a string argument, also set the seed based on the string's contents.
 
 See Info node `(elisp)Random Numbers' for more details.  */)
   (Lisp_Object limit)
 {
-  EMACS_INT val;
-
   if (EQ (limit, Qt))
     init_random ();
   else if (STRINGP (limit))
     seed_random (SSDATA (limit), SBYTES (limit));
-  if (BIGNUMP (limit))
-    {
-      if (0 > mpz_sgn (*xbignum_val (limit)))
-        xsignal2 (Qwrong_type_argument, Qnatnump, limit);
-      return get_random_bignum (limit);
-    }
 
-  val = get_random ();
-  if (FIXNUMP (limit) && 0 < XFIXNUM (limit))
-    while (true)
-      {
-	/* Return the remainder, except reject the rare case where
-	   get_random returns a number so close to INTMASK that the
-	   remainder isn't random.  */
-	EMACS_INT remainder = val % XFIXNUM (limit);
-	if (val - remainder <= INTMASK - XFIXNUM (limit) + 1)
-	  return make_fixnum (remainder);
-	val = get_random ();
-      }
-  return make_ufixnum (val);
+  return make_ufixnum (get_random ());
 }
 \f
 /* Random data-structure functions.  */
@@ -5968,7 +5902,7 @@ syms_of_fns (void)
   use_short_answers = false;
 
   defsubr (&Sidentity);
-  defsubr (&Srandom);
+  defsubr (&Srandom_fixnum);
   defsubr (&Slength);
   defsubr (&Ssafe_length);
   defsubr (&Slength_less);
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: master d582356: * src/fns.c (Frandom): Handle bignum `limit`s
  2021-03-06 13:22               ` Pip Cet
@ 2021-03-06 14:45                 ` Eli Zaretskii
  2021-03-07 13:27                   ` Pip Cet
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2021-03-06 14:45 UTC (permalink / raw)
  To: Pip Cet; +Cc: monnier, emacs-devel

> From: Pip Cet <pipcet@gmail.com>
> Date: Sat, 6 Mar 2021 13:22:10 +0000
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
> 
> > > So I'm not sure whether code_conversion_save is allowed to call Lisp.
> 
> > I'd rather it didn't, for more than one reason.  But we can side-step
> > this by making Fgenerate_new_buffer_name use random-fixnum, which is
> > still a pure-C implementation.
> 
> Here's a patch which makes it use get_random() directly.

Thanks, maybe add a comment explaining the need for the do-while loop
in generate-new-buffer-name.

> Actually, I think it would be best to have these restrictions
> represented in the code. I see two ways of doing that:
> 
> 1. Have FUNCTION_MAY_GC etc. translate into a GCC attribute in debug
> builds so we can statically check that a function that says it never
> calls GC doesn't call a function that says it may call GC.
> 2. Have a statement at the beginning of non-GCing functions which sets
> a flag that is then checked by garbage-collecting functions, so that
> we may dynamically check this.
> 
> (1) seems easy to implement, but has a high rate of false negatives as
> many functions are safe to call from non-GCing functions as long as
> the arguments are correct.
> (2) is difficult to implement, and would only trigger at runtime.
> 
> So I say we should do (1) in preference to (2), but maybe we should do both.

I don't think I understand how will we know which function says it
never calls GC.  And the FUNCTION_MAY_GC attribute, even if applied to
the lowest-level functions that actually call maybe_gc, would be a
maintenance headache because we do change this from time to time.  So
we'd need something that checks the attribute's accuracy at compile
time, otherwise the attribute will bitrot.

For the same reasons, I don't see how (2) can be done in practice.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: master d582356: * src/fns.c (Frandom): Handle bignum `limit`s
  2021-03-06 14:45                 ` Eli Zaretskii
@ 2021-03-07 13:27                   ` Pip Cet
  2021-03-07 14:04                     ` Eli Zaretskii
  2021-03-07 18:37                     ` Stefan Monnier
  0 siblings, 2 replies; 19+ messages in thread
From: Pip Cet @ 2021-03-07 13:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Stefan Monnier, emacs-devel

On Sat, Mar 6, 2021 at 2:46 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Sat, 6 Mar 2021 13:22:10 +0000
> > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
> >
> > > > So I'm not sure whether code_conversion_save is allowed to call Lisp.
> >
> > > I'd rather it didn't, for more than one reason.  But we can side-step
> > > this by making Fgenerate_new_buffer_name use random-fixnum, which is
> > > still a pure-C implementation.
> >
> > Here's a patch which makes it use get_random() directly.
>
> Thanks, maybe add a comment explaining the need for the do-while loop
> in generate-new-buffer-name.

I received some very intelligent suggestions on how to improve the
code, will follow up with a better patch (unless the anonymous
benefactor beats me to it, of course :-) ).

> > Actually, I think it would be best to have these restrictions
> > represented in the code. I see two ways of doing that:
> >
> > 1. Have FUNCTION_MAY_GC etc. translate into a GCC attribute in debug
> > builds so we can statically check that a function that says it never
> > calls GC doesn't call a function that says it may call GC.
> > 2. Have a statement at the beginning of non-GCing functions which sets
> > a flag that is then checked by garbage-collecting functions, so that
> > we may dynamically check this.
> >
> > (1) seems easy to implement, but has a high rate of false negatives as

"Seems". If you have a computer fast enough and enough RAM to actually
compile emacs with -flto -fanalyzer -fdump-analyzer-json. I don't.

> > many functions are safe to call from non-GCing functions as long as
> > the arguments are correct.
> > (2) is difficult to implement, and would only trigger at runtime.
> >
> > So I say we should do (1) in preference to (2), but maybe we should do both.
>
> I don't think I understand how will we know which function says it
> never calls GC.

By tagging it in the source code?

> And the FUNCTION_MAY_GC attribute, even if applied to
> the lowest-level functions that actually call maybe_gc, would be a
> maintenance headache because we do change this from time to time.  So
> we'd need something that checks the attribute's accuracy at compile
> time, otherwise the attribute will bitrot.

Indeed. We should walk the call graph and determine which basic blocks
end up (potentially) calling GC, which is what I set out to do with
-fanalyzer but can't continue working on because it's too slow for
Emacs...

> For the same reasons, I don't see how (2) can be done in practice.

Sorry, I don't understand.  We'd have

void
f (void)
{
  DONT_CALL_GC ();
  g();
}

void
g (void)
{
  maybe_gc ();
}

and that would throw a runtime error because maybe_gc checks the flag
set by DONT_CALL_GC.

I don't think that would be too hard to maintain; would it?

(It would, alas, throw the error only at runtime (or -fanalyzer time,
potentially), and implementing it wouldn't be entirely trivial because
of stack unwinds, but doable).

What I'd like to know is whether something like this is worth pursuing
at all, and that mostly depends on whether people are willing to build
with --enable-checking=nogc once in a while and fix any assertion
errors that pop up.

Pip



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: master d582356: * src/fns.c (Frandom): Handle bignum `limit`s
  2021-03-07 13:27                   ` Pip Cet
@ 2021-03-07 14:04                     ` Eli Zaretskii
  2021-03-07 14:21                       ` Pip Cet
  2021-03-07 18:37                     ` Stefan Monnier
  1 sibling, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2021-03-07 14:04 UTC (permalink / raw)
  To: Pip Cet; +Cc: monnier, emacs-devel

> From: Pip Cet <pipcet@gmail.com>
> Date: Sun, 7 Mar 2021 13:27:01 +0000
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
> 
> > > 1. Have FUNCTION_MAY_GC etc. translate into a GCC attribute in debug
> > > builds so we can statically check that a function that says it never
> > > calls GC doesn't call a function that says it may call GC.
> > > 2. Have a statement at the beginning of non-GCing functions which sets
> > > a flag that is then checked by garbage-collecting functions, so that
> > > we may dynamically check this.
> > >
> > > (1) seems easy to implement, but has a high rate of false negatives as
> 
> "Seems". If you have a computer fast enough and enough RAM to actually
> compile emacs with -flto -fanalyzer -fdump-analyzer-json. I don't.
> 
> > > many functions are safe to call from non-GCing functions as long as
> > > the arguments are correct.
> > > (2) is difficult to implement, and would only trigger at runtime.
> > >
> > > So I say we should do (1) in preference to (2), but maybe we should do both.
> >
> > I don't think I understand how will we know which function says it
> > never calls GC.
> 
> By tagging it in the source code?

How do you know which functions to tag?

> > For the same reasons, I don't see how (2) can be done in practice.
> 
> Sorry, I don't understand.  We'd have
> 
> void
> f (void)
> {
>   DONT_CALL_GC ();
>   g();
> }
> 
> void
> g (void)
> {
>   maybe_gc ();
> }
> 
> and that would throw a runtime error because maybe_gc checks the flag
> set by DONT_CALL_GC.

How do you know in which functions to add DONT_CALL_GC ?

Or did you intend to add that to all the functions one by one, then
removing them as needed, until you stop getting runtime exceptions?



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: master d582356: * src/fns.c (Frandom): Handle bignum `limit`s
  2021-03-07 14:04                     ` Eli Zaretskii
@ 2021-03-07 14:21                       ` Pip Cet
  2021-03-07 15:22                         ` Eli Zaretskii
  0 siblings, 1 reply; 19+ messages in thread
From: Pip Cet @ 2021-03-07 14:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Stefan Monnier, emacs-devel

On Sun, Mar 7, 2021 at 2:04 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Sun, 7 Mar 2021 13:27:01 +0000
> > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
> >
> > > > 1. Have FUNCTION_MAY_GC etc. translate into a GCC attribute in debug
> > > > builds so we can statically check that a function that says it never
> > > > calls GC doesn't call a function that says it may call GC.
> > > > 2. Have a statement at the beginning of non-GCing functions which sets
> > > > a flag that is then checked by garbage-collecting functions, so that
> > > > we may dynamically check this.
> > > >
> > > > (1) seems easy to implement, but has a high rate of false negatives as
> >
> > "Seems". If you have a computer fast enough and enough RAM to actually
> > compile emacs with -flto -fanalyzer -fdump-analyzer-json. I don't.
> >
> > > > many functions are safe to call from non-GCing functions as long as
> > > > the arguments are correct.
> > > > (2) is difficult to implement, and would only trigger at runtime.
> > > >
> > > > So I say we should do (1) in preference to (2), but maybe we should do both.
> > >
> > > I don't think I understand how will we know which function says it
> > > never calls GC.
> >
> > By tagging it in the source code?
>
> How do you know which functions to tag?

That's the part I wasn't volunteering for :-) We'd have to do that
manually, but we wouldn't have to do it all at once.

> > > For the same reasons, I don't see how (2) can be done in practice.
> >
> > Sorry, I don't understand.  We'd have
> >
> > void
> > f (void)
> > {
> >   DONT_CALL_GC ();
> >   g();
> > }
> >
> > void
> > g (void)
> > {
> >   maybe_gc ();
> > }
> >
> > and that would throw a runtime error because maybe_gc checks the flag
> > set by DONT_CALL_GC.
>
> How do you know in which functions to add DONT_CALL_GC ?

That's not something we can decide automatically, is it?

But usually there will already be a comment in those functions
explaining that they must not call GC, right? I think a runtime check
is better than a comment, and either is much better than not
documenting the assumption at all.

> Or did you intend to add that to all the functions one by one, then
> removing them as needed, until you stop getting runtime exceptions?

No. That sounds painful.

Pip



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: master d582356: * src/fns.c (Frandom): Handle bignum `limit`s
  2021-03-07 14:21                       ` Pip Cet
@ 2021-03-07 15:22                         ` Eli Zaretskii
  2021-03-07 17:23                           ` Pip Cet
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2021-03-07 15:22 UTC (permalink / raw)
  To: Pip Cet; +Cc: monnier, emacs-devel

> From: Pip Cet <pipcet@gmail.com>
> Date: Sun, 7 Mar 2021 14:21:51 +0000
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
> 
> > > > I don't think I understand how will we know which function says it
> > > > never calls GC.
> > >
> > > By tagging it in the source code?
> >
> > How do you know which functions to tag?
> 
> That's the part I wasn't volunteering for :-) We'd have to do that
> manually, but we wouldn't have to do it all at once.

I don't see how we can do that, without some methodical procedure.

> > How do you know in which functions to add DONT_CALL_GC ?
> 
> That's not something we can decide automatically, is it?

No, I don't think so.

> But usually there will already be a comment in those functions
> explaining that they must not call GC, right?

No, I think these are rare exceptions rather than the rule.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: master d582356: * src/fns.c (Frandom): Handle bignum `limit`s
  2021-03-07 15:22                         ` Eli Zaretskii
@ 2021-03-07 17:23                           ` Pip Cet
  2021-03-07 17:47                             ` Eli Zaretskii
  0 siblings, 1 reply; 19+ messages in thread
From: Pip Cet @ 2021-03-07 17:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Stefan Monnier, emacs-devel

On Sun, Mar 7, 2021 at 3:22 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Sun, 7 Mar 2021 14:21:51 +0000
> > Cc: Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
> >
> > > > > I don't think I understand how will we know which function says it
> > > > > never calls GC.
> > > >
> > > > By tagging it in the source code?
> > >
> > > How do you know which functions to tag?
> >
> > That's the part I wasn't volunteering for :-) We'd have to do that
> > manually, but we wouldn't have to do it all at once.
> I don't see how we can do that, without some methodical procedure.

Incrementally? I say it might be worth a try, as long as it doesn't go
the GCPRO route. If it does, let's delete it again.

> > But usually there will already be a comment in those functions
> > explaining that they must not call GC, right?
> No, I think these are rare exceptions rather than the rule.

So we can cover these rare exceptions, at least, converting an
ambiguous and hard-to-grep comment into an unambiguous macro
invocation? I don't really see the cons, though it's not as great as
automatic tagging of functions would be...

Pip



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: master d582356: * src/fns.c (Frandom): Handle bignum `limit`s
  2021-03-07 17:23                           ` Pip Cet
@ 2021-03-07 17:47                             ` Eli Zaretskii
  0 siblings, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2021-03-07 17:47 UTC (permalink / raw)
  To: Pip Cet; +Cc: monnier, emacs-devel

> From: Pip Cet <pipcet@gmail.com>
> Date: Sun, 7 Mar 2021 17:23:04 +0000
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, emacs-devel@gnu.org
> 
> > > > How do you know which functions to tag?
> > >
> > > That's the part I wasn't volunteering for :-) We'd have to do that
> > > manually, but we wouldn't have to do it all at once.
> > I don't see how we can do that, without some methodical procedure.
> 
> Incrementally?

What does this mean in practice, though?  Do you try tagging one
function after another, or do you have some guidelines which ones to
tag and which not to tag?

> > > But usually there will already be a comment in those functions
> > > explaining that they must not call GC, right?
> > No, I think these are rare exceptions rather than the rule.
> 
> So we can cover these rare exceptions, at least, converting an
> ambiguous and hard-to-grep comment into an unambiguous macro
> invocation? I don't really see the cons, though it's not as great as
> automatic tagging of functions would be...

Maybe there are no cons, but there are also no pro's: covering a small
fraction of the problem means the problem is still with us.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: master d582356: * src/fns.c (Frandom): Handle bignum `limit`s
  2021-03-07 13:27                   ` Pip Cet
  2021-03-07 14:04                     ` Eli Zaretskii
@ 2021-03-07 18:37                     ` Stefan Monnier
  2021-03-07 19:54                       ` Andrea Corallo via Emacs development discussions.
  2021-03-07 19:55                       ` Pip Cet
  1 sibling, 2 replies; 19+ messages in thread
From: Stefan Monnier @ 2021-03-07 18:37 UTC (permalink / raw)
  To: Pip Cet; +Cc: Eli Zaretskii, emacs-devel

>> I don't think I understand how will we know which function says it
>> never calls GC.
> By tagging it in the source code?

Random thoughts on this:
- AFAIK in the current code, the places where we can't run GC are much
  more rare than the cases where we can run GC, so we'd be better off
  trying to annotate the places where it can't happen.
- Those places are currently not annotated at all, by and large.
  There are a few comments here and there stating that GC shouldn't
  happen, but those comments shouldn't be trusted.
- The trend is to reduce the amount of code where GC cannot take place
  [ I think and I hope.  ]
- As you have noted some functions can be called in contexts where they
  may GC and in other contexts where they may not GC.  So we don't have
  a clear static partitioning of the code.  So maybe a dynamic-test
  approach is the better option (it will rarely catch the unlikely
  corner cases, but if it does catch them in their rare occurrences it'll
  still help us diagnose those problems which tend to be very painful
  to track down).


        Stefan




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: master d582356: * src/fns.c (Frandom): Handle bignum `limit`s
  2021-03-07 18:37                     ` Stefan Monnier
@ 2021-03-07 19:54                       ` Andrea Corallo via Emacs development discussions.
  2021-03-07 19:55                       ` Pip Cet
  1 sibling, 0 replies; 19+ messages in thread
From: Andrea Corallo via Emacs development discussions. @ 2021-03-07 19:54 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Pip Cet, Eli Zaretskii, emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>> I don't think I understand how will we know which function says it
>>> never calls GC.
>> By tagging it in the source code?
>
> Random thoughts on this:
> - AFAIK in the current code, the places where we can't run GC are much
>   more rare than the cases where we can run GC, so we'd be better off
>   trying to annotate the places where it can't happen.
> - Those places are currently not annotated at all, by and large.
>   There are a few comments here and there stating that GC shouldn't
>   happen, but those comments shouldn't be trusted.
> - The trend is to reduce the amount of code where GC cannot take place
>   [ I think and I hope.  ]

I hope too, I believe this is the main barrier in trying to transition
to a parallel GC (thing that on the paper shouldn't be that complex).

  Andrea



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: master d582356: * src/fns.c (Frandom): Handle bignum `limit`s
  2021-03-07 18:37                     ` Stefan Monnier
  2021-03-07 19:54                       ` Andrea Corallo via Emacs development discussions.
@ 2021-03-07 19:55                       ` Pip Cet
  1 sibling, 0 replies; 19+ messages in thread
From: Pip Cet @ 2021-03-07 19:55 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, emacs-devel

On Sun, Mar 7, 2021 at 6:37 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> >> I don't think I understand how will we know which function says it
> >> never calls GC.
> > By tagging it in the source code?
>
> Random thoughts on this:
> - AFAIK in the current code, the places where we can't run GC are much
>   more rare than the cases where we can run GC, so we'd be better off
>   trying to annotate the places where it can't happen.

I think there's one exception: DEFUNs should be assumed to call GC
unless they're explicitly tagged as not doing so, and they should
assert they're allowed to call GC when run (again, unless explicitly
tagged as not requiring GC).

> - Those places are currently not annotated at all, by and large.
>   There are a few comments here and there stating that GC shouldn't
>   happen, but those comments shouldn't be trusted.

Indeed.

> - The trend is to reduce the amount of code where GC cannot take place
>   [ I think and I hope.  ]

There are several analogous problems, IMHO:
1. Code that can't GC
2. Code that can't quit
3. Code that leaves my X session in an unusable state if gdb interrupts it
4. Code that mustn't call Lisp (signal handlers)
5. Code that mustn't call malloc
6. Code that never signals an error

I think (5) is less of an issue now that reentrant libcs are a thing,
and I'll freely admit I don't understand (3) (or the X/Wayland
situation, at all). But my suspicion is critical sections of one kind
or another will keep coming up.

> - As you have noted some functions can be called in contexts where they
>   may GC and in other contexts where they may not GC.  So we don't have
>   a clear static partitioning of the code.  So maybe a dynamic-test
>   approach is the better option (it will rarely catch the unlikely
>   corner cases, but if it does catch them in their rare occurrences it'll
>   still help us diagnose those problems which tend to be very painful
>   to track down).

I'm currently leaning towards a dynamic approach together with running
GCC with -fanalyzer -flto. (IIUC, the analyzer differs from normal
warning passes in that it tries a hell of a lot harder to prove the
code is okay, but if it fails to do so, it outputs a warning rather
than erring on the side of caution. For example,
verify_interval_modification in textprop.c generates a warning; i and
prev are two variables which cannot simultaneously both be NULL (but
either one of them can be), and code dereferences them in the i ==
prev branch of an if. The code's correct, but it's definitely not
obviously correct.) So I hope the analyzer will be able to turn the
dynamic check into a static check given a few starting points...

And as I said, I think DEFUNs are a good starting point, because it's
easy to redefine DEFUN() to define a wrapper around the actual
Fwhatever function instead of the Fwhatever function directly.

But what I would like to know is whether it's potentially worth it to
investigate this further. If there's a strong consensus (or a
maintainer veto implying) that we'd rather keep catching bugs by hand,
well, I'd rather know.

I'd also like to point out that this approach is not cheap. Running
the analyzer on an LTO'd Emacs (compiled with -O0; I don't know
whether that makes it slower or faster) takes 20 GB of RAM and 5
minutes of CPU time, and that's with the configurable parameters tuned
down quite a bit.

Pip

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-03-07 19:55 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210305170955.27732.27579@vcs0.savannah.gnu.org>
     [not found] ` <20210305170957.AF99920E1B@vcs0.savannah.gnu.org>
2021-03-05 19:42   ` master d582356: * src/fns.c (Frandom): Handle bignum `limit`s Pip Cet
2021-03-05 19:56     ` Stefan Monnier
2021-03-05 20:13       ` Pip Cet
2021-03-05 20:34         ` Stefan Monnier
2021-03-06  7:42       ` Pip Cet
2021-03-06  8:44         ` Eli Zaretskii
2021-03-06  9:44           ` Pip Cet
2021-03-06 10:56             ` Eli Zaretskii
2021-03-06 13:22               ` Pip Cet
2021-03-06 14:45                 ` Eli Zaretskii
2021-03-07 13:27                   ` Pip Cet
2021-03-07 14:04                     ` Eli Zaretskii
2021-03-07 14:21                       ` Pip Cet
2021-03-07 15:22                         ` Eli Zaretskii
2021-03-07 17:23                           ` Pip Cet
2021-03-07 17:47                             ` Eli Zaretskii
2021-03-07 18:37                     ` Stefan Monnier
2021-03-07 19:54                       ` Andrea Corallo via Emacs development discussions.
2021-03-07 19:55                       ` Pip Cet

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).