unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Stefan Kangas <stefan@marxist.se>
To: Sheng Yang <styang@fastmail.com>
Cc: 45379@debbugs.gnu.org, Juri Linkov <juri@linkov.net>,
	Kenichi Handa <handa@gnu.org>,
	Stefan Monnier <monnier@iro.umontreal.ca>,
	Stephen Berman <stephen.berman@gmx.net>
Subject: bug#45379: 28.0.50; Degraded Performance of describe-buffer-bindings
Date: Tue, 4 May 2021 18:31:10 -0500	[thread overview]
Message-ID: <CADwFkm=JH2KOEhSySifL6XYT5KdJYHROgb4c4oLMq54rMjgsHw@mail.gmail.com> (raw)
In-Reply-To: <CADwFkmkgYWQOzDP7WaYeyS5pS3ZA7iY4Fs-1F2Gymtata7A8nw@mail.gmail.com>

I finally had time/energy to look into this again!  Sorry for taking
more time than expected.

handa <handa@gnu.org> writes:

> In article <838s65ktvk.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
>
>> > > Is the patch for the above improvement the one included in the file
>> > > 0001-Fix-describe-buffer-bindings-performance-regression.patch?
>> >
>> > Yes, it is.
>
> It seems that the main intention of that patch is to avoid unnecessary
> call of char_table_ref_and_range introduced by the commit below:
>
>> >     Don't show key ranges if shadowed by different commands
>> >
>> >     * src/keymap.c (describe_vector): Make sure found consecutive keys
>> >     are either not shadowed or, if they are, that they are shadowed by
>> >     the same command.  (Bug#9293)
>
> In describe_vector, if VECTOR is a char-table, char_table_ref_and_range
> is already called at the fairly beginning of the main loop.  So, we do
> not have to call it again, and thus, I think the patch is doing the
> correct thing.

Yes, this is all correct.

> But, I don't know whether the following part in the patch is correct or
> not.
>
> +	  /* Ignore `self-insert-command' for performance.  */
> +	  && !EQ (definition, Qself_insert_command))

(This is explained below.)

Eli Zaretskii <eliz@gnu.org> writes:

> I'm not sure I understand the reasons for each of the changes here.
> char-tables are a tricky data structure, so I'd like to make sure this
> change doesn't make our code subtly incorrect.
>
> So could you please walk us through the proposed changes, adding
> explanations for each part as you go?

This code is a bit complicated, so please bare with me if I am going
into too much detail.  BTW, note that I have also carried out a lot of
testing to see that my change does the same thing as before, only faster
(unfortunately it has been harder to come up with useful automated tests
beyond the ones we already have).

First, it might help to think of this as consisting of two parts:

1. A cleanup of the boundary condition check.  It is simply to make this
   code a bit more clear and easier to follow.

2. The actual bug fix for the performance bug.

I put a divider in between these two parts to make things hopefully a
bit more clear.

Stefan Kangas <stefan@marxist.se> writes:

> From f95c75f1112c1aae0bd06a6753b60ce8a591d6e2 Mon Sep 17 00:00:00 2001
> From: Stefan Kangas <stefan@marxist.se>
> Date: Sat, 6 Mar 2021 05:32:32 +0100
> Subject: [PATCH] Fix describe-buffer-bindings performance regression
>
> * src/keymap.c (describe_vector): Improve char-table performance by
> removing an unnecessary loop.  (Bug#45379)
> (syms_of_keymap) <Qself_insert_command>: New DEFSYM.
> ---
>  src/keymap.c | 47 +++++++++++++++++++----------------------------
>  1 file changed, 19 insertions(+), 28 deletions(-)
>
> diff --git a/src/keymap.c b/src/keymap.c
> index 782931fadf..c70df98a6e 100644
> --- a/src/keymap.c
> +++ b/src/keymap.c
> @@ -2920,7 +2920,7 @@ describe_vector (Lisp_Object vector, Lisp_Object prefix, Lisp_Object args,
>    Lisp_Object suppress = Qnil;
>    bool first = true;
>    /* Range of elements to be handled.  */
> -  int from, to, stop;
> +  int to, stop;
>
>    if (!keymap_p)
>      {
> @@ -2940,32 +2940,33 @@ describe_vector (Lisp_Object vector, Lisp_Object prefix, Lisp_Object args,
>    if (partial)
>      suppress = intern ("suppress-keymap");
>
> -  from = 0;

The "from" variable is initialized to 0 below and is redundant.  So it
is replaced with the constant 0, which I think makes the intention of
this code more clear.  IOW, this is just a cleanup.

> +  /* If VECTOR is a char-table, we had better put a boundary
> +     between normal characters (-#x3FFF7F) and 8-bit characters
> +     (#x3FFF80-).  */
>    if (CHAR_TABLE_P (vector))
>      stop = MAX_5_BYTE_CHAR + 1, to = MAX_CHAR + 1;
>    else
>      stop = to = ASIZE (vector);

The above puts a "boundary" that we need to handle below by stopping
(skipping to the next range) when we reach "stop".

We must end the loop altogether only when we reach "to".

Note that for char tables stop != to, otherwise stop == to

>
> -  for (int i = from; ; i++)
> +  for (int i = 0; i < to; i++)
>      {

Here we stop when we reach "to", which is what we intend.

The "from" mentioned above is also here replaced with constant 0.

>        bool this_shadowed = false;
>        Lisp_Object shadowed_by = Qnil;
> -      int range_beg, range_end;
> +      int range_beg;

[range_end is now unused and so removed.]

>        Lisp_Object val, tem2;
>
>        maybe_quit ();
>
> -      if (i == stop)
> -	{
> -	  if (i == to)
> -	    break;

This is a bit complicated to follow, so I have cleaned it up.

What happens here is that we exit the loop if "i == to".

The rest is to handle the above "boundary".  We have two cases:

1. If this is not a char table:

    i == stop  implies that  i == to

   (The loop will always end here.)

2. If this is a char table:

   i == stop   does not imply that   i == to

  a) The loop will end if

   i == stop  ∧  i == to

   (This can never be the case the first time we reach this, see above.
   We must first have reached the 2b) immediately below in a previous
   iteration.)

> -	  stop = to;
> -	}
> -

  b) Otherwise, if "i == stop ∧ i != to", we set "stop = to"

   (Again, only when this has happened can we reach 2a.)

But this is all removed, so the 2b) action is moved here:

>        int starting_i = i;
>
>        if (CHAR_TABLE_P (vector))
>  	{
> +	  /* Take care of the boundary.  */
> +	  if (i == stop)
> +	    stop = to;

IOW, here "i != to", but "i == stop" so we set "stop = to".  Just as
before.

Thus, the boundary condition is handled.

————————————– End part 1, performance bug fix follows:

> +	  /* Find the first element between i and stop - 1.  Put its
> +	     index in i.  */
>  	  range_beg = i;
>  	  i = stop - 1;
>  	  val = char_table_ref_and_range (vector, range_beg, &range_beg, &i);
                ^^^^^^^^^^^^^^^^^^^^^^^^

First call to "char_table_ref_and_range".

This puts the correct values in the "range_beg" variables and "i", where
"range_beg" is the start of the range and "i" is the last item in the
range that has the same value.

This is followed by:

>	}
>      else
>	val = AREF (vector, i);
>      Lisp_Object definition = get_keyelt (val, 0);
>
>      if (NILP (definition)) continue;

IOW, we skip it if it is not defined.

This is important to see why we can remove the next part.

> @@ -3024,21 +3025,8 @@ describe_vector (Lisp_Object vector, Lisp_Object prefix, Lisp_Object args,
>        insert1 (Fkey_description (kludge, prefix));
>
>        /* Find all consecutive characters or rows that have the same
> -	 definition.  But, if VECTOR is a char-table, we had better
> -	 put a boundary between normal characters (-#x3FFF7F) and
> -	 8-bit characters (#x3FFF80-).  */
> -      if (CHAR_TABLE_P (vector))
> -	{
> -	  while (i + 1 < stop
> -		 && (range_beg = i + 1, range_end = stop - 1,
> -		   val = char_table_ref_and_range (vector, range_beg,
> -						   &range_beg, &range_end),
                         ^^^^^^^^^^^^^^^^^^^^^^^^

This second call simply tries to call up a *second* range within the
same iteration.  This is to "put a boundary" (commit bed6185fecbb), but
it is crucial to note this is _already handled_ above.

This is therefore superfluous, as we can see from what happens next:

> -		   tem2 = get_keyelt (val, 0),
> -		   !NILP (tem2))
> -		 && !NILP (Fequal (tem2, definition)))
> -	    i = range_end;

This is all just to continue advancing down the char table until we find
something.  Again, note that above we already do exactly the same thing,
so doing it here as well is superfluous.

I.e. compare these statements to the lines above, specifically:

    Lisp_Object definition = get_keyelt (val, 0);
    if (NILP (definition)) continue;

Pay particular attention to the variables i, range_beg, and range_end.

> -	}
> -      else
> +	 definition.  */
> +      if (!CHAR_TABLE_P (vector))
>  	while (i + 1 < stop
>  	       && (tem2 = get_keyelt (AREF (vector, i + 1), 0),
>  		   !NILP (tem2))

(Note that there is no change if this is not a char-table.)

> @@ -3047,10 +3035,12 @@ describe_vector (Lisp_Object vector, Lisp_Object prefix, Lisp_Object args,
>
>        /* Make sure found consecutive keys are either not shadowed or,
>  	 if they are, that they are shadowed by the same command.  */
> -      if (CHAR_TABLE_P (vector) && i != starting_i)
> +      if (CHAR_TABLE_P (vector) && i != starting_i
> +	  /* Ignore `self-insert-command' for performance.  */
> +	  && !EQ (definition, Qself_insert_command))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To see if the shadowing is the same for an entire range, we need to run
shadow_lookup() for *once for each character* in that range to see if
they are shadowed.  This is expensive.

One observation is that we often have *very long* ranges of characters
where the value is "self-insert-command", as in:

    (lookup-key global-map "文")

This is because a char-table will cover the range of all valid character
codes.  [Note again that we use a char-table only if the keymap is
defined with `make-keymap' (as opposed to `make-sparse-keymap', which is
just a list)]

Let's just assume that it is unlikely that there is any shadowing going
on for all of these self-inserting keys.  If there is shadowing going
on, we are probably not looking at a keymap where we have the default
value is set to self-insert-command.

So we basically say here: let's just not care about
`self-insert-command' and skip the check.  Yes, we will in theory not
get a perfect result, as there will be some cases where we miss the
shadowing.  OTOH, we are sure to have something that is not very slow.
(And in any case, I don't know of any examples where this will fail, and
if they exist we will in any case already be doing better than Emacs 27,
as this entire check is new in Emacs 28.)

>  	{
>  	  Lisp_Object key = make_nil_vector (1);
> -	  for (int j = starting_i + 1; j <= i; j++)
> +	  for (int j = range_beg + 1; j <= i; j++)
                       ^^^^^^^^^^

("range_beg" is the start of the actual range here, previously it was
starting_i due to the second call to char_table_ref_and_range.)

>  	    {
>  	      ASET (key, 0, make_fixnum (j));
>  	      Lisp_Object tem = shadow_lookup (shadow, key, Qt, 0);
> @@ -3109,6 +3099,7 @@ syms_of_keymap (void)
>    DEFSYM (Qdescribe_map_tree, "describe-map-tree");
>
>    DEFSYM (Qkeymap_canonicalize, "keymap-canonicalize");
> +  DEFSYM (Qself_insert_command, "self-insert-command");
>
>    /* Now we are ready to set up this property, so we can
>       create char tables.  */
> --
> 2.30.1

Phew!





  parent reply	other threads:[~2021-05-04 23:31 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-23  6:01 bug#45379: 28.0.50; Degraded Performance of describe-buffer-bindings styang
2021-01-08 16:47 ` Sheng Yang
2021-01-08 17:00   ` Stefan Kangas
2021-01-08 17:08   ` Stefan Kangas
2021-02-04 15:43     ` Sheng Yang
2021-03-06  4:44     ` Stefan Kangas
2021-03-06  8:15       ` Eli Zaretskii
2021-03-07  1:42         ` handa
2021-03-07  6:15           ` Eli Zaretskii
2021-03-30  7:01             ` Eli Zaretskii
2021-04-01 15:06               ` handa
2021-04-14  3:06                 ` Sheng Yang
2021-03-07  8:12         ` Stefan Kangas
2021-03-07  8:38           ` Eli Zaretskii
2021-05-04 23:31       ` Stefan Kangas [this message]
2021-05-06 10:11         ` Eli Zaretskii
2021-05-13 10:10         ` Eli Zaretskii
2021-06-26 21:51           ` Sheng Yang
2021-06-27  5:56             ` Eli Zaretskii
2021-09-07 18:53           ` Eli Zaretskii
2021-09-18 10:37             ` Eli Zaretskii
2021-09-18 12:34               ` Stefan Kangas
2021-09-18 13:24                 ` Eli Zaretskii
2021-09-18 14:39                   ` Stefan Kangas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADwFkm=JH2KOEhSySifL6XYT5KdJYHROgb4c4oLMq54rMjgsHw@mail.gmail.com' \
    --to=stefan@marxist.se \
    --cc=45379@debbugs.gnu.org \
    --cc=handa@gnu.org \
    --cc=juri@linkov.net \
    --cc=monnier@iro.umontreal.ca \
    --cc=stephen.berman@gmx.net \
    --cc=styang@fastmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).