From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Kangas Newsgroups: gmane.emacs.bugs Subject: bug#45379: 28.0.50; Degraded Performance of describe-buffer-bindings Date: Tue, 4 May 2021 18:31:10 -0500 Message-ID: References: <02f717c6-dc96-4ba0-9117-2ef079ac556f@www.fastmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="28076"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 45379@debbugs.gnu.org, Juri Linkov , Kenichi Handa , Stefan Monnier , Stephen Berman To: Sheng Yang Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed May 05 01:32:30 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1le4Wk-0007DF-DP for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 05 May 2021 01:32:30 +0200 Original-Received: from localhost ([::1]:52954 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1le4Wj-0002cf-Fn for geb-bug-gnu-emacs@m.gmane-mx.org; Tue, 04 May 2021 19:32:29 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:35326) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1le4WL-0002Jk-1f for bug-gnu-emacs@gnu.org; Tue, 04 May 2021 19:32:05 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:44516) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1le4WI-0004Ey-Ep for bug-gnu-emacs@gnu.org; Tue, 04 May 2021 19:32:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1le4WI-0001Rj-7W for bug-gnu-emacs@gnu.org; Tue, 04 May 2021 19:32:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Stefan Kangas Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 04 May 2021 23:32:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 45379 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: confirmed patch Original-Received: via spool by 45379-submit@debbugs.gnu.org id=B45379.16201710805550 (code B ref 45379); Tue, 04 May 2021 23:32:02 +0000 Original-Received: (at 45379) by debbugs.gnu.org; 4 May 2021 23:31:20 +0000 Original-Received: from localhost ([127.0.0.1]:56060 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1le4Vc-0001RS-1L for submit@debbugs.gnu.org; Tue, 04 May 2021 19:31:20 -0400 Original-Received: from mail-pf1-f174.google.com ([209.85.210.174]:36801) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1le4VZ-0001R7-57 for 45379@debbugs.gnu.org; Tue, 04 May 2021 19:31:18 -0400 Original-Received: by mail-pf1-f174.google.com with SMTP id p4so658395pfo.3 for <45379@debbugs.gnu.org>; Tue, 04 May 2021 16:31:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:in-reply-to:references:mime-version:date :message-id:subject:to:cc:content-transfer-encoding; bh=h/t//vg7uIczPpI8PkJC3nw6QCgF33tjnEHC9bu9gOM=; b=kuXwBVQVS/5GjWCeVo/1NdUMQeBfTh1RM7RX69h+IQKe+O+sLlIpXBxy6vRV0zJEEP n/1Si2y0ddDRjptKn77LQYtSA9dNOL5/sqITMVQ/CKhHVmEHt/LqLjJ4lSLsYsnMnU/P jZP0HqeMaW88Ha2rdqKraTomZC0najyPL0ya5qk2g02Lq31zhDnMeq/R+e9/PALtv0bz Vrqp4qswsT/+qN/s4Uxcgc+Xmr5RtvyCmNaInGQm45I+9e+5mGevPwttJ4Ol6oYSYAoN jekBaaJ6uk2FQTeVF2X/P+wc/YzKwDzXERVtUJey/OI0HZV28tl50//BuNkqOU2Tuwef H7Tg== X-Gm-Message-State: AOAM532aG248wKszG5nWr20KieLOYzfsRnyqj/vimy3lwJGiHhifmiHl PEgR6oUAK4deSMOfxtbFe+1JUoAcc5JDpa8otHY= X-Google-Smtp-Source: ABdhPJzwQlPLgcn45x3BE5WVF5DP9ujA1Gyi50h90fUyurUiMEis0fMcu0FITJ5STEguFqR5C5zL1BzhRtq1on38Bbc= X-Received: by 2002:a17:90a:b38a:: with SMTP id e10mr30938628pjr.175.1620171071301; Tue, 04 May 2021 16:31:11 -0700 (PDT) Original-Received: from 753933720722 named unknown by gmailapi.google.com with HTTPREST; Tue, 4 May 2021 18:31:10 -0500 In-Reply-To: X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:205653 Archived-At: I finally had time/energy to look into this again! Sorry for taking more time than expected. handa writes: > In article <838s65ktvk.fsf@gnu.org>, Eli Zaretskii writes: > >> > > Is the patch for the above improvement the one included in the file >> > > 0001-Fix-describe-buffer-bindings-performance-regression.patch? >> > >> > Yes, it is. > > It seems that the main intention of that patch is to avoid unnecessary > call of char_table_ref_and_range introduced by the commit below: > >> > Don't show key ranges if shadowed by different commands >> > >> > * src/keymap.c (describe_vector): Make sure found consecutive keys >> > are either not shadowed or, if they are, that they are shadowed by >> > the same command. (Bug#9293) > > In describe_vector, if VECTOR is a char-table, char_table_ref_and_range > is already called at the fairly beginning of the main loop. So, we do > not have to call it again, and thus, I think the patch is doing the > correct thing. Yes, this is all correct. > But, I don't know whether the following part in the patch is correct or > not. > > + /* Ignore `self-insert-command' for performance. */ > + && !EQ (definition, Qself_insert_command)) (This is explained below.) Eli Zaretskii writes: > I'm not sure I understand the reasons for each of the changes here. > char-tables are a tricky data structure, so I'd like to make sure this > change doesn't make our code subtly incorrect. > > So could you please walk us through the proposed changes, adding > explanations for each part as you go? This code is a bit complicated, so please bare with me if I am going into too much detail. BTW, note that I have also carried out a lot of testing to see that my change does the same thing as before, only faster (unfortunately it has been harder to come up with useful automated tests beyond the ones we already have). First, it might help to think of this as consisting of two parts: 1. A cleanup of the boundary condition check. It is simply to make this code a bit more clear and easier to follow. 2. The actual bug fix for the performance bug. I put a divider in between these two parts to make things hopefully a bit more clear. Stefan Kangas writes: > From f95c75f1112c1aae0bd06a6753b60ce8a591d6e2 Mon Sep 17 00:00:00 2001 > From: Stefan Kangas > Date: Sat, 6 Mar 2021 05:32:32 +0100 > Subject: [PATCH] Fix describe-buffer-bindings performance regression > > * src/keymap.c (describe_vector): Improve char-table performance by > removing an unnecessary loop. (Bug#45379) > (syms_of_keymap) : New DEFSYM. > --- > src/keymap.c | 47 +++++++++++++++++++---------------------------- > 1 file changed, 19 insertions(+), 28 deletions(-) > > diff --git a/src/keymap.c b/src/keymap.c > index 782931fadf..c70df98a6e 100644 > --- a/src/keymap.c > +++ b/src/keymap.c > @@ -2920,7 +2920,7 @@ describe_vector (Lisp_Object vector, Lisp_Object pr= efix, Lisp_Object args, > Lisp_Object suppress =3D Qnil; > bool first =3D true; > /* Range of elements to be handled. */ > - int from, to, stop; > + int to, stop; > > if (!keymap_p) > { > @@ -2940,32 +2940,33 @@ describe_vector (Lisp_Object vector, Lisp_Object = prefix, Lisp_Object args, > if (partial) > suppress =3D intern ("suppress-keymap"); > > - from =3D 0; The "from" variable is initialized to 0 below and is redundant. So it is replaced with the constant 0, which I think makes the intention of this code more clear. IOW, this is just a cleanup. > + /* If VECTOR is a char-table, we had better put a boundary > + between normal characters (-#x3FFF7F) and 8-bit characters > + (#x3FFF80-). */ > if (CHAR_TABLE_P (vector)) > stop =3D MAX_5_BYTE_CHAR + 1, to =3D MAX_CHAR + 1; > else > stop =3D to =3D ASIZE (vector); The above puts a "boundary" that we need to handle below by stopping (skipping to the next range) when we reach "stop". We must end the loop altogether only when we reach "to". Note that for char tables stop !=3D to, otherwise stop =3D=3D to > > - for (int i =3D from; ; i++) > + for (int i =3D 0; i < to; i++) > { Here we stop when we reach "to", which is what we intend. The "from" mentioned above is also here replaced with constant 0. > bool this_shadowed =3D false; > Lisp_Object shadowed_by =3D Qnil; > - int range_beg, range_end; > + int range_beg; [range_end is now unused and so removed.] > Lisp_Object val, tem2; > > maybe_quit (); > > - if (i =3D=3D stop) > - { > - if (i =3D=3D to) > - break; This is a bit complicated to follow, so I have cleaned it up. What happens here is that we exit the loop if "i =3D=3D to". The rest is to handle the above "boundary". We have two cases: 1. If this is not a char table: i =3D=3D stop implies that i =3D=3D to (The loop will always end here.) 2. If this is a char table: i =3D=3D stop does not imply that i =3D=3D to a) The loop will end if i =3D=3D stop =E2=88=A7 i =3D=3D to (This can never be the case the first time we reach this, see above. We must first have reached the 2b) immediately below in a previous iteration.) > - stop =3D to; > - } > - b) Otherwise, if "i =3D=3D stop =E2=88=A7 i !=3D to", we set "stop =3D to= " (Again, only when this has happened can we reach 2a.) But this is all removed, so the 2b) action is moved here: > int starting_i =3D i; > > if (CHAR_TABLE_P (vector)) > { > + /* Take care of the boundary. */ > + if (i =3D=3D stop) > + stop =3D to; IOW, here "i !=3D to", but "i =3D=3D stop" so we set "stop =3D to". Just a= s before. Thus, the boundary condition is handled. =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=93 End part 1, performance bug fix = follows: > + /* Find the first element between i and stop - 1. Put its > + index in i. */ > range_beg =3D i; > i =3D stop - 1; > val =3D char_table_ref_and_range (vector, range_beg, &range_beg, &i); ^^^^^^^^^^^^^^^^^^^^^^^^ First call to "char_table_ref_and_range". This puts the correct values in the "range_beg" variables and "i", where "range_beg" is the start of the range and "i" is the last item in the range that has the same value. This is followed by: > } > else > val =3D AREF (vector, i); > Lisp_Object definition =3D get_keyelt (val, 0); > > if (NILP (definition)) continue; IOW, we skip it if it is not defined. This is important to see why we can remove the next part. > @@ -3024,21 +3025,8 @@ describe_vector (Lisp_Object vector, Lisp_Object p= refix, Lisp_Object args, > insert1 (Fkey_description (kludge, prefix)); > > /* Find all consecutive characters or rows that have the same > - definition. But, if VECTOR is a char-table, we had better > - put a boundary between normal characters (-#x3FFF7F) and > - 8-bit characters (#x3FFF80-). */ > - if (CHAR_TABLE_P (vector)) > - { > - while (i + 1 < stop > - && (range_beg =3D i + 1, range_end =3D stop - 1, > - val =3D char_table_ref_and_range (vector, range_beg, > - &range_beg, &range_end), ^^^^^^^^^^^^^^^^^^^^^^^^ This second call simply tries to call up a *second* range within the same iteration. This is to "put a boundary" (commit bed6185fecbb), but it is crucial to note this is _already handled_ above. This is therefore superfluous, as we can see from what happens next: > - tem2 =3D get_keyelt (val, 0), > - !NILP (tem2)) > - && !NILP (Fequal (tem2, definition))) > - i =3D range_end; This is all just to continue advancing down the char table until we find something. Again, note that above we already do exactly the same thing, so doing it here as well is superfluous. I.e. compare these statements to the lines above, specifically: Lisp_Object definition =3D get_keyelt (val, 0); if (NILP (definition)) continue; Pay particular attention to the variables i, range_beg, and range_end. > - } > - else > + definition. */ > + if (!CHAR_TABLE_P (vector)) > while (i + 1 < stop > && (tem2 =3D get_keyelt (AREF (vector, i + 1), 0), > !NILP (tem2)) (Note that there is no change if this is not a char-table.) > @@ -3047,10 +3035,12 @@ describe_vector (Lisp_Object vector, Lisp_Object = prefix, Lisp_Object args, > > /* Make sure found consecutive keys are either not shadowed or, > if they are, that they are shadowed by the same command. */ > - if (CHAR_TABLE_P (vector) && i !=3D starting_i) > + if (CHAR_TABLE_P (vector) && i !=3D starting_i > + /* Ignore `self-insert-command' for performance. */ > + && !EQ (definition, Qself_insert_command)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To see if the shadowing is the same for an entire range, we need to run shadow_lookup() for *once for each character* in that range to see if they are shadowed. This is expensive. One observation is that we often have *very long* ranges of characters where the value is "self-insert-command", as in: (lookup-key global-map "=E6=96=87") This is because a char-table will cover the range of all valid character codes. [Note again that we use a char-table only if the keymap is defined with `make-keymap' (as opposed to `make-sparse-keymap', which is just a list)] Let's just assume that it is unlikely that there is any shadowing going on for all of these self-inserting keys. If there is shadowing going on, we are probably not looking at a keymap where we have the default value is set to self-insert-command. So we basically say here: let's just not care about `self-insert-command' and skip the check. Yes, we will in theory not get a perfect result, as there will be some cases where we miss the shadowing. OTOH, we are sure to have something that is not very slow. (And in any case, I don't know of any examples where this will fail, and if they exist we will in any case already be doing better than Emacs 27, as this entire check is new in Emacs 28.) > { > Lisp_Object key =3D make_nil_vector (1); > - for (int j =3D starting_i + 1; j <=3D i; j++) > + for (int j =3D range_beg + 1; j <=3D i; j++) ^^^^^^^^^^ ("range_beg" is the start of the actual range here, previously it was starting_i due to the second call to char_table_ref_and_range.) > { > ASET (key, 0, make_fixnum (j)); > Lisp_Object tem =3D shadow_lookup (shadow, key, Qt, 0); > @@ -3109,6 +3099,7 @@ syms_of_keymap (void) > DEFSYM (Qdescribe_map_tree, "describe-map-tree"); > > DEFSYM (Qkeymap_canonicalize, "keymap-canonicalize"); > + DEFSYM (Qself_insert_command, "self-insert-command"); > > /* Now we are ready to set up this property, so we can > create char tables. */ > -- > 2.30.1 Phew!