unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Michal Nazarewicz <mina86@mina86.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 24603@debbugs.gnu.org
Subject: bug#24603: [RFC 16/18] Refactor character class checking; optimise ASCII case
Date: Tue, 20 Dec 2016 15:32:27 +0100	[thread overview]
Message-ID: <xa1tinqellhg.fsf@mina86.com> (raw)
In-Reply-To: <83k2cgiea2.fsf@gnu.org>

Sorry about the delay.  I hope I’ll have some time during Xmas to work
on this more.

On Sun, Nov 06 2016, Eli Zaretskii wrote:
>> From: Michal Nazarewicz <mina86@mina86.com>
>> Cc: 24603@debbugs.gnu.org
>> Date: Sun, 06 Nov 2016 20:26:11 +0100
>> 
>> On Tue, Oct 04 2016, Eli Zaretskii wrote:
>> > Thanks.  I think this change will require a benchmark to make sure we
>> > don't lose too much in terms of performance.
>> 
>> Benchmark and its results included below.
>> 
>> It’s a bit noisy and as all benchmarks of that kind it doesn’t really
>> measure the real usage, but I think it’s safe to say that things aren’t
>> getting worse.
>
> Thanks.  What happened here:
>
>> ==== Refactor character class checking; optimise ASCII case ====
>> alnum         54.643     54.301     56.668     56.898
>> alpha         54.654     54.558     56.134     56.281
>> digit         26.103     26.044      0.495      0.443
>> xdigit        25.606     25.690      0.815      0.806
>> upper         83.269     83.306     36.704     36.487
>> lower         56.278     55.804     54.872     54.917
>> word          34.820     55.092     99.577    100.618
>                            ^^^^^^
> Is this slow-down real?

I’ve re-run the benchmarks five times and took averages.  Based on that,
this slow-down does not appear to be real, but it seems there are some
other which I haven’t noticed previously:

Class      [[:cc:]]              no-case               [^[:cc:]]             no-case             
---------  --------------------  --------------------  --------------------  --------------------
alnum         56.772                54.973                58.132                58.388           
              -1.385     -2.44%      0.571     +1.04%     -0.041     -0.07%     -0.346     -0.59%
              -1.539     -2.71%      0.198     +0.36%     -0.967     -1.66%     -1.272     -2.18%
              -3.017     -5.31%      2.990     +5.44%     -1.013     -1.74%     -1.681     -2.88%
              -3.850     -6.78%     -1.229     -2.24%     -0.086     -0.15%     -1.453     -2.49%
---------  --------------------  --------------------  --------------------  --------------------
alpha         54.386                54.380                56.698                58.332           
               1.135     +2.09%      0.892     +1.64%      0.587     +1.04%      0.667     +1.14%
               1.052     +1.93%      1.108     +2.04%      0.661     +1.17%     -1.555     -2.67%
              -0.338     -0.62%     -0.235     -0.43%     -0.363     -0.64%     -1.788     -3.06%
              -1.068     -1.96%     -0.541     -1.00%     -0.182     -0.32%     -2.659     -4.56%
---------  --------------------  --------------------  --------------------  --------------------
digit         26.416                26.574                 0.454                 0.455           
               0.203     +0.77%     -0.030     -0.11%     -0.010     -2.20%     -0.007     -1.58%
               0.138     +0.52%     -0.013     -0.05%     -0.006     -1.28%     -0.008     -1.71%
              -0.021     -0.08%     -0.161     -0.61%     -0.014     -3.08%     -0.018     -4.04%
              -0.293     -1.11%     -0.417     -1.57%     -0.003     -0.57%     -0.009     -2.02%
---------  --------------------  --------------------  --------------------  --------------------
xdigit        26.015                25.956                 0.902                 0.898           
               0.194     +0.75%      0.186     +0.72%     -0.074     -8.20%     -0.075     -8.33%
               1.092     +4.20%      0.191     +0.74%     -0.073     -8.13%     -0.070     -7.84%
              -0.003     -0.01%      0.239     +0.92%     -0.084     -9.35%     -0.083     -9.22%
              -0.345     -1.32%     -0.124     -0.48%     -0.069     -7.62%     -0.060     -6.64%
---------  --------------------  --------------------  --------------------  --------------------
upper         83.257                82.562                41.189                41.284           
               3.298     +3.96%      0.683     +0.83%     -4.733    -11.49%     -3.970     -9.62%
               1.791     +2.15%      3.616     +4.38%     -3.875     -9.41%     -3.845     -9.31%
               0.045     +0.05%      5.854     +7.09%     -8.977    -21.80%     -9.105    -22.05%
             -28.204    -33.88%    -27.548    -33.37%     13.052    +31.69%     12.946    +31.36%
---------  --------------------  --------------------  --------------------  --------------------
lower         64.299                64.218                61.111                62.093           
              -7.671    -11.93%     -8.443    -13.15%     -6.356    -10.40%     -7.320    -11.79%
              -7.251    -11.28%     -5.967     -9.29%     -5.593     -9.15%     -6.500    -10.47%
              -7.901    -12.29%     -8.447    -13.15%     -6.268    -10.26%     -7.304    -11.76%
              -9.213    -14.33%     -9.183    -14.30%     -4.879     -7.98%     -7.422    -11.95%
---------  --------------------  --------------------  --------------------  --------------------
word          35.618                37.086               104.661               105.706           
               0.198     +0.55%     -1.206     -3.25%      1.497     +1.43%      2.618     +2.48%
               0.614     +1.72%      0.263     +0.71%      1.618     +1.55%      2.099     +1.99%
               0.692     +1.94%     -0.403     -1.09%     -2.975     -2.84%     -3.099     -2.93%
              -1.210     -3.40%     -1.759     -4.74%     -3.491     -3.34%     -3.722     -3.52%
---------  --------------------  --------------------  --------------------  --------------------
punct        107.447               107.661                33.509                33.453           
               3.037     +2.83%      1.931     +1.79%      0.640     +1.91%      0.596     +1.78%
               3.106     +2.89%      4.309     +4.00%      0.539     +1.61%      0.680     +2.03%
              -0.588     -0.55%      3.730     +3.46%     -1.138     -3.40%     -1.046     -3.13%
               1.013     +0.94%      2.857     +2.65%      1.679     +5.01%     -1.142     -3.41%
---------  --------------------  --------------------  --------------------  --------------------
cntrl         25.770                25.718                 1.246                 1.229           
               0.115     +0.45%      0.150     +0.58%     -0.068     -5.47%     -0.063     -5.11%
               0.031     +0.12%      0.112     +0.44%     -0.087     -7.00%     -0.057     -4.64%
              -0.089     -0.35%     -0.034     -0.13%     -0.103     -8.30%     -0.088     -7.16%
              -0.410     -1.59%     -0.334     -1.30%     -0.047     -3.77%     -0.043     -3.53%
---------  --------------------  --------------------  --------------------  --------------------
graph         23.703                23.595                69.221                70.017           
               0.306     +1.29%      0.245     +1.04%      0.592     +0.85%     -0.146     -0.21%
               1.838     +7.75%      0.641     +2.71%      0.517     +0.75%     -0.316     -0.45%
               4.503    +19.00%      4.599    +19.49%     13.219    +19.10%     15.108    +21.58%
               6.628    +27.96%      4.209    +17.84%     12.004    +17.34%     11.160    +15.94%
---------  --------------------  --------------------  --------------------  --------------------
print         22.798                22.781                69.873                69.795           
               0.670     +2.94%      0.607     +2.67%      0.826     +1.18%      0.699     +1.00%
               1.225     +5.37%      1.171     +5.14%      2.049     +2.93%      1.427     +2.04%
               4.540    +19.91%      4.574    +20.08%     14.046    +20.10%     17.268    +24.74%
               4.178    +18.33%      4.188    +18.38%     12.189    +17.44%     12.351    +17.70%
---------  --------------------  --------------------  --------------------  --------------------
space        141.314               144.661                 1.130                 1.125           
               0.331     +0.23%     -3.395     -2.35%      0.011     +0.99%      0.011     +0.94%
               2.535     +1.79%      0.202     +0.14%      0.029     +2.53%      0.029     +2.60%
              -5.808     -4.11%     -6.856     -4.74%     -0.001     -0.11%      0.076     +6.79%
              -6.470     -4.58%     -9.847     -6.81%      0.010     +0.85%      0.005     +0.48%
---------  --------------------  --------------------  --------------------  --------------------
blank         26.706                26.740                 0.183                 0.181           
               0.147     +0.55%     -0.009     -0.04%      0.003     +1.74%      0.004     +2.10%
               1.461     +5.47%      0.091     +0.34%      0.006     +3.05%      0.007     +3.99%
               3.021    +11.31%      0.591     +2.21%     -0.002     -0.98%      0.000     +0.11%
              -0.305     -1.14%     -0.372     -1.39%     -0.001     -0.33%      0.000     +0.22%
---------  --------------------  --------------------  --------------------  --------------------
ascii         22.202                22.140                 4.722                 4.751           
               0.489     +2.20%      0.601     +2.71%     -0.493    -10.44%     -0.436     -9.18%
               0.625     +2.81%      0.597     +2.69%     -0.397     -8.41%     -0.436     -9.18%
               0.348     +1.57%      1.043     +4.71%      0.287     +6.08%      0.249     +5.25%
              -0.033     -0.15%      0.826     +3.73%      0.398     +8.42%      0.251     +5.29%
---------  --------------------  --------------------  --------------------  --------------------
nonascii       5.586                 5.544                85.792                83.721           
              -0.392     -7.02%     -0.405     -7.30%      5.600     +6.53%      1.420     +1.70%
              -0.459     -8.21%      0.213     +3.84%      5.553     +6.47%      3.031     +3.62%
               0.461     +8.25%     -0.144     -2.59%      4.086     +4.76%      1.803     +2.15%
              -0.368     -6.58%     -0.296     -5.35%     -0.947     -1.10%      1.088     +1.30%
---------  --------------------  --------------------  --------------------  --------------------
unibyte       22.166                22.172                 5.299                 5.403           
               0.545     +2.46%      0.533     +2.40%     -1.041    -19.65%     -1.140    -21.09%
               1.187     +5.36%      0.843     +3.80%     -1.068    -20.16%     -1.182    -21.87%
               0.429     +1.94%      0.385     +1.74%     -1.043    -19.69%     -1.163    -21.52%
               0.237     +1.07%      0.063     +0.28%     -0.915    -17.26%     -1.025    -18.98%
---------  --------------------  --------------------  --------------------  --------------------
multibyte      6.031                 5.571                83.834                85.391           
              -0.875    -14.50%     -0.432     -7.75%      1.855     +2.21%     -0.073     -0.09%
              -0.902    -14.96%     -0.440     -7.89%      7.195     +8.58%      1.665     +1.95%
              -0.904    -14.99%     -0.531     -9.53%      2.005     +2.39%      0.094     +0.11%
              -0.786    -13.03%     -0.336     -6.04%      0.692     +0.83%      1.607     +1.88%
---------  --------------------  --------------------  --------------------  --------------------
...all...      0.928                 0.927                89.115                89.857           
               0.080     +8.60%      0.076     +8.22%     -0.314     -0.35%      5.126     +5.70%
               0.058     +6.23%      0.058     +6.30%     -0.304     -0.34%      0.038     +0.04%
               0.002     +0.19%      0.001     +0.11%     -0.413     -0.46%     -1.742     -1.94%
               0.037     +3.97%      0.034     +3.64%      0.824     +0.92%     -1.253     -1.39%

(The first line in each group are absolute results with Emacs before my
changes.  The other groups show absolute and relative change to that
baseline (i.e. negative is good)).

Slow-downs in intermediate commits aren’t that big of an issue as long
as the last line shows an improvement (or at least negligible
regression) but sadly that is not always the case.

As can be seen, [[:graph:]] slows-down by almost 28% :( and I don’t
quite understand where all that can come from.

-- 
Best regards
ミハウ “𝓶𝓲𝓷𝓪86” ナザレヴイツ
«If at first you don’t succeed, give up skydiving»





  reply	other threads:[~2016-12-20 14:32 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-04  1:05 bug#24603: [RFC 00/18] Improvement to casing Michal Nazarewicz
2016-10-04  1:10 ` bug#24603: [RFC 01/18] Add tests for casefiddle.c Michal Nazarewicz
2016-10-04  1:10   ` bug#24603: [RFC 02/18] Generate upcase and downcase tables from Unicode data Michal Nazarewicz
2016-10-04  7:27     ` Eli Zaretskii
2016-10-04 14:54       ` Michal Nazarewicz
2016-10-04 15:06         ` Eli Zaretskii
2016-10-04 16:57           ` Michal Nazarewicz
2016-10-04 17:27             ` Eli Zaretskii
2016-10-04 17:44               ` Eli Zaretskii
2016-10-06 20:29                 ` Michal Nazarewicz
2016-10-07  6:52                   ` Eli Zaretskii
2016-10-04  1:10   ` bug#24603: [RFC 03/18] Don’t assume character can be either upper- or lower-case when casing Michal Nazarewicz
2016-10-04  1:10   ` bug#24603: [RFC 04/18] Split casify_object into multiple functions Michal Nazarewicz
2016-10-04  1:10   ` bug#24603: [RFC 05/18] Introduce case_character function Michal Nazarewicz
2016-10-04  1:10   ` bug#24603: [RFC 06/18] Add support for title-casing letters Michal Nazarewicz
2016-10-04  1:10   ` bug#24603: [RFC 07/18] Split up casify_region function Michal Nazarewicz
2016-10-04  7:17     ` Eli Zaretskii
2016-10-18  2:27       ` Michal Nazarewicz
2016-10-04  1:10   ` bug#24603: [RFC 08/18] Support casing characters which map into multiple code points Michal Nazarewicz
2016-10-04  7:38     ` Eli Zaretskii
2016-10-06 21:40       ` Michal Nazarewicz
2016-10-07  7:46         ` Eli Zaretskii
2017-01-28 23:48           ` Michal Nazarewicz
2017-02-10  9:12             ` Eli Zaretskii
2016-10-04  1:10   ` bug#24603: [RFC 09/18] Implement special sigma casing rule Michal Nazarewicz
2016-10-04  7:22     ` Eli Zaretskii
2016-10-04  1:10   ` bug#24603: [RFC 10/18] Implement Turkic dotless and dotted i handling when casing strings Michal Nazarewicz
2016-10-04  7:12     ` Eli Zaretskii
2016-10-04  1:10   ` bug#24603: [RFC 11/18] Implement casing rules for Lithuanian Michal Nazarewicz
2016-10-04  1:10   ` bug#24603: [RFC 12/18] Implement rules for title-casing Dutch ij ‘letter’ Michal Nazarewicz
2016-10-04  1:10   ` bug#24603: [RFC 13/18] Add some tricky Unicode characters to regex test Michal Nazarewicz
2016-10-04  1:10   ` bug#24603: [RFC 14/18] Factor out character category lookup to separate function Michal Nazarewicz
2016-10-04  1:10   ` bug#24603: [RFC 15/18] Base lower- and upper-case tests on Unicode properties Michal Nazarewicz
2016-10-04  6:54     ` Eli Zaretskii
2016-10-04  1:10   ` bug#24603: [RFC 16/18] Refactor character class checking; optimise ASCII case Michal Nazarewicz
2016-10-04  7:48     ` Eli Zaretskii
2016-10-17 13:22       ` Michal Nazarewicz
2016-11-06 19:26       ` Michal Nazarewicz
2016-11-06 19:44         ` Eli Zaretskii
2016-12-20 14:32           ` Michal Nazarewicz [this message]
2016-12-20 16:39             ` Eli Zaretskii
2016-12-22 14:02               ` Michal Nazarewicz
2016-10-04  1:10   ` bug#24603: [RFC 17/18] Optimise character class matching in regexes Michal Nazarewicz
2016-10-04  1:10   ` bug#24603: [RFC 18/18] Fix case-fold-search character class matching Michal Nazarewicz
2016-10-17 22:03 ` bug#24603: [PATCH 0/3] Case table updates Michal Nazarewicz
2016-10-17 22:03   ` bug#24603: [PATCH 1/3] Add tests for casefiddle.c Michal Nazarewicz
2016-10-17 22:03   ` bug#24603: [PATCH 2/3] Generate upcase and downcase tables from Unicode data Michal Nazarewicz
2016-10-17 22:03   ` bug#24603: [PATCH 3/3] Don’t generate ‘X maps to X’ entries in case tables Michal Nazarewicz
2016-10-18  6:36   ` bug#24603: [PATCH 0/3] Case table updates Eli Zaretskii
2016-10-24 15:11     ` Michal Nazarewicz
2016-10-24 15:33       ` Eli Zaretskii
2017-03-09 21:51 ` bug#24603: [PATCHv5 00/11] Casing improvements Michal Nazarewicz
2017-03-09 21:51   ` bug#24603: [PATCHv5 01/11] Split casify_object into multiple functions Michal Nazarewicz
2017-03-10  9:00     ` Andreas Schwab
2017-03-09 21:51   ` bug#24603: [PATCHv5 02/11] Introduce case_character function Michal Nazarewicz
2017-03-09 21:51   ` bug#24603: [PATCHv5 03/11] Add support for title-casing letters (bug#24603) Michal Nazarewicz
2017-03-11  9:03     ` Eli Zaretskii
2017-03-09 21:51   ` bug#24603: [PATCHv5 04/11] Split up casify_region function (bug#24603) Michal Nazarewicz
2017-03-09 21:51   ` bug#24603: [PATCHv5 05/11] Support casing characters which map into multiple code points (bug#24603) Michal Nazarewicz
2017-03-11  9:14     ` Eli Zaretskii
2017-03-21  2:09       ` Michal Nazarewicz
2017-03-09 21:51   ` bug#24603: [PATCHv5 06/11] Implement special sigma casing rule (bug#24603) Michal Nazarewicz
2017-03-09 21:51   ` bug#24603: [PATCHv5 07/11] Introduce ‘buffer-language’ buffer-locar variable Michal Nazarewicz
2017-03-11  9:29     ` Eli Zaretskii
2017-03-09 21:51   ` bug#24603: [PATCHv5 08/11] Implement rules for title-casing Dutch ij ‘letter’ (bug#24603) Michal Nazarewicz
2017-03-11  9:40     ` Eli Zaretskii
2017-03-16 21:30       ` Michal Nazarewicz
2017-03-17 13:43         ` Eli Zaretskii
2017-03-09 21:51   ` bug#24603: [PATCHv5 09/11] Implement Turkic dotless and dotted i casing rules (bug#24603) Michal Nazarewicz
2017-03-09 21:51   ` bug#24603: [PATCHv5 10/11] Implement casing rules for Lithuanian (bug#24603) Michal Nazarewicz
2017-03-09 21:51   ` bug#24603: [PATCHv5 11/11] Implement Irish casing rules (bug#24603) Michal Nazarewicz
2017-03-11  9:44     ` Eli Zaretskii
2017-03-16 22:16       ` Michal Nazarewicz
2017-03-17  8:20         ` Eli Zaretskii
2017-03-11 10:00   ` bug#24603: [PATCHv5 00/11] Casing improvements Eli Zaretskii
2017-03-21  1:27   ` bug#24603: [PATCHv6 0/6] Casing improvements, language-independent part Michal Nazarewicz
2017-03-21  1:27     ` bug#24603: [PATCHv6 1/6] Split casify_object into multiple functions Michal Nazarewicz
2017-03-21  1:27     ` bug#24603: [PATCHv6 2/6] Introduce case_character function Michal Nazarewicz
2017-03-21  1:27     ` bug#24603: [PATCHv6 3/6] Add support for title-casing letters (bug#24603) Michal Nazarewicz
2017-03-21  1:27     ` bug#24603: [PATCHv6 4/6] Split up casify_region function (bug#24603) Michal Nazarewicz
2017-03-21  1:27     ` bug#24603: [PATCHv6 5/6] Support casing characters which map into multiple code points (bug#24603) Michal Nazarewicz
2017-03-22 16:06       ` Eli Zaretskii
2017-04-03  9:01         ` Michal Nazarewicz
2017-04-03 14:52           ` Eli Zaretskii
2019-06-25  0:09           ` Lars Ingebrigtsen
2019-06-25  0:29             ` Michał Nazarewicz
2020-08-11 13:46               ` Lars Ingebrigtsen
2021-05-10 11:51                 ` bug#24603: [RFC 00/18] Improvement to casing Lars Ingebrigtsen
2017-03-21  1:27     ` bug#24603: [PATCHv6 6/6] Implement special sigma casing rule (bug#24603) Michal Nazarewicz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xa1tinqellhg.fsf@mina86.com \
    --to=mina86@mina86.com \
    --cc=24603@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).