From: Michal Nazarewicz <mina86@mina86.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 24603@debbugs.gnu.org
Subject: bug#24603: [RFC 16/18] Refactor character class checking; optimise ASCII case
Date: Tue, 20 Dec 2016 15:32:27 +0100 [thread overview]
Message-ID: <xa1tinqellhg.fsf@mina86.com> (raw)
In-Reply-To: <83k2cgiea2.fsf@gnu.org>
Sorry about the delay. I hope I’ll have some time during Xmas to work
on this more.
On Sun, Nov 06 2016, Eli Zaretskii wrote:
>> From: Michal Nazarewicz <mina86@mina86.com>
>> Cc: 24603@debbugs.gnu.org
>> Date: Sun, 06 Nov 2016 20:26:11 +0100
>>
>> On Tue, Oct 04 2016, Eli Zaretskii wrote:
>> > Thanks. I think this change will require a benchmark to make sure we
>> > don't lose too much in terms of performance.
>>
>> Benchmark and its results included below.
>>
>> It’s a bit noisy and as all benchmarks of that kind it doesn’t really
>> measure the real usage, but I think it’s safe to say that things aren’t
>> getting worse.
>
> Thanks. What happened here:
>
>> ==== Refactor character class checking; optimise ASCII case ====
>> alnum 54.643 54.301 56.668 56.898
>> alpha 54.654 54.558 56.134 56.281
>> digit 26.103 26.044 0.495 0.443
>> xdigit 25.606 25.690 0.815 0.806
>> upper 83.269 83.306 36.704 36.487
>> lower 56.278 55.804 54.872 54.917
>> word 34.820 55.092 99.577 100.618
> ^^^^^^
> Is this slow-down real?
I’ve re-run the benchmarks five times and took averages. Based on that,
this slow-down does not appear to be real, but it seems there are some
other which I haven’t noticed previously:
Class [[:cc:]] no-case [^[:cc:]] no-case
--------- -------------------- -------------------- -------------------- --------------------
alnum 56.772 54.973 58.132 58.388
-1.385 -2.44% 0.571 +1.04% -0.041 -0.07% -0.346 -0.59%
-1.539 -2.71% 0.198 +0.36% -0.967 -1.66% -1.272 -2.18%
-3.017 -5.31% 2.990 +5.44% -1.013 -1.74% -1.681 -2.88%
-3.850 -6.78% -1.229 -2.24% -0.086 -0.15% -1.453 -2.49%
--------- -------------------- -------------------- -------------------- --------------------
alpha 54.386 54.380 56.698 58.332
1.135 +2.09% 0.892 +1.64% 0.587 +1.04% 0.667 +1.14%
1.052 +1.93% 1.108 +2.04% 0.661 +1.17% -1.555 -2.67%
-0.338 -0.62% -0.235 -0.43% -0.363 -0.64% -1.788 -3.06%
-1.068 -1.96% -0.541 -1.00% -0.182 -0.32% -2.659 -4.56%
--------- -------------------- -------------------- -------------------- --------------------
digit 26.416 26.574 0.454 0.455
0.203 +0.77% -0.030 -0.11% -0.010 -2.20% -0.007 -1.58%
0.138 +0.52% -0.013 -0.05% -0.006 -1.28% -0.008 -1.71%
-0.021 -0.08% -0.161 -0.61% -0.014 -3.08% -0.018 -4.04%
-0.293 -1.11% -0.417 -1.57% -0.003 -0.57% -0.009 -2.02%
--------- -------------------- -------------------- -------------------- --------------------
xdigit 26.015 25.956 0.902 0.898
0.194 +0.75% 0.186 +0.72% -0.074 -8.20% -0.075 -8.33%
1.092 +4.20% 0.191 +0.74% -0.073 -8.13% -0.070 -7.84%
-0.003 -0.01% 0.239 +0.92% -0.084 -9.35% -0.083 -9.22%
-0.345 -1.32% -0.124 -0.48% -0.069 -7.62% -0.060 -6.64%
--------- -------------------- -------------------- -------------------- --------------------
upper 83.257 82.562 41.189 41.284
3.298 +3.96% 0.683 +0.83% -4.733 -11.49% -3.970 -9.62%
1.791 +2.15% 3.616 +4.38% -3.875 -9.41% -3.845 -9.31%
0.045 +0.05% 5.854 +7.09% -8.977 -21.80% -9.105 -22.05%
-28.204 -33.88% -27.548 -33.37% 13.052 +31.69% 12.946 +31.36%
--------- -------------------- -------------------- -------------------- --------------------
lower 64.299 64.218 61.111 62.093
-7.671 -11.93% -8.443 -13.15% -6.356 -10.40% -7.320 -11.79%
-7.251 -11.28% -5.967 -9.29% -5.593 -9.15% -6.500 -10.47%
-7.901 -12.29% -8.447 -13.15% -6.268 -10.26% -7.304 -11.76%
-9.213 -14.33% -9.183 -14.30% -4.879 -7.98% -7.422 -11.95%
--------- -------------------- -------------------- -------------------- --------------------
word 35.618 37.086 104.661 105.706
0.198 +0.55% -1.206 -3.25% 1.497 +1.43% 2.618 +2.48%
0.614 +1.72% 0.263 +0.71% 1.618 +1.55% 2.099 +1.99%
0.692 +1.94% -0.403 -1.09% -2.975 -2.84% -3.099 -2.93%
-1.210 -3.40% -1.759 -4.74% -3.491 -3.34% -3.722 -3.52%
--------- -------------------- -------------------- -------------------- --------------------
punct 107.447 107.661 33.509 33.453
3.037 +2.83% 1.931 +1.79% 0.640 +1.91% 0.596 +1.78%
3.106 +2.89% 4.309 +4.00% 0.539 +1.61% 0.680 +2.03%
-0.588 -0.55% 3.730 +3.46% -1.138 -3.40% -1.046 -3.13%
1.013 +0.94% 2.857 +2.65% 1.679 +5.01% -1.142 -3.41%
--------- -------------------- -------------------- -------------------- --------------------
cntrl 25.770 25.718 1.246 1.229
0.115 +0.45% 0.150 +0.58% -0.068 -5.47% -0.063 -5.11%
0.031 +0.12% 0.112 +0.44% -0.087 -7.00% -0.057 -4.64%
-0.089 -0.35% -0.034 -0.13% -0.103 -8.30% -0.088 -7.16%
-0.410 -1.59% -0.334 -1.30% -0.047 -3.77% -0.043 -3.53%
--------- -------------------- -------------------- -------------------- --------------------
graph 23.703 23.595 69.221 70.017
0.306 +1.29% 0.245 +1.04% 0.592 +0.85% -0.146 -0.21%
1.838 +7.75% 0.641 +2.71% 0.517 +0.75% -0.316 -0.45%
4.503 +19.00% 4.599 +19.49% 13.219 +19.10% 15.108 +21.58%
6.628 +27.96% 4.209 +17.84% 12.004 +17.34% 11.160 +15.94%
--------- -------------------- -------------------- -------------------- --------------------
print 22.798 22.781 69.873 69.795
0.670 +2.94% 0.607 +2.67% 0.826 +1.18% 0.699 +1.00%
1.225 +5.37% 1.171 +5.14% 2.049 +2.93% 1.427 +2.04%
4.540 +19.91% 4.574 +20.08% 14.046 +20.10% 17.268 +24.74%
4.178 +18.33% 4.188 +18.38% 12.189 +17.44% 12.351 +17.70%
--------- -------------------- -------------------- -------------------- --------------------
space 141.314 144.661 1.130 1.125
0.331 +0.23% -3.395 -2.35% 0.011 +0.99% 0.011 +0.94%
2.535 +1.79% 0.202 +0.14% 0.029 +2.53% 0.029 +2.60%
-5.808 -4.11% -6.856 -4.74% -0.001 -0.11% 0.076 +6.79%
-6.470 -4.58% -9.847 -6.81% 0.010 +0.85% 0.005 +0.48%
--------- -------------------- -------------------- -------------------- --------------------
blank 26.706 26.740 0.183 0.181
0.147 +0.55% -0.009 -0.04% 0.003 +1.74% 0.004 +2.10%
1.461 +5.47% 0.091 +0.34% 0.006 +3.05% 0.007 +3.99%
3.021 +11.31% 0.591 +2.21% -0.002 -0.98% 0.000 +0.11%
-0.305 -1.14% -0.372 -1.39% -0.001 -0.33% 0.000 +0.22%
--------- -------------------- -------------------- -------------------- --------------------
ascii 22.202 22.140 4.722 4.751
0.489 +2.20% 0.601 +2.71% -0.493 -10.44% -0.436 -9.18%
0.625 +2.81% 0.597 +2.69% -0.397 -8.41% -0.436 -9.18%
0.348 +1.57% 1.043 +4.71% 0.287 +6.08% 0.249 +5.25%
-0.033 -0.15% 0.826 +3.73% 0.398 +8.42% 0.251 +5.29%
--------- -------------------- -------------------- -------------------- --------------------
nonascii 5.586 5.544 85.792 83.721
-0.392 -7.02% -0.405 -7.30% 5.600 +6.53% 1.420 +1.70%
-0.459 -8.21% 0.213 +3.84% 5.553 +6.47% 3.031 +3.62%
0.461 +8.25% -0.144 -2.59% 4.086 +4.76% 1.803 +2.15%
-0.368 -6.58% -0.296 -5.35% -0.947 -1.10% 1.088 +1.30%
--------- -------------------- -------------------- -------------------- --------------------
unibyte 22.166 22.172 5.299 5.403
0.545 +2.46% 0.533 +2.40% -1.041 -19.65% -1.140 -21.09%
1.187 +5.36% 0.843 +3.80% -1.068 -20.16% -1.182 -21.87%
0.429 +1.94% 0.385 +1.74% -1.043 -19.69% -1.163 -21.52%
0.237 +1.07% 0.063 +0.28% -0.915 -17.26% -1.025 -18.98%
--------- -------------------- -------------------- -------------------- --------------------
multibyte 6.031 5.571 83.834 85.391
-0.875 -14.50% -0.432 -7.75% 1.855 +2.21% -0.073 -0.09%
-0.902 -14.96% -0.440 -7.89% 7.195 +8.58% 1.665 +1.95%
-0.904 -14.99% -0.531 -9.53% 2.005 +2.39% 0.094 +0.11%
-0.786 -13.03% -0.336 -6.04% 0.692 +0.83% 1.607 +1.88%
--------- -------------------- -------------------- -------------------- --------------------
...all... 0.928 0.927 89.115 89.857
0.080 +8.60% 0.076 +8.22% -0.314 -0.35% 5.126 +5.70%
0.058 +6.23% 0.058 +6.30% -0.304 -0.34% 0.038 +0.04%
0.002 +0.19% 0.001 +0.11% -0.413 -0.46% -1.742 -1.94%
0.037 +3.97% 0.034 +3.64% 0.824 +0.92% -1.253 -1.39%
(The first line in each group are absolute results with Emacs before my
changes. The other groups show absolute and relative change to that
baseline (i.e. negative is good)).
Slow-downs in intermediate commits aren’t that big of an issue as long
as the last line shows an improvement (or at least negligible
regression) but sadly that is not always the case.
As can be seen, [[:graph:]] slows-down by almost 28% :( and I don’t
quite understand where all that can come from.
--
Best regards
ミハウ “𝓶𝓲𝓷𝓪86” ナザレヴイツ
«If at first you don’t succeed, give up skydiving»
next prev parent reply other threads:[~2016-12-20 14:32 UTC|newest]
Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-04 1:05 bug#24603: [RFC 00/18] Improvement to casing Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 01/18] Add tests for casefiddle.c Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 02/18] Generate upcase and downcase tables from Unicode data Michal Nazarewicz
2016-10-04 7:27 ` Eli Zaretskii
2016-10-04 14:54 ` Michal Nazarewicz
2016-10-04 15:06 ` Eli Zaretskii
2016-10-04 16:57 ` Michal Nazarewicz
2016-10-04 17:27 ` Eli Zaretskii
2016-10-04 17:44 ` Eli Zaretskii
2016-10-06 20:29 ` Michal Nazarewicz
2016-10-07 6:52 ` Eli Zaretskii
2016-10-04 1:10 ` bug#24603: [RFC 03/18] Don’t assume character can be either upper- or lower-case when casing Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 04/18] Split casify_object into multiple functions Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 05/18] Introduce case_character function Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 06/18] Add support for title-casing letters Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 07/18] Split up casify_region function Michal Nazarewicz
2016-10-04 7:17 ` Eli Zaretskii
2016-10-18 2:27 ` Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 08/18] Support casing characters which map into multiple code points Michal Nazarewicz
2016-10-04 7:38 ` Eli Zaretskii
2016-10-06 21:40 ` Michal Nazarewicz
2016-10-07 7:46 ` Eli Zaretskii
2017-01-28 23:48 ` Michal Nazarewicz
2017-02-10 9:12 ` Eli Zaretskii
2016-10-04 1:10 ` bug#24603: [RFC 09/18] Implement special sigma casing rule Michal Nazarewicz
2016-10-04 7:22 ` Eli Zaretskii
2016-10-04 1:10 ` bug#24603: [RFC 10/18] Implement Turkic dotless and dotted i handling when casing strings Michal Nazarewicz
2016-10-04 7:12 ` Eli Zaretskii
2016-10-04 1:10 ` bug#24603: [RFC 11/18] Implement casing rules for Lithuanian Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 12/18] Implement rules for title-casing Dutch ij ‘letter’ Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 13/18] Add some tricky Unicode characters to regex test Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 14/18] Factor out character category lookup to separate function Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 15/18] Base lower- and upper-case tests on Unicode properties Michal Nazarewicz
2016-10-04 6:54 ` Eli Zaretskii
2016-10-04 1:10 ` bug#24603: [RFC 16/18] Refactor character class checking; optimise ASCII case Michal Nazarewicz
2016-10-04 7:48 ` Eli Zaretskii
2016-10-17 13:22 ` Michal Nazarewicz
2016-11-06 19:26 ` Michal Nazarewicz
2016-11-06 19:44 ` Eli Zaretskii
2016-12-20 14:32 ` Michal Nazarewicz [this message]
2016-12-20 16:39 ` Eli Zaretskii
2016-12-22 14:02 ` Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 17/18] Optimise character class matching in regexes Michal Nazarewicz
2016-10-04 1:10 ` bug#24603: [RFC 18/18] Fix case-fold-search character class matching Michal Nazarewicz
2016-10-17 22:03 ` bug#24603: [PATCH 0/3] Case table updates Michal Nazarewicz
2016-10-17 22:03 ` bug#24603: [PATCH 1/3] Add tests for casefiddle.c Michal Nazarewicz
2016-10-17 22:03 ` bug#24603: [PATCH 2/3] Generate upcase and downcase tables from Unicode data Michal Nazarewicz
2016-10-17 22:03 ` bug#24603: [PATCH 3/3] Don’t generate ‘X maps to X’ entries in case tables Michal Nazarewicz
2016-10-18 6:36 ` bug#24603: [PATCH 0/3] Case table updates Eli Zaretskii
2016-10-24 15:11 ` Michal Nazarewicz
2016-10-24 15:33 ` Eli Zaretskii
2017-03-09 21:51 ` bug#24603: [PATCHv5 00/11] Casing improvements Michal Nazarewicz
2017-03-09 21:51 ` bug#24603: [PATCHv5 01/11] Split casify_object into multiple functions Michal Nazarewicz
2017-03-10 9:00 ` Andreas Schwab
2017-03-09 21:51 ` bug#24603: [PATCHv5 02/11] Introduce case_character function Michal Nazarewicz
2017-03-09 21:51 ` bug#24603: [PATCHv5 03/11] Add support for title-casing letters (bug#24603) Michal Nazarewicz
2017-03-11 9:03 ` Eli Zaretskii
2017-03-09 21:51 ` bug#24603: [PATCHv5 04/11] Split up casify_region function (bug#24603) Michal Nazarewicz
2017-03-09 21:51 ` bug#24603: [PATCHv5 05/11] Support casing characters which map into multiple code points (bug#24603) Michal Nazarewicz
2017-03-11 9:14 ` Eli Zaretskii
2017-03-21 2:09 ` Michal Nazarewicz
2017-03-09 21:51 ` bug#24603: [PATCHv5 06/11] Implement special sigma casing rule (bug#24603) Michal Nazarewicz
2017-03-09 21:51 ` bug#24603: [PATCHv5 07/11] Introduce ‘buffer-language’ buffer-locar variable Michal Nazarewicz
2017-03-11 9:29 ` Eli Zaretskii
2017-03-09 21:51 ` bug#24603: [PATCHv5 08/11] Implement rules for title-casing Dutch ij ‘letter’ (bug#24603) Michal Nazarewicz
2017-03-11 9:40 ` Eli Zaretskii
2017-03-16 21:30 ` Michal Nazarewicz
2017-03-17 13:43 ` Eli Zaretskii
2017-03-09 21:51 ` bug#24603: [PATCHv5 09/11] Implement Turkic dotless and dotted i casing rules (bug#24603) Michal Nazarewicz
2017-03-09 21:51 ` bug#24603: [PATCHv5 10/11] Implement casing rules for Lithuanian (bug#24603) Michal Nazarewicz
2017-03-09 21:51 ` bug#24603: [PATCHv5 11/11] Implement Irish casing rules (bug#24603) Michal Nazarewicz
2017-03-11 9:44 ` Eli Zaretskii
2017-03-16 22:16 ` Michal Nazarewicz
2017-03-17 8:20 ` Eli Zaretskii
2017-03-11 10:00 ` bug#24603: [PATCHv5 00/11] Casing improvements Eli Zaretskii
2017-03-21 1:27 ` bug#24603: [PATCHv6 0/6] Casing improvements, language-independent part Michal Nazarewicz
2017-03-21 1:27 ` bug#24603: [PATCHv6 1/6] Split casify_object into multiple functions Michal Nazarewicz
2017-03-21 1:27 ` bug#24603: [PATCHv6 2/6] Introduce case_character function Michal Nazarewicz
2017-03-21 1:27 ` bug#24603: [PATCHv6 3/6] Add support for title-casing letters (bug#24603) Michal Nazarewicz
2017-03-21 1:27 ` bug#24603: [PATCHv6 4/6] Split up casify_region function (bug#24603) Michal Nazarewicz
2017-03-21 1:27 ` bug#24603: [PATCHv6 5/6] Support casing characters which map into multiple code points (bug#24603) Michal Nazarewicz
2017-03-22 16:06 ` Eli Zaretskii
2017-04-03 9:01 ` Michal Nazarewicz
2017-04-03 14:52 ` Eli Zaretskii
2019-06-25 0:09 ` Lars Ingebrigtsen
2019-06-25 0:29 ` Michał Nazarewicz
2020-08-11 13:46 ` Lars Ingebrigtsen
2021-05-10 11:51 ` bug#24603: [RFC 00/18] Improvement to casing Lars Ingebrigtsen
2017-03-21 1:27 ` bug#24603: [PATCHv6 6/6] Implement special sigma casing rule (bug#24603) Michal Nazarewicz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xa1tinqellhg.fsf@mina86.com \
--to=mina86@mina86.com \
--cc=24603@debbugs.gnu.org \
--cc=eliz@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).