From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Michal Nazarewicz Newsgroups: gmane.emacs.bugs Subject: bug#24603: [RFC 16/18] Refactor character class checking; optimise ASCII case Date: Sun, 06 Nov 2016 20:26:11 +0100 Organization: http://mina86.com/ Message-ID: References: <1475543441-10493-1-git-send-email-mina86@mina86.com> <1475543441-10493-16-git-send-email-mina86@mina86.com> <837f9oo8q3.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1478460450 9304 195.159.176.226 (6 Nov 2016 19:27:30 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 6 Nov 2016 19:27:30 +0000 (UTC) User-Agent: Notmuch/0.19+53~g2e63a09 (http://notmuchmail.org) Emacs/26.0.50.2 (x86_64-unknown-linux-gnu) Cc: 24603@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sun Nov 06 20:27:26 2016 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c3T5y-0000M3-Fs for geb-bug-gnu-emacs@m.gmane.org; Sun, 06 Nov 2016 20:27:10 +0100 Original-Received: from localhost ([::1]:46734 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c3T61-0007so-E5 for geb-bug-gnu-emacs@m.gmane.org; Sun, 06 Nov 2016 14:27:13 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39410) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c3T5v-0007sW-2m for bug-gnu-emacs@gnu.org; Sun, 06 Nov 2016 14:27:08 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c3T5q-0001hP-3R for bug-gnu-emacs@gnu.org; Sun, 06 Nov 2016 14:27:07 -0500 Original-Received: from debbugs.gnu.org ([208.118.235.43]:59975) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1c3T5p-0001hD-Vd for bug-gnu-emacs@gnu.org; Sun, 06 Nov 2016 14:27:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1c3T5p-00040Y-N2 for bug-gnu-emacs@gnu.org; Sun, 06 Nov 2016 14:27:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Michal Nazarewicz Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 06 Nov 2016 19:27:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24603 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 24603-submit@debbugs.gnu.org id=B24603.147846038415357 (code B ref 24603); Sun, 06 Nov 2016 19:27:01 +0000 Original-Received: (at 24603) by debbugs.gnu.org; 6 Nov 2016 19:26:24 +0000 Original-Received: from localhost ([127.0.0.1]:47141 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c3T5D-0003zc-J8 for submit@debbugs.gnu.org; Sun, 06 Nov 2016 14:26:23 -0500 Original-Received: from mail-wm0-f45.google.com ([74.125.82.45]:37872) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c3T5B-0003zP-BM for 24603@debbugs.gnu.org; Sun, 06 Nov 2016 14:26:21 -0500 Original-Received: by mail-wm0-f45.google.com with SMTP id t79so140561273wmt.0 for <24603@debbugs.gnu.org>; Sun, 06 Nov 2016 11:26:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:from:to:cc:subject:in-reply-to:organization:references :user-agent:face:date:message-id:mime-version :content-transfer-encoding; bh=EYEj0GHVp8qxJAQLEKFJinzZW9EsqlmHvbnjQgqAy7Q=; b=li3uIaoZUvOTQ/og/5Q8UVuxoZ7KnTNJztsM6P/q9BuJxuVX1qnOQgRWa/0cGxqI75 oJ93QHv8+sp3VbEPrrvD8XXvbiEUsXcu9QiaoL3M6+t+etRs0qZzYyGDb3UU/oiZIqk1 f/BZZlhGP2AWLKUdCKmRGmPD0MGY7d4gBSMznv1yzWbIx4/66kPu4I4n5J0OANFReMcx PTVa+CKOUNoaG0yNWX+xIOxIf5p83BgW/wvCXWw3kJWbAaYjawaTvGREAB89DhnE8A6C ERx3xcnWF23hQ0J0Nl4Csnz0b/7I/vbX85X+rHFstmIpfvg+W5QQVRWFJ95xX5CkHKsK jQgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:in-reply-to :organization:references:user-agent:face:date:message-id :mime-version:content-transfer-encoding; bh=EYEj0GHVp8qxJAQLEKFJinzZW9EsqlmHvbnjQgqAy7Q=; b=PAGlHZy/gb8ahYkZ6tzyAJ1IUCA2BN4br2G+Ym/lV1KZh7Fg5v4cJtbLH382qrypMQ Ba+akVpL99LonnFHMrbm77sc2L0hiSlf/N0JKRgke0ExBE5uVLATHO+azFfxaf45YUaY 3PwUFlt4jB8vCKOFdaUxvtkWvkv63xMWk2Gcfh1x+oQayPt5vCH/IP8sTcpRp8l3926L vtA7Y4ougO3D28s02v+xjh5sspQ+chq49O0NHD3bkLKYYq+aeBmdKl26d1bJMj7mrALK Nq+lJsCiANBLwWXBhtm8jbgLb3AoL+7A6Adw1oTSs0V7IrM5msZbNM8SJg65gO1aVMOE RjNg== X-Gm-Message-State: ABUngvd6W1/WlgyyTvHYMRonSgWPHLcvu4bkxP28u1wbmz+zoKf/rWixXNGY0ljigP+gL0nS X-Received: by 10.194.178.200 with SMTP id da8mr2667815wjc.157.1478460375206; Sun, 06 Nov 2016 11:26:15 -0800 (PST) Original-Received: from mpn-glaptop (77-58-148-20.dclient.hispeed.ch. [77.58.148.20]) by smtp.gmail.com with ESMTPSA id 135sm9194519wmq.8.2016.11.06.11.26.12 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Sun, 06 Nov 2016 11:26:13 -0800 (PST) In-Reply-To: <837f9oo8q3.fsf@gnu.org> Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWbfGlUPDDHgE57V0jUupKjgIObY0PLrom9mH4dFRK4gmjPs41MxjOgAAACP0lEQVQ4T23Sv2vbQBQHcBk1xE6WyALX107VUEgmn6+ouUwpEQQ6uRjttkWP4CkBg2M0BQLBdPFZYPsyFYo7qEtKDQ7on+t7+nF2Ux8ahD587717OmNYrOvycHsZ+o2r051wHTHysAvGb8ygvgu4QWT0sCmkgZCIEnlV2X8BtyraazFGDuxhmKSQJMlwHQ7v5MHSNxmz78rfElwAa3ieVD9e+hBhjaPDDG6NgFo2f4wBMNIo5YmRtF0RyDgFjJjlMIWbnuM4x9MMfABGTlN4qgIQB4A1DEyA1BHWtfeWNUMwiVJKoqh97KrkOO+qzgluVYLvFCUKAX73nONeBr7BGMdM6Sg0kuep03VywLaIzRiVr+GAzKlpQIsAFnWAG2e6DT5WmWDiudZMIc6hYrMOmeMQK9WX0B+/RfjzL9DI7Y9/Iayn29Ci0r2i4f9gMimMSZLCDMalgQGU5hnUtqAN0OGvEmO1Wnl0C0wWSCEHnuHBqmygxdxA8oWXwbipoc1EoNR9DqOpBpOJrnr0criQab9ZT4LL+wI+K7GBQH30CrhUruilgP9DRTrhVWZCiAyILP+wiuLeCKGTD6r/nc8LOJcAwR6IBTUs+7CASw3QFZ0MdA2PI3zNziH4ZKVhXCRMBjeZ1DWMekKwDCASwExy+NQ86TaykaDAFHO4aP48y4 fIcDM5yOG8GcTLbOyp8A8azjJI93JFd1EA6yN8sSxMQJWoABqniRZVykYgRXErzrdqExAoUrRb0xfRp8p2A/4XmfilTtkDZ4cAAAAASUVORK5CYII= X-Face: -TR8(rDTHy/(xl?SfWd1|3:TTgDIatE^t'vop%*gVg[kn$t{EpK(P"VQ=~T2#ysNmJKN$"yTRLB4YQs$4{[.]Fc1)*O]3+XO^oXM>Q#b^ix, O)Zbn)q[y06$`e3?C)`CwR9y5riE=fv^X@x$y?D:XO6L&x4f-}}I4=VRNwiA^t1-ZrVK^07.Pi/57c_du'& X-PGP: 50751FF4 X-PGP-FP: AC1F 5F5C D418 88F8 CC84 5858 2060 4012 5075 1FF4 X-Hashcash: 1:20:161106:24603@debbugs.gnu.org::S0yLF2rBWhMrNqKM:00000000000000000000000000000000000000002A9i X-Hashcash: 1:20:161106:eliz@gnu.org::O7vmhY/j4/c/veI1:000001+bX X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:125392 Archived-At: On Tue, Oct 04 2016, Eli Zaretskii wrote: > Thanks. I think this change will require a benchmark to make sure we > don't lose too much in terms of performance. Benchmark and its results included below. It=E2=80=99s a bit noisy and as all benchmarks of that kind it doesn=E2=80= =99t really measure the real usage, but I think it=E2=80=99s safe to say that things ar= en=E2=80=99t getting worse. ---- >8 -------------------------------------------------------------------= ----- Class [[:cc:]] no-case [^[:cc:]] no-case=20=20 --------- --------- --------- --------- --------- =3D=3D=3D=3D Add regex character class matching benchmark =3D=3D=3D=3D alnum 59.870 60.148 63.548 64.048 alpha 60.355 60.137 63.333 62.684 digit 27.835 27.648 0.513 0.488 xdigit 27.160 27.320 0.969 0.883 upper 91.027 91.572 39.423 39.595 lower 60.591 61.307 60.332 59.730 word 36.201 36.046 108.118 109.396 punct 110.987 111.683 35.110 35.200 cntrl 27.005 26.756 1.212 1.176 graph 25.694 26.097 75.872 75.711 print 24.783 24.976 76.652 74.921 space 147.210 148.431 1.261 1.252 blank 27.602 27.722 0.373 0.189 ascii 23.243 23.302 4.550 4.486 nonascii 5.448 5.407 90.733 90.410 unibyte 22.986 23.342 4.559 4.655 multibyte 5.508 5.535 92.457 91.163 ...all... 1.138 1.030 93.275 93.383 =3D=3D=3D=3D Refactor character class checking; optimise ASCII case =3D=3D= =3D=3D alnum 54.643 54.301 56.668 56.898 alpha 54.654 54.558 56.134 56.281 digit 26.103 26.044 0.495 0.443 xdigit 25.606 25.690 0.815 0.806 upper 83.269 83.306 36.704 36.487 lower 56.278 55.804 54.872 54.917 word 34.820 55.092 99.577 100.618 punct 103.410 103.465 31.673 31.590 cntrl 25.509 25.274 1.119 1.101 graph 23.593 23.673 69.335 69.481 print 23.003 23.123 69.962 70.132 space 132.224 132.458 1.143 1.120 blank 26.223 26.342 0.193 0.187 ascii 22.329 22.257 4.094 4.082 nonascii 4.910 4.897 84.633 84.515 unibyte 22.866 22.385 4.094 4.078 multibyte 4.913 4.886 95.385 85.341 ...all... 0.942 0.936 88.979 88.744 =3D=3D=3D=3D Optimise character class matching in regexes =3D=3D=3D=3D alnum 53.338 53.052 56.571 56.434 alpha 53.591 53.350 56.218 56.255 digit 26.266 26.502 0.438 0.438 xdigit 25.793 25.887 0.877 0.876 upper 82.539 82.700 31.994 32.200 lower 55.280 55.040 54.615 54.429 word 33.666 33.530 100.678 101.721 punct 101.714 101.715 31.766 31.620 cntrl 25.669 25.068 1.113 1.114 graph 27.848 28.067 81.669 81.619 print 27.128 28.297 82.326 82.306 space 131.847 132.242 1.124 1.128 blank 26.493 26.607 0.190 0.188 ascii 22.332 22.315 4.379 4.358 nonascii 5.169 5.159 84.872 85.488 unibyte 22.259 22.529 4.374 4.361 multibyte 5.193 5.181 86.421 86.568 ...all... 0.945 0.939 92.903 93.209 =3D=3D=3D=3D Fix case-fold-search character class matching =3D=3D=3D=3D alnum 53.553 53.527 56.918 56.886 alpha 53.657 53.758 56.541 57.107 digit 26.616 26.641 0.467 0.510 xdigit 27.255 26.271 0.894 0.923 upper 56.608 55.073 55.792 55.422 lower 55.419 55.330 55.486 55.018 word 35.537 35.434 103.414 103.516 punct 105.810 106.618 33.454 33.322 cntrl 25.875 26.020 1.274 1.271 graph 28.011 28.185 82.239 82.245 print 26.935 27.016 99.945 83.213 space 136.774 138.135 1.170 1.159 blank 26.984 26.976 0.192 0.204 ascii 22.365 22.661 4.652 4.652 nonascii 5.759 5.524 85.805 86.403 unibyte 22.568 22.375 4.995 4.909 multibyte 5.729 5.749 84.671 84.396 ...all... 0.990 0.978 89.520 89.612 All times in ms; lower is better. ---- >8 -------------------------------------------------------------------= ----- >From 23d8fe0b093730406b64e0e20207c2fb929f707f Mon Sep 17 00:00:00 2001 From: Michal Nazarewicz Date: Fri, 7 Oct 2016 02:44:30 +0200 Subject: [PATCH] Add regex character class matching benchmark * test/src/regex-tests.el (regex-tests-benchmark-cc-match): New function running character class matching benchmark. --- test/src/regex-tests.el | 59 +++++++++++++++++++++++++++++++++++++++++++++= ++++ 1 file changed, 59 insertions(+) diff --git a/test/src/regex-tests.el b/test/src/regex-tests.el index fc50344..d0aad97 100644 --- a/test/src/regex-tests.el +++ b/test/src/regex-tests.el @@ -98,6 +98,65 @@ regex--test-cc (eval `(ert-deftest ,name () ,doc ,(cons 'regex--test-cc test)) t))) =20 =20 +(defun regex-tests-benchmark-cc-match () + "Benchmark regex character class matching." + (interactive) + (let* ((prn (if (called-interactively-p) + 'insert + (lambda (&rest args) (mapc 'princ args)))) + (strings + (nconc (list + (apply 'string (number-sequence 32 126)) + (apply 'string (number-sequence 0 127)) + (apply 'unibyte-string (number-sequence 128 255)) + (concat (apply 'string (number-sequence 0 255)) + (apply 'unibyte-string (number-sequence 128 255)= )) + (make-string 10000 #x3FFF80) + (make-string 10000 #x3FFFFF)) + (mapcar (lambda (ch) (make-string 10000 ch)) + (number-sequence 0 256)))) + + (ccs '("alnum" "alpha" "digit" "xdigit" "upper" "lower" + "word" "punct" "cntrl" "graph" "print" "space" "blank" + "ascii" "nonascii" "unibyte" "multibyte")) + + (benchmark-re + (lambda (re) + (dolist (cf '(nil t)) + ;; Compile the regex so it ends up in cache. + (string-match re "") + (let ((res (benchmark-run 10 + (dolist (str strings) (string-match re str))))) + (funcall prn (format " %10.3f" + (* (- (nth 0 res) (nth 2 res)) 100)))= ))))) + + (when (called-interactively-p) + (switch-to-buffer (get-buffer-create "*Regex Benchmark*")) + (delete-region (point-min) (point-max))) + + (funcall prn (format "%-9s %-9s %-9s %-9s %-9s\n" + "Class" "[[:cc:]]" "no-case" + "[^[:cc:]]" "no-case") + (make-string 9 ?-) + " " (make-string 9 ?-) " " (make-string 9 ?-) + " " (make-string 9 ?-) " " (make-string 9 ?-) "\n") + + (dolist (cc ccs) + (funcall prn (format "%-9s" cc)) + (dolist (re (list (format "[[:%s:]]" cc) + (format "[^[:%s:]]" cc))) + (funcall benchmark-re re)) + (funcall prn "\n")) + + (funcall prn (format "%-9s" "...all...")) + (let ((all-ccs (mapconcat (lambda (cc) (format "[:%s:]" cc)) ccs ""))) + (funcall benchmark-re (concat "[" all-ccs "]")) + (funcall benchmark-re (concat "[^" all-ccs "]"))) + + (funcall prn "\n" (make-string 53 ?-) + "\nAll times in ms; lower is better.\n"))) + + (defmacro regex-tests-generic-line (comment-char test-file whitelist &rest= body) "Reads a line of the test file TEST-FILE, skipping comments (defined by COMMENT-CHAR), and evaluates the tests in --=20 2.8.0.rc3.226.g39d4020