From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Konstantin Kharlamov Newsgroups: gmane.emacs.devel Subject: Re: Using __builtin_expect (likely/unlikely macros) Date: Wed, 17 Apr 2019 00:27:50 +0300 Message-ID: <1555450070.23658.4@yandex.ru> References: <87a7gst973.fsf@gmail.com> <875zrgt12q.fsf@gmail.com> <6919a4c8-df76-ea1e-34db-1fa62a360e5a@cs.ucla.edu> <87h8aykdod.fsf@gmail.com> <4fa7885e-8c66-c7c4-ff71-a013505863af@cs.ucla.edu> <2dfb837d-989d-c736-b6e6-b20c0e940596@cs.ucla.edu> <87o956c4n4.fsf@gmail.com> <1fbd2fca-18f0-0a90-7a45-58419a9e11ee@cs.ucla.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=koi8-r; format=flowed Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="17166"; mail-complaints-to="usenet@blaine.gmane.org" Cc: Paul Eggert , Stefan Monnier , emacs-devel@gnu.org To: Alex Gramiak Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Tue Apr 16 23:29:09 2019 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1hGVdc-0004MR-Kg for ged-emacs-devel@m.gmane.org; Tue, 16 Apr 2019 23:29:08 +0200 Original-Received: from localhost ([127.0.0.1]:43072 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hGVdb-0005oO-Kh for ged-emacs-devel@m.gmane.org; Tue, 16 Apr 2019 17:29:07 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:35912) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hGVcU-0005Ba-TL for emacs-devel@gnu.org; Tue, 16 Apr 2019 17:28:00 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hGVcT-0000iN-Dh for emacs-devel@gnu.org; Tue, 16 Apr 2019 17:27:58 -0400 Original-Received: from forward105o.mail.yandex.net ([2a02:6b8:0:1a2d::608]:57509) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hGVcS-0000gH-JW for emacs-devel@gnu.org; Tue, 16 Apr 2019 17:27:57 -0400 Original-Received: from mxback9g.mail.yandex.net (mxback9g.mail.yandex.net [IPv6:2a02:6b8:0:1472:2741:0:8b7:170]) by forward105o.mail.yandex.net (Yandex) with ESMTP id 4FE9942004FC; Wed, 17 Apr 2019 00:27:53 +0300 (MSK) Original-Received: from smtp1j.mail.yandex.net (smtp1j.mail.yandex.net [2a02:6b8:0:801::ab]) by mxback9g.mail.yandex.net (nwsmtp/Yandex) with ESMTP id 4virB2QIMw-Rq5ekCeA; Wed, 17 Apr 2019 00:27:53 +0300 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1555450073; bh=GgQScvJBTP1cev9DW2c5OHKEubcN61QoxpS3l0Zad1w=; h=In-Reply-To:Cc:To:Subject:From:References:Date:Message-Id; b=be/8jIfsd3hkAAUZJlLHMxRpMZMCRyv4seeCuHiBRMrqLXzrAalafAwsqDvp0F7Ck 1HL8Bqc0pDt2NmXtE66PyPhHsB9t3E8i0Qr8wV4MSis2eKT9U605DxUxpL+ghs6Dh9 G4otsooCYu6rYcKVQvrdHRYJQZ0C6EaTe/tsSPKI= Authentication-Results: mxback9g.mail.yandex.net; dkim=pass header.i=@yandex.ru Original-Received: by smtp1j.mail.yandex.net (nwsmtp/Yandex) with ESMTPSA id Bv5Obpz5UX-RpWSdgoL; Wed, 17 Apr 2019 00:27:51 +0300 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client certificate not present) In-Reply-To: <87sguhbrof.fsf@gmail.com> X-Mailer: geary/master~g59ef85ca X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2a02:6b8:0:1a2d::608 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:235555 Archived-At: FWIW I was in a similar search not so long ago, and I was told that=20 e.g. "cold" attribute can sometimes produce unbearably slow code=20 https://gcc.gnu.org/ml/gcc-help/2019-01/msg00035.html =F7 =F7=D4, =C1=D0=D2 16, 2019 at 14:50, Alex Gramiak =20 =CE=C1=D0=C9=D3=C1=CC: > Paul Eggert writes: >=20 >> That being said, it might make sense for a few=20 >> obviously-rarely-called >> functions like 'emacs-abort' to be marked with __attribute__=20 >> ((cold)), >> so long as we don't turn this into a mission to mark all cold=20 >> functions >> (which would cost us more than it would benefit). That is what GCC >> itself does, with its own functions. However, I'd like to see >> performance figures. Could you try it out on the benchmark of 'cd=20 >> lisp >> && time make compile-always'? >=20 > Right, I agree that if used, they should be used sparingly. I tested > three versions a few times each with both 'make' and 'make -j4': >=20 > a) Regular Emacs master. > b) The below diff with only the _Cold attribute > c) The below diff with both _Cold and _Hot attributes >=20 > a) Normal > real 4:17.97s > user 3:57.18s > sys 20.394s >=20 > real 1:17.67s > user 4:23.78s > sys 18.888s >=20 > b) Cold > real 4:10.92s > user 3:50.34s > sys 20.178s >=20 > real 1:15.77s > user 4:16.73s > sys 18.943s >=20 > c) Hot/Cold > real 4:11.43s > user 3:51.07s > sys 19.961s >=20 > real 1:16.01s > user 4:17.63s > sys 18.662s >=20 > So not much of a difference. For some reason the Hot/Cold performed > consistently worse than Cold. >=20 > I also tested startup/shutdown with perf: >=20 > Performance counter stats for '../emacs-normal -f kill-emacs' (20=20 > runs): >=20 > 762.17 msec task-clock:u # 0.844 CPUs=20 > utilized ( +- 0.23% ) > 0 context-switches:u # 0.000 K/sec > 0 cpu-migrations:u # 0.000 K/sec > 12,941 page-faults:u # 0.017 M/sec =20 > ( +- 0.01% ) > 2,998,322,125 cycles:u # 3.934 GHz =20 > ( +- 0.06% ) > 1,392,869,413 stalled-cycles-frontend:u # 46.45% frontend=20 > cycles idle ( +- 0.15% ) > 982,206,843 stalled-cycles-backend:u # 32.76% backend=20 > cycles idle ( +- 0.18% ) > 4,874,186,825 instructions:u # 1.63 insn per=20 > cycle > # 0.29 stalled=20 > cycles per insn ( +- 0.01% ) > 1,037,929,374 branches:u # 1361.802 M/sec =20 > ( +- 0.01% ) > 17,930,471 branch-misses:u # 1.73% of all=20 > branches ( +- 0.16% ) > 1,209,539,215 L1-dcache-loads:u # 1586.960 M/sec =20 > ( +- 0.01% ) > 42,346,229 L1-dcache-load-misses:u # 3.50% of all=20 > L1-dcache hits ( +- 0.05% ) > 9,088,647 LLC-loads:u # 11.925 M/sec =20 > ( +- 0.29% ) > LLC-load-misses:u >=20 > 0.90325 +- 0.00441 seconds time elapsed ( +- 0.49% ) >=20 >=20 >=20 > Performance counter stats for '../emacs.cold -f kill-emacs' (20=20 > runs): >=20 > 755.94 msec task-clock:u # 0.845 CPUs=20 > utilized ( +- 0.24% ) > 0 context-switches:u # 0.000 K/sec > 0 cpu-migrations:u # 0.000 K/sec > 12,941 page-faults:u # 0.017 M/sec =20 > ( +- 0.01% ) > 2,976,036,365 cycles:u # 3.937 GHz =20 > ( +- 0.06% ) > 1,374,451,779 stalled-cycles-frontend:u # 46.18% frontend=20 > cycles idle ( +- 0.14% ) > 990,227,732 stalled-cycles-backend:u # 33.27% backend=20 > cycles idle ( +- 0.18% ) > 4,878,661,927 instructions:u # 1.64 insn per=20 > cycle > # 0.28 stalled=20 > cycles per insn ( +- 0.00% ) > 1,038,495,525 branches:u # 1373.782 M/sec =20 > ( +- 0.00% ) > 17,859,906 branch-misses:u # 1.72% of all=20 > branches ( +- 0.16% ) > 1,209,345,531 L1-dcache-loads:u # 1599.792 M/sec =20 > ( +- 0.00% ) > 42,444,358 L1-dcache-load-misses:u # 3.51% of all=20 > L1-dcache hits ( +- 0.06% ) > 9,204,368 LLC-loads:u # 12.176 M/sec =20 > ( +- 0.41% ) > LLC-load-misses:u >=20 > 0.89430 +- 0.00217 seconds time elapsed ( +- 0.24% ) >=20 >=20 > Performance counter stats for '../emacs.hot-cold -f kill-emacs' (20=20 > runs): >=20 > 761.97 msec task-clock:u # 0.845 CPUs=20 > utilized ( +- 0.20% ) > 0 context-switches:u # 0.000 K/sec > 0 cpu-migrations:u # 0.000 K/sec > 12,947 page-faults:u # 0.017 M/sec =20 > ( +- 0.01% ) > 2,989,750,359 cycles:u # 3.924 GHz =20 > ( +- 0.04% ) > 1,383,312,275 stalled-cycles-frontend:u # 46.27% frontend=20 > cycles idle ( +- 0.12% ) > 994,643,853 stalled-cycles-backend:u # 33.27% backend=20 > cycles idle ( +- 0.13% ) > 4,879,318,990 instructions:u # 1.63 insn per=20 > cycle > # 0.28 stalled=20 > cycles per insn ( +- 0.00% ) > 1,038,584,045 branches:u # 1363.022 M/sec =20 > ( +- 0.00% ) > 17,863,736 branch-misses:u # 1.72% of all=20 > branches ( +- 0.13% ) > 1,209,327,347 L1-dcache-loads:u # 1587.103 M/sec =20 > ( +- 0.00% ) > 42,501,374 L1-dcache-load-misses:u # 3.51% of all=20 > L1-dcache hits ( +- 0.05% ) > 9,201,311 LLC-loads:u # 12.076 M/sec =20 > ( +- 0.28% ) > LLC-load-misses:u >=20 > 0.90132 +- 0.00201 seconds time elapsed ( +- 0.22% ) >=20 >=20 > Which again shows a slight improvement with the Cold attributes, and > still shows the hot attributes degrading performance. Perhaps I was=20 > too > overzealous with the hot tagging? >=20 =