From mboxrd@z Thu Jan 1 00:00:00 1970 From: Efraim Flashner Subject: [nmeyerha@amzn.com: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1] Date: Sat, 11 Apr 2020 22:21:02 +0300 Message-ID: <20200411192102.GA2191@E5400> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="VrqPEDrXMn8OVzN4" Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:41204) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jNLhg-0007JO-MB for guix-devel@gnu.org; Sat, 11 Apr 2020 15:22:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jNLhe-0000Dc-Tw for guix-devel@gnu.org; Sat, 11 Apr 2020 15:22:08 -0400 Received: from flashner.co.il ([178.62.234.194]:44942) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1jNLhe-0008Uw-I2 for guix-devel@gnu.org; Sat, 11 Apr 2020 15:22:06 -0400 Received: from localhost (unknown [141.226.9.17]) by flashner.co.il (Postfix) with ESMTPSA id 6C9154034D for ; Sat, 11 Apr 2020 19:21:34 +0000 (UTC) Content-Disposition: inline List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane-mx.org@gnu.org Sender: "Guix-devel" To: guix-devel@gnu.org --VrqPEDrXMn8OVzN4 Content-Type: multipart/mixed; boundary="AqsLC8rIMeq19msA" Content-Disposition: inline --AqsLC8rIMeq19msA Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable A copy of a bug report sent to debian-arm. I'll try to keep an eye on it to see if this is something we want to do for our aarch64 builds. I don't have any armv-8.1 boards to test this out on. --=20 Efraim Flashner =D7=90=D7=A4=D7=A8=D7=99=D7=9D = =D7=A4=D7=9C=D7=A9=D7=A0=D7=A8 GPG key =3D A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351 Confidentiality cannot be guaranteed on emails sent or received unencrypted --AqsLC8rIMeq19msA Content-Type: message/rfc822 Content-Disposition: inline Return-Path: Delivered-To: efraim@flashner.co.il Received: from flashner.co.il by do1 with LMTP id t3oMKWHVkF4bNwAAymwEiA (envelope-from ) for ; Fri, 10 Apr 2020 20:21:53 +0000 Received: from bendel.debian.org (bendel.debian.org [82.195.75.100]) by flashner.co.il (Postfix) with ESMTPS id 96D26402FD for ; Fri, 10 Apr 2020 20:21:53 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by bendel.debian.org (Postfix) with QMQP id 11A672057A; Fri, 10 Apr 2020 20:21:22 +0000 (UTC) X-Mailbox-Line: From debian-arm-request@lists.debian.org Fri Apr 10 20:21:22 2020 Old-Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on bendel.debian.org X-Spam-Level: X-Spam-Status: No, score=0.4 required=4.0 tests=FOURLA, HEADER_FROM_DIFFERENT_DOMAINS,MURPHY_DRUGS_REL8,RCVD_IN_DNSWL_NONE autolearn=no autolearn_force=no version=3.4.2 X-Original-To: lists-debian-arm@bendel.debian.org Delivered-To: lists-debian-arm@bendel.debian.org Received: from localhost (localhost [127.0.0.1]) by bendel.debian.org (Postfix) with ESMTP id CF1DB2057C for ; Fri, 10 Apr 2020 20:21:13 +0000 (UTC) X-Virus-Scanned: at lists.debian.org with policy bank en-lt X-Amavis-Spam-Status: No, score=-1.631 tagged_above=-10000 required=5.3 tests=[BAYES_00=-2, FOURLA=0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.249, MURPHY_DRUGS_REL8=0.02, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=no autolearn_force=no Received: from bendel.debian.org ([127.0.0.1]) by localhost (lists.debian.org [127.0.0.1]) (amavisd-new, port 2525) with ESMTP id U60zapgGh5d5 for ; Fri, 10 Apr 2020 20:21:08 +0000 (UTC) Received: from buxtehude.debian.org (buxtehude.debian.org [IPv6:2607:f8f0:614:1::1274:39]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "buxtehude.debian.org", Issuer "Debian SMTP CA" (not verified)) by bendel.debian.org (Postfix) with ESMTPS id DAD0220574; Fri, 10 Apr 2020 20:21:07 +0000 (UTC) Received: from debbugs by buxtehude.debian.org with local (Exim 4.92) (envelope-from ) id 1jN09A-0005ep-0C; Fri, 10 Apr 2020 20:21:04 +0000 X-Loop: owner@bugs.debian.org Subject: Bug#956418: src:glibc: Please provide optimized builds for ARMv8.1 Reply-To: Noah Meyerhans , 956418@bugs.debian.org Resent-From: Noah Meyerhans Resent-To: debian-bugs-dist@lists.debian.org Resent-CC: debian-arm@lists.debian.org, GNU Libc Maintainers X-Loop: owner@bugs.debian.org Resent-Date: Fri, 10 Apr 2020 20:21:02 +0000 Resent-Message-ID: X-Debian-PR-Message: report 956418 X-Debian-PR-Package: src:glibc X-Debian-PR-Keywords: X-Debian-PR-Source: glibc Received: via spool by submit@bugs.debian.org id=B.158654993420946 (code B); Fri, 10 Apr 2020 20:21:02 +0000 Received: (at submit) by bugs.debian.org; 10 Apr 2020 20:18:54 +0000 Received: from smtp-fw-6001.amazon.com ([52.95.48.154]:17262) by buxtehude.debian.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jN073-0005Mg-R2 for submit@bugs.debian.org; Fri, 10 Apr 2020 20:18:54 +0000 IronPort-SDR: CsEBeqOGoUt7POPMQPBaAWikdkcEMdBtVdHD8egMikPWOcAwDLpERU4i4VhzV84anv6sGHWI26 zi9SmzCJvVtQ== X-Amazon-filename: a.c X-IronPort-AV: E=Sophos;i="5.72,368,1580774400"; d="c'?scan'208";a="26471859" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-1d-5dd976cd.us-east-1.amazon.com) ([10.43.8.6]) by smtp-border-fw-out-6001.iad6.amazon.com with ESMTP; 10 Apr 2020 20:18:38 +0000 Received: from EX13MTAUEE002.ant.amazon.com (iad55-ws-svc-p15-lb9-vlan3.iad.amazon.com [10.40.159.166]) by email-inbound-relay-1d-5dd976cd.us-east-1.amazon.com (Postfix) with ESMTPS id 95A70A1C6C; Fri, 10 Apr 2020 20:18:37 +0000 (UTC) Received: from EX13D03UEE001.ant.amazon.com (10.43.62.140) by EX13MTAUEE002.ant.amazon.com (10.43.62.24) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 10 Apr 2020 20:18:36 +0000 Received: from EX13MTAUEE002.ant.amazon.com (10.43.62.24) by EX13D03UEE001.ant.amazon.com (10.43.62.140) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 10 Apr 2020 20:18:36 +0000 Received: from u310cf36631d552.ant.amazon.com (10.119.88.48) by mail-relay.amazon.com (10.43.62.224) with Microsoft SMTP Server (TLS) id 15.0.1497.2 via Frontend Transport; Fri, 10 Apr 2020 20:18:36 +0000 Received: from nmeyerha by u310cf36631d552.ant.amazon.com with local (Exim 4.90_1) (envelope-from ) id 1jN04l-0006Se-Ng; Fri, 10 Apr 2020 13:16:31 -0700 Date: Fri, 10 Apr 2020 13:16:31 -0700 From: Noah Meyerhans To: Debian Bug Tracking System Message-ID: <20200410201631.GA23377@amazon.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="BXVAT5kNtrzKuDFl" Content-Disposition: inline User-Agent: Mutt/1.9.4 (2018-02-28) Delivered-To: submit@bugs.debian.org X-Rc-Virus: 2007-09-13_01 X-Rc-Spam: 2008-11-04_01 X-Mailing-List: archive/latest/22571 X-Loop: debian-arm@lists.debian.org List-Id: List-URL: List-Post: List-Help: List-Subscribe: List-Unsubscribe: Precedence: list Resent-Sender: debian-arm-request@lists.debian.org List-Archive: https://lists.debian.org/msgid-search/20200410201631.GA23377@amazon.com X-TUID: 3SQ0Qw5s1KI8 --BXVAT5kNtrzKuDFl Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline Package: src:glibc Version: 2.30-4 Severity: wishlist X-Debbugs-CC: debian-arm@lists.debian.org The ARMv8.1 spec, as implemented by the ARM Neoverse N1 processor, introduces a set of instructions [1] that result in significant performance improvements for multithreaded applications. Sample code demonstrating the performance improvements is attached. When run on a 16-core Neoverse N1 host with glibc 2.30-4, runtimes vary significantly, ranging from lows around 250ms to highs around 15 seconds. When linked against glibc rebuilt with support for these instructions, runtimes are consistently <50ms. Significant performance impact has also been observed in less contrived cases (MariaDB and Postgres), but I don't have a repro to share. Gcc provides two ways to enable support for these instructions at build time. The simplest, and least disruptive, is to enable -moutline-atomics globally in the arm64 glibc build. As described at [2], this option enables runtime checks for the availability of the atomic instructions. If found, they are used, otherwise ARMv8.0 compatible code is used. The drawback of this option is that the check happens at runtime, thus introducing some overhead on all arm64 installations. The second option is to provide libraries built with explicit support for the ARM v8.1a spec via the -march=armv8.1-a flag. This option is also described at [2]. This build would be incompatible with earlier versions of the spec, so it would need to be provided in a location where the linker will automatically discover it if it is usable (e.g. /lib/aarch64-linux-gnu/atomics/). This does not incur any runtime overhead, but obviously involves an additional libc build, and the corresponding complixity and disk space utilization. I'm not sure if this is an option that the glibc maintainers are interested in pursuing. I've tested both options and found them to be acceptable on v8.1a (Neoverse N1) and v8a (Cortex A72) CPUs. I can provide bulk test run data of the various different configuration permutations if you'd like to see additional data. I can provide patches or merge requests implementing either option, at least for a starting point, if you'd like to see them. Thanks! noah 1. https://static.docs.arm.com/ddi0557/a/DDI0557A_b_armv8_1_supplement.pdf Section B1 2. https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html --BXVAT5kNtrzKuDFl Content-Type: text/x-csrc; charset="us-ascii" Content-Disposition: attachment; filename="a.c" /* * Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. * * Licensed under the Apache License, Version 2.0 (the "License"). You may * not use this file except in compliance with the License. A copy of the * License is located at * * http://aws.amazon.com/apache2.0/ * * or in the "license" file accompanying this file. This file is distributed * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either * express or implied. See the License for the specific language governing * permissions and limitations under the License. */ /* Build with: * gcc -O2 -o a.out a.c -lpthread -DITER=1000 -DTHREADS=64 */ #include #include #include #include #ifndef ITER # define ITER 1000 #endif #ifndef THREADS # define THREADS 3 #endif #if THREADS < 1 # error "THREADS is supposed to be at least 1" #endif static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; static int shared_ptr = 0; typedef struct stats_s { uint64_t min, max; int times; uint64_t total; uint64_t flips; } stats_t; stats_t stats[THREADS + 1]; pthread_t threads[THREADS]; #ifdef __aarch64__ static uint64_t cpu_shift() { uint64_t shift = 0; __asm__ __volatile__ ("mrs %0,cntfrq_el0; clz %w0, %w0":"=&r"(shift)); return shift; } #endif static uint64_t gettime() { #ifdef __aarch64__ uint64_t ret = 0; __asm__ __volatile__ ("isb; mrs %0,cntvct_el0":"=r"(ret)); return ret << cpu_shift(); #elif defined __x86_64__ uint64_t a, d; __asm__ __volatile__ ("rdtsc" : "=a" (a), "=d" (d)); return ((uint64_t)a + ((uint64_t)d << 32)); #endif return 0; } static void init_stats() { int i; for (i = 0; i <= THREADS; i++) { stats_t *s = &stats[i]; s->min = 1000000; s->max = 0; s->times = 0; s->total = 0; s->flips = 0; } } static void print_stat(int i) { stats_t *s = &stats[i]; float average = (float) s->total / s->times; if (i == THREADS) fprintf(stdout, "server: min=%ld, max=%ld, average=%f, mutexes_locked=%d, flips=%ld\n", s->min, s->max, average, s->times, s->flips); else fprintf(stdout, "thread %d: min=%ld, max=%ld, average=%f, mutexes_locked=%d, flips=%ld\n", i, s->min, s->max, average, s->times, s->flips); } static void print_stats() { int i; for (i = 0; i <= THREADS; i++) print_stat(i); } static void update_stats(stats_t *s, uint64_t time) { ++s->times; if (time < s->min) s->min = time; if (time > s->max) s->max = time; s->total += time; } static void fun(int check, int set, stats_t *stat) { int loop = 1; while (loop) { uint64_t start = gettime(); pthread_mutex_lock (&lock); if (shared_ptr == check) { loop = 0; ++stat->flips; shared_ptr = set; } pthread_mutex_unlock (&lock); update_stats(stat, gettime() - start); } } static void *tf (void *arg) { int i; stats_t *stat = NULL; pthread_t tid = pthread_self(); for (i = 0; i < THREADS; i++) if (tid == threads[i]) { stat = &stats[i]; break; } /* Run until canceled. */ while(1) fun(1, 0, stat); return NULL; } int main (int argc, char **argv) { int i; for (i = 0; i < THREADS; i++) { if (pthread_create (&threads[i], NULL, tf, NULL) != 0) { puts ("pthread_create failed"); exit (1); } } init_stats(); for (i = 0; i < ITER; i++) fun(0, 1, &stats[THREADS]); for (i = 0; i < THREADS; i++) { if (pthread_cancel (threads[i]) != 0) { puts ("pthread_cancel failed"); exit (1); } } print_stats(); return 0; } --BXVAT5kNtrzKuDFl-- --AqsLC8rIMeq19msA-- --VrqPEDrXMn8OVzN4 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEoov0DD5VE3JmLRT3Qarn3Mo9g1EFAl6SGJ0ACgkQQarn3Mo9 g1H7RxAAjM0G5TCspKfzsOXghWTJJMtlZnSOv1xYLtEZ5KaWgsTNuwXxyBsQgWDj bkzXf99YVwki2UxLv0zxVGrpcAO+UC6I08Kq7uwVI7tuNiIDspV75wTJ6ePfaNMo nf5lST0rMg+PWvdlcLzWTlDoyOdOF9XyAzrZaTURVv5+S4nPPXoaHB/BdTg3ckm0 7QU/+u4rbcyBw5yH+f1ruZaOSc93jR+XCfVvs+8fLN4IFtCQUPCgkEGMny35wtCT PavyNbYe2N0z5Vcp/rWnmVSqv6Hrdhu5n08Spy9isXtRWGFJQEBA6jv/Ge8eX7ww B+Q6JZIJ5zPiO8qeT9HbQBxpBzq2BFLbxct1Ux4YBPTm2LgPL5O3oEWA2daaCxsb SSTL4hgrPIqrDWmGxr19GdEGSRO35Ti5NVMw5EfJWo/CbeXrQSUaMOEi8ELwhRJ8 oRqIlcm6X/2dkKlhhkxMeiPl7GkfBHa7hLtwRVvgxNJnFM8qmitzLeBbTRcu25Tf wp4YIKIuvnCEorUZ/457tEJHBQ55npaTipMuzbb12u1Y2Mpz2B454KmBRe3iNfHG vt9sgGJDDWCUKqs/iw01IBuZBqK7ryQe28NKbKMhSM6zEbg57ezEuNww8gCwgrQn OboCl/sfJC0TkaEBbxeCWF1EzuwkP+aYk4JQekwx29waTFVsiZc= =PdcY -----END PGP SIGNATURE----- --VrqPEDrXMn8OVzN4--