From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Newsgroups: gmane.lisp.guile.bugs Subject: bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes Date: Sat, 05 Nov 2022 23:18:26 +0100 Message-ID: <87zgd5gi4t.fsf@gnu.org> References: <20220706012323.1024763-1-rlb@defaultvalue.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="31954"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.1 (gnu/linux) Cc: 56413@debbugs.gnu.org To: Rob Browning Original-X-From: bug-guile-bounces+guile-bugs=m.gmane-mx.org@gnu.org Sat Nov 05 23:19:22 2022 Return-path: Envelope-to: guile-bugs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1orRVZ-00085t-As for guile-bugs@m.gmane-mx.org; Sat, 05 Nov 2022 23:19:21 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1orRVJ-0000Do-Nl; Sat, 05 Nov 2022 18:19:06 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1orRVG-0000Cv-KB for bug-guile@gnu.org; Sat, 05 Nov 2022 18:19:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1orRVG-0006iT-Ar for bug-guile@gnu.org; Sat, 05 Nov 2022 18:19:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1orRVG-0003sm-5i for bug-guile@gnu.org; Sat, 05 Nov 2022 18:19:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Sat, 05 Nov 2022 22:19:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56413 X-GNU-PR-Package: guile X-GNU-PR-Keywords: patch Original-Received: via spool by 56413-submit@debbugs.gnu.org id=B56413.166768671914876 (code B ref 56413); Sat, 05 Nov 2022 22:19:02 +0000 Original-Received: (at 56413) by debbugs.gnu.org; 5 Nov 2022 22:18:39 +0000 Original-Received: from localhost ([127.0.0.1]:58247 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1orRUt-0003rs-8A for submit@debbugs.gnu.org; Sat, 05 Nov 2022 18:18:39 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:48592) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1orRUo-0003rY-LM for 56413@debbugs.gnu.org; Sat, 05 Nov 2022 18:18:37 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1orRUj-0006ak-1x; Sat, 05 Nov 2022 18:18:29 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=rbRFX+XMkLfyic9sqEAVuMw3z9YWksqTUg9TNG17KOU=; b=jqKMFUZfJgXuKoJJsF9C gWF9/7HizcOpEuZDvhAjO987EnofITtiBHWexuKiyUMk6juQCMxdeF3pf/mIVQywwL46Fq6RMqutB jvOLG1vAK9FF+XRBlYC5+v/zqLSNhzBS8O1+NL5Ah8oNmQkigdhEooQnKk0D58dRZ8VVirJnC1JOi h7EB1IDHsLdn5i1f7ZlvD7RVCYpaXRrSul8CkuWhwP3bTE8prHSn5tL+zCcoVB6553VmyYqEV7Q2s Smx4pZSaWFVOe0vVG57XOT/1C4fQiF4NDguAwV8QZwIluQjKs5k6/RvUqmw2h92brzWleQfG7ii6Q HnSq8swUFQw49A==; Original-Received: from 91-160-117-201.subs.proxad.net ([91.160.117.201] helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1orRUi-0005K2-Ga; Sat, 05 Nov 2022 18:18:28 -0400 In-Reply-To: <20220706012323.1024763-1-rlb@defaultvalue.org> (Rob Browning's message of "Tue, 5 Jul 2022 20:23:23 -0500") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: "bug-guile" Errors-To: bug-guile-bounces+guile-bugs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.lisp.guile.bugs:10413 Archived-At: Hi, Rob Browning skribis: > Noticed while investigating a migration to utf-8 strings. After making > changes that routed non-ascii symbol hashing through this function, > encoding-iso88597.test began intermittently failing because it would > traverse trailing garbage when u8_strnlen reported 8 chars instead of 4. > > Change the scm_i_str2symbol internal hash type to unsigned long to > explicitly match the hashing result type. Oh, good catch. For the final patch please add a ChangeLog-style entry. > + // Make sure a utf-8 symbol has the expected hash. In addition to > + // catching algorithmic regressions, this would have caught a > + // long-standing buffer overflow. > + > + // =CF=80=CE=B5=CF=81=CE=AF > + char about_u8[] =3D {0xce, 0xa0, 0xce, 0xb5, 0xcf, 0x81, 0xce, 0xaf, 0= }; > + SCM sym =3D scm_from_utf8_symbol (about_u8); > + > + const unsigned long expect =3D 4029223418961680680; > + const unsigned long actual =3D scm_to_ulong (scm_symbol_hash (sym)); Is this a documented example of Jenkins? Or did you use a reference implementation? > Hmm. I suppose the current test could be handled on the scheme side > instead. (I'd started off attempting some more direct, elaborate tests > that didn't pan out.) Happy to rework that if desired. Yes, it may be nicer to have it in =E2=80=98test-suite/tests/hash.test=E2= =80=99. AFAICS this will only change the hash of UTF-8 symbols and won=E2=80=99t ha= ve any effect on the output of =E2=80=98string-hash=E2=80=99, right? If not t= hat would be an incompatibility. Thanks and sorry for the delay! Ludo=E2=80=99.