From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Rob Browning Newsgroups: gmane.lisp.guile.bugs Subject: bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes Date: Tue, 5 Jul 2022 20:23:23 -0500 Message-ID: <20220706012323.1024763-1-rlb@defaultvalue.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="1130"; mail-complaints-to="usenet@ciao.gmane.io" To: 56413@debbugs.gnu.org Original-X-From: bug-guile-bounces+guile-bugs=m.gmane-mx.org@gnu.org Wed Jul 06 03:25:09 2022 Return-path: Envelope-to: guile-bugs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1o8tmv-00007F-Kz for guile-bugs@m.gmane-mx.org; Wed, 06 Jul 2022 03:25:09 +0200 Original-Received: from localhost ([::1]:43496 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1o8tmu-0006FQ-Eg for guile-bugs@m.gmane-mx.org; Tue, 05 Jul 2022 21:25:08 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:51962) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o8tmp-0006FF-82 for bug-guile@gnu.org; Tue, 05 Jul 2022 21:25:03 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:58053) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1o8tmo-0001D5-V5 for bug-guile@gnu.org; Tue, 05 Jul 2022 21:25:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1o8tmo-0000ex-Q3 for bug-guile@gnu.org; Tue, 05 Jul 2022 21:25:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Rob Browning Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Wed, 06 Jul 2022 01:25:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 56413 X-GNU-PR-Package: guile X-GNU-PR-Keywords: patch X-Debbugs-Original-To: bug-guile@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.16570706522466 (code B ref -1); Wed, 06 Jul 2022 01:25:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 6 Jul 2022 01:24:12 +0000 Original-Received: from localhost ([127.0.0.1]:51950 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o8tln-0000dD-8L for submit@debbugs.gnu.org; Tue, 05 Jul 2022 21:24:12 -0400 Original-Received: from lists.gnu.org ([209.51.188.17]:41222) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o8tlm-0000d6-2M for submit@debbugs.gnu.org; Tue, 05 Jul 2022 21:23:58 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:51740) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o8tll-0006Aj-TV for bug-guile@gnu.org; Tue, 05 Jul 2022 21:23:57 -0400 Original-Received: from defaultvalue.org ([45.33.119.55]:37416) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1o8tlj-0000za-WF for bug-guile@gnu.org; Tue, 05 Jul 2022 21:23:57 -0400 Original-Received: from trouble.defaultvalue.org (localhost [127.0.0.1]) (Authenticated sender: rlb@defaultvalue.org) by defaultvalue.org (Postfix) with ESMTPSA id DB0FB200C7 for ; Tue, 5 Jul 2022 20:23:23 -0500 (CDT) Original-Received: by trouble.defaultvalue.org (Postfix, from userid 1000) id 77DBF14E494; Tue, 5 Jul 2022 20:23:23 -0500 (CDT) X-Mailer: git-send-email 2.30.2 Received-SPF: pass client-ip=45.33.119.55; envelope-from=rlb@defaultvalue.org; helo=defaultvalue.org X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane-mx.org@gnu.org Original-Sender: "bug-guile" Xref: news.gmane.io gmane.lisp.guile.bugs:10296 Archived-At: Noticed while investigating a migration to utf-8 strings. After making changes that routed non-ascii symbol hashing through this function, encoding-iso88597.test began intermittently failing because it would traverse trailing garbage when u8_strnlen reported 8 chars instead of 4. Change the scm_i_str2symbol internal hash type to unsigned long to explicitly match the hashing result type. --- Proposed for at least main. libguile/hash.c | 2 +- libguile/symbols.c | 2 +- test-suite/standalone/Makefile.am | 7 ++++ test-suite/standalone/test-hashing.c | 61 ++++++++++++++++++++++++++++ 4 files changed, 70 insertions(+), 2 deletions(-) create mode 100644 test-suite/standalone/test-hashing.c diff --git a/libguile/hash.c b/libguile/hash.c index 93431102f..0740b2645 100644 --- a/libguile/hash.c +++ b/libguile/hash.c @@ -188,7 +188,7 @@ scm_i_utf8_string_hash (const char *str, size_t len) /* Invalid UTF-8; punt. */ return scm_i_string_hash (scm_from_utf8_stringn (str, len)); - length = u8_strnlen (ustr, len); + length = u8_mbsnlen (ustr, len); /* Set up the internal state. */ a = b = c = 0xdeadbeef + ((uint32_t)(length<<2)) + 47; diff --git a/libguile/symbols.c b/libguile/symbols.c index ad5f22f57..cd9cda3de 100644 --- a/libguile/symbols.c +++ b/libguile/symbols.c @@ -239,7 +239,7 @@ static SCM scm_i_str2symbol (SCM str) { SCM symbol; - size_t raw_hash = scm_i_string_hash (str); + unsigned long raw_hash = scm_i_string_hash (str); symbol = lookup_interned_symbol (str, raw_hash); if (scm_is_true (symbol)) diff --git a/test-suite/standalone/Makefile.am b/test-suite/standalone/Makefile.am index e87100c96..ca1b3131b 100644 --- a/test-suite/standalone/Makefile.am +++ b/test-suite/standalone/Makefile.am @@ -167,6 +167,13 @@ test_conversion_LDADD = $(LIBGUILE_LDADD) $(top_builddir)/lib/libgnu.la check_PROGRAMS += test-conversion TESTS += test-conversion +# test-hashing +test_hashing_SOURCES = test-hashing.c +test_hashing_CFLAGS = ${test_cflags} +test_hashing_LDADD = $(LIBGUILE_LDADD) $(top_builddir)/lib/libgnu.la +check_PROGRAMS += test-hashing +TESTS += test-hashing + # test-loose-ends test_loose_ends_SOURCES = test-loose-ends.c test_loose_ends_CFLAGS = ${test_cflags} diff --git a/test-suite/standalone/test-hashing.c b/test-suite/standalone/test-hashing.c new file mode 100644 index 000000000..476181fe2 --- /dev/null +++ b/test-suite/standalone/test-hashing.c @@ -0,0 +1,61 @@ +/* Copyright 2022 + Free Software Foundation, Inc. + + This file is part of Guile. + + Guile is free software: you can redistribute it and/or modify it + under the terms of the GNU Lesser General Public License as published + by the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + Guile is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public + License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with Guile. If not, see + . */ + +#if HAVE_CONFIG_H +# include +#endif + +#include + +#include + +static void +test_hashing () +{ + // Make sure a utf-8 symbol has the expected hash. In addition to + // catching algorithmic regressions, this would have caught a + // long-standing buffer overflow. + + // περί + char about_u8[] = {0xce, 0xa0, 0xce, 0xb5, 0xcf, 0x81, 0xce, 0xaf, 0}; + SCM sym = scm_from_utf8_symbol (about_u8); + + const unsigned long expect = 4029223418961680680; + const unsigned long actual = scm_to_ulong (scm_symbol_hash (sym)); + + if (actual != expect) + { + fprintf (stderr, "fail: unexpected utf-8 symbol hash (%lu != %lu)\n", + actual, expect); + exit (EXIT_FAILURE); + } +} + +static void +tests (void *data, int argc, char **argv) +{ + test_hashing (); +} + +int +main (int argc, char *argv[]) +{ + scm_boot_guile (argc, argv, tests, NULL); + return 0; +} -- 2.30.2