From: "Ludovic Courtès" <ludo@gnu.org>
To: Rob Browning <rlb@defaultvalue.org>
Cc: 56413@debbugs.gnu.org
Subject: bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes
Date: Sat, 05 Nov 2022 23:18:26 +0100 [thread overview]
Message-ID: <87zgd5gi4t.fsf@gnu.org> (raw)
In-Reply-To: <20220706012323.1024763-1-rlb@defaultvalue.org> (Rob Browning's message of "Tue, 5 Jul 2022 20:23:23 -0500")
Hi,
Rob Browning <rlb@defaultvalue.org> skribis:
> Noticed while investigating a migration to utf-8 strings. After making
> changes that routed non-ascii symbol hashing through this function,
> encoding-iso88597.test began intermittently failing because it would
> traverse trailing garbage when u8_strnlen reported 8 chars instead of 4.
>
> Change the scm_i_str2symbol internal hash type to unsigned long to
> explicitly match the hashing result type.
Oh, good catch.
For the final patch please add a ChangeLog-style entry.
> + // Make sure a utf-8 symbol has the expected hash. In addition to
> + // catching algorithmic regressions, this would have caught a
> + // long-standing buffer overflow.
> +
> + // περί
> + char about_u8[] = {0xce, 0xa0, 0xce, 0xb5, 0xcf, 0x81, 0xce, 0xaf, 0};
> + SCM sym = scm_from_utf8_symbol (about_u8);
> +
> + const unsigned long expect = 4029223418961680680;
> + const unsigned long actual = scm_to_ulong (scm_symbol_hash (sym));
Is this a documented example of Jenkins? Or did you use a reference
implementation?
> Hmm. I suppose the current test could be handled on the scheme side
> instead. (I'd started off attempting some more direct, elaborate tests
> that didn't pan out.) Happy to rework that if desired.
Yes, it may be nicer to have it in ‘test-suite/tests/hash.test’.
AFAICS this will only change the hash of UTF-8 symbols and won’t have
any effect on the output of ‘string-hash’, right? If not that would be
an incompatibility.
Thanks and sorry for the delay!
Ludo’.
next prev parent reply other threads:[~2022-11-05 22:18 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-06 1:23 bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes Rob Browning
2022-07-06 3:04 ` Rob Browning
2022-11-05 22:18 ` Ludovic Courtès [this message]
2022-11-06 16:44 ` Rob Browning
2022-11-06 17:45 ` Rob Browning
2022-11-07 13:06 ` Ludovic Courtès
2022-11-06 19:46 ` Rob Browning
2022-11-07 13:07 ` Ludovic Courtès
2022-11-08 5:05 ` Rob Browning
2022-11-08 10:09 ` Ludovic Courtès
2023-03-05 22:21 ` bug#56413: [PATCH v2 " Rob Browning
2023-03-06 16:39 ` Ludovic Courtès
2023-03-12 19:30 ` bug#56413: [PATCH v3 " Rob Browning
2023-03-13 11:29 ` Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zgd5gi4t.fsf@gnu.org \
--to=ludo@gnu.org \
--cc=56413@debbugs.gnu.org \
--cc=rlb@defaultvalue.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).