Le 05/09/2022 à 21:24, Ludovic Courtès a écrit : > Yes, that’d be welcome. I would not call it a constraint or limitation; > for example, that ‘w’ is not a letter in Swedish is the kind of thing > you’d generally want to take into account. Now, it’d be nice if one > could easily specify the locale to operate under, with an API similar to > that of (ice-9 i18n) and its first-class locale objects. Sorry that it took me forever to send this. From c666ca4f72dc0a00d28b8d7ef1221ebfc9741551 Mon Sep 17 00:00:00 2001 From: Jean Abou Samra Date: Thu, 17 Nov 2022 21:26:07 +0100 Subject: [PATCH] Doc: clarification on regexes and encodings * doc/ref/api-regex.texi: make it more obviously clear that regexp   matching supports only characters supported by the locale encoding. ---  doc/ref/api-regex.texi | 6 +++++-  1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/doc/ref/api-regex.texi b/doc/ref/api-regex.texi index b14c2b39c..bd1f4079d 100644 --- a/doc/ref/api-regex.texi +++ b/doc/ref/api-regex.texi @@ -57,7 +57,11 @@ locale's encoding, and then passed to the C library's regular expression  routines (@pxref{Regular Expressions,,, libc, The GNU C Library  Reference Manual}).  The returned match structures always point to  characters in the strings, not to individual bytes, even in the case of -multi-byte encodings. +multi-byte encodings.  This ensures that the match structures are +correct when performing matching with characters that have a multi-byte +representation in the locale encoding.  Note, however, that using +characters which cannot be represented in the locale encoding can lead +to surprising results.  @deffn {Scheme Procedure} string-match pattern str [start]  Compile the string @var{pattern} into a regular expression and compare -- 2.38.1