From f5140bf63c624b975f1dcf98dcf18bf7cc44abfa Mon Sep 17 00:00:00 2001 From: Jean Abou Samra Date: Sun, 11 Dec 2022 12:28:02 +0100 Subject: [PATCH 2/2] Doc: clarification on regexes and encodings * doc/ref/api-regex.texi: make it more obviously clear that regexp matching supports only characters supported by the locale encoding. --- doc/ref/api-regex.texi | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/doc/ref/api-regex.texi b/doc/ref/api-regex.texi index b14c2b39c..d778f969f 100644 --- a/doc/ref/api-regex.texi +++ b/doc/ref/api-regex.texi @@ -57,7 +57,11 @@ locale's encoding, and then passed to the C library's regular expression routines (@pxref{Regular Expressions,,, libc, The GNU C Library Reference Manual}). The returned match structures always point to characters in the strings, not to individual bytes, even in the case of -multi-byte encodings. +multi-byte encodings. This ensures that the match structures are +correct when performing matching with characters that have a multi-byte +representation in the locale encoding. Note, however, that using +characters which cannot be represented in the locale encoding can +lead to surprising results. @deffn {Scheme Procedure} string-match pattern str [start] Compile the string @var{pattern} into a regular expression and compare @@ -325,7 +329,7 @@ example the following is the date example from @code{string-match} call. @lisp -(define date-regex +(define date-regex "([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])") (define s "Date 20020429 12am.") (regexp-substitute/global #f date-regex s -- 2.38.1