* [PATCH 0/3] Documentation improvements @ 2024-06-25 11:20 Andrew Tropin 2024-06-25 11:20 ` [PATCH 1/3] Make string-length documentation more correct Andrew Tropin ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Andrew Tropin @ 2024-06-25 11:20 UTC (permalink / raw) To: guile-devel; +Cc: Andrew Tropin Fix spelling, mentions of removed code and factual inaccuracies. Andrew Tropin (3): Make string-length documentation more correct Change make-dynamic-state mentions to current-dynamic-state Fix spelling doc/r5rs/r5rs.texi | 2 +- doc/ref/api-scheduling.texi | 2 +- doc/ref/srfi-modules.texi | 2 +- doc/ref/vm.texi | 2 +- libguile/fluids.c | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) -- 2.45.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/3] Make string-length documentation more correct 2024-06-25 11:20 [PATCH 0/3] Documentation improvements Andrew Tropin @ 2024-06-25 11:20 ` Andrew Tropin 2024-06-25 11:27 ` Maxime Devos 2024-06-25 11:20 ` [PATCH 2/3] Change make-dynamic-state mentions to current-dynamic-state Andrew Tropin 2024-06-25 11:20 ` [PATCH 3/3] Fix spelling Andrew Tropin 2 siblings, 1 reply; 14+ messages in thread From: Andrew Tropin @ 2024-06-25 11:20 UTC (permalink / raw) To: guile-devel; +Cc: Andrew Tropin * doc/r5rs/r5rs.texi: --- doc/r5rs/r5rs.texi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/r5rs/r5rs.texi b/doc/r5rs/r5rs.texi index 775c93094..f2e9dda19 100644 --- a/doc/r5rs/r5rs.texi +++ b/doc/r5rs/r5rs.texi @@ -5846,7 +5846,7 @@ Returns a newly allocated string composed of the arguments. @deffn {procedure} string-length string -Returns the number of characters in the given @var{string}. +Returns the number of bytes in the given @var{string}. @end deffn -- 2.45.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* RE: [PATCH 1/3] Make string-length documentation more correct 2024-06-25 11:20 ` [PATCH 1/3] Make string-length documentation more correct Andrew Tropin @ 2024-06-25 11:27 ` Maxime Devos 2024-06-26 11:18 ` Andrew Tropin 0 siblings, 1 reply; 14+ messages in thread From: Maxime Devos @ 2024-06-25 11:27 UTC (permalink / raw) To: Andrew Tropin, guile-devel@gnu.org; +Cc: Andrew Tropin [-- Attachment #1: Type: text/plain, Size: 516 bytes --] >-Returns the number of characters in the given @var{string}. +Returns the number of bytes in the given @var{string}. This is false. For example, (string-length "😀") is 1, whereas in all encodings I know of it is more than one byte. Also, R5RS says: >procedure: string-length string >Returns the number of characters in the given string. , not “return the number of bytes”. Without mentioning the encoding, the “number of bytes” would be ill-defined anyways. Best regards, Maxime Devos. [-- Attachment #2: Type: text/html, Size: 1934 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH 1/3] Make string-length documentation more correct 2024-06-25 11:27 ` Maxime Devos @ 2024-06-26 11:18 ` Andrew Tropin 2024-06-26 11:46 ` Maxime Devos 0 siblings, 1 reply; 14+ messages in thread From: Andrew Tropin @ 2024-06-26 11:18 UTC (permalink / raw) To: Maxime Devos, guile-devel@gnu.org [-- Attachment #1: Type: text/plain, Size: 758 bytes --] On 2024-06-25 13:27, Maxime Devos wrote: > >-Returns the number of characters in the given @var{string}. > +Returns the number of bytes in the given @var{string}. > > This is false. For example, (string-length "😀") is 1, whereas in all encodings I know of it is more than one byte. Also, R5RS says: Maybe `the number of codepoints` will work here. (string-length "👨🏭") ;; => 3 (string-length "é") ;; => 2 The number of characters here is 1 in both cases. > >>procedure: string-length string >>Returns the number of characters in the given string. > > , not “return the number of bytes”. Without mentioning the encoding, the “number of bytes” would be ill-defined anyways. -- Best regards, Andrew Tropin [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH 1/3] Make string-length documentation more correct 2024-06-26 11:18 ` Andrew Tropin @ 2024-06-26 11:46 ` Maxime Devos 2024-06-26 12:07 ` tomas ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Maxime Devos @ 2024-06-26 11:46 UTC (permalink / raw) To: Andrew Tropin, guile-devel@gnu.org [-- Attachment #1: Type: text/plain, Size: 1049 bytes --] >> >-Returns the number of characters in the given @var{string}. >> +Returns the number of bytes in the given @var{string}. >> >> This is false. For example, (string-length "😀") is 1, whereas in all encodings I know of it is >more than one byte. Also, R5RS says: [...] > >Maybe `the number of codepoints` will work here. > >(string-length "👨🏭") ;; => 3 >(string-length "é") ;; => 2 > >The number of characters here is 1 in both cases. No, in Unicode (and Guile equates character=Unicode character) all characters correspond to a single codepoint. You need to fix your setup, that’s not what Guile does. Are you sure you have set the encoding of current-input-port correctly? (Probably by setting LC_ALL or the like to a UTF-8 locale.) Otherwise the 3 bytes in the UTF-8 encoding might be interpreted in terms of some 8-bit encoding. Here’s a test: if you can input #\👨🏭 without errors and it evaluates to #\👨🏭, then the encoding should be set up correctly. Best regards, Maxime Devos [-- Attachment #2: Type: text/html, Size: 3043 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] Make string-length documentation more correct 2024-06-26 11:46 ` Maxime Devos @ 2024-06-26 12:07 ` tomas 2024-06-26 12:09 ` Maxime Devos 2024-06-26 12:18 ` Jean Abou Samra 2024-06-28 13:38 ` Andrew Tropin 2 siblings, 1 reply; 14+ messages in thread From: tomas @ 2024-06-26 12:07 UTC (permalink / raw) To: Maxime Devos; +Cc: Andrew Tropin, guile-devel@gnu.org [-- Attachment #1: Type: text/plain, Size: 1214 bytes --] On Wed, Jun 26, 2024 at 01:46:28PM +0200, Maxime Devos wrote: > > >> >-Returns the number of characters in the given @var{string}. > >> +Returns the number of bytes in the given @var{string}. > >> > >> This is false. For example, (string-length "😀") is 1, whereas in all encodings I know of it is >more than one byte. Also, R5RS says: [...] > > > >Maybe `the number of codepoints` will work here. > > > >(string-length "👨🏭") ;; => 3 > >(string-length "é") ;; => 2 > > > >The number of characters here is 1 in both cases. > > No, in Unicode (and Guile equates character=Unicode character) all characters correspond to a single codepoint. It's more subtle than that: Unicode knows about "combining characters", so it's quite possible that Andrew's "é" consists of two code points (FWIW, it arrives to me as just one, but perhaps there was some canonicalization [1] step in between). ISTR that "Unicode character" is actually synonymous the same than "Unicode code point" -- but the common meaning of "character" is more fuzzy. Perhaps it's wise to avoid that word when trying to be precise. Cheers [1] https://en.wikipedia.org/wiki/Unicode_normalization -- t [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH 1/3] Make string-length documentation more correct 2024-06-26 12:07 ` tomas @ 2024-06-26 12:09 ` Maxime Devos 0 siblings, 0 replies; 14+ messages in thread From: Maxime Devos @ 2024-06-26 12:09 UTC (permalink / raw) To: tomas@tuxteam.de; +Cc: Andrew Tropin, guile-devel@gnu.org [-- Attachment #1: Type: text/plain, Size: 327 bytes --] >ISTR that "Unicode character" is actually synonymous the same than "Unicode code point" -- but the common meaning of "character" is more fuzzy. Perhaps it's wise to avoid that word when trying to be precise. My second point was that it is to late for that, unless you intend to rename procedures like character? etc.. [-- Attachment #2: Type: text/html, Size: 1490 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] Make string-length documentation more correct 2024-06-26 11:46 ` Maxime Devos 2024-06-26 12:07 ` tomas @ 2024-06-26 12:18 ` Jean Abou Samra 2024-06-26 12:26 ` Maxime Devos 2024-06-28 13:38 ` Andrew Tropin 2 siblings, 1 reply; 14+ messages in thread From: Jean Abou Samra @ 2024-06-26 12:18 UTC (permalink / raw) To: Maxime Devos, Andrew Tropin, guile-devel@gnu.org [-- Attachment #1: Type: text/plain, Size: 1478 bytes --] Le mercredi 26 juin 2024 à 13:46 +0200, Maxime Devos a écrit : > > > > Maybe `the number of codepoints` will work here. > > (string-length "👨🏭") ;; => 3 > > (string-length "é") ;; => 2> > > The number of characters here is 1 in both cases. > > No, in Unicode (and Guile equates character=Unicode character) all > characters correspond to a single codepoint. Agreed. "The number of code points" would be correct, but "the number of characters" (i.e., the current wording) is correct too. In the Scheme terminology, a character is just a Unicode code point, as can be seen from the name of the procedure character? and related APIs. > You need to fix your setup, that’s not what Guile does. No; he wrote é, U+0065 LATIN SMALL LETTER E + U+0301 COMBINING ACUTE ACCENT, which is two characters unlike é, LATIN SMALL LETTER E WITH ACUTE. Likewise 👨🏭 is U+1F468 MAN + U+200D ZERO WIDTH JOINER + U+1F3ED FACTORY. The "visual characters" are called grapheme clusters, and AFAIK Guile doesn't provide any API that relates to grapheme clusters. (Note that the number of grapheme clusters in a given strings depends on the Unicode database and therefore on the Unicode version.) There are programming languages where the data type called "character" corresponds to grapheme clusters, but I don't think this is common. Swift is the only example I know. Obligatory reading: https://hsivonen.fi/string-length/ [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH 1/3] Make string-length documentation more correct 2024-06-26 12:18 ` Jean Abou Samra @ 2024-06-26 12:26 ` Maxime Devos 2024-06-26 14:47 ` Damien Mattei 2024-06-28 13:42 ` Andrew Tropin 0 siblings, 2 replies; 14+ messages in thread From: Maxime Devos @ 2024-06-26 12:26 UTC (permalink / raw) To: Jean Abou Samra, Andrew Tropin, guile-devel@gnu.org [-- Attachment #1: Type: text/plain, Size: 353 bytes --] >No; he wrote é, U+0065 LATIN SMALL LETTER E + U+0301 COMBINING ACUTE ACCENT, >which is two characters unlike é, LATIN SMALL LETTER E WITH ACUTE. > >Likewise 👨🏭 is U+1F468 MAN + U+200D ZERO WIDTH JOINER + U+1F3ED FACTORY. Right, I should have tested that instead of assuming it’s the pre-combined é and a single-codepoint emoji. [-- Attachment #2: Type: text/html, Size: 2000 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] Make string-length documentation more correct 2024-06-26 12:26 ` Maxime Devos @ 2024-06-26 14:47 ` Damien Mattei 2024-06-28 13:42 ` Andrew Tropin 1 sibling, 0 replies; 14+ messages in thread From: Damien Mattei @ 2024-06-26 14:47 UTC (permalink / raw) To: guile-devel@gnu.org [-- Attachment #1: Type: text/plain, Size: 71 bytes --] and how long for this one? 𓄿𓎢𓆑𓇋𓅃𓉔𓇌𓃀𓆓 😂 [-- Attachment #2: Type: text/html, Size: 229 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH 1/3] Make string-length documentation more correct 2024-06-26 12:26 ` Maxime Devos 2024-06-26 14:47 ` Damien Mattei @ 2024-06-28 13:42 ` Andrew Tropin 1 sibling, 0 replies; 14+ messages in thread From: Andrew Tropin @ 2024-06-28 13:42 UTC (permalink / raw) To: Maxime Devos, Jean Abou Samra, guile-devel@gnu.org [-- Attachment #1: Type: text/plain, Size: 557 bytes --] On 2024-06-26 14:26, Maxime Devos wrote: >>No; he wrote é, U+0065 LATIN SMALL LETTER E + U+0301 COMBINING ACUTE ACCENT, >>which is two characters unlike é, LATIN SMALL LETTER E WITH ACUTE. >> >>Likewise 👨🏭 is U+1F468 MAN + U+200D ZERO WIDTH JOINER + U+1F3ED FACTORY. > > Right, I should have tested that instead of assuming it’s the > pre-combined é and a single-codepoint emoji. > Let's keep string-length documentation intact :) It would be cool if somebody apply the rest two patches. -- Best regards, Andrew Tropin [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [PATCH 1/3] Make string-length documentation more correct 2024-06-26 11:46 ` Maxime Devos 2024-06-26 12:07 ` tomas 2024-06-26 12:18 ` Jean Abou Samra @ 2024-06-28 13:38 ` Andrew Tropin 2 siblings, 0 replies; 14+ messages in thread From: Andrew Tropin @ 2024-06-28 13:38 UTC (permalink / raw) To: Maxime Devos, guile-devel@gnu.org [-- Attachment #1: Type: text/plain, Size: 1347 bytes --] On 2024-06-26 13:46, Maxime Devos wrote: >>> >-Returns the number of characters in the given @var{string}. >>> +Returns the number of bytes in the given @var{string}. >>> >>> This is false. For example, (string-length "😀") is 1, whereas in all encodings I know of it is >more than one byte. Also, R5RS says: [...] >> >>Maybe `the number of codepoints` will work here. >> >>(string-length "👨🏭") ;; => 3 >>(string-length "é") ;; => 2 >> >>The number of characters here is 1 in both cases. > > No, in Unicode (and Guile equates character=Unicode character) all characters correspond to a single codepoint. > > You need to fix your setup, that’s not what Guile does. Are you sure you have set the encoding of current-input-port correctly? (Probably by setting LC_ALL or the like to a UTF-8 locale.) Otherwise the 3 bytes in the UTF-8 encoding might be interpreted in terms of some 8-bit encoding. > > Here’s a test: if you can input #\👨🏭 without errors and it evaluates to #\👨🏭, then the encoding should be set up correctly. (setlocale LC_ALL) ;; => "en_US.utf8" (display #\👨🏭) ;; => /home/bob/guile-ares-rs/dev/guile/tmp.scm:84:15: unknown character name 👨🏭 The same hapenning if I do it in usual REPL: LC_ALL=en_US.utf8 guile -- Best regards, Andrew Tropin [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/3] Change make-dynamic-state mentions to current-dynamic-state 2024-06-25 11:20 [PATCH 0/3] Documentation improvements Andrew Tropin 2024-06-25 11:20 ` [PATCH 1/3] Make string-length documentation more correct Andrew Tropin @ 2024-06-25 11:20 ` Andrew Tropin 2024-06-25 11:20 ` [PATCH 3/3] Fix spelling Andrew Tropin 2 siblings, 0 replies; 14+ messages in thread From: Andrew Tropin @ 2024-06-25 11:20 UTC (permalink / raw) To: guile-devel; +Cc: Andrew Tropin * doc/ref/api-scheduling.texi: * libguile/fluids.c: --- doc/ref/api-scheduling.texi | 2 +- libguile/fluids.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/ref/api-scheduling.texi b/doc/ref/api-scheduling.texi index d79808049..f6cc942a1 100644 --- a/doc/ref/api-scheduling.texi +++ b/doc/ref/api-scheduling.texi @@ -199,7 +199,7 @@ A fluid created with @code{make-thread-local-fluid} won't be captured by Return a newly created fluid, whose initial value is @var{dflt}, or @code{#f} if @var{dflt} is not given. Unlike fluids made with @code{make-fluid}, thread local fluids are not captured by -@code{make-dynamic-state}. Similarly, a newly spawned child thread does +@code{current-dynamic-state}. Similarly, a newly spawned child thread does not inherit thread-local fluid values from the parent thread. @end deffn diff --git a/libguile/fluids.c b/libguile/fluids.c index ebdb48fbc..4632f32ae 100644 --- a/libguile/fluids.c +++ b/libguile/fluids.c @@ -264,7 +264,7 @@ SCM_DEFINE (scm_make_thread_local_fluid, "make-thread-local-fluid", 0, 1, 0, "Return a newly created fluid, whose initial value is @var{dflt},\n" "or @code{#f} if @var{dflt} is not given. Unlike fluids made\n" "with @code{make-fluid}, thread local fluids are not captured\n" - "by @code{make-dynamic-state}. Similarly, a newly spawned\n" + "by @code{current-dynamic-state}. Similarly, a newly spawned\n" "child thread does not inherit thread-local fluid values from\n" "the parent thread.") #define FUNC_NAME s_scm_make_thread_local_fluid -- 2.45.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/3] Fix spelling 2024-06-25 11:20 [PATCH 0/3] Documentation improvements Andrew Tropin 2024-06-25 11:20 ` [PATCH 1/3] Make string-length documentation more correct Andrew Tropin 2024-06-25 11:20 ` [PATCH 2/3] Change make-dynamic-state mentions to current-dynamic-state Andrew Tropin @ 2024-06-25 11:20 ` Andrew Tropin 2 siblings, 0 replies; 14+ messages in thread From: Andrew Tropin @ 2024-06-25 11:20 UTC (permalink / raw) To: guile-devel; +Cc: Andrew Tropin * doc/ref/srfi-modules.texi: * doc/ref/vm.texi: --- doc/ref/srfi-modules.texi | 2 +- doc/ref/vm.texi | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/ref/srfi-modules.texi b/doc/ref/srfi-modules.texi index 02da3e2f2..7e2295acd 100644 --- a/doc/ref/srfi-modules.texi +++ b/doc/ref/srfi-modules.texi @@ -5093,7 +5093,7 @@ wrap procedure bodies with @code{(lazy ...)}. @end itemize @node SRFI-46 -@subsection SRFI-46 Basic syntax-rules Extensions +@subsection SRFI-46 - Basic syntax-rules Extensions @cindex SRFI-46 Guile's core @code{syntax-rules} supports the extensions specified by diff --git a/doc/ref/vm.texi b/doc/ref/vm.texi index b0669f0d4..c0c4aa3c4 100644 --- a/doc/ref/vm.texi +++ b/doc/ref/vm.texi @@ -533,7 +533,7 @@ Side tables of procedure properties, arities, and docstrings. @item .guile.docstrs.strtab Side table of frame maps, describing the set of live slots for ever return point in the program text, and whether those slots are pointers -are not. Used by the garbage collector. +or not. Used by the garbage collector. @item .debug_info @itemx .debug_abbrev @itemx .debug_str -- 2.45.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-06-28 13:42 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-06-25 11:20 [PATCH 0/3] Documentation improvements Andrew Tropin 2024-06-25 11:20 ` [PATCH 1/3] Make string-length documentation more correct Andrew Tropin 2024-06-25 11:27 ` Maxime Devos 2024-06-26 11:18 ` Andrew Tropin 2024-06-26 11:46 ` Maxime Devos 2024-06-26 12:07 ` tomas 2024-06-26 12:09 ` Maxime Devos 2024-06-26 12:18 ` Jean Abou Samra 2024-06-26 12:26 ` Maxime Devos 2024-06-26 14:47 ` Damien Mattei 2024-06-28 13:42 ` Andrew Tropin 2024-06-28 13:38 ` Andrew Tropin 2024-06-25 11:20 ` [PATCH 2/3] Change make-dynamic-state mentions to current-dynamic-state Andrew Tropin 2024-06-25 11:20 ` [PATCH 3/3] Fix spelling Andrew Tropin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).