From: Jean Abou Samra <jean@abou-samra.fr>
To: Maxime Devos <maximedevos@telenet.be>,
Andrew Tropin <andrew@trop.in>,
"guile-devel@gnu.org" <guile-devel@gnu.org>
Subject: Re: [PATCH 1/3] Make string-length documentation more correct
Date: Wed, 26 Jun 2024 14:18:16 +0200 [thread overview]
Message-ID: <0ed0f868d5faef088b4b2b7fa2d7457b5f6629b8.camel@abou-samra.fr> (raw)
In-Reply-To: <20240626134628.gBmU2C00E3K6y2F01BmUew@xavier.telenet-ops.be>
[-- Attachment #1: Type: text/plain, Size: 1478 bytes --]
Le mercredi 26 juin 2024 à 13:46 +0200, Maxime Devos a écrit :
> >
> > Maybe `the number of codepoints` will work here.
> > (string-length "👨🏭") ;; => 3
> > (string-length "é") ;; => 2>
> > The number of characters here is 1 in both cases.
>
> No, in Unicode (and Guile equates character=Unicode character) all
> characters correspond to a single codepoint.
Agreed. "The number of code points" would be correct, but "the number
of characters" (i.e., the current wording) is correct too. In the
Scheme terminology, a character is just a Unicode code point,
as can be seen from the name of the procedure character? and related
APIs.
> You need to fix your setup, that’s not what Guile does.
No; he wrote é, U+0065 LATIN SMALL LETTER E + U+0301 COMBINING ACUTE ACCENT,
which is two characters unlike é, LATIN SMALL LETTER E WITH ACUTE.
Likewise 👨🏭 is U+1F468 MAN + U+200D ZERO WIDTH JOINER + U+1F3ED FACTORY.
The "visual characters" are called grapheme clusters, and AFAIK Guile
doesn't provide any API that relates to grapheme clusters. (Note that
the number of grapheme clusters in a given strings depends on the Unicode
database and therefore on the Unicode version.)
There are programming languages where the data type called "character"
corresponds to grapheme clusters, but I don't think this is common.
Swift is the only example I know.
Obligatory reading: https://hsivonen.fi/string-length/
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
next prev parent reply other threads:[~2024-06-26 12:18 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-25 11:20 [PATCH 0/3] Documentation improvements Andrew Tropin
2024-06-25 11:20 ` [PATCH 1/3] Make string-length documentation more correct Andrew Tropin
2024-06-25 11:27 ` Maxime Devos
2024-06-26 11:18 ` Andrew Tropin
2024-06-26 11:46 ` Maxime Devos
2024-06-26 12:07 ` tomas
2024-06-26 12:09 ` Maxime Devos
2024-06-26 12:18 ` Jean Abou Samra [this message]
2024-06-26 12:26 ` Maxime Devos
2024-06-26 14:47 ` Damien Mattei
2024-06-28 13:42 ` Andrew Tropin
2024-06-28 13:38 ` Andrew Tropin
2024-06-25 11:20 ` [PATCH 2/3] Change make-dynamic-state mentions to current-dynamic-state Andrew Tropin
2024-06-25 11:20 ` [PATCH 3/3] Fix spelling Andrew Tropin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0ed0f868d5faef088b4b2b7fa2d7457b5f6629b8.camel@abou-samra.fr \
--to=jean@abou-samra.fr \
--cc=andrew@trop.in \
--cc=guile-devel@gnu.org \
--cc=maximedevos@telenet.be \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).