unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Jean Abou Samra <jean@abou-samra.fr>
To: Maxime Devos <maximedevos@telenet.be>,
	Andrew Tropin <andrew@trop.in>,
	 "guile-devel@gnu.org" <guile-devel@gnu.org>
Subject: Re: [PATCH 1/3] Make string-length documentation more correct
Date: Wed, 26 Jun 2024 14:18:16 +0200	[thread overview]
Message-ID: <0ed0f868d5faef088b4b2b7fa2d7457b5f6629b8.camel@abou-samra.fr> (raw)
In-Reply-To: <20240626134628.gBmU2C00E3K6y2F01BmUew@xavier.telenet-ops.be>

[-- Attachment #1: Type: text/plain, Size: 1478 bytes --]

Le mercredi 26 juin 2024 à 13:46 +0200, Maxime Devos a écrit :
> > 
> > Maybe `the number of codepoints` will work here.
> > (string-length "👨‍🏭") ;; => 3
> > (string-length "é") ;; => 2> 
> > The number of characters here is 1 in both cases.
> 
> No, in Unicode (and Guile equates character=Unicode character) all
> characters correspond to a single codepoint.


Agreed. "The number of code points" would be correct, but "the number
of characters" (i.e., the current wording) is correct too. In the
Scheme terminology, a character is just a Unicode code point,
as can be seen from the name of the procedure character? and related
APIs.


> You need to fix your setup, that’s not what Guile does.


No; he wrote é, U+0065 LATIN SMALL LETTER E + U+0301 COMBINING ACUTE ACCENT,
which is two characters unlike é, LATIN SMALL LETTER E WITH ACUTE.

Likewise 👨‍🏭 is U+1F468 MAN + U+200D ZERO WIDTH JOINER + U+1F3ED FACTORY.

The "visual characters" are called grapheme clusters, and AFAIK Guile
doesn't provide any API that relates to grapheme clusters. (Note that
the number of grapheme clusters in a given strings depends on the Unicode
database and therefore on the Unicode version.)

There are programming languages where the data type called "character"
corresponds to grapheme clusters, but I don't think this is common.
Swift is the only example I know.

Obligatory reading: https://hsivonen.fi/string-length/



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

  parent reply	other threads:[~2024-06-26 12:18 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-25 11:20 [PATCH 0/3] Documentation improvements Andrew Tropin
2024-06-25 11:20 ` [PATCH 1/3] Make string-length documentation more correct Andrew Tropin
2024-06-25 11:27   ` Maxime Devos
2024-06-26 11:18     ` Andrew Tropin
2024-06-26 11:46       ` Maxime Devos
2024-06-26 12:07         ` tomas
2024-06-26 12:09           ` Maxime Devos
2024-06-26 12:18         ` Jean Abou Samra [this message]
2024-06-26 12:26           ` Maxime Devos
2024-06-26 14:47             ` Damien Mattei
2024-06-28 13:42             ` Andrew Tropin
2024-06-28 13:38         ` Andrew Tropin
2024-06-25 11:20 ` [PATCH 2/3] Change make-dynamic-state mentions to current-dynamic-state Andrew Tropin
2024-06-25 11:20 ` [PATCH 3/3] Fix spelling Andrew Tropin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0ed0f868d5faef088b4b2b7fa2d7457b5f6629b8.camel@abou-samra.fr \
    --to=jean@abou-samra.fr \
    --cc=andrew@trop.in \
    --cc=guile-devel@gnu.org \
    --cc=maximedevos@telenet.be \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).