* bug#38235: string-foldcase bug for trailing sigma
@ 2019-11-16 20:41 Andy Wingo
2019-11-17 11:19 ` tomas
2019-11-17 18:13 ` John Cowan
0 siblings, 2 replies; 3+ messages in thread
From: Andy Wingo @ 2019-11-16 20:41 UTC (permalink / raw)
To: 38235
Given the following example, using (rnrs unicode):
(string-foldcase "ΜΈΛΟΣ")
The expected result is "μέλοσ"; see R6RS libraries section 1.2. However
instead Guile's result is "μέλος". Note that although Σ usually
downcases to σ, at the end of a string it's ς. This test shows a
limitation of defining string-foldcase as simply (string-downcase
(string-upcase str)).
^ permalink raw reply [flat|nested] 3+ messages in thread
* bug#38235: string-foldcase bug for trailing sigma
2019-11-16 20:41 bug#38235: string-foldcase bug for trailing sigma Andy Wingo
@ 2019-11-17 11:19 ` tomas
2019-11-17 18:13 ` John Cowan
1 sibling, 0 replies; 3+ messages in thread
From: tomas @ 2019-11-17 11:19 UTC (permalink / raw)
To: 38235
[-- Attachment #1: Type: text/plain, Size: 401 bytes --]
On Sat, Nov 16, 2019 at 09:41:05PM +0100, Andy Wingo wrote:
> Given the following example, using (rnrs unicode):
>
> (string-foldcase "ΜΈΛΟΣ")
Good catch. I think there's even a worse example: dotless
and dotted I [1]. Here it seems even impossible to do
up- and downcase correctly without knowing the language
context.
Cheers
[1] https://en.wikipedia.org/wiki/%C4%B0
-- tomás
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* bug#38235: string-foldcase bug for trailing sigma
2019-11-16 20:41 bug#38235: string-foldcase bug for trailing sigma Andy Wingo
2019-11-17 11:19 ` tomas
@ 2019-11-17 18:13 ` John Cowan
1 sibling, 0 replies; 3+ messages in thread
From: John Cowan @ 2019-11-17 18:13 UTC (permalink / raw)
To: Andy Wingo, tomas; +Cc: 38235
[-- Attachment #1: Type: text/plain, Size: 2041 bytes --]
On Sat, Nov 16, 2019 at 3:42 PM Andy Wingo <wingo@pobox.com> wrote:
> The expected result is "μέλοσ"; see R6RS libraries section 1.2. However
> instead Guile's result is "μέλος". Note that although Σ usually
> downcases to σ, at the end of a string it's ς.
More precisely, it downcases to σ if a letter follows and to ς if not
(being at the end of a string is a particular case). However, this is not
actually always Greekly correct: the string "ΦΙΛΟΣ." with a period at the
end downcases to "φιλος." if it is the word φίλος 'friend' (without its
proper accent) at the end of a sentence, but as "φιλος." if it is an
abbreviation for φιλοσοφία 'philosophy'. For this reason, R7RS does not
require mapping to ς in this situation as R6RS does.
This test shows a
> limitation of defining string-foldcase as simply (string-downcase
> (string-upcase str)).
>
As explained in Unicode section 5.18, the foldcase mappings (in <
https://www.unicode.org/Public/UNIDATA/CaseFolding.txt>, the lines with
status C and F) actually create a set of equivalence classes that are
closed under {upper,lower,title}case mapping, and then choose a single
character to represent each class. This is usually the unique lowercase
character, but not always: in Cherokee it is the uppercase character, and
in the set {Σ, σ, ς} it is σ.
On Sun, Nov 17, 2019 at 6:20 AM <tomas@tuxteam.de> wrote:
Good catch. I think there's even a worse example: dotless
> and dotted I [1]. Here it seems even impossible to do
> up- and downcase correctly without knowing the language
> context.
>
Language-specific case mappings are explicitly out of Scheme's remit: they
have to be performed by specialized libraries. There is an additional
situation in Lithuanian dictionaries (but not running text): an "i" with a
tone accent is represented as "i" + dot above + accent, like this: "i̇́".
However, this dot above must be dropped when uppercasing, producing
ordinary "Í".
[-- Attachment #2: Type: text/html, Size: 3004 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-11-17 18:13 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-16 20:41 bug#38235: string-foldcase bug for trailing sigma Andy Wingo
2019-11-17 11:19 ` tomas
2019-11-17 18:13 ` John Cowan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).