bug#69968: Case-folding of Mathematical Alphanumeric Symbols

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* bug#69968: Case-folding of Mathematical Alphanumeric Symbols
@ 2024-03-23 18:27 Juri Linkov
  2024-03-24  6:27 ` Eli Zaretskii
  0 siblings, 1 reply; 7+ messages in thread
From: Juri Linkov @ 2024-03-23 18:27 UTC (permalink / raw)
  To: 69968

I wonder why case-folding is not supported for letters from
the Unicode block "Mathematical Alphanumeric Symbols":
https://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symbols

Is it because the Unicode standard doesn't provide information
about their case-folding?  And indeed they are missing from
https://unicode.org/Public/UNIDATA/CaseFolding.txt

But OTOH, I can't find the file CaseFolding.txt in admin/unidata.
This means Emacs doesn't use this file?

Then should we add more case-folding information explicitly
for this Unicode block?

Case-folding is already supported for some characters from other
Unicode blocks such e.g. FULLWIDTH LATIN CAPITAL LETTERs,
CIRCLED LATIN CAPITAL LETTERs, etc.
But e.g. PARENTHESIZED LATIN CAPITAL LETTERs are missing too.
What is worse is that in Emacs ⒜ doesn't have even a word syntax
like its counterpart 🄐.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#69968: Case-folding of Mathematical Alphanumeric Symbols
  2024-03-23 18:27 bug#69968: Case-folding of Mathematical Alphanumeric Symbols Juri Linkov
@ 2024-03-24  6:27 ` Eli Zaretskii
  2024-03-24 17:09   ` Juri Linkov
  0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2024-03-24  6:27 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 69968

> From: Juri Linkov <juri@linkov.net>
> Date: Sat, 23 Mar 2024 20:27:45 +0200
> 
> I wonder why case-folding is not supported for letters from
> the Unicode block "Mathematical Alphanumeric Symbols":
> https://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symbols

These are not letters, they are symbols.  And letter-case is not
defined for symbols.

> Is it because the Unicode standard doesn't provide information
> about their case-folding?  And indeed they are missing from
> https://unicode.org/Public/UNIDATA/CaseFolding.txt

Unicode doesn't consider them letters.

> But OTOH, I can't find the file CaseFolding.txt in admin/unidata.
> This means Emacs doesn't use this file?

We don't.  We use the case-conversion information in UnicodeData.txt,
as it tells us everything we need to know.

> Then should we add more case-folding information explicitly
> for this Unicode block?

What is the rationale for doing so?  It's against Unicode, so we need
to have a good reason, as this will have to be maintained by hand, and
also because some users might be surprised.

> Case-folding is already supported for some characters from other
> Unicode blocks such e.g. FULLWIDTH LATIN CAPITAL LETTERs,
> CIRCLED LATIN CAPITAL LETTERs, etc.

That's because UnicodeData.txt defines their letter-case conversions.

> But e.g. PARENTHESIZED LATIN CAPITAL LETTERs are missing too.
> What is worse is that in Emacs ⒜ doesn't have even a word syntax
> like its counterpart 🄐.

I think the fact that 🄐 has the word syntax might be a mistake.  These
are both symbols, so why would we want them to have the word syntax?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#69968: Case-folding of Mathematical Alphanumeric Symbols
  2024-03-24  6:27 ` Eli Zaretskii
@ 2024-03-24 17:09   ` Juri Linkov
  2024-03-24 17:45     ` Eli Zaretskii
  0 siblings, 1 reply; 7+ messages in thread
From: Juri Linkov @ 2024-03-24 17:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 69968

>> I wonder why case-folding is not supported for letters from
>> the Unicode block "Mathematical Alphanumeric Symbols":
>> https://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symbols
>
> These are not letters, they are symbols.  And letter-case is not
> defined for symbols.

𝘋𝘰 𝘺𝘰𝘶 𝘳𝘦𝘢𝘭𝘭𝘺 𝘵𝘩𝘪𝘯𝘬 𝘵𝘩𝘪𝘴 𝘵𝘦𝘹𝘵 𝘪𝘴 𝘯𝘰𝘵 𝘸𝘳𝘪𝘵𝘵𝘦𝘯 𝘸𝘪𝘵𝘩 𝙡𝙚𝙩𝙩𝙚𝙧𝙨?

>> Is it because the Unicode standard doesn't provide information
>> about their case-folding?  And indeed they are missing from
>> https://unicode.org/Public/UNIDATA/CaseFolding.txt
>
> Unicode doesn't consider them letters.

Ок, if Unicode doesn't consider them letters,
let's stick to the Unicode standard.

>> But OTOH, I can't find the file CaseFolding.txt in admin/unidata.
>> This means Emacs doesn't use this file?
>
> We don't.  We use the case-conversion information in UnicodeData.txt,
> as it tells us everything we need to know.

Thanks, I didn't remember that case-conversion is in UnicodeData.txt.
I checked admin/unidata/UnicodeData.txt and indeed there is
no case-conversion for Mathematical Alphanumeric Symbols.

>> Then should we add more case-folding information explicitly
>> for this Unicode block?
>
> What is the rationale for doing so?  It's against Unicode, so we need
> to have a good reason, as this will have to be maintained by hand, and
> also because some users might be surprised.

I don't think that some users might be surprised because
when they don't need to change case, they just don't use
case-changing functions.  But when they expect that case
should be changed, then indeed they will be surprised
that case is not changed.

>> Case-folding is already supported for some characters from other
>> Unicode blocks such e.g. FULLWIDTH LATIN CAPITAL LETTERs,
>> CIRCLED LATIN CAPITAL LETTERs, etc.
>
> That's because UnicodeData.txt defines their letter-case conversions.

Ok, then it's very strange that the Unicode standard doesn't define
letter-case conversions for other letters.  But what can we do.

>> But e.g. PARENTHESIZED LATIN CAPITAL LETTERs are missing too.
>> What is worse is that in Emacs ⒜ doesn't have even a word syntax
>> like its counterpart 🄐.
>
> I think the fact that 🄐 has the word syntax might be a mistake.  These
> are both symbols, so why would we want them to have the word syntax?

Because they look like letters with diacritics.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#69968: Case-folding of Mathematical Alphanumeric Symbols
  2024-03-24 17:09   ` Juri Linkov
@ 2024-03-24 17:45     ` Eli Zaretskii
  2024-03-25  7:37       ` Juri Linkov
  0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2024-03-24 17:45 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 69968

> From: Juri Linkov <juri@linkov.net>
> Cc: 69968@debbugs.gnu.org
> Date: Sun, 24 Mar 2024 19:09:10 +0200
> 
> >> I wonder why case-folding is not supported for letters from
> >> the Unicode block "Mathematical Alphanumeric Symbols":
> >> https://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symbols
> >
> > These are not letters, they are symbols.  And letter-case is not
> > defined for symbols.
> 
> 𝘋𝘰 𝘺𝘰𝘶 𝘳𝘦𝘢𝘭𝘭𝘺 𝘵𝘩𝘪𝘯𝘬 𝘵𝘩𝘪𝘴 𝘵𝘦𝘹𝘵 𝘪𝘴 𝘯𝘰𝘵 𝘸𝘳𝘪𝘵𝘵𝘦𝘯 𝘸𝘪𝘵𝘩 𝙡𝙚𝙩𝙩𝙚𝙧𝙨?

What does that prove?  The fact that the glyphs look like normal
letters doesn't mean they are.  Like ℵ and ℶ are not Hebrew letters
they look like (and have left-to-right directionality).  And similarly
with 𞸀, 𞸁 and other mathematical symbols in that block aren't Arabic
letters, and in particular don't shape like Arabic letters.

> >> Case-folding is already supported for some characters from other
> >> Unicode blocks such e.g. FULLWIDTH LATIN CAPITAL LETTERs,
> >> CIRCLED LATIN CAPITAL LETTERs, etc.
> >
> > That's because UnicodeData.txt defines their letter-case conversions.
> 
> Ok, then it's very strange that the Unicode standard doesn't define
> letter-case conversions for other letters.  But what can we do.

We can define case-conversions for them if we decide to do so.
Moreover, Lisp programs which for some reason need that can do that
themselves, even if by default there are no case-conversions defined
for them.  The question is when and why is this needed?

> >> But e.g. PARENTHESIZED LATIN CAPITAL LETTERs are missing too.
> >> What is worse is that in Emacs ⒜ doesn't have even a word syntax
> >> like its counterpart 🄐.
> >
> > I think the fact that 🄐 has the word syntax might be a mistake.  These
> > are both symbols, so why would we want them to have the word syntax?
> 
> Because they look like letters with diacritics.

Not sure I agree.





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#69968: Case-folding of Mathematical Alphanumeric Symbols
  2024-03-24 17:45     ` Eli Zaretskii
@ 2024-03-25  7:37       ` Juri Linkov
  2024-03-25 12:37         ` Eli Zaretskii
  0 siblings, 1 reply; 7+ messages in thread
From: Juri Linkov @ 2024-03-25  7:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 69968

>> >> I wonder why case-folding is not supported for letters from
>> >> the Unicode block "Mathematical Alphanumeric Symbols":
>> >> https://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symbols
>> >
>> > These are not letters, they are symbols.  And letter-case is not
>> > defined for symbols.
>>
>> 𝘋𝘰 𝘺𝘰𝘶 𝘳𝘦𝘢𝘭𝘭𝘺 𝘵𝘩𝘪𝘯𝘬 𝘵𝘩𝘪𝘴 𝘵𝘦𝘹𝘵 𝘪𝘴 𝘯𝘰𝘵 𝘸𝘳𝘪𝘵𝘵𝘦𝘯 𝘸𝘪𝘵𝘩 𝙡𝙚𝙩𝙩𝙚𝙧𝙨?
>
> What does that prove?  The fact that the glyphs look like normal
> letters doesn't mean they are.  Like ℵ and ℶ are not Hebrew letters
> they look like (and have left-to-right directionality).  And similarly
> with 𞸀, 𞸁 and other mathematical symbols in that block aren't Arabic
> letters, and in particular don't shape like Arabic letters.

I agree that these characters were intended to be used only
as mathematical symbols.  The problem is that often these symbols
are abused as letters to apply more styles in applications that
don't support styles.  There are special sites such as
https://www.textconverter.net/
that convert ASCII text to styled Unicode characters.

I don't use such sites, but once tried to copy such text to Emacs
and discovered that Isearch already nicely supports the search
of these characters by char-fold.  So it was a surprise that
unlike char-fold, case-fold is not supported to ignore case
while searching.

>> >> Case-folding is already supported for some characters from other
>> >> Unicode blocks such e.g. FULLWIDTH LATIN CAPITAL LETTERs,
>> >> CIRCLED LATIN CAPITAL LETTERs, etc.
>> >
>> > That's because UnicodeData.txt defines their letter-case conversions.
>>
>> Ok, then it's very strange that the Unicode standard doesn't define
>> letter-case conversions for other letters.  But what can we do.
>
> We can define case-conversions for them if we decide to do so.
> Moreover, Lisp programs which for some reason need that can do that
> themselves, even if by default there are no case-conversions defined
> for them.  The question is when and why is this needed?

Probably case-conversions for them could be added later only
when there is more support for such symbols in Emacs:
for example, after creating an input method to input them,
or better a command that will convert the region of ASCII chars,
etc.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#69968: Case-folding of Mathematical Alphanumeric Symbols
  2024-03-25  7:37       ` Juri Linkov
@ 2024-03-25 12:37         ` Eli Zaretskii
  2024-03-25 17:18           ` Juri Linkov
  0 siblings, 1 reply; 7+ messages in thread
From: Eli Zaretskii @ 2024-03-25 12:37 UTC (permalink / raw)
  To: Juri Linkov; +Cc: 69968

> From: Juri Linkov <juri@linkov.net>
> Cc: 69968@debbugs.gnu.org
> Date: Mon, 25 Mar 2024 09:37:10 +0200
> 
> >> Ok, then it's very strange that the Unicode standard doesn't define
> >> letter-case conversions for other letters.  But what can we do.
> >
> > We can define case-conversions for them if we decide to do so.
> > Moreover, Lisp programs which for some reason need that can do that
> > themselves, even if by default there are no case-conversions defined
> > for them.  The question is when and why is this needed?
> 
> Probably case-conversions for them could be added later only
> when there is more support for such symbols in Emacs:
> for example, after creating an input method to input them,
> or better a command that will convert the region of ASCII chars,
> etc.

I agree that case-conversions for these characters would make more
sense as part of a larger package which would allow using these
characters as letters.  In any case, making a lower-case character L
and upper-case character U a case-pair is simple:

  (let ((tbl (standard-case-table)))
    (set-case-syntax-pair U L tbl))

The above makes the change global, but it can also be made
buffer-locally; see "Case Tables" in the ELisp manual for more
details.

I guess we can now close this bug?  Or is there anything else to do
here?





^ permalink raw reply	[flat|nested] 7+ messages in thread

* bug#69968: Case-folding of Mathematical Alphanumeric Symbols
  2024-03-25 12:37         ` Eli Zaretskii
@ 2024-03-25 17:18           ` Juri Linkov
  0 siblings, 0 replies; 7+ messages in thread
From: Juri Linkov @ 2024-03-25 17:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 69968-done

> I agree that case-conversions for these characters would make more
> sense as part of a larger package which would allow using these
> characters as letters.  In any case, making a lower-case character L
> and upper-case character U a case-pair is simple:
>
>   (let ((tbl (standard-case-table)))
>     (set-case-syntax-pair U L tbl))
>
> The above makes the change global, but it can also be made
> buffer-locally; see "Case Tables" in the ELisp manual for more
> details.
>
> I guess we can now close this bug?  Or is there anything else to do
> here?

Thanks for the explanations, so I'm closing this now.





^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-03-25 17:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-23 18:27 bug#69968: Case-folding of Mathematical Alphanumeric Symbols Juri Linkov
2024-03-24  6:27 ` Eli Zaretskii
2024-03-24 17:09   ` Juri Linkov
2024-03-24 17:45     ` Eli Zaretskii
2024-03-25  7:37       ` Juri Linkov
2024-03-25 12:37         ` Eli Zaretskii
2024-03-25 17:18           ` Juri Linkov

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.