all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Juri Linkov <juri@linkov.net>
Cc: 36923@debbugs.gnu.org
Subject: bug#36923: Combining Diacritical Marks are not Latin only
Date: Mon, 05 Aug 2019 19:08:21 +0300	[thread overview]
Message-ID: <83k1brd28a.fsf@gnu.org> (raw)
In-Reply-To: <87lfw8r744.fsf@mail.linkov.net> (message from Juri Linkov on Sun, 04 Aug 2019 23:40:38 +0300)

> From: Juri Linkov <juri@linkov.net>
> Date: Sun, 04 Aug 2019 23:40:38 +0300
> 
> The generated file lisp/international/charscript.el
> assigns the block “Combining Diacritical Marks” to the ‘latin’ script
> on the assumption that these characters are used only in Latin.
> 
> But in fact according to e.g. https://en.wikipedia.org/wiki/Acute_accent
> the acute accent marks the stressed vowel of a word in several languages
> with alphabets based on the Latin, Cyrillic, and Greek scripts.
> In particular https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode
> mentions how characters from other blocks are used in Cyrillic script.
> Moreover, the Combining Diacritical Marks block also
> contains several characters from the Greek script:
> COMBINING GREEK PERISPOMENI, COMBINING GREEK KORONIS
> COMBINING GREEK DIALYTIKA TONOS, COMBINING GREEK YPOGEGRAMMENI
> 
> I noticed this problem recently while helping to develop char-fold where
> GREEK SMALL LETTER IOTA combined with COMBINING GREEK DIALYTIKA TONOS was
> alarmingly highlighted as “mixed scripts” by markchars-mode from GNU ELPA.
> 
> Of course, it's possible to add exceptions for characters in this block
> in markchars-mode.  But before doing this, I'm asking a confirmation
> whether Unicode data should be fixed in ‘char-script-table’, so e.g.
> 
>   (aref char-script-table ?\N{COMBINING ACUTE ACCENT})
> 
> could return
> 
>   (latin greek cyrillic)
> 
> instead of the current
> 
>   latin

char-script-table is documented to yield a single symbol, so returning
a list would be an incompatible change, which we should avoid.

More generally, I think what you describe is a clear conceptual bug in
markchars-mode: it should only pay attention to the script of the base
characters, not to the script of combining accents.  The latter is
mostly irrelevant, certainly so for the purpose of detecting
confusables.

So I think this should be fixed in markchars-mode, and the fact that
we somewhat arbitrarily assign those diacritics to the latin script is
not a serious problem, if at all.





  reply	other threads:[~2019-08-05 16:08 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-04 20:40 bug#36923: Combining Diacritical Marks are not Latin only Juri Linkov
2019-08-05 16:08 ` Eli Zaretskii [this message]
2019-08-05 19:41   ` Juri Linkov
2019-08-06 14:32     ` Eli Zaretskii
2019-08-07 21:44       ` Juri Linkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83k1brd28a.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=36923@debbugs.gnu.org \
    --cc=juri@linkov.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.