all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Juri Linkov <juri@linkov.net>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 36923@debbugs.gnu.org
Subject: bug#36923: Combining Diacritical Marks are not Latin only
Date: Mon, 05 Aug 2019 22:41:59 +0300	[thread overview]
Message-ID: <87zhknzc7c.fsf@mail.linkov.net> (raw)
In-Reply-To: <83k1brd28a.fsf@gnu.org> (Eli Zaretskii's message of "Mon, 05 Aug 2019 19:08:21 +0300")

>>   (aref char-script-table ?\N{COMBINING ACUTE ACCENT})
>>
>> could return
>>
>>   (latin greek cyrillic)
>>
>> instead of the current
>>
>>   latin
>
> char-script-table is documented to yield a single symbol, so returning
> a list would be an incompatible change, which we should avoid.

The docstring of char-script-table says:

  Char table of script symbols.
  It has one extra slot whose value is a list of script symbols.

So it seems char-script-table should yield a list of script symbols?

I searched more for char-script-table in the documentation, and one
place where it's used is forward-word.  But I don't understand why
forward-word doesn't stop between “COMBINING ACUTE ACCENT” (that is
the Latin script) and non-Latin letters.

This is good that it doesn't stop here, and I'm just trying to
understand why - so the same logic could be used in markchars-mode.
Maybe it doesn't stop because of special script handling in
‘find-word-boundary-function-table’?  Or because it ignores all
combining characters?

BTW, while looking at forward-word and right-word I noticed inconsistency:
there are left-word and right-word commands, but no left-sexp and right-sexp
to accompany forward-sexp.

> More generally, I think what you describe is a clear conceptual bug in
> markchars-mode: it should only pay attention to the script of the base
> characters, not to the script of combining accents.  The latter is
> mostly irrelevant, certainly so for the purpose of detecting
> confusables.

Could you suggest a proper function to strip all combining characters
from the string?





  reply	other threads:[~2019-08-05 19:41 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-04 20:40 bug#36923: Combining Diacritical Marks are not Latin only Juri Linkov
2019-08-05 16:08 ` Eli Zaretskii
2019-08-05 19:41   ` Juri Linkov [this message]
2019-08-06 14:32     ` Eli Zaretskii
2019-08-07 21:44       ` Juri Linkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zhknzc7c.fsf@mail.linkov.net \
    --to=juri@linkov.net \
    --cc=36923@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.