From: Ted Zlatanov <tzz@lifelogs.com>
To: emacs-devel@gnu.org
Subject: Re: highlighting non-ASCII characters
Date: Fri, 26 Mar 2010 12:35:36 -0500 [thread overview]
Message-ID: <87pr2rj89j.fsf@lifelogs.com> (raw)
In-Reply-To: e01d8a51003241234t7fd61191ua23c8152c3ac705@mail.gmail.com
On Wed, 24 Mar 2010 20:34:41 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote:
LB> The attached file sets up IDN chars as above. How about defining a
LB> character class [:idnchars:]?
The IDN character class could be useful. The list changes so rarely
that it can be hard-coded like the POSIX classes IMO. I think this
would be done in src/regex.c by defining RECC_IDNCHARS for instance.
This could highlight when non-IDN characters are used in a domain name.
But IDN characters are separate from the "confusables" (homoglyphs) we
should discuss, which are much more problematic and more complex because
they not just a character class.
On Thu, 25 Mar 2010 09:11:35 +0200 Juri Linkov <juri@jurta.org> wrote:
JL> I think it would be more useful to implement this spec:
JL> http://www.unicode.org/reports/tr39/data/confusables.txt
JL> "Visually Confusable Characters: Provides a mapping for visual
JL> confusables for use in further restricting identifiers for security".
JL> It's very large, but it seems it's still incomplete. I can't find
JL> a "confusable" mapping for the problem I reported:
JL> BOX DRAWINGS DOUBLE HORIZONTAL -> EQUALS SIGN
We can have a [:confusable:] character class defined in src/regex.c.
That lets us find these characters. It could be generated from the TXT
database and augmented with our own mappings. But there's grouping
information, so maybe that should be available too. For highlighting we
don't need grouping information, but the user would find it useful to
look at a glyph and find out that it looks like 3 other glyphs. So this
can be in a Lisp-level data structure like a hashtable with list values.
I looked at whitespace.el and it looks generally suitable for this kind
of highlighting. I can't decide if the work should augment
whitespace.el or if it should be a new library called visible.el
(because the name whitespace.el is so specific).
On Thu, 25 Mar 2010 15:07:04 +0100 Lennart Borgman <lennart.borgman@gmail.com> wrote:
LB> To me it looks like IDN is the most important. Is not this a
LB> derivative work from "confusables"?
I think they are separate logically. TR39 cares about "confusables" in
the context of IDN but Emacs has a wider view as a general text editor,
IIUC.
Ted
next prev parent reply other threads:[~2010-03-26 17:35 UTC|newest]
Thread overview: 182+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-18 19:11 Translation of http status code to text Lennart Borgman
2010-03-22 1:19 ` Juri Linkov
2010-03-22 13:17 ` Ted Zlatanov
2010-03-22 14:01 ` Stefan Monnier
2010-03-22 14:25 ` Ted Zlatanov
2010-03-22 17:06 ` Ted Zlatanov
2010-03-22 17:55 ` Sven Joachim
2010-03-22 19:23 ` Ted Zlatanov
2010-03-22 20:32 ` Sven Joachim
2010-03-22 21:31 ` Ted Zlatanov
2010-03-23 9:55 ` Juri Linkov
2010-03-23 13:08 ` Lennart Borgman
2010-03-23 14:26 ` face for non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov
2010-03-23 16:28 ` Lennart Borgman
2010-03-23 18:18 ` face for non-ASCII characters Ted Zlatanov
2011-04-15 22:41 ` Ted Zlatanov
2011-04-15 23:07 ` Lennart Borgman
2011-04-16 0:51 ` Ted Zlatanov
2011-04-16 9:10 ` Lennart Borgman
2011-04-16 15:05 ` Ted Zlatanov
2011-04-16 15:28 ` Lennart Borgman
2011-04-16 15:42 ` Ted Zlatanov
2011-04-16 15:50 ` Lennart Borgman
2011-04-16 15:57 ` Ted Zlatanov
2011-04-16 16:01 ` Lennart Borgman
2011-04-16 16:13 ` Ted Zlatanov
2011-04-16 16:22 ` Lennart Borgman
2011-04-16 16:27 ` Drew Adams
2011-04-16 16:45 ` Ted Zlatanov
2011-04-16 16:48 ` Lennart Borgman
2011-04-16 16:55 ` Ted Zlatanov
2011-04-16 17:11 ` Lennart Borgman
2011-04-18 15:48 ` Ted Zlatanov
2011-04-18 15:53 ` Lennart Borgman
2011-04-18 16:20 ` Ted Zlatanov
2011-04-18 17:03 ` Lennart Borgman
2011-04-19 13:07 ` Ted Zlatanov
2011-04-19 18:56 ` Lennart Borgman
2011-04-20 14:49 ` Ted Zlatanov
2011-04-20 21:38 ` Lennart Borgman
2011-04-21 17:35 ` Ted Zlatanov
2011-04-21 18:42 ` Lennart Borgman
2011-04-21 19:14 ` Ted Zlatanov
2011-04-21 20:00 ` Lennart Borgman
2011-04-21 20:35 ` Ted Zlatanov
2011-04-21 20:53 ` Lennart Borgman
2011-04-21 21:18 ` Ted Zlatanov
2011-04-22 12:20 ` Lennart Borgman
2011-04-22 12:49 ` Stephen J. Turnbull
2011-04-22 13:23 ` Lennart Borgman
2011-04-23 0:50 ` Richard Stallman
2011-04-23 7:13 ` Lennart Borgman
2011-04-25 17:54 ` Richard Stallman
2011-04-26 18:26 ` Chong Yidong
2011-04-26 19:05 ` Ted Zlatanov
2011-04-26 20:29 ` Chong Yidong
2011-04-27 3:45 ` Ted Zlatanov
2011-04-27 4:42 ` Stephen J. Turnbull
2011-05-02 18:18 ` Ted Zlatanov
2011-05-03 1:50 ` Stephen J. Turnbull
2011-05-03 14:45 ` Ted Zlatanov
2011-05-03 21:21 ` Lennart Borgman
2011-05-04 14:41 ` Stephen J. Turnbull
2011-04-27 12:41 ` Lennart Borgman
2011-04-22 14:20 ` Ted Zlatanov
2011-04-22 17:12 ` Lennart Borgman
2011-04-26 3:14 ` package management proposals for Emacs (was: face for non-ASCII characters) Ted Zlatanov
2011-04-26 8:10 ` Lennart Borgman
2011-04-26 21:46 ` Richard Stallman
2011-04-27 1:19 ` package management proposals for Emacs Stefan Monnier
2011-04-27 3:36 ` Ted Zlatanov
2011-04-27 21:14 ` Richard Stallman
2011-04-26 3:09 ` markchars.el 0.2.0 and idn.el (was: face for non-ASCII characters) Ted Zlatanov
2011-04-26 8:13 ` Lennart Borgman
2011-04-26 15:28 ` idn.el and confusables.txt (was: markchars.el 0.2.0 and idn.el) Ted Zlatanov
2011-05-13 19:42 ` idn.el and confusables.txt Stefan Monnier
2011-05-13 20:19 ` Ted Zlatanov
2011-05-14 8:13 ` Eli Zaretskii
2011-05-14 8:06 ` Eli Zaretskii
2011-05-14 8:56 ` Lennart Borgman
2011-05-14 9:36 ` Eli Zaretskii
2011-05-14 13:40 ` Ted Zlatanov
2011-05-14 14:38 ` Eli Zaretskii
2011-05-14 15:30 ` Ted Zlatanov
2011-05-14 16:42 ` Eli Zaretskii
2011-05-14 17:06 ` Ted Zlatanov
2011-05-14 20:59 ` Eli Zaretskii
2011-05-15 1:22 ` Ted Zlatanov
2011-05-15 5:56 ` Eli Zaretskii
2011-05-15 12:14 ` Ted Zlatanov
2011-05-16 12:38 ` Eli Zaretskii
2011-05-16 18:31 ` Ted Zlatanov
2011-05-17 17:59 ` Eli Zaretskii
2011-05-17 15:32 ` Ted Zlatanov
2011-05-18 18:15 ` Ted Zlatanov
2011-05-14 17:25 ` Stefan Monnier
2011-05-15 13:06 ` Kenichi Handa
2011-05-15 17:34 ` Eli Zaretskii
2011-05-18 5:23 ` handa
2011-05-18 7:38 ` Eli Zaretskii
2011-05-18 7:59 ` handa
2011-05-18 8:13 ` Eli Zaretskii
2011-06-17 8:15 ` Kenichi Handa
2011-06-17 15:12 ` Eli Zaretskii
2011-06-21 2:07 ` Kenichi Handa
2011-06-21 2:53 ` Eli Zaretskii
2011-06-21 3:29 ` Kenichi Handa
2011-06-21 6:11 ` Eli Zaretskii
2011-06-21 7:22 ` Kenichi Handa
2011-06-21 7:34 ` Eli Zaretskii
2011-06-21 8:02 ` Kenichi Handa
2011-06-21 10:30 ` bidi at startup (was: idn.el and confusables.txt) Eli Zaretskii
2011-06-21 15:12 ` bidi at startup Stefan Monnier
2011-06-21 17:13 ` Eli Zaretskii
2011-06-22 15:32 ` Stefan Monnier
2011-07-07 6:10 ` C interface to Unicode character property char-tables Kenichi Handa
2011-08-06 16:52 ` Using uniprop_table_lookup (was: idn.el and confusables.txt) Eli Zaretskii
2011-08-09 0:55 ` Kenichi Handa
2011-08-09 1:32 ` Using uniprop_table_lookup Stefan Monnier
2011-08-09 4:31 ` Kenichi Handa
2011-08-15 8:57 ` Eli Zaretskii
2011-05-31 10:42 ` uni-confusables 0.1 is on the Emacs ELPA branch (was: idn.el and confusables.txt) Ted Zlatanov
2011-06-08 10:42 ` uni-confusables 0.1 is on the Emacs ELPA branch Ted Zlatanov
2011-06-08 15:22 ` Stefan Monnier
2011-04-16 16:00 ` face for non-ASCII characters Drew Adams
2010-03-23 19:40 ` Florian Beck
2010-03-23 14:35 ` Translation of http status code to text Miles Bader
2010-03-23 14:22 ` highlighting non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov
2010-03-23 16:50 ` highlighting non-ASCII characters (was: Translation of http statuscode " Drew Adams
2010-03-23 21:49 ` highlighting non-ASCII characters Stefan Monnier
2010-03-23 21:53 ` Drew Adams
2010-03-24 0:45 ` Stefan Monnier
2010-03-24 1:03 ` Ted Zlatanov
2010-03-24 2:47 ` Stefan Monnier
2010-03-24 4:20 ` Eli Zaretskii
2010-03-24 5:14 ` Jason Rumney
2010-03-24 13:25 ` Stefan Monnier
2010-03-24 15:06 ` Jason Rumney
2010-03-24 19:47 ` Ted Zlatanov
2010-03-24 10:05 ` Ted Zlatanov
2010-03-24 16:21 ` Lennart Borgman
2010-03-24 19:34 ` Lennart Borgman
2010-03-26 17:35 ` Ted Zlatanov [this message]
2010-03-26 20:43 ` Ted Zlatanov
2010-03-26 22:50 ` Lennart Borgman
2010-03-29 18:38 ` Ted Zlatanov
2010-03-29 18:48 ` Drew Adams
2010-03-29 20:20 ` Stefan Monnier
2010-03-29 20:19 ` Stefan Monnier
2010-03-29 20:51 ` Lennart Borgman
2010-03-30 13:22 ` Ted Zlatanov
2010-03-29 21:05 ` Ted Zlatanov
2010-03-29 21:31 ` Lennart Borgman
2010-03-29 21:32 ` Drew Adams
2010-03-30 13:15 ` Ted Zlatanov
2010-03-30 14:04 ` Drew Adams
2010-03-30 14:17 ` Lennart Borgman
2010-03-30 14:42 ` Ted Zlatanov
2010-03-30 16:18 ` Juri Linkov
2010-03-30 1:45 ` Stefan Monnier
2010-03-25 7:11 ` Juri Linkov
2010-03-25 14:07 ` Lennart Borgman
2010-03-25 17:32 ` Juri Linkov
2010-03-26 0:32 ` Lennart Borgman
2010-03-26 13:38 ` Stephen Berman
2010-03-26 22:44 ` Lennart Borgman
2010-03-25 7:12 ` Juri Linkov
2010-03-24 2:09 ` Drew Adams
2010-03-24 5:00 ` Stephen J. Turnbull
2010-03-24 9:28 ` Juri Linkov
2010-03-24 13:15 ` Ted Zlatanov
2010-03-24 9:27 ` Juri Linkov
2010-03-22 18:41 ` Translation of http status code to text Stefan Monnier
2010-03-22 19:15 ` Ted Zlatanov
2010-03-23 9:54 ` Juri Linkov
2010-03-23 10:54 ` joakim
2010-03-23 15:02 ` Ted Zlatanov
2010-03-24 3:22 ` Stefan Monnier
2010-03-24 17:35 ` Glenn Morris
2010-03-24 19:37 ` Ted Zlatanov
2010-03-25 1:16 ` Ted Zlatanov
2010-03-23 12:57 ` Stefan Monnier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87pr2rj89j.fsf@lifelogs.com \
--to=tzz@lifelogs.com \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).