From: Ted Zlatanov <tzz@lifelogs.com>
To: emacs-devel@gnu.org
Subject: Re: idn.el and confusables.txt
Date: Sun, 15 May 2011 07:14:47 -0500 [thread overview]
Message-ID: <87hb8w5few.fsf@lifelogs.com> (raw)
In-Reply-To: E1QLUJ8-0003pE-HP@fencepost.gnu.org
On Sun, 15 May 2011 01:56:02 -0400 Eli Zaretskii <eliz@gnu.org> wrote:
EZ> These all examine portions of a buffer ("words") for being a match to
EZ> some string or regexp. So I think having strings in the char-table
EZ> will be more convenient, because you could then use looking-at,
EZ> string=, string-match, etc.
Oh, good point. OK, strings it is. I'll write the converter.
>> As a general rule I'd say that if the mapping is to a single character
>> with the SL/SA single-script property, chances are it's a true
>> confusable. Otherwise it could be legitimate and we'd need to convert
>> the string to a normalized form, which is probably slow (do you know?)
EZ> What do you mean by "normalized form"?
Unicode has a normalization algorithm to see if two strings are
informationally the same regardless of the combining characters and
other sequences within. But thinking about it, even if normalization
says they're the same, it's still a potential problem for the user, so
we can skip normalization and always mark those.
>> Based on all this, I think it's best to make the confusables char-table
>> values atoms or sequences (strings or lists) but split them into two
>> char-tables for the single-script and multi-script mappings.
EZ> If we were to implement the full IDNA protocol, would the above be
EZ> enough? Or will we need additional information?
Oh, all this has been for confusables (TR39) only. IDNA and uni-idn.el
will have their own needs! IIUC, Lennart used IDNA only as a character
set in markchars.el (I didn't write that functionality and he maintains
idn.el), but there are more security issues with it we may need to
handle.
IDNA is better described in http://unicode.org/reports/tr46/ and the
links at the end of that document (a whole bunch of RFCs). I'm not
interested in implementing the IDNA code beyond supporting the current
character set detection because I don't think IDNA is popular enough,
but maybe Lennart and others want to do it.
For further possible markchars.el functionality, take a look at
http://www.unicode.org/reports/tr36/ (Unicode Security Considerations).
It talks about the confusables issues, IDNA issues, and bidi issues
among others. It's a really good explanation of what security-related
functionality is needed from the confusables char-table and potentially
other places in Emacs.
Ted
next prev parent reply other threads:[~2011-05-15 12:14 UTC|newest]
Thread overview: 182+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-18 19:11 Translation of http status code to text Lennart Borgman
2010-03-22 1:19 ` Juri Linkov
2010-03-22 13:17 ` Ted Zlatanov
2010-03-22 14:01 ` Stefan Monnier
2010-03-22 14:25 ` Ted Zlatanov
2010-03-22 17:06 ` Ted Zlatanov
2010-03-22 17:55 ` Sven Joachim
2010-03-22 19:23 ` Ted Zlatanov
2010-03-22 20:32 ` Sven Joachim
2010-03-22 21:31 ` Ted Zlatanov
2010-03-23 9:55 ` Juri Linkov
2010-03-23 13:08 ` Lennart Borgman
2010-03-23 14:26 ` face for non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov
2010-03-23 16:28 ` Lennart Borgman
2010-03-23 18:18 ` face for non-ASCII characters Ted Zlatanov
2011-04-15 22:41 ` Ted Zlatanov
2011-04-15 23:07 ` Lennart Borgman
2011-04-16 0:51 ` Ted Zlatanov
2011-04-16 9:10 ` Lennart Borgman
2011-04-16 15:05 ` Ted Zlatanov
2011-04-16 15:28 ` Lennart Borgman
2011-04-16 15:42 ` Ted Zlatanov
2011-04-16 15:50 ` Lennart Borgman
2011-04-16 15:57 ` Ted Zlatanov
2011-04-16 16:01 ` Lennart Borgman
2011-04-16 16:13 ` Ted Zlatanov
2011-04-16 16:22 ` Lennart Borgman
2011-04-16 16:27 ` Drew Adams
2011-04-16 16:45 ` Ted Zlatanov
2011-04-16 16:48 ` Lennart Borgman
2011-04-16 16:55 ` Ted Zlatanov
2011-04-16 17:11 ` Lennart Borgman
2011-04-18 15:48 ` Ted Zlatanov
2011-04-18 15:53 ` Lennart Borgman
2011-04-18 16:20 ` Ted Zlatanov
2011-04-18 17:03 ` Lennart Borgman
2011-04-19 13:07 ` Ted Zlatanov
2011-04-19 18:56 ` Lennart Borgman
2011-04-20 14:49 ` Ted Zlatanov
2011-04-20 21:38 ` Lennart Borgman
2011-04-21 17:35 ` Ted Zlatanov
2011-04-21 18:42 ` Lennart Borgman
2011-04-21 19:14 ` Ted Zlatanov
2011-04-21 20:00 ` Lennart Borgman
2011-04-21 20:35 ` Ted Zlatanov
2011-04-21 20:53 ` Lennart Borgman
2011-04-21 21:18 ` Ted Zlatanov
2011-04-22 12:20 ` Lennart Borgman
2011-04-22 12:49 ` Stephen J. Turnbull
2011-04-22 13:23 ` Lennart Borgman
2011-04-23 0:50 ` Richard Stallman
2011-04-23 7:13 ` Lennart Borgman
2011-04-25 17:54 ` Richard Stallman
2011-04-26 18:26 ` Chong Yidong
2011-04-26 19:05 ` Ted Zlatanov
2011-04-26 20:29 ` Chong Yidong
2011-04-27 3:45 ` Ted Zlatanov
2011-04-27 4:42 ` Stephen J. Turnbull
2011-05-02 18:18 ` Ted Zlatanov
2011-05-03 1:50 ` Stephen J. Turnbull
2011-05-03 14:45 ` Ted Zlatanov
2011-05-03 21:21 ` Lennart Borgman
2011-05-04 14:41 ` Stephen J. Turnbull
2011-04-27 12:41 ` Lennart Borgman
2011-04-22 14:20 ` Ted Zlatanov
2011-04-22 17:12 ` Lennart Borgman
2011-04-26 3:14 ` package management proposals for Emacs (was: face for non-ASCII characters) Ted Zlatanov
2011-04-26 8:10 ` Lennart Borgman
2011-04-26 21:46 ` Richard Stallman
2011-04-27 1:19 ` package management proposals for Emacs Stefan Monnier
2011-04-27 3:36 ` Ted Zlatanov
2011-04-27 21:14 ` Richard Stallman
2011-04-26 3:09 ` markchars.el 0.2.0 and idn.el (was: face for non-ASCII characters) Ted Zlatanov
2011-04-26 8:13 ` Lennart Borgman
2011-04-26 15:28 ` idn.el and confusables.txt (was: markchars.el 0.2.0 and idn.el) Ted Zlatanov
2011-05-13 19:42 ` idn.el and confusables.txt Stefan Monnier
2011-05-13 20:19 ` Ted Zlatanov
2011-05-14 8:13 ` Eli Zaretskii
2011-05-14 8:06 ` Eli Zaretskii
2011-05-14 8:56 ` Lennart Borgman
2011-05-14 9:36 ` Eli Zaretskii
2011-05-14 13:40 ` Ted Zlatanov
2011-05-14 14:38 ` Eli Zaretskii
2011-05-14 15:30 ` Ted Zlatanov
2011-05-14 16:42 ` Eli Zaretskii
2011-05-14 17:06 ` Ted Zlatanov
2011-05-14 20:59 ` Eli Zaretskii
2011-05-15 1:22 ` Ted Zlatanov
2011-05-15 5:56 ` Eli Zaretskii
2011-05-15 12:14 ` Ted Zlatanov [this message]
2011-05-16 12:38 ` Eli Zaretskii
2011-05-16 18:31 ` Ted Zlatanov
2011-05-17 17:59 ` Eli Zaretskii
2011-05-17 15:32 ` Ted Zlatanov
2011-05-18 18:15 ` Ted Zlatanov
2011-05-14 17:25 ` Stefan Monnier
2011-05-15 13:06 ` Kenichi Handa
2011-05-15 17:34 ` Eli Zaretskii
2011-05-18 5:23 ` handa
2011-05-18 7:38 ` Eli Zaretskii
2011-05-18 7:59 ` handa
2011-05-18 8:13 ` Eli Zaretskii
2011-06-17 8:15 ` Kenichi Handa
2011-06-17 15:12 ` Eli Zaretskii
2011-06-21 2:07 ` Kenichi Handa
2011-06-21 2:53 ` Eli Zaretskii
2011-06-21 3:29 ` Kenichi Handa
2011-06-21 6:11 ` Eli Zaretskii
2011-06-21 7:22 ` Kenichi Handa
2011-06-21 7:34 ` Eli Zaretskii
2011-06-21 8:02 ` Kenichi Handa
2011-06-21 10:30 ` bidi at startup (was: idn.el and confusables.txt) Eli Zaretskii
2011-06-21 15:12 ` bidi at startup Stefan Monnier
2011-06-21 17:13 ` Eli Zaretskii
2011-06-22 15:32 ` Stefan Monnier
2011-07-07 6:10 ` C interface to Unicode character property char-tables Kenichi Handa
2011-08-06 16:52 ` Using uniprop_table_lookup (was: idn.el and confusables.txt) Eli Zaretskii
2011-08-09 0:55 ` Kenichi Handa
2011-08-09 1:32 ` Using uniprop_table_lookup Stefan Monnier
2011-08-09 4:31 ` Kenichi Handa
2011-08-15 8:57 ` Eli Zaretskii
2011-05-31 10:42 ` uni-confusables 0.1 is on the Emacs ELPA branch (was: idn.el and confusables.txt) Ted Zlatanov
2011-06-08 10:42 ` uni-confusables 0.1 is on the Emacs ELPA branch Ted Zlatanov
2011-06-08 15:22 ` Stefan Monnier
2011-04-16 16:00 ` face for non-ASCII characters Drew Adams
2010-03-23 19:40 ` Florian Beck
2010-03-23 14:35 ` Translation of http status code to text Miles Bader
2010-03-23 14:22 ` highlighting non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov
2010-03-23 16:50 ` highlighting non-ASCII characters (was: Translation of http statuscode " Drew Adams
2010-03-23 21:49 ` highlighting non-ASCII characters Stefan Monnier
2010-03-23 21:53 ` Drew Adams
2010-03-24 0:45 ` Stefan Monnier
2010-03-24 1:03 ` Ted Zlatanov
2010-03-24 2:47 ` Stefan Monnier
2010-03-24 4:20 ` Eli Zaretskii
2010-03-24 5:14 ` Jason Rumney
2010-03-24 13:25 ` Stefan Monnier
2010-03-24 15:06 ` Jason Rumney
2010-03-24 19:47 ` Ted Zlatanov
2010-03-24 10:05 ` Ted Zlatanov
2010-03-24 16:21 ` Lennart Borgman
2010-03-24 19:34 ` Lennart Borgman
2010-03-26 17:35 ` Ted Zlatanov
2010-03-26 20:43 ` Ted Zlatanov
2010-03-26 22:50 ` Lennart Borgman
2010-03-29 18:38 ` Ted Zlatanov
2010-03-29 18:48 ` Drew Adams
2010-03-29 20:20 ` Stefan Monnier
2010-03-29 20:19 ` Stefan Monnier
2010-03-29 20:51 ` Lennart Borgman
2010-03-30 13:22 ` Ted Zlatanov
2010-03-29 21:05 ` Ted Zlatanov
2010-03-29 21:31 ` Lennart Borgman
2010-03-29 21:32 ` Drew Adams
2010-03-30 13:15 ` Ted Zlatanov
2010-03-30 14:04 ` Drew Adams
2010-03-30 14:17 ` Lennart Borgman
2010-03-30 14:42 ` Ted Zlatanov
2010-03-30 16:18 ` Juri Linkov
2010-03-30 1:45 ` Stefan Monnier
2010-03-25 7:11 ` Juri Linkov
2010-03-25 14:07 ` Lennart Borgman
2010-03-25 17:32 ` Juri Linkov
2010-03-26 0:32 ` Lennart Borgman
2010-03-26 13:38 ` Stephen Berman
2010-03-26 22:44 ` Lennart Borgman
2010-03-25 7:12 ` Juri Linkov
2010-03-24 2:09 ` Drew Adams
2010-03-24 5:00 ` Stephen J. Turnbull
2010-03-24 9:28 ` Juri Linkov
2010-03-24 13:15 ` Ted Zlatanov
2010-03-24 9:27 ` Juri Linkov
2010-03-22 18:41 ` Translation of http status code to text Stefan Monnier
2010-03-22 19:15 ` Ted Zlatanov
2010-03-23 9:54 ` Juri Linkov
2010-03-23 10:54 ` joakim
2010-03-23 15:02 ` Ted Zlatanov
2010-03-24 3:22 ` Stefan Monnier
2010-03-24 17:35 ` Glenn Morris
2010-03-24 19:37 ` Ted Zlatanov
2010-03-25 1:16 ` Ted Zlatanov
2010-03-23 12:57 ` Stefan Monnier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87hb8w5few.fsf@lifelogs.com \
--to=tzz@lifelogs.com \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).