From: Eli Zaretskii <eliz@gnu.org>
To: Ted Zlatanov <tzz@lifelogs.com>
Cc: emacs-devel@gnu.org
Subject: Re: idn.el and confusables.txt
Date: Sat, 14 May 2011 17:38:11 +0300 [thread overview]
Message-ID: <83aaepfiuk.fsf@gnu.org> (raw)
In-Reply-To: <87y629ien3.fsf@lifelogs.com>
> From: Ted Zlatanov <tzz@lifelogs.com>
> Date: Sat, 14 May 2011 08:40:48 -0500
>
> EZ> You see, the uni-*.el files we create out of the Unicode DB are not
> EZ> used anywhere in application code, AFAIK. We use them to display
> EZ> character properties in the likes of "C-u C-x =", and that's it. I'm
> EZ> not even sure they are organized in a way that makes them useful.
>
> markchars.el could use other Unicode properties if people ask.
I'm talking about the details. The way we currently set the tables in
uni-*.el is that many of the values are symbols. For example:
(get-char-code-property ?1 'general-category) => Nd
(get-char-code-property ?א 'bidi-class) => R
(get-char-code-property ?\( 'mirrored) => Y
The `Nd', `R', and `Y' are symbols.
Now, suppose you wanted to use these values in some code that needs to
be fast -- how would you feel about having to write multi-branch
`cond' forms to compare the value against all the possibilities?
For bidi reordering, which runs in the innermost loop of the display
engine, using the `bidi-class' or `mirrored' properties that are
symbols would be prohibitively expensive.
For now, with markchars.el, all you need is a boolean value for each
character. However, in other use cases, some other Lisp code will
want the paired character. Yet another application will want to
compare characters such that confusable pairs will compare equal. Can
a single table satisfy all these needs efficiently? Maybe it can, but
we need to design that table carefully.
> But specifically regarding the ones I'm proposing for inclusion,
> since we've started using the GNU ELPA more and markchars.el lives
> in it, we can put uni-confusables.el and uni-idn.el in the GNU ELPA
> instead of the Emacs trunk.
I'm not arguing about where to put them. I'm saying that for such
basic infrastructure, we should consider the possible uses before we
rush into implementation. Otherwise, we will again repeat the same
mistake, whose result is that the only real user of bidirectional
properties cannot use uni-bidi.el!
> EZ> So I'd really like to avoid introducing yet another huge table whose
> EZ> only effects are to show one more property in "C-u C-x =" and bloat
> EZ> the ELisp manual some more.
>
> IMO it's not a huge table
??? It's a char-table that can be indexed by any character supported
by Emacs. Even if you count only the characters mentioned in
confusables.txt, there are 20 thousand of them. char-tables are
memory-efficient, but their footprint is not negligible.
The bloat may be insignificant by comparison, but if the _only_ useful
effect is the bloat, why should we do that?
> Also the char-table doesn't have to
> cover the Asian confusables--I'm not sure anyone would need those.
Well, the Unicode consortium definitely thought they were needed.
Either we follow established standards, or we don't.
next prev parent reply other threads:[~2011-05-14 14:38 UTC|newest]
Thread overview: 182+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-18 19:11 Translation of http status code to text Lennart Borgman
2010-03-22 1:19 ` Juri Linkov
2010-03-22 13:17 ` Ted Zlatanov
2010-03-22 14:01 ` Stefan Monnier
2010-03-22 14:25 ` Ted Zlatanov
2010-03-22 17:06 ` Ted Zlatanov
2010-03-22 17:55 ` Sven Joachim
2010-03-22 19:23 ` Ted Zlatanov
2010-03-22 20:32 ` Sven Joachim
2010-03-22 21:31 ` Ted Zlatanov
2010-03-23 9:55 ` Juri Linkov
2010-03-23 13:08 ` Lennart Borgman
2010-03-23 14:26 ` face for non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov
2010-03-23 16:28 ` Lennart Borgman
2010-03-23 18:18 ` face for non-ASCII characters Ted Zlatanov
2011-04-15 22:41 ` Ted Zlatanov
2011-04-15 23:07 ` Lennart Borgman
2011-04-16 0:51 ` Ted Zlatanov
2011-04-16 9:10 ` Lennart Borgman
2011-04-16 15:05 ` Ted Zlatanov
2011-04-16 15:28 ` Lennart Borgman
2011-04-16 15:42 ` Ted Zlatanov
2011-04-16 15:50 ` Lennart Borgman
2011-04-16 15:57 ` Ted Zlatanov
2011-04-16 16:01 ` Lennart Borgman
2011-04-16 16:13 ` Ted Zlatanov
2011-04-16 16:22 ` Lennart Borgman
2011-04-16 16:27 ` Drew Adams
2011-04-16 16:45 ` Ted Zlatanov
2011-04-16 16:48 ` Lennart Borgman
2011-04-16 16:55 ` Ted Zlatanov
2011-04-16 17:11 ` Lennart Borgman
2011-04-18 15:48 ` Ted Zlatanov
2011-04-18 15:53 ` Lennart Borgman
2011-04-18 16:20 ` Ted Zlatanov
2011-04-18 17:03 ` Lennart Borgman
2011-04-19 13:07 ` Ted Zlatanov
2011-04-19 18:56 ` Lennart Borgman
2011-04-20 14:49 ` Ted Zlatanov
2011-04-20 21:38 ` Lennart Borgman
2011-04-21 17:35 ` Ted Zlatanov
2011-04-21 18:42 ` Lennart Borgman
2011-04-21 19:14 ` Ted Zlatanov
2011-04-21 20:00 ` Lennart Borgman
2011-04-21 20:35 ` Ted Zlatanov
2011-04-21 20:53 ` Lennart Borgman
2011-04-21 21:18 ` Ted Zlatanov
2011-04-22 12:20 ` Lennart Borgman
2011-04-22 12:49 ` Stephen J. Turnbull
2011-04-22 13:23 ` Lennart Borgman
2011-04-23 0:50 ` Richard Stallman
2011-04-23 7:13 ` Lennart Borgman
2011-04-25 17:54 ` Richard Stallman
2011-04-26 18:26 ` Chong Yidong
2011-04-26 19:05 ` Ted Zlatanov
2011-04-26 20:29 ` Chong Yidong
2011-04-27 3:45 ` Ted Zlatanov
2011-04-27 4:42 ` Stephen J. Turnbull
2011-05-02 18:18 ` Ted Zlatanov
2011-05-03 1:50 ` Stephen J. Turnbull
2011-05-03 14:45 ` Ted Zlatanov
2011-05-03 21:21 ` Lennart Borgman
2011-05-04 14:41 ` Stephen J. Turnbull
2011-04-27 12:41 ` Lennart Borgman
2011-04-22 14:20 ` Ted Zlatanov
2011-04-22 17:12 ` Lennart Borgman
2011-04-26 3:14 ` package management proposals for Emacs (was: face for non-ASCII characters) Ted Zlatanov
2011-04-26 8:10 ` Lennart Borgman
2011-04-26 21:46 ` Richard Stallman
2011-04-27 1:19 ` package management proposals for Emacs Stefan Monnier
2011-04-27 3:36 ` Ted Zlatanov
2011-04-27 21:14 ` Richard Stallman
2011-04-26 3:09 ` markchars.el 0.2.0 and idn.el (was: face for non-ASCII characters) Ted Zlatanov
2011-04-26 8:13 ` Lennart Borgman
2011-04-26 15:28 ` idn.el and confusables.txt (was: markchars.el 0.2.0 and idn.el) Ted Zlatanov
2011-05-13 19:42 ` idn.el and confusables.txt Stefan Monnier
2011-05-13 20:19 ` Ted Zlatanov
2011-05-14 8:13 ` Eli Zaretskii
2011-05-14 8:06 ` Eli Zaretskii
2011-05-14 8:56 ` Lennart Borgman
2011-05-14 9:36 ` Eli Zaretskii
2011-05-14 13:40 ` Ted Zlatanov
2011-05-14 14:38 ` Eli Zaretskii [this message]
2011-05-14 15:30 ` Ted Zlatanov
2011-05-14 16:42 ` Eli Zaretskii
2011-05-14 17:06 ` Ted Zlatanov
2011-05-14 20:59 ` Eli Zaretskii
2011-05-15 1:22 ` Ted Zlatanov
2011-05-15 5:56 ` Eli Zaretskii
2011-05-15 12:14 ` Ted Zlatanov
2011-05-16 12:38 ` Eli Zaretskii
2011-05-16 18:31 ` Ted Zlatanov
2011-05-17 17:59 ` Eli Zaretskii
2011-05-17 15:32 ` Ted Zlatanov
2011-05-18 18:15 ` Ted Zlatanov
2011-05-14 17:25 ` Stefan Monnier
2011-05-15 13:06 ` Kenichi Handa
2011-05-15 17:34 ` Eli Zaretskii
2011-05-18 5:23 ` handa
2011-05-18 7:38 ` Eli Zaretskii
2011-05-18 7:59 ` handa
2011-05-18 8:13 ` Eli Zaretskii
2011-06-17 8:15 ` Kenichi Handa
2011-06-17 15:12 ` Eli Zaretskii
2011-06-21 2:07 ` Kenichi Handa
2011-06-21 2:53 ` Eli Zaretskii
2011-06-21 3:29 ` Kenichi Handa
2011-06-21 6:11 ` Eli Zaretskii
2011-06-21 7:22 ` Kenichi Handa
2011-06-21 7:34 ` Eli Zaretskii
2011-06-21 8:02 ` Kenichi Handa
2011-06-21 10:30 ` bidi at startup (was: idn.el and confusables.txt) Eli Zaretskii
2011-06-21 15:12 ` bidi at startup Stefan Monnier
2011-06-21 17:13 ` Eli Zaretskii
2011-06-22 15:32 ` Stefan Monnier
2011-07-07 6:10 ` C interface to Unicode character property char-tables Kenichi Handa
2011-08-06 16:52 ` Using uniprop_table_lookup (was: idn.el and confusables.txt) Eli Zaretskii
2011-08-09 0:55 ` Kenichi Handa
2011-08-09 1:32 ` Using uniprop_table_lookup Stefan Monnier
2011-08-09 4:31 ` Kenichi Handa
2011-08-15 8:57 ` Eli Zaretskii
2011-05-31 10:42 ` uni-confusables 0.1 is on the Emacs ELPA branch (was: idn.el and confusables.txt) Ted Zlatanov
2011-06-08 10:42 ` uni-confusables 0.1 is on the Emacs ELPA branch Ted Zlatanov
2011-06-08 15:22 ` Stefan Monnier
2011-04-16 16:00 ` face for non-ASCII characters Drew Adams
2010-03-23 19:40 ` Florian Beck
2010-03-23 14:35 ` Translation of http status code to text Miles Bader
2010-03-23 14:22 ` highlighting non-ASCII characters (was: Translation of http status code to text) Ted Zlatanov
2010-03-23 16:50 ` highlighting non-ASCII characters (was: Translation of http statuscode " Drew Adams
2010-03-23 21:49 ` highlighting non-ASCII characters Stefan Monnier
2010-03-23 21:53 ` Drew Adams
2010-03-24 0:45 ` Stefan Monnier
2010-03-24 1:03 ` Ted Zlatanov
2010-03-24 2:47 ` Stefan Monnier
2010-03-24 4:20 ` Eli Zaretskii
2010-03-24 5:14 ` Jason Rumney
2010-03-24 13:25 ` Stefan Monnier
2010-03-24 15:06 ` Jason Rumney
2010-03-24 19:47 ` Ted Zlatanov
2010-03-24 10:05 ` Ted Zlatanov
2010-03-24 16:21 ` Lennart Borgman
2010-03-24 19:34 ` Lennart Borgman
2010-03-26 17:35 ` Ted Zlatanov
2010-03-26 20:43 ` Ted Zlatanov
2010-03-26 22:50 ` Lennart Borgman
2010-03-29 18:38 ` Ted Zlatanov
2010-03-29 18:48 ` Drew Adams
2010-03-29 20:20 ` Stefan Monnier
2010-03-29 20:19 ` Stefan Monnier
2010-03-29 20:51 ` Lennart Borgman
2010-03-30 13:22 ` Ted Zlatanov
2010-03-29 21:05 ` Ted Zlatanov
2010-03-29 21:31 ` Lennart Borgman
2010-03-29 21:32 ` Drew Adams
2010-03-30 13:15 ` Ted Zlatanov
2010-03-30 14:04 ` Drew Adams
2010-03-30 14:17 ` Lennart Borgman
2010-03-30 14:42 ` Ted Zlatanov
2010-03-30 16:18 ` Juri Linkov
2010-03-30 1:45 ` Stefan Monnier
2010-03-25 7:11 ` Juri Linkov
2010-03-25 14:07 ` Lennart Borgman
2010-03-25 17:32 ` Juri Linkov
2010-03-26 0:32 ` Lennart Borgman
2010-03-26 13:38 ` Stephen Berman
2010-03-26 22:44 ` Lennart Borgman
2010-03-25 7:12 ` Juri Linkov
2010-03-24 2:09 ` Drew Adams
2010-03-24 5:00 ` Stephen J. Turnbull
2010-03-24 9:28 ` Juri Linkov
2010-03-24 13:15 ` Ted Zlatanov
2010-03-24 9:27 ` Juri Linkov
2010-03-22 18:41 ` Translation of http status code to text Stefan Monnier
2010-03-22 19:15 ` Ted Zlatanov
2010-03-23 9:54 ` Juri Linkov
2010-03-23 10:54 ` joakim
2010-03-23 15:02 ` Ted Zlatanov
2010-03-24 3:22 ` Stefan Monnier
2010-03-24 17:35 ` Glenn Morris
2010-03-24 19:37 ` Ted Zlatanov
2010-03-25 1:16 ` Ted Zlatanov
2010-03-23 12:57 ` Stefan Monnier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=83aaepfiuk.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=emacs-devel@gnu.org \
--cc=tzz@lifelogs.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).