On Sat, 16 Apr 2011 11:10:03 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote: 

LB> 2011/4/16 Ted Zlatanov <tzz@lifelogs.com>:
>> On Sat, 16 Apr 2011 01:07:06 +0200 Lennart Borgman <lennart.borgman@gmail.com> wrote:
>> 
LB> Nice to see you are enhancing it, Ted. However I wonder if you are
LB> working on an older copy of it since it does not use idn.el. Could you
LB> please take a look at the latest version and see how
LB> idn-is-recommended compares to what you call confusables?
>> 
>> Where is the latest version? 狢 didn't see any further messages from you
>> in that thread after 2010-03 so I didn't know you had updated it.

LB> Oh, I am very sorry Ted. I have put mostly every elisp library I have
LB> written into nXhtml. So you find it in the nXhtml repository at
LB> Launchpad.

I merged your changes with my version and called myself a "contrbuthor" :)

I'd like to keep markchars.el a standalone library, so the attached does
not require idn.el.  I also set the version to 0.2.  I would like to put
it in the GNU ELPA, if you don't mind (it can still live in nXhtml, we
can mirror it).  You'll need to assign the copyright, though.

The major change is that instead of detecting the range at the font-lock
keyword level, I run non-IDN detection at the word markup level (just
like confusables detection).  I think that results in cleaner, easily
extensible code--take a look and see what you think.

For an IDN markup face I defined a new one.  Your call on what it should
be, I just set it to a white underline for now.

This is IMO a good change:

(make-obsolete-variable 'markchars-keywords 'markchars-what "markchars.el 0.2")

because you had `markchars-keywords' and `markchars-used-keywords' which
was confusing.

`markchars--render-nonidn' is not optimized: it steps through the word
in the buffer and assigns the properties to each individual character
instead of each range it finds.  I don't think that's a big deal but it
could be done better.  I couldn't reuse your non-IDN detection logic
because it was not word-oriented.

I would use a char-table for idn.el instead of a bool-vector.  Also
perhaps idn.el's .txt files and confusables.txt should simply be part of
Emacs, so the IDN and confusables properties can be looked up like the
other properties.  Emacs already does that for many properties, see for
example:

(format "%S" (mapcar 'car char-code-property-alist))
(get-char-code-property ?q 'titlecase)

I think that inclusion would benefit everyone, but the original .txt
files are large so I'll leave it up to the experts.  If they are
included, `markchars--render-nonidn' would be much much smaller.

Ted