From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Lennart Borgman Newsgroups: gmane.emacs.devel Subject: Re: highlighting non-ASCII characters Date: Fri, 26 Mar 2010 23:50:26 +0100 Message-ID: References: <6932BBFEB09A4BA09156ED7F598569CE@us.oracle.com> <87pr2uv8e1.fsf@lifelogs.com> <87aatyuj9s.fsf@lifelogs.com> <87pr2rj89j.fsf@lifelogs.com> <87ljdeke5k.fsf@lifelogs.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1269643867 25812 80.91.229.12 (26 Mar 2010 22:51:07 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 26 Mar 2010 22:51:07 +0000 (UTC) Cc: emacs-devel@gnu.org To: Ted Zlatanov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Mar 26 23:51:03 2010 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1NvIMk-0003xp-50 for ged-emacs-devel@m.gmane.org; Fri, 26 Mar 2010 23:50:58 +0100 Original-Received: from localhost ([127.0.0.1]:54135 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NvIMj-0003EY-KU for ged-emacs-devel@m.gmane.org; Fri, 26 Mar 2010 18:50:57 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NvIMd-0003Bu-Du for emacs-devel@gnu.org; Fri, 26 Mar 2010 18:50:51 -0400 Original-Received: from [140.186.70.92] (port=33920 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NvIMb-0003A5-DW for emacs-devel@gnu.org; Fri, 26 Mar 2010 18:50:51 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1NvIMZ-0003Nr-GZ for emacs-devel@gnu.org; Fri, 26 Mar 2010 18:50:49 -0400 Original-Received: from mail-fx0-f224.google.com ([209.85.220.224]:54885) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1NvIMZ-0003Nk-Br for emacs-devel@gnu.org; Fri, 26 Mar 2010 18:50:47 -0400 Original-Received: by fxm24 with SMTP id 24so33524fxm.26 for ; Fri, 26 Mar 2010 15:50:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:received:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=iQkYFJZWy7DRqo/PdeCfc86eXRJhwKGQmzzNwaFv9Jg=; b=Tj2VGlqZI3UuRyD/7lQJYVDnB28VotI7idfIDIrqRiexZladSmxJVpYpNxngmr0eGP vtiCpI5YlGWy8LhQbZZBinq+LErVo0C2l9wk6svbTY4xiIEQLYQgo5ZlarxKo54GYLWD vSZ4TqJ84xSoAH2cZfyp/yEOjMktxJI7c8uPs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; b=KrKKLDbC6gHvDV+QP+Gfik1YR7k0ZI/ZZGOwF7O+x3AK7NtWDtpXmDBZhU229EwOuL V7yF/CNx97vrPJkvi9UGzJuj9/9LZog3WwSwLwZyGDMdo9kUpE/OoYKR6Ua+N4j4n/U0 BCQfIaiTClwDX+uX8EFRxu7wad3CWsQnbPVuY= Original-Received: by 10.239.141.72 with HTTP; Fri, 26 Mar 2010 15:50:26 -0700 (PDT) In-Reply-To: <87ljdeke5k.fsf@lifelogs.com> Original-Received: by 10.239.193.140 with SMTP id j12mr155898hbi.112.1269643846143; Fri, 26 Mar 2010 15:50:46 -0700 (PDT) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:122739 Archived-At: 2010/3/26 Ted Zlatanov : > On Fri, 26 Mar 2010 12:35:36 -0500 Ted Zlatanov wrote: > > TZ> We can have a [:confusable:] character class defined in src/regex.c. > TZ> That lets us find these characters. =C2=A0It could be generated from = the TXT > TZ> database and augmented with our own mappings. =C2=A0But there's group= ing > TZ> information, so maybe that should be available too. =C2=A0For highlig= hting we > TZ> don't need grouping information, but the user would find it useful to > TZ> look at a glyph and find out that it looks like 3 other glyphs. =C2= =A0So this > TZ> can be in a Lisp-level data structure like a hashtable with list valu= es. > > I forgot to mention this RFC is relevant as well, section 2.2.6: > > http://www.ietf.org/rfc/rfc4690.txt > > Like the IDN character class, the discussion centers on homoglyphs > inside domain names, but it mentions general relationship-based > confusable detection and points to further RFCs. Thanks, but what would the difference be to try confusable detection instead of chars outside IDN? I believe marking confusable chars would also mean marking those confusable chars that have been decided on in IDN. Perhaps they could also be handled in another way (for example helping switching/rotating the confusable chars, all or one-by-one). Would that be useful?