From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Ted Zlatanov Newsgroups: gmane.emacs.devel Subject: Re: face for non-ASCII characters Date: Sat, 16 Apr 2011 10:05:48 -0500 Organization: =?utf-8?B?0KLQtdC+0LTQvtGAINCX0LvQsNGC0LDQvdC+0LI=?= @ Cienfuegos Message-ID: <87zknqnsmr.fsf@lifelogs.com> References: <87k4t4zb5l.fsf@lifelogs.com> <87r5ncxp4z.fsf@lifelogs.com> <87hbo8tf4i.fsf@turtle.gmx.de> <87hbo8xis5.fsf@lifelogs.com> <87aau0t7uy.fsf@turtle.gmx.de> <87sk7svyam.fsf@lifelogs.com> <87vdcngws4.fsf@mail.jurta.org> <87y6hjxgfn.fsf_-_@lifelogs.com> <87hbo6x5pe.fsf@lifelogs.com> <87tydzdtn9.fsf@lifelogs.com> <877hav2f30.fsf@lifelogs.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: dough.gmane.org 1302966385 13225 80.91.229.12 (16 Apr 2011 15:06:25 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sat, 16 Apr 2011 15:06:25 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Apr 16 17:06:20 2011 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QB74l-0006xD-I4 for ged-emacs-devel@m.gmane.org; Sat, 16 Apr 2011 17:06:19 +0200 Original-Received: from localhost ([::1]:55038 helo=lists2.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QB74k-0002Al-MI for ged-emacs-devel@m.gmane.org; Sat, 16 Apr 2011 11:06:18 -0400 Original-Received: from eggs.gnu.org ([140.186.70.92]:39687) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QB74e-00028H-GH for emacs-devel@gnu.org; Sat, 16 Apr 2011 11:06:16 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QB74Y-0003hh-UC for emacs-devel@gnu.org; Sat, 16 Apr 2011 11:06:12 -0400 Original-Received: from lo.gmane.org ([80.91.229.12]:38986) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QB74Y-0003gp-AD for emacs-devel@gnu.org; Sat, 16 Apr 2011 11:06:06 -0400 Original-Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1QB74T-0006jt-Iv for emacs-devel@gnu.org; Sat, 16 Apr 2011 17:06:01 +0200 Original-Received: from c-67-186-102-106.hsd1.il.comcast.net ([67.186.102.106]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 16 Apr 2011 17:06:01 +0200 Original-Received: from tzz by c-67-186-102-106.hsd1.il.comcast.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 16 Apr 2011 17:06:01 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 278 Original-X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: c-67-186-102-106.hsd1.il.comcast.net X-Face: bd.DQ~'29fIs`T_%O%C\g%6jW)yi[zuz6; d4V0`@y-~$#3P_Ng{@m+e4o<4P'#(_GJQ%TT= D}[Ep*b!\e,fBZ'j_+#"Ps?s2!4H2-Y"sx" User-Agent: Gnus/5.110016 (No Gnus v0.16) Emacs/24.0.50 (gnu/linux) Cancel-Lock: sha1:x2nxIg1IFmnfDWVvTQYUFyYICZg= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 80.91.229.12 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:138517 Archived-At: --=-=-= Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit On Sat, 16 Apr 2011 11:10:03 +0200 Lennart Borgman wrote: LB> 2011/4/16 Ted Zlatanov : >> On Sat, 16 Apr 2011 01:07:06 +0200 Lennart Borgman wrote: >> LB> Nice to see you are enhancing it, Ted. However I wonder if you are LB> working on an older copy of it since it does not use idn.el. Could you LB> please take a look at the latest version and see how LB> idn-is-recommended compares to what you call confusables? >> >> Where is the latest version?  I didn't see any further messages from you >> in that thread after 2010-03 so I didn't know you had updated it. LB> Oh, I am very sorry Ted. I have put mostly every elisp library I have LB> written into nXhtml. So you find it in the nXhtml repository at LB> Launchpad. I merged your changes with my version and called myself a "contrbuthor" :) I'd like to keep markchars.el a standalone library, so the attached does not require idn.el. I also set the version to 0.2. I would like to put it in the GNU ELPA, if you don't mind (it can still live in nXhtml, we can mirror it). You'll need to assign the copyright, though. The major change is that instead of detecting the range at the font-lock keyword level, I run non-IDN detection at the word markup level (just like confusables detection). I think that results in cleaner, easily extensible code--take a look and see what you think. For an IDN markup face I defined a new one. Your call on what it should be, I just set it to a white underline for now. This is IMO a good change: (make-obsolete-variable 'markchars-keywords 'markchars-what "markchars.el 0.2") because you had `markchars-keywords' and `markchars-used-keywords' which was confusing. `markchars--render-nonidn' is not optimized: it steps through the word in the buffer and assigns the properties to each individual character instead of each range it finds. I don't think that's a big deal but it could be done better. I couldn't reuse your non-IDN detection logic because it was not word-oriented. I would use a char-table for idn.el instead of a bool-vector. Also perhaps idn.el's .txt files and confusables.txt should simply be part of Emacs, so the IDN and confusables properties can be looked up like the other properties. Emacs already does that for many properties, see for example: (format "%S" (mapcar 'car char-code-property-alist)) (get-char-code-property ?q 'titlecase) I think that inclusion would benefit everyone, but the original .txt files are large so I'll leave it up to the experts. If they are included, `markchars--render-nonidn' would be much much smaller. Ted --=-=-= Content-Type: application/emacs-lisp Content-Disposition: attachment; filename=markchars.el Content-Transfer-Encoding: quoted-printable ;;; markchars.el --- Mark chars fitting certain characteristics ;; ;; Author: Lennart Borgman (lennart O borgman A gmail O com) ;; Contributhor: Ted Zlatanov ;; Created: 2010-03-22 Mon ;; Version: 0.2 ;; Last-Updated: 2011-04-15 ;; URL: ;; Keywords: ;; Compatibility: ;; ;; Features that might be required by this library: ;; ;; `idn', `nxhtml-base'. ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;;; Commentary: ;; ;; Mark special chars, by default nonascii, non-IDN chars, in modes ;; where they may be confused with regular chars. See `markchars-mode' ;; and `markchars-what'. There are two modes: confusable detection ;; (where we look for mixed scripts within a word, without using the ;; http://www.unicode.org/reports/tr39/ confusable tables) and pattern ;; detection (where any regular expressions can be matched). ;; ;; The marked text will have the 'markchars property set to either ;; 'confusable or 'pattern and the face set to either ;; `markchars-face-confusable' or `markchars-face-pattern' ;; respectively. ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;;; Change log: ;; ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; This program is free software; you can redistribute it and/or ;; modify it under the terms of the GNU General Public License as ;; published by the Free Software Foundation; either version 3, or ;; (at your option) any later version. ;; ;; This program is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ;; General Public License for more details. ;; ;; You should have received a copy of the GNU General Public License ;; along with this program; see the file COPYING. If not, write to ;; the Free Software Foundation, Inc., 51 Franklin Street, Fifth ;; Floor, Boston, MA 02110-1301, USA. ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;;; Code: (require 'idn nil t) (autoload 'idn-is-recommended "idn") ;;;###autoload (defgroup markchars nil "Customization group for `markchars-mode'." :group 'convenience) (defface markchars-light '((t (:underline "light blue"))) "Light face for `markchars-mode' char marking." :group 'markchars) (defface markchars-heavy '((t (:underline "magenta"))) "Heavy face for `markchars-mode' char marking." :group 'markchars) (defface markchars-white '((t (:underline "white"))) "White face for `markchars-mode' char marking." :group 'markchars) (defcustom markchars-face-pattern 'markchars-heavy "Pointer to face used for marking matched patterns." :type 'face :group 'markchars) (defcustom markchars-face-confusable 'markchars-light "Pointer to face used for marking confusables." :type 'face :group 'markchars) (defcustom markchars-face-nonidn 'markchars-white "Pointer to face used for marking non-IDN characters." :type 'face :group 'markchars) (defcustom markchars-simple-pattern "[[:nonascii:]]+" "Regexp for characters to mark, a simple pattern. By default it matches nonascii-chars." :type 'regexp :group 'markchars) (defcustom markchars-what `(markchars-simple-pattern markchars-confusables ,@(when (fboundp 'idn-is-recommended) '(markchars-nonidn-fun))) "Things to mark, a list of regular expressions or symbols." :type `(repeat (choice :tag "Marking choices" (const :tag "Non IDN chars (Unicode.org tr39 suggestions= )" markchars-nonidn-fun) (const :tag "Confusables" markchars-confusables) (const :tag "`markchars-simple-pattern'" markchars-simple-pattern) (regexp :tag "Arbitrary pattern"))) :group 'markchars) (make-obsolete-variable 'markchars-keywords 'markchars-what "markchars.el 0= .2") (defvar markchars-used-keywords nil "Keywords for font lock.") (put 'markchars-used-keywords 'permanent-local t) (defun markchars-set-keywords () "Set `markchars-used-keywords' from options." (set (make-local-variable 'markchars-used-keywords) (delq nil (mapcar (lambda (what) (when (eq what 'markchars-simple-pattern) (setq what markchars-simple-pattern)) (cond ((eq what 'markchars-nonidn-fun) (list "\\<\\w+\\>" (list 0 '(markchars--render-nonidn (match-beginning 0) (match-end 0))))) ((eq what 'confusables) (list "\\<\\w+\\>" (list 0 '(markchars--render-confusables (match-beginning 0) (match-end 0))))) ((stringp what) (list what (list 0 '(markchars--render-pattern (match-beginning 0) (match-end 0))))))) markchars-what)))) (defun markchars--render-pattern (beg end) "Assign markchars pattern properties between BEG and END." (put-text-property beg end 'face markchars-face-pattern) (put-text-property beg end 'markchars 'pattern)) (defun markchars--render-confusables (beg end) "Assign markchars confusable properties between BEG and END." (let* ((text (buffer-substring-no-properties beg end)) (scripts (mapcar '(lambda (c) (aref char-script-table c)) (string-to-list text))) ;; `scripts-extra' is not nil is there was more than one script (scripts-extra (delq (car scripts) scripts))) (when scripts-extra (put-text-property beg end 'markchars 'confusable) (put-text-property beg end 'face markchars-face-confusable)))) (defun markchars--render-nonidn (beg end) "Assign markchars confusable properties between BEG and END." (save-excursion (goto-char beg) (while (<=3D (point) end) (let ((c (char-after))) (when (and (> c 256) (not (idn-is-recommended c))) (put-text-property (point) (1+ (point)) 'markchars 'nonidn) (put-text-property (point) (1+ (point)) 'face markchars-face-noni= dn))) (forward-char)))) ;;;###autoload (define-minor-mode markchars-mode "Mark special characters. Which characters to mark are defined by `markchars-pattern'. The default is to mark nonascii chars with a magenta underline." :group 'markchars :lighter " Mchar" (if markchars-mode (progn (markchars-set-keywords) (let ((props (make-local-variable 'font-lock-extra-managed-props))) (add-to-list props 'markchars)) (font-lock-add-keywords nil markchars-used-keywords)) (font-lock-remove-keywords nil markchars-used-keywords)) (font-lock-fontify-buffer)) ;;;###autoload (define-globalized-minor-mode markchars-global-mode markchars-mode (lambda () (markchars-mode 1)) :group 'markchars) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; markchars.el ends here --=-=-=--