From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: =?iso-2022-jp?B?GyRCQG5IKBsoQiAbJEJCQDBsGyhC?= ( Taichi KAWABATA ) Newsgroups: gmane.emacs.devel Subject: Re: normalize.el Date: Sun, 19 Apr 2009 12:39:58 +0900 Message-ID: <87skk5wbu9.fsf%kawabata.taichi@gmail.com> References: <877i1hzsqx.fsf%kawabata.taichi@gmail.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1240116461 19674 80.91.229.12 (19 Apr 2009 04:47:41 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 19 Apr 2009 04:47:41 +0000 (UTC) Cc: Stefan Monnier To: Kenichi Handa , emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Apr 19 06:48:59 2009 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1LvOxa-0005M7-Lj for ged-emacs-devel@m.gmane.org; Sun, 19 Apr 2009 06:48:59 +0200 Original-Received: from localhost ([127.0.0.1]:50463 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LvOwB-0006tN-EL for ged-emacs-devel@m.gmane.org; Sun, 19 Apr 2009 00:47:27 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LvNtj-0002mL-UP for emacs-devel@gnu.org; Sat, 18 Apr 2009 23:40:51 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LvNtj-0002m9-3V for emacs-devel@gnu.org; Sat, 18 Apr 2009 23:40:51 -0400 Original-Received: from [199.232.76.173] (port=40919 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LvNtj-0002m6-0b for emacs-devel@gnu.org; Sat, 18 Apr 2009 23:40:51 -0400 Original-Received: from clock.ocn.ne.jp ([122.28.30.213]:64134 helo=smtp.clock.ocn.ne.jp) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1LvNth-00038r-Ed for emacs-devel@gnu.org; Sat, 18 Apr 2009 23:40:50 -0400 Original-Received: from ubuntu.clock.ocn.ne.jp (p59211-adsau18honb3-acca.tokyo.ocn.ne.jp [219.161.210.211]) by smtp.clock.ocn.ne.jp (Postfix) with ESMTP id 731E82693; Sun, 19 Apr 2009 12:40:46 +0900 (JST) User-Agent: Gnus/5.110011 (No Gnus v0.11) Emacs/23.0.92 (gnu/linux) Mule/6.0 =?iso-2022-jp?B?KBskQjJWOzZOJBsoQik=?= X-Operating-System: Linux ubuntu 2.6.24-22-server #1 SMP Mon Nov 24 19:14:19 UTC 2008 i686 GNU/Linux Accept-Language: ja, en;q=0.6, zh;q=0.3, fr;q=0.1, la;q=0.01, sa;q=0.001 X-GPG-Key: http://pgp.nic.ad.jp:11371/pks/lookup?op=index&search=kawabata.taichi@gmail.com In-Reply-To: <877i1hzsqx.fsf%kawabata.taichi@gmail.com> (=?iso-2022-jp?B?IhskQkBuSCgbKEIgGyRCQkAwbBsoQiIncw==?= message of "Sun, 19 Apr 2009 04:06:30 +0900") X-detected-operating-system: by monty-python.gnu.org: HP-UX 11.00-11.11 X-Mailman-Approved-At: Sun, 19 Apr 2009 00:47:21 -0400 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:110344 Archived-At: Handa-san, >> In <877i1hzsqx.fsf%kawabata.taichi@gmail.com>,=20 >> I wrote: > > > Great. Please rename it to uni-normalize.el or ucs-normalize.el. > > I renamed it to ucs-normalize.el and changed > > variable/function names as attached. But, byte-compiling > > shows these warnings and error. As the original > > normalize.el causes the same error, Kawabata-san, could you > > please fix it. > I'm sorry for late reply. Here is fixed version I tried to remove all > warnings. I believe this version fully passes `NormalizationTest-5.1.0.t= xt'. > However, it takes somewhat longer time for byte-compilation (About 1 min > for my 2GHz machine), and byte-compiled code will be quite larger than > original code.=20=20 > These problems would be reduced if we could provide specifically > customized codes for Hangul character decompositions. Here is revised code that tried to reduce the compilation time (about half) and byte-compiled file size (from 2M to 200K) by customizing Hangul Normalization process. It passed NormalizationTest-5.1.0.txt. Cheers, ;;=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D ;;; ucs-normalize.el --- Unicode normalization NFC/NFD/NFKD/NFKC ;; Copyright (C) 2009 ;; Free Software Foundation, Inc. ;; Author: Taichi Kawabata ;; Keywords: unicode, normalization ;; This file is part of GNU Emacs. ;; GNU Emacs is free software: you can redistribute it and/or modify ;; it under the terms of the GNU General Public License as published by ;; the Free Software Foundation, either version 3 of the License, or ;; (at your option) any later version. ;; GNU Emacs is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ;; GNU General Public License for more details. ;; You should have received a copy of the GNU General Public License ;; along with GNU Emacs. If not, see . ;;; Commentary: ;; ;; This program has passed the NormalizationTest-5.1.0.txt. ;; ;; References: ;; http://www.unicode.org/reports/tr15/ ;; http://www.unicode.org/review/pr-29.html ;; ;; HFS-Normalization: ;; Reference: ;; http://developer.apple.com/technotes/tn/tn1150.html ;; ;; HFS Normalization excludes following area for decomposition. ;; ;; U+02000 .. U+02FFF :: Punctuation, symbols, dingbats, arrows, etc. ;; (Characters in this region will be composed.) ;; U+0F900 .. U+0FAFF :: CJK compatibility Ideographs. ;; U+2F800 .. U+2FFFF :: CJK compatibility Ideographs. ;; ;; HFS-Normalization is useful for normalizing text involving CJK Ideograph= s. ;; ;;; ;;; Implementation Notes on NFC/HFS-NFC. ;;; ;; ;; Decomposition Composition ;; NFD: 'nfd nil ;; NFC: 'nfd t ;; NFKD: 'nfkd nil ;; NFKC: 'nfkd t ;; HFS-NFD: 'hfs-nfd 'hfs-nfd-comp-p ;; HFS-NFC: 'hfs-nfd t ;; ;; Algorithm for Normalization ;; ;; Before normalization, following data will be prepared. ;; ;; 1. quick-check-list ;; ;; `quick-check-list' consists of characters that will be decomposed ;; during normalization. It includes composition-exclusions, ;; singletons, non-starter-decompositions and decomposable ;; characters. ;; ;; `quick-check-regexp' will search the above characters plus ;; combining characters. ;; ;; 2. decomposition-translation ;; ;; `decomposition-translation' is a translation table that will be ;; used to decompose the characters. ;; ;; ;; Normalization Process ;; ;; A. Searching (`ucs-normalize-region') ;; ;; Region is searched for `quick-check-regexp' to find possibly ;; normalizable point. ;; ;; B. Identification of Normalization Block ;; ;; (1) start of the block ;; If the searched character is in quick-check-list, then the ;; beginning of the block is the searched character. ;; If searched character is combining character, then previous ;; character will be the target character ;; (2) end of the block ;; Block ends at non-composable starter character. ;; ;; C. Decomposition (`ucs-normalize-block') ;; ;; The entire block will be decomposed by ;; `decomposition-translation' table. ;; ;; D. Sorting and Composition of Smaller Blocks (`ucs-normalize-block-comp= ose-chars') ;; ;; The block will be split to multiple samller blocks by starter ;; charcters. Each block is sorted, and composed if necessary. ;; ;; E. Composition of Entire Block (`ucs-normalize-compose-chars') ;; ;; Composed blocks are collected and again composed. ;;; Code: (defconst ucs-normalize-version "1.1beta") (eval-when-compile (require 'cl)) (eval-when-compile (defconst ucs-normalize-composition-exclusions '(#x0958 #x0959 #x095A #x095B #x095C #x095D #x095E #x095F #x09DC #x09DD #x09DF #x0A33 #x0A36 #x0A59 #x0A5A #x0A5B #x0A5E #x0B5C #x0B5D #x0F43 #x0F4D #x0F52 #x0F57 #x0F5C #x0F69 #x0F76 #x0F78 #x0F93 #x0F9D #x0FA2 #x0FA7 #x0FAC #x0FB9 #xFB1D #xFB1F #xFB2A #xFB2B #xFB2C #xFB2D #xFB2E #xFB2F #xFB30 #xFB31 #xFB32 #xFB33 #xFB34 #xFB35 #xFB36 #xFB38 #xFB39 #xFB3A #xFB3B #xFB3C #xFB3E #xFB40 #xFB41 #xFB43 #xFB44 #xFB46 #xFB47 #xFB48 #xFB49 #xFB4A #xFB4B #xFB4C #xFB4D #xFB4E #x2ADC #x1D15E #x1D15F #x1D160 #x1D161 #x1D162 #x1D163 #x1D164 #x1D1BB #x1D1BC #x1D1BD #x1D1BE #x1D1BF #x1D1C0) "Composition Exclusion List. This list is taken from http://www.unicode.org/Public/UNIDATA/CompositionExclusions-5.1.0.txt") ;; Unicode ranges that decompositions & combinings are defined. (defvar check-range nil) (setq check-range '((#x00a0 . #x3400) (#xA600 . #xAC00) (#xF900 . #x10f= ff) (#x1d000 . #x1dfff) (#x2f800 . #x2faff))) ;; Basic normalization functions (defun nfd (char) (let ((decomposition (get-char-code-property char 'decomposition))) (if (and decomposition (numberp (car decomposition))) decomposition))) (defun nfkd (char) (let ((decomposition (get-char-code-property char 'decomposition))) (if (symbolp (car decomposition)) (cdr decomposition) decomposition))) (defun hfs-nfd (char) (when (or (and (>=3D char 0) (< char #x2000)) (and (>=3D char #x3000) (< char #xf900)) (and (>=3D char #xfb00) (< char #x2f800)) (>=3D char #x30000)) (nfd char)))) (defun ucs-normalize-hfs-nfd-comp-p (char) (and (>=3D char #x2000) (< char #x3000))) (defsubst ucs-normalize-ccc (char) (get-char-code-property char 'canonical-combining-class)) ;; Data common to all normalizations (eval-when-compile (defvar combining-chars nil)=20 (setq combining-chars nil) (defvar decomposition-pair-to-composition nil) (setq decomposition-pair-to-composition nil) (defvar non-starter-decompositions nil) (setq non-starter-decompositions nil) (let ((char 0) ccc decomposition) (mapc (lambda (start-end) (do ((char (car start-end) (+ char 1))) ((> char (cdr start-end))) (setq ccc (ucs-normalize-ccc char)) (setq decomposition (get-char-code-property char 'decomposition)) (if (and ccc (/=3D 0 ccc)) (add-to-list 'combining-chars char)) (if (and (numberp (car decomposition)) (/=3D (ucs-normalize-ccc (car decomposition)) 0)) (add-to-list 'non-starter-decompositions char)) (when (numberp (car decomposition)) (if (and (=3D 2 (length decomposition)) (null (memq char ucs-normalize-composition-exclusions)) (null (memq char non-starter-decompositions))) (setq decomposition-pair-to-composition (cons (cons decomposition char) decomposition-pair-to-composition))) ;; If not singleton decomposition, second and later characters in ;; decomposition will be the subject of combining characters. (if (cdr decomposition) (dolist (char (cdr decomposition)) (add-to-list 'combining-chars char)))))) check-range)) (setq combining-chars (append combining-chars=20 '(?=E1=84=80 ?=E1=84=81 ?=E1=84=82 ?=E1=84=83 ?=E1=84=84 ?= =E1=84=85 ?=E1=84=86 ?=E1=84=87 ?=E1=84=88 ?=E1=84=89 ?=E1=84=8A ?=E1=84=8B ?=E1=84=8C ?=E1=84=8D ?=E1=84=8E ?=E1=84=8F ?=E1=84=90 ?=E1= =84=91 ?=E1=84=92 ?=E1=85=A1 ?=E1=85=A2 ?=E1=85=A3 ?=E1=85=A4 ?=E1=85=A5 ?= =E1=85=A6 ?=E1=85=A7 ?=E1=85=A8 ?=E1=85=A9 ?=E1=85=AA ?=E1=85=AB ?=E1=85=AC ?=E1=85=AD ?=E1=85=AE ?=E1=85=AF ?=E1= =85=B0 ?=E1=85=B1 ?=E1=85=B2 ?=E1=85=B3 ?=E1=85=B4 ?=E1=85=B5 ?=E1=86=A8 ?= =E1=86=A9 ?=E1=86=AA ?=E1=86=AB=20 ?=E1=86=AC ?=E1=86=AD ?=E1=86=AE ?=E1=86=AF ?=E1=86=B0 ?=E1= =86=B1 ?=E1=86=B2 ?=E1=86=B3 ?=E1=86=B4 ?=E1=86=B5 ?=E1=86=B6 ?=E1=86=B7 ?=E1=86=B8 ?=E1=86=B9 ?=E1= =86=BA ?=E1=86=BB ?=E1=86=BC ?=E1=86=BD ?=E1=86=BE ?=E1=86=BF ?=E1=87=80 ?= =E1=87=81 ?=E1=87=82))) ) (defun ucs-normalize-make-hash-table-from-alist (alist) (let ((table (make-hash-table :test 'equal :size 2000))) (mapc (lambda (x) (puthash (car x) (cdr x) table)) alist) table)) (defvar ucs-normalize-decomposition-pair-to-primary-composite nil "Hashtable of decomposed pair to primary composite. Note that Hangul are excluded.") (setq ucs-normalize-decomposition-pair-to-primary-composite (ucs-normalize-make-hash-table-from-alist (eval-when-compile decomposition-pair-to-composition))) (defun ucs-normalize-primary-composite (decomposition-pair composition-pred= icate) "Convert DECOMPOSITION-PAIR to primay composit using COMPOSITION-PREDICAT= E." (let ((char (or (gethash decomposition-pair ucs-normalize-decomposition-pair-to-primary-comp= osite) (and (<=3D #x1100 (car decomposition-pair)) (< (car decomposition-pair) #x1113) (<=3D #x1161 (cadr decomposition-pair)) (< (car decomposition-pair) #x1176) (let ((lindex (- (car decomposition-pair) #x1100)) (vindex (- (cadr decomposition-pair) #x1161))) (+ #xAC00 (* (+ (* lindex 21) vindex) 28)))) (and (<=3D #xac00 (car decomposition-pair)) (< (car decomposition-pair) #xd7a4) (<=3D #x11a7 (cadr decomposition-pair)) (< (cadr decomposition-pair) #x11c3) (=3D 0 (% (- (car decomposition-pair) #xac00) 28)) (let ((tindex (- (cadr decomposition-pair) #x11a7))) (+ (car decomposition-pair) tindex)))))) (if (and char (functionp composition-predicate) (null (funcall composition-predicate char))) nil char))) (defvar ucs-normalize-combining-chars nil) (setq ucs-normalize-combining-chars (eval-when-compile combining-chars)) (defvar ucs-normalize-combining-chars-regexp nil "Regular expression to match sequence of combining characters.") (setq ucs-normalize-combining-chars-regexp (eval-when-compile (concat (regexp-opt (mapcar 'char-to-string combining-= chars)) "+"))) (eval-when-compile (defun decomposition-translation-alist (decomposition-function) (let (decomposition alist) (mapc (lambda (start-end) (do ((char (car start-end) (+ char 1))) ((> char (cdr start-end))) (setq decomposition (funcall decomposition-function char)) (if decomposition (setq alist (cons (cons char (apply 'append (mapcar (lambda (x)=20 (decomposition-char= -recursively x decomposition-fu= nction)) decomposition))) alist))))) check-range) alist)) (defun decomposition-char-recursively (char decomposition-function) (let ((decomposition (funcall decomposition-function char))) (if decomposition (apply 'append (mapcar (lambda (x)=20 (decomposition-char-recursively x decomposition-= function)) decomposition)) (list char)))) (defun alist-list-to-vector (alist) (mapcar (lambda (x) (cons (car x) (apply 'vector (cdr x)))) alist)) (defvar nfd-alist nil) (setq nfd-alist (alist-list-to-vector (decomposition-translation-alist = 'nfd))) (defvar nfkd-alist nil) (setq nfkd-alist (alist-list-to-vector (decomposition-translation-alist= 'nfkd))) (defvar hfs-nfd-alist nil) (setq hfs-nfd-alist (alist-list-to-vector (decomposition-translation-al= ist 'hfs-nfd))) ) (defvar ucs-normalize-hangul-translation-alist nil) (setq ucs-normalize-hangul-translation-alist (let ((i 0) entries) (while (< i 11172) (setq entries=20 (cons (cons (+ #xac00 i)=20 (if (=3D 0 (% i 28)) (vector (+ #x1100 (/ i 588)) (+ #x1161 (/ (% i 588) 28))) (vector (+ #x1100 (/ i 588)) (+ #x1161 (/ (% i 588) 28)) (+ #x11a7 (% i 28))))) entries) i (1+ i))) entries)) (defun ucs-normalize-make-translation-table-from-alist (alist) (make-translation-table-from-alist=20 (append alist ucs-normalize-hangul-translation-alist))) (define-translation-table 'ucs-normalize-nfd-table (ucs-normalize-make-translation-table-from-alist (eval-when-compile nfd-a= list))) (define-translation-table 'ucs-normalize-nfkd-table (ucs-normalize-make-translation-table-from-alist (eval-when-compile nfkd-= alist))) (define-translation-table 'ucs-normalize-hfs-nfd-table (ucs-normalize-make-translation-table-from-alist (eval-when-compile hfs-n= fd-alist))) (defun ucs-normalize-sort (chars) "Sort by canonical combining class of chars." (sort chars (lambda (ch1 ch2) (< (ucs-normalize-ccc ch1) (ucs-normalize-ccc ch2))))) (defun ucs-normalize-compose-chars (chars composition-predicate) "Compose CHARS by COMPOSITION-PREDICATE. CHARS must be sorted and normalized in starter-combining pairs." (if composition-predicate (let* ((starter (car chars)) remain result prev-ccc (target-chars (cdr chars)) target target-ccc primary-composite) (while target-chars (setq target (car target-chars) target-ccc (ucs-normalize-ccc target)) (if (and (or (null prev-ccc) (< prev-ccc target-ccc)) (setq primary-composite (ucs-normalize-primary-composite (list starter target) composition-predicat= e))) ;; case 1: composable (setq starter primary-composite prev-ccc nil) (if (=3D 0 target-ccc) ;; case 2: move starter (setq result (nconc result (cons starter (nreverse remain))) starter target remain nil) ;; case 3: move target (setq prev-ccc target-ccc remain (cons target remain)))) (setq target-chars (cdr target-chars))) (nconc result (cons starter (nreverse remain)))) chars)) (defun ucs-normalize-block-compose-chars (chars composition-predicate) "Try composing CHARS by COMPOSITION-PREDICATE. If COMPOSITION-PREDICATE is not given, then do nothing." (let ((chars (ucs-normalize-sort chars))) (if composition-predicate (ucs-normalize-compose-chars chars composition-predicate) chars))) (eval-when-compile (defun quick-check-list (decomposition-translation &optional composition-predicate) "Quick-Check List for DECOMPOSITION-TRANSLATION and COMPOSITION-PREDICA= TE. It includes Singletons, CompositionExclusions, and Non-Starter decomposition. " (let (entries decomposition composition) (mapc (lambda (start-end) (do ((i (car start-end) (+ i 1))) ((> i (cdr start-end))) (setq decomposition (string-to-list (translate-string (char-to-string i) decomposition-transl= ation))) (setq composition (ucs-normalize-block-compose-chars decomposition compositi= on-predicate)) (when (not (equal composition (list i))) (setq entries (cons i entries))))) check-range) ;;(remove-duplicates (append entries ucs-normalize-composition-exclusions non-starter-decompositions))) ;;) (defvar nfd-quick-check-list nil) (setq nfd-quick-check-list (quick-check-list 'ucs-normalize-nfd-tab= le )) (defvar nfc-quick-check-list nil) (setq nfc-quick-check-list (quick-check-list 'ucs-normalize-nfd-tab= le t )) (defvar nfkd-quick-check-list nil) (setq nfkd-quick-check-list (quick-check-list 'ucs-normalize-nfkd-ta= ble )) (defvar nfkc-quick-check-list nil) (setq nfkc-quick-check-list (quick-check-list 'ucs-normalize-nfkd-ta= ble t )) (defvar hfs-nfd-quick-check-list nil) (setq hfs-nfd-quick-check-list (quick-check-list 'ucs-normalize-hfs-nfd= -table 'ucs-normalize-hfs-nfd= -comp-p)) (defvar hfs-nfc-quick-check-list nil) (setq hfs-nfc-quick-check-list (quick-check-list 'ucs-normalize-hfs-nfd= -table t )) (defun quick-check-list-to-regexp (quick-check-list) (regexp-opt (mapcar 'char-to-string (append quick-check-list combining-= chars)))) (defun quick-check-decomposition-list-to-regexp (quick-check-list) (concat (quick-check-list-to-regexp quick-check-list) "\\|[=EA=B0=80-= =ED=9E=A3]")) (defun quick-check-composition-list-to-regexp (quick-check-list) (concat (quick-check-list-to-regexp quick-check-list) "\\|[=E1=84=80-= =E1=84=92=E1=85=A1-=E1=85=B5=E1=86=A7-=E1=87=82]")) ) ;; NFD/NFC (defvar ucs-normalize-nfd-quick-check-regexp nil) (setq ucs-normalize-nfd-quick-check-regexp (eval-when-compile (quick-check-decomposition-list-to-regexp nfd-quick-ch= eck-list))) (defvar ucs-normalize-nfc-quick-check-regexp nil) (setq ucs-normalize-nfc-quick-check-regexp (eval-when-compile (quick-check-composition-list-to-regexp nfc-quick-chec= k-list))) ;; NFKD/NFKC (defvar ucs-normalize-nfkd-quick-check-regexp nil) (setq ucs-normalize-nfkd-quick-check-regexp (eval-when-compile (quick-check-decomposition-list-to-regexp nfkd-quick-c= heck-list))) (defvar ucs-normalize-nfkc-quick-check-regexp nil) (setq ucs-normalize-nfkc-quick-check-regexp (eval-when-compile (quick-check-composition-list-to-regexp nfkc-quick-che= ck-list))) ;; HFS-NFD/HFS-NFC (defvar ucs-normalize-hfs-nfd-quick-check-regexp nil) (setq ucs-normalize-hfs-nfd-quick-check-regexp (eval-when-compile (concat (quick-check-decomposition-list-to-regexp hfs-= nfd-quick-check-list)))) (defvar ucs-normalize-hfs-nfc-quick-check-regexp nil) (setq ucs-normalize-hfs-nfc-quick-check-regexp (eval-when-compile (quick-check-composition-list-to-regexp hfs-nfc-quick-= check-list))) ;;-------------------------------------------------------------------------= ----------------- ;; Normalize local region. (defun ucs-normalize-block (from to &optional decomposition-translation-table composition-predicate) "Normalize region FROM TO, by sorting the region with canonical-cc. If DECOMPOSITION-TRANSLATION-TABLE is given, translate region before sorting. If COMPOSITION-PREDICATE is given, then compose the region by using it." (save-restriction (narrow-to-region from to) (goto-char (point-min)) (if decomposition-translation-table (translate-region from to decomposition-translation-table)) (goto-char (point-min)) (let ((start (point)) chars); ccc) (while (not (eobp)) (forward-char) (when (or (eobp) (=3D 0 (ucs-normalize-ccc (char-after (point))))) (setq chars (nconc chars (ucs-normalize-block-compose-chars (string-to-list (buffer-substring start (point))) composition-predicate)) start (point))) ;;(unless ccc (error "Undefined character can not be normalized!")) ) (delete-region (point-min) (point-max)) (apply 'insert (ucs-normalize-compose-chars chars composition-predicate))))) (defun ucs-normalize-region (from to quick-check-regexp translation-table composition-predicate) "Normalize region from FROM to TO. QUICK-CHECK-REGEXP is applied for searching the region. TRANSLATION-TABLE will be used to decompose region. COMPOSITION-PREDICATE will be used to compose region." (save-excursion (save-restriction (narrow-to-region from to) (goto-char (point-min)) (let (start-pos starter) (while (re-search-forward quick-check-regexp nil t) (setq starter (string-to-char (match-string 0))) (setq start-pos (match-beginning 0)) (ucs-normalize-block ;; from (if (or (=3D start-pos (point-min)) (and (=3D 0 (ucs-normalize-ccc starter)) (not (memq starter ucs-normalize-combining-chars)))) start-pos (1- start-pos)) ;; to (if (looking-at ucs-normalize-combining-chars-regexp) (match-end 0) (1+ start-pos)) translation-table composition-predicate)))))) ;; ------------------------------------------------------------------------= -------- (defmacro ucs-normalize-string (ucs-normalize-region) `(with-temp-buffer (insert str) (,ucs-normalize-region (point-min) (point-max)) (buffer-string))) ;;;###autoload (defun ucs-normalize-NFD-region (from to) "Normalize the current region by the Unicode NFD." (interactive "r") (ucs-normalize-region from to ucs-normalize-nfd-quick-check-regexp 'ucs-normalize-nfd-table nil)) ;;;###autoload (defun ucs-normalize-NFD-string (str) "Normalize the string STR by the Unicode NFD." (ucs-normalize-string ucs-normalize-NFD-region)) ;;;###autoload (defun ucs-normalize-NFC-region (from to) "Normalize the current region by the Unicode NFC." (interactive "r") (ucs-normalize-region from to ucs-normalize-nfc-quick-check-regexp 'ucs-normalize-nfd-table t)) ;;;###autoload (defun ucs-normalize-NFC-string (str) "Normalize the string STR by the Unicode NFC." (ucs-normalize-string ucs-normalize-NFC-region)) ;;;###autoload (defun ucs-normalize-NFKD-region (from to) "Normalize the current region by the Unicode NFKD." (interactive "r") (ucs-normalize-region from to ucs-normalize-nfkd-quick-check-regexp 'ucs-normalize-nfkd-table nil)) ;;;###autoload (defun ucs-normalize-NFKD-string (str) "Normalize the string STR by the Unicode NFKD." (ucs-normalize-string ucs-normalize-NFKD-region)) ;;;###autoload (defun ucs-normalize-NFKC-region (from to) "Normalize the current region by the Unicode NFKC." (interactive "r") (ucs-normalize-region from to ucs-normalize-nfkc-quick-check-regexp 'ucs-normalize-nfkd-table t)) ;;;###autoload (defun ucs-normalize-NFKC-string (str) "Normalize the string STR by the Unicode NFKC." (ucs-normalize-string ucs-normalize-NFKC-region)) ;;;###autoload (defun ucs-normalize-HFS-NFD-region (from to) "Normalize the current region by the Unicode NFD and Mac OS's HFS Plus." (interactive "r") (ucs-normalize-region from to ucs-normalize-hfs-nfd-quick-check-regexp 'ucs-normalize-hfs-nfd-table 'ucs-normalize-hfs-nfd-comp-p)) ;;;###autoload (defun ucs-normalize-HFS-NFD-string (str) "Normalize the string STR by the Unicode NFD and Mac OS's HFS Plus." (ucs-normalize-string ucs-normalize-HFS-NFD-region)) ;;;###autoload (defun ucs-normalize-HFS-NFC-region (from to) "Normalize the current region by the Unicode NFC and Mac OS's HFS Plus." (interactive "r") (ucs-normalize-region from to ucs-normalize-hfs-nfc-quick-check-regexp 'ucs-normalize-hfs-nfd-table t)) ;;;###autoload (defun ucs-normalize-HFS-NFC-string (str) "Normalize the string STR by the Unicode NFC and Mac OS's HFS Plus." (ucs-normalize-string ucs-normalize-HFS-NFC-region)) ;; Post-read-conversion function for `utf-8-hfs'. (defun ucs-normalize-hfs-nfd-post-read-conversion (len) (save-excursion (save-restriction (narrow-to-region (point) (+ (point) len)) (let ((buffer-modified-p (buffer-modified-p))) (ucs-normalize-HFS-NFC-region (point-min) (point-max)) (- (point-max) (point-min)))))) ;;; coding-system definition (define-coding-system 'utf-8-hfs "UTF-8 base coding system with normalization on decoding. The singleton characters in HFS normalization exclusion will not be decomposed. It doesn't perform normalization on encoding." :coding-type 'utf-8 :mnemonic ?U :charset-list '(unicode) ;; :decode-translation-table (not necessary) :post-read-conversion 'ucs-normalize-hfs-nfd-post-read-conversion ;; NFD encoder is not necessary because MacOS will automatically do it. ;; :encode-translation-table 'nfd-encode-translation-table ;; :pre-write-conversion 'nfd-encode-function ) (provide 'ucs-normalize) ;;; ucs-normalize.el ends here --=20 --------------------------------------------------------------------- =E5=B7=9D=E5=B9=A1 =E5=A4=AA=E4=B8=80 (kawabata.taichi@gmail.com) = KAWABATA, Taichi