From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.ciao.gmane.io!not-for-mail From: James Thomas Newsgroups: gmane.emacs.devel Subject: [PATCH] Improve Malayalam language transliteration Date: Sun, 26 Apr 2020 19:19:10 +0530 Message-ID: <87d07ul5m1.fsf@gmx.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="ciao.gmane.io:159.69.161.202"; logging-data="40735"; mail-complaints-to="usenet@ciao.gmane.io" To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Apr 26 16:19:11 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1jSi7j-000AVB-Pb for ged-emacs-devel@m.gmane-mx.org; Sun, 26 Apr 2020 16:19:11 +0200 Original-Received: from localhost ([::1]:60194 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jSi7h-0000Aw-Gw for ged-emacs-devel@m.gmane-mx.org; Sun, 26 Apr 2020 10:19:10 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:50052) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jShf0-0006nN-OW for emacs-devel@gnu.org; Sun, 26 Apr 2020 09:49:31 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.90_1) (envelope-from ) id 1jShez-0006ww-RL for emacs-devel@gnu.org; Sun, 26 Apr 2020 09:49:30 -0400 Original-Received: from mout.gmx.net ([212.227.15.18]:60743) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jShez-0006qi-2D for emacs-devel@gnu.org; Sun, 26 Apr 2020 09:49:29 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1587908965; bh=gYoCdX3vIInGcIqjDD2i1AQLjrGnDazU5KHfMQsrxsw=; h=X-UI-Sender-Class:From:To:Subject:Date; b=P3o6wQA1nl1+fshTjqa/LJdrIyoEXpD1ZN3PE/FFcWYgcdKZwPd9ToSHbakHA7+aX KZ7/5kKYO4iCeHdUzYGLpD5/2k18OJPM/g8yN9XiXtQXwaPG9mihwpNpu64M5xmW+q Upxfo4VUt2SVepwOKOd0Q1SQETdn1WBAB4SFnNrE= X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c Original-Received: from localhost ([59.94.229.89]) by mail.gmx.com (mrgmx005 [212.227.17.184]) with ESMTPSA (Nemesis) id 1MVeMA-1jaVf639YS-00RV6C for ; Sun, 26 Apr 2020 15:49:25 +0200 X-Provags-ID: V03:K1:zAH36olw4uUWzq0oTchYbot10YpVlY3My/s+yy8mYHuzbrMmNRR QODwEahSzScHNQmwJEG8DmIhHdXQx1UnEfuVJVsW82bvUvba3LfwWB+3NwEQQYxCzZW1y4a lfMVJfYanErnbWn1RD3tnPF8M9P8pmicpuSy4hP/mfyYJishFmBWKWzOJmEdKgCmTXNKNfo nCS1fPpMnWri7U3YzdwQA== X-UI-Out-Filterresults: notjunk:1;V03:K0:2qLY7YmNWn4=:7EgLCJtNf9Spuo5XL1QWSM RFoX1R0cHLXfIGW5rupiZFTCEhA86fAJ61g5NKn7/a37kkORKTjq6eXwhxuaOuS/7HXENB9A/ W3coZtH7N0nN9P4xjmAntKoJYJRXFOFhDCutjSG6vC8GHf4nehLeDgNRV6jNZkvigdaq6sudM Y0D56/OCfM0lSORjiF/VDy+1jXFCe/ZtjU12PS2bA9pYwdnEkgvCocwiaZk0fpkE2CnzquVNK fNv1MaEa3EUMTKYtP/QZKmfbdp2JxNsRbAlChTq4mTzlICPlSoJEBHIG4z1jULMU8pcAvsPXQ 14Ojh/xUKKWtVLDYo8hP0zJ8/bw6OA7o8teKugjI0uRJ60kjDISVQuNrYsoaitSyaMge8sWXQ aCrmhEFbGDEA/XSP+GLk6C70Wni3udIliuuAKOcHUjxYMOLMw9pfJw1EZqFVaHBg2cc9CdQ7z uIUaaxH2QViiT1tb3zdx07RV2lCR3mpTx9aHJYRtY7Q5n8HOFkuqm/zPGDo1Q3RNWgBvbH7ge aVzqSUHU1lZ98Qi/n3eTQcoZ5fWrVDLriyRa3YIIx8oLLCSCxS4Ra/llDjQGfrXjeTmMUEwui GwlQ/sKWMtq8dU02RlZlCfzooeVHmHf4yhtfg9t5irsKlsQ8Ta2gcvQLONvqvE+0vXMlntW85 TfLjGfUhSCd9p+jFIZ9xnwXqEVRG65q6GkhX/7YwSuUTJXKori41bmgpIuAFqwdEUrj+KjetR N/Nm3M88G5Q8LRzP5oTAMl+pEzJM8y12LGceEKbm5otH6tCKkcW734tGrQMW9LjsWfTJOeHx Received-SPF: pass client-ip=212.227.15.18; envelope-from=jimjoe@gmx.net; helo=mout.gmx.net X-detected-operating-system: by eggs.gnu.org: First seen = 2020/04/26 08:17:28 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] X-Received-From: 212.227.15.18 X-Mailman-Approved-At: Sun, 26 Apr 2020 10:13:24 -0400 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:247835 Archived-At: --=-=-= Content-Type: text/plain The existing Quail ITRANS scheme is incomplete and not applicable to the language - does not support some common characters and does not handle its quirks like 'chillu's. Also, the Inscript method has errors and is not updated to the latest standard. This patch implements the basic sufficient features of the Mozhi scheme (the complete scheme is unnecessarily complicated IMO) and updates Inscript. Refer: https://malayalam.kerala.gov.in/index.php/InputMethods https://sites.google.com/site/cibu/ Note: If checking on Ubuntu (and maybe Debian) set the font to Noto Sans Mono to avoid the problems with the default one. -- Jim --=-=-= Content-Type: text/x-diff; charset=utf-8 Content-Disposition: attachment; filename=0001-Improve-Malayalam-language-transliteration.patch Content-Transfer-Encoding: quoted-printable =46rom 7261783271799b0d9cbd5c49afb119f1b8d9e9d6 Mon Sep 17 00:00:00 2001 From: James Thomas Date: Sun, 26 Apr 2020 18:59:56 +0530 Subject: [PATCH] Improve Malayalam language transliteration The current ITRANS scheme does not support some characters and language quirks like 'chillu's. The Inscript method is not complete. * lisp/language/ind-util.el (indian-mlm-base-table): Add archaic chars & combos; cleanup. (indian-mlm-mozhi-table): For new scheme Mozhi. * lisp/leim/quail/indian.el (inscript-mlm-keytable): Correct errors. Add Inscript chillus & zero-width chars, Mozhi scheme. * etc/NEWS: Mention change Replace ITRANS with a sufficient implementation of the Mozhi scheme. Complete Inscript implementation. Reference: https://malayalam.kerala.gov.in/index.php/InputMethods =2D-- etc/NEWS | 7 +++ lisp/language/ind-util.el | 40 +++++++++++--- lisp/leim/quail/indian.el | 106 ++++++++++++++++++++++++++++++++------ 3 files changed, 129 insertions(+), 24 deletions(-) diff --git a/etc/NEWS b/etc/NEWS index 025d5c14a7..e701cfef41 100644 =2D-- a/etc/NEWS +++ b/etc/NEWS @@ -288,6 +288,13 @@ prefix on the Subject line in various languages. These new navigation commands are bound to 'n' and 'p' in 'apropos-mode'. +** Quail + +--- +*** Improved Malayalam language transliteration +A sufficient implementation of the Mozhi scheme replaces the +incomplete ITRANS scheme. Inscript method updated to latest standard. + =0C * New Modes and Packages in Emacs 28.1 diff --git a/lisp/language/ind-util.el b/lisp/language/ind-util.el index 4319e5537e..fd21f3a6a6 100644 =2D-- a/lisp/language/ind-util.el +++ b/lisp/language/ind-util.el @@ -232,8 +232,8 @@ indian-mlm-base-table '( (;; VOWELS (?=E0=B4=85 nil) (?=E0=B4=86 ?=E0=B4=BE) (?=E0=B4=87 ?=E0=B4=BF) (?= =E0=B4=88 ?=E0=B5=80) (?=E0=B4=89 ?=E0=B5=81) (?=E0=B4=8A ?=E0=B5=82) - (?=E0=B4=8B ?=E0=B5=83) (?=E0=B4=8C nil) nil (?=E0=B4=8F ?=E0=B5=87)= (?=E0=B4=8E ?=E0=B5=86) (?=E0=B4=90 ?=E0=B5=88) - nil (?=E0=B4=93 ?=E0=B5=8B) (?=E0=B4=92 ?=E0=B5=8A) (?=E0=B4=94 ?=E0= =B5=8C) nil nil) + (?=E0=B4=8B ?=E0=B5=83) (?=E0=B4=8C ?=E0=B5=A2) (?=E0=B5=A1 ?=E0=B5= =A3) (?=E0=B4=8F ?=E0=B5=87) (?=E0=B4=8E ?=E0=B5=86) (?=E0=B4=90 ?=E0=B5= =88) + nil (?=E0=B4=92 ?=E0=B5=8A) (?=E0=B4=93 ?=E0=B5=8B) (?=E0=B4=94 ?=E0= =B5=97) (?=E0=B5=8D ?=E0=B5=8D) (?=E0=B5=A0 ?=E0=B5=84)) (;; CONSONANTS ?=E0=B4=95 ?=E0=B4=96 ?=E0=B4=97 ?=E0=B4=98 ?=E0=B4=99 = ;; GUTTRULS ?=E0=B4=9A ?=E0=B4=9B ?=E0=B4=9C ?=E0=B4=9D ?=E0=B4=9E = ;; PALATALS @@ -243,13 +243,14 @@ indian-mlm-base-table ?=E0=B4=AF ?=E0=B4=B0 ?=E0=B4=B1 ?=E0=B4=B2 ?=E0=B4=B3 ?=E0=B4=B4 ?= =E0=B4=B5 ;; SEMIVOWELS ?=E0=B4=B6 ?=E0=B4=B7 ?=E0=B4=B8 ?=E0=B4=B9 ;; SI= BILANTS nil nil nil nil nil nil nil nil ;; NUKTAS - "=E0=B4=9C=E0=B5=8D=E0=B4=9E" "=E0=B4=95=E0=B5=8D=E0=B4=B7") + "=E0=B4=95=E0=B5=8D=E0=B4=B7" + "=E0=B4=B1=E0=B5=8D=E0=B4=B1" "=E0=B4=A8=E0=B5=8D=E0=B4=B1" "=E0=B4= =A4=E0=B5=8D=E0=B4=A4" "=E0=B4=A4=E0=B5=8D=E0=B4=A5" "=E0=B4=9E=E0=B5=8D= =E0=B4=9E" "=E0=B4=99=E0=B5=8D=E0=B4=99" "=E0=B4=A8=E0=B5=8D=E0=B4=A8" + "=E0=B4=9E=E0=B5=8D=E0=B4=9A" "=E0=B4=A8=E0=B5=8D=E0=B4=95" "=E0=B4= =99=E0=B5=8D=E0=B4=95" "=E0=B4=9A=E0=B5=8D=E0=B4=9A" "=E0=B4=9A=E0=B5=8D= =E0=B4=9B" "=E0=B4=95=E0=B5=8D=E0=B4=95" + "=E0=B4=AC=E0=B5=8D=E0=B4=AC" "=E0=B4=95=E0=B5=8D=E0=B4=95" "=E0=B4= =97=E0=B5=8D=E0=B4=97" "=E0=B4=9C=E0=B5=8D=E0=B4=9C" "=E0=B4=AE=E0=B5=8D= =E0=B4=AE" "=E0=B4=AA=E0=B5=8D=E0=B4=AA" "=E0=B4=B5=E0=B5=8D=E0=B4=B5" "= =E0=B4=95=E0=B5=8D=E0=B4=B8" "=E0=B4=B6=E0=B5=8D=E0=B4=B6") (;; Misc Symbols nil ?=E0=B4=82 ?=E0=B4=83 nil ?=E0=B5=8D nil nil) (;; Digits - ?=E0=B5=A6 ?=E0=B5=A7 ?=E0=B5=A8 ?=E0=B5=A9 ?=E0=B5=AA ?=E0=B5=AB ?= =E0=B5=AC ?=E0=B5=AD ?=E0=B5=AE ?=E0=B5=AF) - (;; Inscript-extra (4) (#, $, ^, *, ]) - "=E0=B5=8D=E0=B4=B0" "=E0=B4=B0=E0=B5=8D" "=E0=B4=A4=E0=B5=8D=E0=B4= =B0" "=E0=B4=B6=E0=B5=8D=E0=B4=B0" nil))) + ?=E0=B5=A6 ?=E0=B5=A7 ?=E0=B5=A8 ?=E0=B5=A9 ?=E0=B5=AA ?=E0=B5=AB ?= =E0=B5=AC ?=E0=B5=AD ?=E0=B5=AE ?=E0=B5=AF))) (defvar indian-tml-base-table '( @@ -323,6 +324,29 @@ indian-itrans-v5-table-for-tamil (;; misc -- 7 ".N" (".n" "M") "H" ".a" ".h" ("AUM" "OM") ".."))) +(defvar indian-mlm-mozhi-table + '(;; for encode/decode + (;; vowels -- 18 + "a" ("aa" "A") "i" ("ii" "I") "u" ("uu" "U") + "R" "Ll" "Lll" ("E" "ae") "e" "ai" + nil "o" "O" "au" "~" "RR") + (;; consonants -- 40 + ("k" "c") "kh" "g" "gh" "ng" + "ch" ("Ch" "chh") "j" "jh" "nj" + "T" "Th" "D" "Dh" "N" + "th" "thh" "d" "dh" "n" nil + "p" ("ph" "f") "b" "bh" "m" + "y" "r" "rr" "l" "L" "zh" ("v" "w") + ("S" "z") "sh" "s" "h" + nil nil nil nil nil nil nil nil + "X" + ;; some of these are extra to Mozhi + ("t" "tt") "nt" "tth" "tthh" "nnj" "nng" "nn" + "nch" "nc" "nk" "cch" "cchh" "cc" + "B" ("C" "K" "q") "G" "J" "M" "P" "V" "x" "Z") + (;; misc -- 7 + nil nil "H"))) + (defvar indian-kyoto-harvard-table '(;; for encode/decode (;; vowel @@ -520,9 +544,9 @@ indian-knd-itrans-v5-hash (indian-make-hash indian-knd-base-table indian-itrans-v5-table)) -(defvar indian-mlm-itrans-v5-hash +(defvar indian-mlm-mozhi-hash (indian-make-hash indian-mlm-base-table - indian-itrans-v5-table)) + indian-mlm-mozhi-table)) (defvar indian-tml-itrans-v5-hash (indian-make-hash indian-tml-base-table diff --git a/lisp/leim/quail/indian.el b/lisp/leim/quail/indian.el index 2681eab0e5..7fd2b8ed65 100644 =2D-- a/lisp/leim/quail/indian.el +++ b/lisp/leim/quail/indian.el @@ -117,12 +117,6 @@ "\\''" indian-knd-itrans-v5-hash "kannada-itrans" "Kannada" "KndIT" "Kannada transliteration by ITRANS method.") -(if nil - (quail-define-package "malayalam-itrans" "Malayalam" "MlmIT" t "Malay= alam ITRANS")) -(quail-define-indian-trans-package - indian-mlm-itrans-v5-hash "malayalam-itrans" "Malayalam" "MlmIT" - "Malayalam transliteration by ITRANS method.") - (defvar quail-tamil-itrans-syllable-table (let ((vowels '(("=E0=AE=85" nil "a") @@ -358,24 +352,21 @@ inscript-mlm-keytable '( (;; VOWELS (18) (?D nil) (?E ?e) (?F ?f) (?R ?r) (?G ?g) (?T ?t) - (?+ ?=3D) ("F]" "f]") (?! ?@) (?S ?s) (?Z ?z) (?W ?w) - (?| ?\\) (?~ ?`) (?A ?a) (?Q ?q) ("+]" "=3D]") ("R]" "r]")) + (?=3D ?+) nil nil (?S ?s) (?Z ?z) (?W ?w) + nil (?~ ?`) (?A ?a) (?Q ?q)) (;; CONSONANTS (42) ?k ?K ?i ?I ?U ;; GRUTTALS ?\; ?: ?p ?P ?} ;; PALATALS ?' ?\" ?\[ ?{ ?C ;; CEREBRALS - ?l ?L ?o ?O ?v ?V ;; DENTALS + ?l ?L ?o ?O ?v nil ;; DENTALS ?h ?H ?y ?Y ?c ;; LABIALS - ?/ ?j ?J ?n ?N "N]" ?b ;; SEMIVOWELS + ?/ ?j ?J ?n ?N ?B ?b ;; SEMIVOWELS ?M ?< ?m ?u ;; SIBILANTS - "k]" "K]" "i]" "p]" "[]" "{]" "H]" "/]" ;; NUKTAS - ?% ?&) + nil nil nil nil nil nil nil nil nil) ;; NUKTAS (;; Misc Symbols (7) - ?X ?x ?_ ">]" ?d "X]" ?>) + nil ?x ?_ nil ?d) (;; Digits - ?0 ?1 ?2 ?3 ?4 ?5 ?6 ?7 ?8 ?9) - (;; Inscripts - ?# ?$ ?^ ?* ?\]))) + ?0 ?1 ?2 ?3 ?4 ?5 ?6 ?7 ?8 ?9))) (defvar inscript-tml-keytable '( @@ -463,6 +454,21 @@ inscript-tml-keytable "malayalam-inscript" "Malayalam" "MlmIS" "Malayalam keyboard Inscript.") +;; Chillus +(quail-defrule "Cd" ["=E0=B4=A3=E0=B5=8D"]) +(quail-defrule "Cd]" ?=E0=B5=BA) +(quail-defrule "vd" ["=E0=B4=A8=E0=B5=8D"]) +(quail-defrule "vd]" ?=E0=B5=BB) +(quail-defrule "jd" ["=E0=B4=B0=E0=B5=8D"]) +(quail-defrule "jd]" ?=E0=B5=BC) +(quail-defrule "nd" ["=E0=B4=B2=E0=B5=8D"]) +(quail-defrule "nd]" ?=E0=B5=BD) +(quail-defrule "Nd" ["=E0=B4=B3=E0=B5=8D"]) +(quail-defrule "Nd]" ?=E0=B5=BE) + +(quail-defrule "\\" ?=E2=80=8C) +(quail-defrule "X" ?=E2=80=8B) + (if nil (quail-define-package "tamil-inscript" "Tamil" "TmlIS" t "Tamil keybo= ard Inscript")) (quail-define-inscript-package @@ -571,4 +577,72 @@ inscript-tml-keytable ("?" ?\?) ("/" ?=E0=A7=8D)) +(defun indian-mlm-mozhi-update-translation (control-flag) + (let ((len (length quail-current-key)) chillu + (vowels '(?a ?e ?i ?o ?u ?A ?E ?I ?O ?U ?R))) + (cond ((numberp control-flag) + (progn (if (=3D control-flag 0) + (setq quail-current-str quail-current-key) + (cond (input-method-exit-on-first-char) + ((and (memq (aref quail-current-key + (1- control-flag)) + vowels) + (setq chillu (cl-position + (aref quail-current-key + control-flag) + '(?m ?N ?n ?r ?l ?L)))) + ;; conditions for putting chillu + (and (or (and (=3D control-flag (1- len)) + (not (setq control-flag nil))) + (and (=3D control-flag (- len 2)) + (let ((temp (aref quail-current-key + (1- len)))) + ;; is it last char of word? + (not + (or (and (>=3D temp ?a) (<=3D temp ?z)) + (and (>=3D temp ?A) (<=3D temp ?Z)) + (eq temp ?~)))) + (setq control-flag (1+ control-flag)))) + (setq quail-current-str ;; put chillu + (concat (if (not (stringp + quail-current-str)) + (string quail-current-str) + quail-current-str) + (string + (nth chillu '(?=E0=B4=82 ?=E0=B5=BA ?=E0=B5=BB ?=E0=B5=BC ?= =E0=B5=BD ?=E0=B5=BE))))))))) + (and (not input-method-exit-on-first-char) control-flag + (while (> len control-flag) + (setq len (1- len)) + (setq unread-command-events + (cons (aref quail-current-key len) + unread-command-events)))) + )) + ((null control-flag) + (unless quail-current-str + (setq quail-current-str quail-current-key) + )) + ((equal control-flag t) + (if (memq (aref quail-current-key (1- len)) ;; If vowel ending, + vowels) ;; may have to put + (setq control-flag nil))))) ;; chillu. So don't + control-flag) ;; end translatio= n + +(quail-define-package "malayalam-mozhi" "Malayalam" "MlmMI" t + "Malayalam transliteration by Mozhi method." + nil nil t nil nil nil t nil + 'indian-mlm-mozhi-update-translation) + +(maphash + (lambda (key val) + (quail-defrule key (if (=3D (length val) 1) + (string-to-char val) + (vector val)))) + (cdr indian-mlm-mozhi-hash)) + +(defun indian-mlm-mozhi-underscore (key len) (throw 'quail-tag nil)) + +(quail-defrule "_" 'indian-mlm-mozhi-underscore) +(quail-defrule "|" ?=E2=80=8C) +(quail-defrule "||" ?=E2=80=8B) + ;;; indian.el ends here =2D- 2.20.1 --=-=-=--