From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Richard Wordingham Newsgroups: gmane.emacs.bugs Subject: bug#20140: 24.4; M17n shaper output rejected Date: Wed, 18 Mar 2015 22:20:40 +0000 Message-ID: <20150318222040.4066e6e9@JRWUBU2> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1426717292 24486 80.91.229.3 (18 Mar 2015 22:21:32 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 18 Mar 2015 22:21:32 +0000 (UTC) To: 20140@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Wed Mar 18 23:21:15 2015 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YYMKv-00014h-T6 for geb-bug-gnu-emacs@m.gmane.org; Wed, 18 Mar 2015 23:21:14 +0100 Original-Received: from localhost ([::1]:36139 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YYMKv-00076H-6J for geb-bug-gnu-emacs@m.gmane.org; Wed, 18 Mar 2015 18:21:13 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43074) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YYMKr-00075z-G4 for bug-gnu-emacs@gnu.org; Wed, 18 Mar 2015 18:21:10 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YYMKk-00048Y-RG for bug-gnu-emacs@gnu.org; Wed, 18 Mar 2015 18:21:09 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:53487) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YYMKk-00048U-NH for bug-gnu-emacs@gnu.org; Wed, 18 Mar 2015 18:21:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1YYMKk-0006ee-DU for bug-gnu-emacs@gnu.org; Wed, 18 Mar 2015 18:21:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Richard Wordingham Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 18 Mar 2015 22:21:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 20140 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.142671725525553 (code B ref -1); Wed, 18 Mar 2015 22:21:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 18 Mar 2015 22:20:55 +0000 Original-Received: from localhost ([127.0.0.1]:52055 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YYMKc-0006e5-TD for submit@debbugs.gnu.org; Wed, 18 Mar 2015 18:20:55 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:41301) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YYMKa-0006dw-5d for submit@debbugs.gnu.org; Wed, 18 Mar 2015 18:20:53 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YYMKY-00046U-FW for submit@debbugs.gnu.org; Wed, 18 Mar 2015 18:20:51 -0400 Original-Received: from lists.gnu.org ([2001:4830:134:3::11]:50987) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YYMKY-00046Q-Cx for submit@debbugs.gnu.org; Wed, 18 Mar 2015 18:20:50 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:43022) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YYMKW-00070s-Ok for bug-gnu-emacs@gnu.org; Wed, 18 Mar 2015 18:20:50 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YYMKT-000467-BB for bug-gnu-emacs@gnu.org; Wed, 18 Mar 2015 18:20:48 -0400 Original-Received: from know-smtprelay-omc-10.server.virginmedia.net ([80.0.253.74]:46876) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YYMKS-00045w-TU for bug-gnu-emacs@gnu.org; Wed, 18 Mar 2015 18:20:45 -0400 Original-Received: from JRWUBU2 ([81.103.224.4]) by know-smtprelay-10-imp with bizsmtp id 5ALj1q02L06JmVd01ALjNr; Wed, 18 Mar 2015 22:20:43 +0000 X-Originating-IP: [81.103.224.4] X-Spam: 0 X-Authority: v=2.1 cv=dY0O3Bne c=1 sm=1 tr=0 a=pLuj3OkTrmEUIJBpyvkqVg==:117 a=pLuj3OkTrmEUIJBpyvkqVg==:17 a=IkcTkHD0fZMA:10 a=NLZqzBF-AAAA:8 a=mDV3o1hIAAAA:8 a=ibnQV_NrJ8uHORisgRgA:9 a=QEXdDO2ut3YA:10 X-Mailer: Claws Mail 3.8.0 (GTK+ 2.24.10; i686-pc-linux-gnu) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:100640 Archived-At: I am running Emacs 24.4 in a Ubuntu 12.04 Precise Pangolin installation, for which the version of libm17n-0 is 1.6.3-1. I am attempting to induce Emacs to render the Tai Tham script. There appears to be a bug/feature in Emacs which makes this unnecessarily difficult. To achieve Tai Tham rendering, I added the following in new, loaded file tai-tham.el: (defvar tai-tham-composable-pattern (let ((table ;; C is letters, independent vowels, digits, punctuation and symbols. '(("C" . "[\u1A20-\u1A54\u1A80-\u1A89\u1A90-\u1A99\u1AA0-\u1AAD]") ("M" . "[\u1A55-\u1A5E\u1A61-\u1A7C\u1A7F]"); Mark ("S" . "[\u1A75-\u1A7C]") ; Marks commuting with sakot ("H" . "\u1A60") ; sakot ("N" . "\u1A58"))) ; mai kang lai - also included in M. ;; Which orthographic syllable mai kang lai belongs to can depend on the font! (regexp "C\\(M\\|HS*C?\\)*\\(NC\\(M\\|HS*C?\\)*\\)*N?")) (let ((case-fold-search nil)) (dolist (elt table) (setq regexp (replace-regexp-in-string (car elt) (cdr elt) regexp t t)))) regexp)) (let ((elt (list (vector tai-tham-composable-pattern 0 'font-shape-gstring) (vector "." 0 'font-shape-gstring) ))) (set-char-table-range composition-function-table '(#x1A20 . #x1AAD) elt)) I added the following (cut-down) file LANA-OFT.flt to the m17n database: (font layouter lana-otf nil (font (nil nil unicode-bmp :otf=3Dlana))) (category ;; H: SAKOT ;; N: Other character with non-zero canonical combining class ;; Z: Character with ccc=3D0 or other with ccc=3D9=20 (0x0000 0x1A5F ?Z)bug-gnu-emacs@gnu.org (0x1A60 ?H) (0x1A61 0x1A74 ?Z) (0x1A75 0x1A7C ?N) (0x1A7D 0xFFFF ?Z) ) (generator (0 (cond ("(H)(N+)" (2 =3D *) (1 =3D)) ("." =3D) ) * ) ) (category ;; C: Consonant and non-mark (lenient processing) ;; H: SAKOT ;; P: Preposed vowelbug-gnu-emacs@gnu.org ;; R: Medial RA (preposed dependent consonant) ;; M: Mark (0x1A20 0x1A54 ?C) (0x1A55 0x1A55 ?R) (0x1A56 0x1A5E ?M) (0x1A5F ?C) ; Unassigned (0x1A60 ?H) (0x1A61 0x1A6D ?M) (0x1A6E 0x1A72 ?P) (0x1A73 0x1A7C ?M) (0x1A7D 0x1A7E ?C) ; Unassigned (0x1A7F ?M) (0x1A80 0x1A89 ?C) (0x1A8A 0x1A8F ?C) ; Unassigned (0x1A90 0x1A99 ?C) (0x1A9A 0x1A9F ?C) ; Unassigned (0x1AA0 0x1AAC ?C) ; Punctuation (0x1AAD ?C) ; Can take a vowel! (0x1AAE 0x1AAF ?C) ; Unassigned ) (generator (0 (cond ("(C)(R|P)" (2 =3D) (1 =3D) ) ("." =3D) )* ) ) (generator (0 otf:lana)) However, much Tai Tham text failed to render properly. To determine what was wrong, I added some monitoring code to ftfont.c: *** ftfont.c.orig 2014-03-21 05:34:40.000000000 +0000 --- ftfont.c 2015-03-18 19:47:30.032718995 +0000 *************** *** 2516,2522 **** --- 2516,2553 ---- flt =3D mflt_get (msymbol ("combining")); for (i =3D 0; i < 3; i++) { + int k; + fprintf(stdout, "mflt_run("); + if (gstring.glyphs[0].encoded) { + for (k =3D 0; k < len; k++) { + fprintf(stdout, " %d", gstring.glyphs[k].code); + } + } else { + for (k =3D 0; k < len; k++) { + fprintf(stdout, " %4.4X", gstring.glyphs[k].c); + } + } int result =3D mflt_run (&gstring, 0, len, &flt_font_ft.flt_font, flt); + if (-1 =3D=3D result) { + fprintf(stdout, ") failed.\n"); + } else if (result >=3D 0) { + fprintf(stdout, ") produced ("); + for (k =3D 0; k < result; k++) { + #if 0 + fprintf(stdout, " %d", gstring.glyphs[k].code); + #else + fprintf(stdout, " %4.4X>%d:%d:%d", + gstring.glyphs[k].c, gstring.glyphs[k].code, + gstring.glyphs[k].from, gstring.glyphs[k].to); + #endif + } + fprintf(stdout, ")\n"); + if (result !=3D gstring.used) { + fprintf(stdout, "Anomalously, gstring.used =3D %d\n", + (int) gstring.used); + } + fflush(0); + } if (result !=3D -2) break; if (INT_MAX / 2 < gstring.allocated) The sample Tai Tham text was: ;; =E1=A9=88=E1=A9=A3=E1=A9=B4=E1=A9=81=E1=A9=A2=E1=A9=A0=E1=A8=B7=E1=A8=BD= =E1=A9=A3=E1=A9=88=E1=A9=A3=E1=A9=83=E1=A9=B6=E1=A9=A3=E1=A9=A0=E1=A8=B6=E1= =A8=B6=E1=A9=A3 / =E1=A8=A3=E1=A9=A3=E1=A9=B4=E1=A8=BE=E1=A9=AE=E1=A9=AC=E1= =A9=A5=E1=A8=A6 - =E1=A9=88=E1=A9=A2=E1=A8=AC=E1=A9=A0=E1=A8=AC=E1=A9=A3 = =E1=A8=A0=E1=A9=A0=E1=A9=B5=E1=A8=B7 =E1=A9=83=E1=A9=A0=E1=A9=B6=E1=A8=AF = =E1=A8=AE=E1=A9=A0 =E1=A8=B3=E1=A9=AB=E1=A9=A0=E1=A9=B5=E1=A8=B6 =E1=A8=A0=E1=A9=A2=E1=A9=A0=E1=A9=B5=E1=A8=B7=E1=A8=A0=E1=A9=AB=E1=A9=B6=E1= =A9=A0=E1=A8=AF=E1=A8=BF=E1=A9=A5=E1=A9=A0=E1=A8=B7=E1=A8=B6=E1=A9=A6=E1=A9= =B5=E1=A9=A0=E1=A8=B7 ;; =E1=A8=A3=E1=A9=95 =E1=A8=B2=E1=A9=B1 I extract and analyse what was rendered as shaped ('accepted') and what was not ('rejected'), quoting the monitoring output. I suspect the problem is the strict testing of the from and to fields in Lisp function font-shape-gstring, which is defined in file font.c. The shaping of the following was accepted: mflt_run( 1A48 1A63 1A74) produced ( 1A48>820:0:0 1A63>858:1:1 1A74>878:2:2) mflt_run( 1A41 1A62 1A60 1A37) produced ( 1A41>813:0:1 1A62>853:0:1 0000>953:2:3) mflt_run( 1A3D 1A63) produced ( 1A3D>808:0:0 1A63>858:1:1) mflt_run( 1A48 1A63) produced ( 1A48>820:0:0 1A63>858:1:1) mflt_run( 1A43 1A76 1A63 1A60 1A36) produced ( 1A43>815:0:1 1A76>890:0:1 1A63>858:2:4 0000>952:2:4)=20 mflt_run( 1A36 1A63) produced ( 1A36>800:0:0 1A63>858:1:1) mflt_run( 1A23 1A63 1A74) produced ( 1A23>777:0:0 0000>859:1:2) mflt_run( 1A26) produced ( 1A26>780:0:0) mflt_run( 1A48 1A62) produced ( 1A48>820:0:1 1A62>853:0:1) mflt_run( 1A2C 1A60 1A2C 1A63) produced ( 0000>789:0:2 1A63>858:3:3) mflt_run( 1A43 1A60 1A76 1A2F) produced ( 1A43>815:0:3 1A76>890:0:3 0000>941:0:3)=20 mflt_run( 1A2E 1A60) produced ( 1A2E>792:0:1 1A60>851:0:1) mflt_run( 1A33 1A6B 1A60 1A75 1A36) produced ( 1A33>797:0:4 1A6B>868:0:4 1A75>889:0:4 0000>952:0:4)=20 mflt_run( 1A20 1A6B 1A76 1A60 1A2F) produced ( 1A20>774:0:4 1A6B>868:0:4 1A76>890:0:4 0000>941:0:4) mflt_run( 1A3F 1A65 1A60 1A37) produced ( 1A3F>811:0:1 1A65>862:0:1 0000>953:2:3) The shaping of the following, with vowels or MEDIAL RA that should be rendered before the consonant, was rejected: mflt_run( 1A3E 1A6E 1A6C 1A65) produced ( 1A6E>872:1:1 1A3E>810:0:3 1A6C>869:0:3 1A65>862:0:3)=20 mflt_run( 1A23 1A55) produced ( 1A55>835:1:1 1A23>777:0:0) mflt_run( 1A32 1A71) produced ( 1A71>875:1:1 1A32>796:0:0) The problem is that the first glyph does not derive from the first character. The shaping of the following was rejected: mflt_run( 1A20 1A60 1A75 1A37) produced ( 1A20>774:0:2 1A75>889:0:2 0000>953:1:3) In this case, character 2 is stacked below character 0, and characters 1 and 3 combine to form a spacing glyph. mflt_run( 1A20 1A62 1A60 1A75 1A37) produced ( 1A20>774:0:1 1A62>853:0:3 1A75>889:0:3 0000>953:2:4) Character 1 is mounted on character 0, and character 3 on character 1. Characters 2 and 4 combine to form a spacing glyph. =20 mflt_run( 1A36 1A66 1A75 1A60 1A37) produced ( 1A36>800:0:1 1A66>863:0:2 1A75>889:0:2 0000>953:3:4) Character 1 is mounted on character 0. and character 2 on character 1. Characters 3 and 4 form a spacing glyph. There does appear to be a work around, which is to have m17n declare the orthographic syllables it receives to be 'grapheme clusters'. It solves at least some of the problems above. However, it then makes editing of the 'clusters' more difficult. Note that there are examples above with 5 characters in a cluster, and this is by no means the limit. Richard.