From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Richard Wordingham via "Bug reports for GNU Emacs, the Swiss army knife of text editors" Newsgroups: gmane.emacs.bugs Subject: bug#20140: 24.4; M17n shaper output rejected Date: Sat, 5 Feb 2022 22:52:51 +0000 Message-ID: <20220205225251.08a0faab@JRWUBU2> References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> Reply-To: Richard Wordingham Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="MP_/Imvf5sngy_WuL_y0QW2hf.K" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="5631"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 20140@debbugs.gnu.org, Lars Ingebrigtsen To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Feb 05 23:53:39 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nGTw1-0001G4-0V for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 05 Feb 2022 23:53:37 +0100 Original-Received: from localhost ([::1]:50588 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nGTvz-00089Q-W6 for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 05 Feb 2022 17:53:36 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:37226) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nGTvi-00089A-IX for bug-gnu-emacs@gnu.org; Sat, 05 Feb 2022 17:53:18 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:41945) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nGTvS-0006Ze-FZ for bug-gnu-emacs@gnu.org; Sat, 05 Feb 2022 17:53:14 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1nGTvS-0004Df-Fh for bug-gnu-emacs@gnu.org; Sat, 05 Feb 2022 17:53:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Richard Wordingham Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 05 Feb 2022 22:53:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20140 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: moreinfo Original-Received: via spool by 20140-submit@debbugs.gnu.org id=B20140.164410158116212 (code B ref 20140); Sat, 05 Feb 2022 22:53:02 +0000 Original-Received: (at 20140) by debbugs.gnu.org; 5 Feb 2022 22:53:01 +0000 Original-Received: from localhost ([127.0.0.1]:35842 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nGTvQ-0004DP-Lo for submit@debbugs.gnu.org; Sat, 05 Feb 2022 17:53:01 -0500 Original-Received: from smtpq1.tb.ukmail.iss.as9143.net ([212.54.57.96]:38948) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nGTvO-0004DD-OM for 20140@debbugs.gnu.org; Sat, 05 Feb 2022 17:52:59 -0500 Original-Received: from [212.54.57.106] (helo=csmtp2.tb.ukmail.iss.as9143.net) by smtpq1.tb.ukmail.iss.as9143.net with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nGTvI-0006wz-N7 for 20140@debbugs.gnu.org; Sat, 05 Feb 2022 23:52:52 +0100 Original-Received: from JRWUBU2 ([82.27.122.109]) by cmsmtp with ESMTP id GTvIn2JD4YDyuGTvIn3ESm; Sat, 05 Feb 2022 23:52:52 +0100 X-SourceIP: 82.27.122.109 X-Spam: 0 X-Authority: v=2.4 cv=eu3Mc6lX c=1 sm=1 tr=0 ts=61feffc4 cx=a_exe a=lZfnwhydZ+7bl6OdZ0zTBw==:117 a=lZfnwhydZ+7bl6OdZ0zTBw==:17 a=oGFeUVbbRNcA:10 a=mDV3o1hIAAAA:8 a=OocQHUDgAAAA:8 a=NLZqzBF-AAAA:8 a=KxoD5NdYuB1iYnENJgQA:9 a=QEXdDO2ut3YA:10 a=M_eVecF_RifbRHp7dpgA:9 a=_FVE-zBwftR9WsbkzFJk:22 a=xUZTl98r3Qw_uB5NK3jt:22 a=wW_WBVUImv98JQXhvVPZ:22 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ntlworld.com; s=meg.feb2017; t=1644101572; bh=6T9m1uuehi08aaE6MuAgNOI7SnwPXLltdxcEN2CJL2U=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=tkMfw8GhG5ReBG+zBm9xyk3dqGIAwFpq/sRLiSw1t8glppucRIgi44avdW5nesMUK jOBfyDNRrPQjQTyu6UsUKsmVG4/i3hW35Ynz9ZBvN7brttNTuKXjbmmDZOybY28cEd 4OPnJGq1KnBJHDqjUO3fAN8DcI+pliioTmlA61YQxDETRxgi5fFMgu2Hjd+P7wrOa8 NuTP7wpggeiNZ3MFzFeZbpOxF7ndq5nEztChAlB3f1WoW7tGxTbkqE1nvqdIWS2Ckk iUngPnOYvYqbFPgWGL6ZWyL6MW3L2grognoJPlJ6hrkzDxyvHqQhumoflv27OaGLm4 MxRhGIn2oqC/w== In-Reply-To: <83v8xv2icg.fsf@gnu.org> X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu) X-CMAE-Envelope: MS4xfGwskZ0ax6CiUm0fMk7xt5TVQYp3y5Zw7CczaZh7ir+KckZt1j85NbjnYjX+Sp9AufrgvBzTMhrr041GYcJvFHvWRiNAp5NnAAOcwx6Tg3DwMg8gbU1m LpgwjW0YNeC0nJUVOUOgDh82HqZW28MeB//1j50vSfsFJXUPc+hmhxyqebnlyhtxp6gTQmE31Mo/SNQ+z7gY6IfBGhapRRIy/J9ibND+SgrPuTvriZ96Tg8T ZqtQMFodMWBWW+1j+ntvZw== X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:226107 Archived-At: --MP_/Imvf5sngy_WuL_y0QW2hf.K Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Fri, 04 Feb 2022 09:37:03 +0200 Eli Zaretskii wrote: > > From: Lars Ingebrigtsen > > Date: Thu, 03 Feb 2022 22:21:28 +0100 > > Cc: 20140@debbugs.gnu.org > >=20 > > Richard Wordingham writes: > > =20 > > > I am running Emacs 24.4 in a Ubuntu 12.04 Precise Pangolin > > > installation, for which the version of libm17n-0 is 1.6.3-1. I am > > > attempting to induce Emacs to render the Tai Tham script. There > > > appears to be a bug/feature in Emacs which makes this > > > unnecessarily difficult. =20 > >=20 > > (I'm going through old bug reports that unfortunately weren't > > resolved at the time.) > >=20 > > I vaguely remember there having been some fixes in this area since > > this bug report was opened -- does this work better for you in more > > recent versions of Emacs? =20 I'm currently using the vanilla emacs on Ubuntu Focal, which is described as 'GNU Emacs 26.3 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.24.14) of 2020-03-26, modified by Debian'. The key good news is that the commands forward-char-intrusive and backward-char-intrusive are now standard, so I can position the cursor by dead-reckoning. You can reasonably mark the issue as solved. > The most important change is that we now use HarfBuzz by default. Isn't that only true for Emacs 27.1 and above? > Richard didn't contribute the Tai Tham composition rules to us > (AFAIR), so I cannot test what happens now in Emacs with HarfBuzz. > Maybe we should revisit this issue, but first I hope Richard could > tell whether the issue still exists, and if so, what composition rules > he uses or suggests to use for Tai Tham. Sad to see that Khaled Hosny's suggestion not to use composition rules seems not to have been taken. You're welcome to include my composition rules. They're complicated by the facts that the 'regular expressions' are not interpreted as regular expressions and they are not interpreted as closed under canonical equivalence. I therefore calculate the regular expression. My composition rules are attached as tai-tham.el, which was last modified on 20 March 2015. (It would need reformatting to paste into this email.) There are some deficiencies; I've a feeling there may be a problem with adding ZWNJ and CGJ as marks; ZWJ should also be added for completeness. I need ZWNJ to write 4-column =E1=A8=B4=E1=A9=A3=E1=A9=B4=E1= =A8=B6=E1=A9=A0=E1=A9=85=E2=80=8C=E1=A9=A3=E1=A9=A0=E1=A8=BF as opposed to 3-column =E1=A8=B4=E1=A9=A3=E1=A9=B4=E1=A8=B6=E1=A9=A0=E1=A9=85=E1=A9=A3=E1= =A9=A0=E1=A8=BF, and even with my font, HarfBuzz will need CGJ for the suppression of jack-booted dotted circles. Additionally, for didactic text, what can I do for U+25CC for explicit display of marks and their equivalents on a dotted circle, and for that matter, for display on NBSP? Richard. Richard. --MP_/Imvf5sngy_WuL_y0QW2hf.K Content-Type: text/x-emacs-lisp Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename=tai-tham.el ;;; tai-tham.el --- support for Tai Tham -*- coding: utf-8 -*- ;; Copyright (C) 2008, 2009, 2010, 2011 ;; National Institute of Advanced Industrial Science and Technology (AIST) ;; Registration Number H13PRO009 ;; Keywords: multilingual, Tai Tham, i18n ;; This file is part of GNU Emacs. ;; GNU Emacs is free software: you can redistribute it and/or modify ;; it under the terms of the GNU General Public License as published by ;; the Free Software Foundation, either version 3 of the License, or ;; (at your option) any later version. ;; GNU Emacs is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ;; GNU General Public License for more details. ;; You should have received a copy of the GNU General Public License ;; along with GNU Emacs. If not, see . ;;; Code: ;; (set-language-info-alist ;; "Northern Thai" '((charset unicode) ;; (coding-system utf-8) ;; (coding-priority utf-8) ;; (sample-text . ;; "Northern Thai (=E1=A8=A3=E1=A9=A3=E1=A9=B4=E1=A8=BE=E1=A9=AE=E1= =A9=AC=E1=A9=A5=E1=A8=A6 / =E1=A8=BD=E1=A9=A3=E1=A9=88=E1=A9=A3=E1=A9=83=E1= =A9=B6=E1=A9=A3=E1=A9=A0=E1=A8=B6=E1=A8=B6=E1=A9=A3) =E1=A9=88=E1=A9=A0=E1= =A9=85=E1=A9=A2=E1=A9=94=E1=A9=A0=E1=A8=AF=E1=A9=A6=E1=A8=A3=E1=A9=95=E1=A9= =A2=E1=A9=A0=E1=A8=B8") ;; (documentation . t))) ;; To load: ;; (load-file "~/tham/tai-tham.el") tai-tham-composable-pattern ;;=20 (defvar tai-tham-composable-pattern (let ((table ;; C is letters, independent vowels, digits, punctuation and symbols. '(("C" . "[\u1A20-\u1A54\u1A80-\u1A89\u1A90-\u1A99\u1AA0-\u1AAD]") ("M" . "[\u1A55-\u1A57\u1A59-\u1A5E\u1A61-\u1A7C\u1A7F]"); Mark ("H" . "\u1A60") ; sakot ("S" . "[\u1A75-\u1A7C]") ; Marks commuting with sakot ("N" . "\u1A58"))) ; mai kang lai ;; The definition of a sequence of interacting Tai Tham characters is ;; surprisingly complicated. The basic syllable structure should just be: ;; ;; C(M|HC)* ;; ;; There are three complications: ;; ;; 1. Emacs uses a backtracking regular expression engine, but it only ;; backtracks if the characters accepted so far don't only match the reg= ular ;; expression. Thus if M includes sakot, CHC will be parsed as CH and t= hen ;; C - there is no cause to backtrack! On the other hand, missing conso= nants ;; should not disrupt display - the glyph for sakot will normally alert = the ;; user that text entry is incomplete. ;; ;; 2. Some characters can be swapped round with sakot without changing the ;; signification of the sequence of characters. The regular expression ;; works with strings of characters rather than traces of fully decompos= ed ;; characters subject to Unicode's canonical equivalence. ;; ;; 3. Which syllable mai kang lai belongs to depends on the font. Again, if ;; M included mai kang lai, CNC would be parsed as CN and C. The word ;; =E1=A8=B4=E1=A9=98=E1=A9=A0=E1=A9=83=E1=A9=A3=E1=A9=A0=E1=A8=BF has m= ai kang lai in the middle of an orthographic syllable. ; (basic_syllable "C\\(N*\\(M\\|HS*C?\\)\\)*") (basic_syllable "C\\(N*\\(M\\|HS*C\\)\\)*") (regexp "X\\(N\\(X\\)?\\)*H?")) ; X is basic syllable (let ((case-fold-search nil)) (setq regexp (replace-regexp-in-string "X" basic_syllable regexp t t)) (dolist (elt table) (setq regexp (replace-regexp-in-string (car elt) (cdr elt) regexp t t)))) regexp)) ; Failed attempt to get proper composition for incomplete word =E1=A8=B4=E1= =A9=98=E1=A9=A0=E1=A9=83=E1=A9=A3=E1=A9=A0. ;(let ((elt (list (vector tai-tham-composable-pattern 3 'font-shape-gstring) ; (vector tai-tham-composable-pattern 2 'font-shape-gstring) ; (vector tai-tham-composable-pattern 1 'font-shape-gstring) ; (vector tai-tham-composable-pattern 0 'font-shape-gstring) ; (vector "." 0 'font-shape-gstring) ; ))) ; (set-char-table-range composition-function-table '(#x1A20 . #x1AAD) elt)) (let ((elt (list (vector tai-tham-composable-pattern 0 'font-shape-gstring) (vector "." 0 'font-shape-gstring) ))) (set-char-table-range composition-function-table '(#x1A20 . #x1AAD) elt)) --MP_/Imvf5sngy_WuL_y0QW2hf.K--