unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Richard Wordingham via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 20140@debbugs.gnu.org, Lars Ingebrigtsen <larsi@gnus.org>
Subject: bug#20140: 24.4; M17n shaper output rejected
Date: Sat, 5 Feb 2022 22:52:51 +0000	[thread overview]
Message-ID: <20220205225251.08a0faab@JRWUBU2> (raw)
In-Reply-To: <83v8xv2icg.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 2805 bytes --]

On Fri, 04 Feb 2022 09:37:03 +0200
Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Lars Ingebrigtsen <larsi@gnus.org>
> > Date: Thu, 03 Feb 2022 22:21:28 +0100
> > Cc: 20140@debbugs.gnu.org
> > 
> > Richard Wordingham <richard.wordingham@ntlworld.com> writes:
> >   
> > > I am running Emacs 24.4 in a Ubuntu 12.04 Precise Pangolin
> > > installation, for which the version of libm17n-0 is 1.6.3-1.  I am
> > > attempting to induce Emacs to render the Tai Tham script.  There
> > > appears to be a bug/feature in Emacs which makes this
> > > unnecessarily difficult.  
> > 
> > (I'm going through old bug reports that unfortunately weren't
> > resolved at the time.)
> > 
> > I vaguely remember there having been some fixes in this area since
> > this bug report was opened -- does this work better for you in more
> > recent versions of Emacs?  

I'm currently using the vanilla emacs on Ubuntu Focal, which is
described as 'GNU Emacs 26.3 (build 2, x86_64-pc-linux-gnu, GTK+
Version 3.24.14) of 2020-03-26, modified by Debian'.  The key good news
is that the commands forward-char-intrusive and backward-char-intrusive
are now standard, so I can position the cursor by dead-reckoning.  You
can reasonably mark the issue as solved.

> The most important change is that we now use HarfBuzz by default.

Isn't that only true for Emacs 27.1 and above?

> Richard didn't contribute the Tai Tham composition rules to us
> (AFAIR), so I cannot test what happens now in Emacs with HarfBuzz.
> Maybe we should revisit this issue, but first I hope Richard could
> tell whether the issue still exists, and if so, what composition rules
> he uses or suggests to use for Tai Tham.

Sad to see that Khaled Hosny's suggestion not to use composition rules
seems not to have been taken.

You're welcome to include my composition rules.  They're complicated by
the facts that the 'regular expressions' are not interpreted as regular
expressions and they are not interpreted as closed under canonical
equivalence.  I therefore calculate the regular expression.  My
composition rules are attached as tai-tham.el, which was last modified
on 20 March 2015.  (It would need reformatting to paste into this
email.)

There are some deficiencies; I've a feeling there may be a problem with
adding ZWNJ and CGJ as marks; ZWJ should also be added for
completeness.  I need ZWNJ to write 4-column ᨴᩣᩴᨶ᩠ᩅ‌ᩣ᩠ᨿ as opposed to
3-column ᨴᩣᩴᨶ᩠ᩅᩣ᩠ᨿ, and even with my font, HarfBuzz will need CGJ for
the suppression of jack-booted dotted circles. Additionally, for
didactic text, what can I do for U+25CC for explicit display of marks
and their equivalents on a dotted circle, and for that matter, for
display on NBSP?

Richard.

Richard.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: tai-tham.el --]
[-- Type: text/x-emacs-lisp, Size: 4070 bytes --]

;;; tai-tham.el --- support for Tai Tham -*- coding: utf-8 -*-

;; Copyright (C) 2008, 2009, 2010, 2011
;;   National Institute of Advanced Industrial Science and Technology (AIST)
;;   Registration Number H13PRO009

;; Keywords: multilingual, Tai Tham, i18n

;; This file is part of GNU Emacs.

;; GNU Emacs is free software: you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation, either version 3 of the License, or
;; (at your option) any later version.

;; GNU Emacs is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;; GNU General Public License for more details.

;; You should have received a copy of the GNU General Public License
;; along with GNU Emacs.  If not, see <http://www.gnu.org/licenses/>.

;;; Code:

;; (set-language-info-alist
;;  "Northern Thai" '((charset unicode)
;; 		   (coding-system utf-8)
;;		   (coding-priority utf-8)
;;		   (sample-text .
;;		     "Northern Thai (ᨣᩣᩴᨾᩮᩬᩥᨦ / ᨽᩣᩈᩣᩃ᩶ᩣ᩠ᨶᨶᩣ)	ᩈ᩠ᩅᩢᩔ᩠ᨯᩦᨣᩕᩢ᩠ᨸ")
;;		   (documentation . t)))

;; To load:
;; (load-file "~/tham/tai-tham.el") tai-tham-composable-pattern
;; 

(defvar tai-tham-composable-pattern
  (let ((table
	 ;; C is letters, independent vowels, digits, punctuation and symbols.
	 '(("C" . "[\u1A20-\u1A54\u1A80-\u1A89\u1A90-\u1A99\u1AA0-\u1AAD]")
	   ("M" . "[\u1A55-\u1A57\u1A59-\u1A5E\u1A61-\u1A7C\u1A7F]"); Mark
	   ("H" . "\u1A60") ; sakot
           ("S" . "[\u1A75-\u1A7C]") ; Marks commuting with sakot
	   ("N" . "\u1A58"))) ; mai kang lai
;; The definition of a sequence of interacting Tai Tham characters is
;; surprisingly complicated.  The basic syllable structure should just be:
;;
;;                           C(M|HC)*
;;
;; There are three complications:
;;
;; 1. Emacs uses a backtracking regular expression engine, but it only
;;    backtracks if the characters accepted so far don't only match the regular
;;    expression.  Thus if M includes sakot, CHC will be parsed as CH and then
;;    C - there is no cause to backtrack!  On the other hand, missing consonants
;;    should not disrupt display - the glyph for sakot will normally alert the
;;    user that text entry is incomplete.
;;
;; 2. Some characters can be swapped round with sakot without changing the
;;    signification of the sequence of characters.  The regular expression
;;    works with strings of characters rather than traces of fully decomposed
;;    characters subject to Unicode's canonical equivalence.
;;
;; 3. Which syllable mai kang lai belongs to depends on the font.  Again, if
;;    M included mai kang lai, CNC would be parsed as CN and C.  The word
;;    ᨴᩘ᩠ᩃᩣ᩠ᨿ has mai kang lai in the middle of an orthographic syllable.
;	(basic_syllable "C\\(N*\\(M\\|HS*C?\\)\\)*")
	(basic_syllable "C\\(N*\\(M\\|HS*C\\)\\)*")
        (regexp "X\\(N\\(X\\)?\\)*H?")) ; X is basic syllable
    (let ((case-fold-search nil))
      (setq regexp (replace-regexp-in-string "X" basic_syllable regexp t t))
      (dolist (elt table)
	(setq regexp (replace-regexp-in-string (car elt) (cdr elt)
					       regexp t t))))
    regexp))

; Failed attempt to get proper composition for incomplete word ᨴᩘ᩠ᩃᩣ᩠.
;(let ((elt (list (vector tai-tham-composable-pattern 3 'font-shape-gstring)
;		 (vector tai-tham-composable-pattern 2 'font-shape-gstring)
;		 (vector tai-tham-composable-pattern 1 'font-shape-gstring)
;		 (vector tai-tham-composable-pattern 0 'font-shape-gstring)
;		 (vector "." 0 'font-shape-gstring)
;		 )))
;  (set-char-table-range composition-function-table '(#x1A20 . #x1AAD) elt))

(let ((elt (list (vector tai-tham-composable-pattern 0 'font-shape-gstring)
		 (vector "." 0 'font-shape-gstring)
		 )))
  (set-char-table-range composition-function-table '(#x1A20 . #x1AAD) elt))

  reply	other threads:[~2022-02-05 22:52 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-18 22:20 bug#20140: 24.4; M17n shaper output rejected Richard Wordingham
2015-03-19  3:43 ` Eli Zaretskii
2015-03-21  8:33 ` K. Handa
2015-03-21 17:20   ` Wolfgang Jenkner
2015-03-21 17:58   ` Richard Wordingham
2015-03-21 18:26     ` Eli Zaretskii
2015-03-25 14:25     ` K. Handa
2015-03-25 21:45       ` Richard Wordingham
2015-04-05 19:48       ` Richard Wordingham
2022-02-03 21:21 ` Lars Ingebrigtsen
2022-02-04  7:37   ` Eli Zaretskii
2022-02-05 22:52     ` Richard Wordingham via Bug reports for GNU Emacs, the Swiss army knife of text editors [this message]
2022-02-06  8:11       ` Eli Zaretskii
2022-02-06 22:09         ` Richard Wordingham via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-07 14:04           ` Eli Zaretskii
2022-02-07 23:38             ` Richard Wordingham via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-08 22:13         ` Richard Wordingham via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-12 18:54           ` Eli Zaretskii
2022-02-13 16:04       ` Eli Zaretskii
2022-02-13 20:53         ` Richard Wordingham via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-14 13:19           ` Eli Zaretskii
2022-02-14 22:14             ` Richard Wordingham via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-15  1:27               ` Richard Wordingham via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-16 15:13                 ` Eli Zaretskii
2022-02-16 15:12               ` Eli Zaretskii
2022-02-16 15:11           ` Eli Zaretskii
2022-02-13 19:49       ` Eli Zaretskii
2022-02-13 21:11         ` Richard Wordingham via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-14 13:26           ` Eli Zaretskii
2022-02-14 23:26             ` Richard Wordingham via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-15 14:40               ` Eli Zaretskii
2022-02-15 21:06                 ` Richard Wordingham via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-16 13:15                   ` Eli Zaretskii
2022-02-16 19:01                     ` Richard Wordingham via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-16 19:20                       ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220205225251.08a0faab@JRWUBU2 \
    --to=bug-gnu-emacs@gnu.org \
    --cc=20140@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=larsi@gnus.org \
    --cc=richard.wordingham@ntlworld.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).