unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Richard Wordingham via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 20140@debbugs.gnu.org, larsi@gnus.org
Subject: bug#20140: 24.4; M17n shaper output rejected
Date: Sun, 13 Feb 2022 20:53:10 +0000	[thread overview]
Message-ID: <20220213205310.0b8a715c@JRWUBU2> (raw)
In-Reply-To: <831r06rbwk.fsf@gnu.org>

On Sun, 13 Feb 2022 18:04:11 +0200
Eli Zaretskii <eliz@gnu.org> wrote:

> > Date: Sat, 5 Feb 2022 22:52:51 +0000
> > From: Richard Wordingham <richard.wordingham@ntlworld.com>
> > Cc: Lars Ingebrigtsen <larsi@gnus.org>, 20140@debbugs.gnu.org
> > 
> > You're welcome to include my composition rules.  
> 
> Thanks.  I started with your code:
> 
> > (defvar tai-tham-composable-pattern
> >   (let ((table
> > 	 ;; C is letters, independent vowels, digits, punctuation
> > and symbols. '(("C" .
> > "[\u1A20-\u1A54\u1A80-\u1A89\u1A90-\u1A99\u1AA0-\u1AAD]") ("M" .
> > "[\u1A55-\u1A57\u1A59-\u1A5E\u1A61-\u1A7C\u1A7F]"); Mark ("H" .
> > "\u1A60") ; sakot ("S" . "[\u1A75-\u1A7C]") ; Marks commuting with
> > sakot ("N" . "\u1A58"))) ; mai kang lai
> > 	(basic_syllable "C\\(N*\\(M\\|HS*C\\)\\)*")
> >         (regexp "X\\(N\\(X\\)?\\)*H?")) ; X is basic syllable
> >     (let ((case-fold-search nil))
> >       (setq regexp (replace-regexp-in-string "X" basic_syllable
> > regexp t t)) (dolist (elt table)
> > 	(setq regexp (replace-regexp-in-string (car elt) (cdr elt)
> > 					       regexp t t))))
> >     regexp))
> > 
> > (let ((elt (list (vector tai-tham-composable-pattern 0
> > 'font-shape-gstring) (vector "." 0 'font-shape-gstring)
> > 		 )))
> >   (set-char-table-range composition-function-table '(#x1A20 .
> > #x1AAD) elt))  
> 
> But that didn't seem to work well enough: e.g., some marks in your
> "sample text" didn't combine with letters, as I think they should.

Which ones?  Are you sure they didn't combine at the Emacs level?
I did suspect the problem was writing '\u1A7C' instead of
'\u1a7c', but I'm no longer so sure.  (The 'C' might get expanded, but
I'm beginning to think not.)

> Then I tried this simplistic setting:
> 
>   (set-char-table-range composition-function-table
> 			'(#x1a20 . #x1aaf)
> 			(list (vector "[\u1a20-\u1aaf]+" 0
> 'font-shape-gstring)))
> 
> and it worked much better, including passing a small number of the
> tests from your renderer test page that I threw on Emacs.  This is on
> MS-Windows with Emacs 29 and HarfBuzz 2.4.0 (which is not even the
> latest release of HarfBuzz), and with the A Tai Tham KH New V3 font.

> Any reason not to use the above simple setup for Tai Tham text
> composition?

Mostly only that you would have to edit the text with "autocomposition
at point disabled" or mark word boundaries, e.g. with U+200B ZERO WIDTH
SPACE. The Tai languages that use Tai Tham use scriptio continua.  While
modern Pali does separate words with visible white space, its words
tend to be polysyllabic; with discerning composition, it would be about
as tolerable as editing Hindi in Devanagari with autocomposition
enabled. (Quite a few people edit Devanagari in transliteration to
Latin!)

You should also add CGJ and ZWNJ, and some people may appreciate ZWJ -
the Khottabun font has ligatures involving ZWJ, though it may just be
an experimental feature - and ultimately WJ, for when someone writes a
Tai Tham word breaker. Oh, and Thai and Lao mai t(r)i and mai
chat(t)awa and U+0324 COMBINING DIAERESIS BELOW turn up occasionally -
U+0324 is supported in Thep's Khottabun font, and my Da Lekh series
supports Thai mai tri and mai chattawa. These characters seem to work
with HarfBuzz.

If using the native Windows renderer is an option with Emacs, then 'A
Tai Tham KH New' works better than 'A Tai Tham KH New V3'.  I've
created https://wrdingham.co.uk/lanna/font_test.htm to do _font_
comparisons.  I'd delayed because I've only recently satisfied myself
that it is lawful, at least under English law.  (The qualms were
with the samples taken from books.)  It's still very much a work in
progress.

> I needed a couple more additions to Emacs to make Tai Tham support
> work OOTB: for example, script-representative-chars lacked an entry
> for Tai Tham, and the default fontset needed an addition.  (And on
> MS-Windows, one needs to run the w32-find-non-USB-fonts magic once, to
> notice the newly installed Tai Tham font.)

> Other than that, assuming the above setting of
> composition-function-table is okay, we are ready to officially add Tai
> Tham to scripts supported by Emacs.

> Btw, is there a way to get all the examples from your
> https://wrdingham.co.uk/lanna/renderer_test.htm as a UTF-8 encoded
> text file?  I'd like to test the Emacs rendering with all of the
> examples, but copy-pasting each example separately from the browser is
> not my idea of useful time investment.  So if you could provide the
> examples as a downloadable text file, I'd appreciate.

As buried (you're not the only one to have overlooked it) in the
penultimate paragraph of 'Content and Layout' section, "The test words
may, in principle, be extracted quite simply from this web page. Each
test 'word' is the content of the first cell in each row whose class is
tst1. For convenience*, I have extracted the first two cells in such
rows, along with titles, to a CSV file."  The file is rt.csv in the
same directory.  I included the meaning and pronunciation as those who
don't know the script may find it easier to refer to the words by
translation or transcription.  You may prefer to use the file more or
less as it is, but one can easily knock up an Emacs macro sequence to
delete the first comma and the rest of the line.  I left the
section titles in for easier navigation to the renderer test file.

*Some people claim to find XML files easy to use, they should then be
able to analyse a file conforming to HTML4 syntax.

Dodgy spellings go in pink rows whose class is 'tst2'.  The alternative
encodings demanded by the USE go in orange rows whose class is 'tst3'.
I have not extracted these.

Richard.





  reply	other threads:[~2022-02-13 20:53 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-18 22:20 bug#20140: 24.4; M17n shaper output rejected Richard Wordingham
2015-03-19  3:43 ` Eli Zaretskii
2015-03-21  8:33 ` K. Handa
2015-03-21 17:20   ` Wolfgang Jenkner
2015-03-21 17:58   ` Richard Wordingham
2015-03-21 18:26     ` Eli Zaretskii
2015-03-25 14:25     ` K. Handa
2015-03-25 21:45       ` Richard Wordingham
2015-04-05 19:48       ` Richard Wordingham
2022-02-03 21:21 ` Lars Ingebrigtsen
2022-02-04  7:37   ` Eli Zaretskii
2022-02-05 22:52     ` Richard Wordingham via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-06  8:11       ` Eli Zaretskii
2022-02-06 22:09         ` Richard Wordingham via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-07 14:04           ` Eli Zaretskii
2022-02-07 23:38             ` Richard Wordingham via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-08 22:13         ` Richard Wordingham via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-12 18:54           ` Eli Zaretskii
2022-02-13 16:04       ` Eli Zaretskii
2022-02-13 20:53         ` Richard Wordingham via Bug reports for GNU Emacs, the Swiss army knife of text editors [this message]
2022-02-14 13:19           ` Eli Zaretskii
2022-02-14 22:14             ` Richard Wordingham via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-15  1:27               ` Richard Wordingham via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-16 15:13                 ` Eli Zaretskii
2022-02-16 15:12               ` Eli Zaretskii
2022-02-16 15:11           ` Eli Zaretskii
2022-02-13 19:49       ` Eli Zaretskii
2022-02-13 21:11         ` Richard Wordingham via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-14 13:26           ` Eli Zaretskii
2022-02-14 23:26             ` Richard Wordingham via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-15 14:40               ` Eli Zaretskii
2022-02-15 21:06                 ` Richard Wordingham via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-16 13:15                   ` Eli Zaretskii
2022-02-16 19:01                     ` Richard Wordingham via Bug reports for GNU Emacs, the Swiss army knife of text editors
2022-02-16 19:20                       ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220213205310.0b8a715c@JRWUBU2 \
    --to=bug-gnu-emacs@gnu.org \
    --cc=20140@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=larsi@gnus.org \
    --cc=richard.wordingham@ntlworld.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).