unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
From: Richard Wordingham <richard.wordingham@ntlworld.com>
To: help-gnu-emacs@gnu.org
Subject: Re: Manually parsing char-tables
Date: Mon, 21 Feb 2022 01:39:41 +0000	[thread overview]
Message-ID: <20220221013941.7e97dba1@JRWUBU2> (raw)
In-Reply-To: <835yp9ya4x.fsf@gnu.org>

On Sun, 20 Feb 2022 14:50:54 +0200
Eli Zaretskii <eliz@gnu.org> wrote:

> > Date: Sun, 20 Feb 2022 11:09:26 +0000
> > From: Richard Wordingham <richard.wordingham@ntlworld.com>
> > 
> > I am trying to understand how Arabic script rendering works in Emacs
> > 28.0.90, as it seems to be using a different mechanism to that used
> > for Indic or European scripts.  (There seems to be more to it than
> > just the asymmetries between right-to-left and left-to-right.)  To
> > that end, I am trying to understand the contents of the variable
> > composition-function-table.  
> 
> I think it is easier to just look at how the Arabic part of this table
> is populated.  See lisp/language/misc-lang.el starting from line 105.

I first wanted to check that it was overwritten somewhere else.

> >       #^^[3 1152 nil nil nil #1# #1# #1# #1# #1# #1# #1# nil nil nil
> >       nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil
> > nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil
> >       nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil
> > nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil
> >       nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil
> > nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil
> >       nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil
> > nil nil nil nil]
> > 
> > (I've converted lines to paragraphs and abbreviated leading white
> > space.)
> > 
> >  I'm guessing that #1# is a macro invocation; when I invoke (print
> >  composition-function-table), I get something similar, but with #1#
> >  expanded and the '#1=' in the apparent macro definition omitted.  
> 
> #1# is a backreference to the value indicated by #1=.
> 
> > Where is this syntax explained?  I've looked in the elisp manual,
> > but not found it, though I may simply have failed to guess where
> > such a description was.  
> 
> See the node "Circular Objects" there.

That was reassuring - but I'm wondering why it was not familiar.  Had I
forgotten it?  Perhaps it's later then Emacs 19, when I last came close
to reading the lisp reference manual cover to cover.

Even the read syntax of a char-table is poorly documented. Using the
hint of an unexpanded reference to a 'sub-char-table', I've discovered
that the first key to understanding it is in list.h, and I may have to
delve into the .c files for the finer details.  It looks full of tricks
to reduce the storage requirement, which are reflected in the read
syntax. Perhaps it's not been documented because someone hopes it will
be cleaned up, but it is a useful syntax for dumping the table if
someone suspect the structure has been corrupted.  I will now present
my analysis in the hope that someone will find it useful.

Basically the data is stored in 64 blocks (of 'depth' 1) each for 2^16
characters, which in turn are composed of 16 blocks (of 'depth' 2) each
for 2^12 characters, which in turn are composed of 32 blocks (of 'depth'
3) each for 128 characters.  These blocks are the 'sub-char-tables', and
are introduced as a vector with two prepended items - the depth and the
first character code.  If all the data in a block is the same, that
same value replaces its sub-char-table.  (That happens with the
Unicode Arabic Block, which is covered by two sub-char-tables.)  This
structure is, eminently sensibly, hidden from the lisp interfaces.  The
sub-char-tables' syntax is basically

#^^[depth min_char ...]

where the ellipsis is the values at the lower level.

I suspect that the char-table syntax is basically

#^^[default parent purpose ascii_block ...]

but I haven't verified the order of those first four values, and indeed
I may have them wrong.

(In case anyone is wondering, the Emacs code space consists of 64
planes, rather than Unicode's 'measly' 17.)

Richard.



  reply	other threads:[~2022-02-21  1:39 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-20 11:09 Manually parsing char-tables Richard Wordingham
2022-02-20 12:50 ` Eli Zaretskii
2022-02-21  1:39   ` Richard Wordingham [this message]
2022-02-26  0:28   ` Composed Sequences (was: Manually parsing char-tables) Richard Wordingham
2022-02-26  6:33     ` Eli Zaretskii
2022-02-26 15:11       ` Composed Sequences Richard Wordingham
2022-02-26 15:35         ` Eli Zaretskii
2022-02-26 19:46           ` Richard Wordingham
2022-02-26 20:02             ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220221013941.7e97dba1@JRWUBU2 \
    --to=richard.wordingham@ntlworld.com \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).