From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Richard Wordingham Newsgroups: gmane.emacs.help Subject: Re: Manually parsing char-tables Date: Mon, 21 Feb 2022 01:39:41 +0000 Message-ID: <20220221013941.7e97dba1@JRWUBU2> References: <20220220110926.25c675be@JRWUBU2> <835yp9ya4x.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="21706"; mail-complaints-to="usenet@ciao.gmane.io" To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Mon Feb 21 02:41:21 2022 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nLxhZ-0005UP-Lk for geh-help-gnu-emacs@m.gmane-mx.org; Mon, 21 Feb 2022 02:41:21 +0100 Original-Received: from localhost ([::1]:55592 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nLxhY-0005Xf-Bs for geh-help-gnu-emacs@m.gmane-mx.org; Sun, 20 Feb 2022 20:41:20 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:42492) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nLxgW-0005XS-Lc for help-gnu-emacs@gnu.org; Sun, 20 Feb 2022 20:40:16 -0500 Original-Received: from smtpq1.tb.ukmail.iss.as9143.net ([212.54.57.96]:38666) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nLxgL-0002Yd-4Q for help-gnu-emacs@gnu.org; Sun, 20 Feb 2022 20:40:14 -0500 Original-Received: from [212.54.57.105] (helo=csmtp1.tb.ukmail.iss.as9143.net) by smtpq1.tb.ukmail.iss.as9143.net with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nLxg3-0008Gn-4o for help-gnu-emacs@gnu.org; Mon, 21 Feb 2022 02:39:47 +0100 Original-Received: from JRWUBU2 ([82.27.122.109]) by cmsmtp with ESMTP id Lxg2nCwhBI8uBLxg2nk31h; Mon, 21 Feb 2022 02:39:47 +0100 X-SourceIP: 82.27.122.109 X-Spam: 0 X-Authority: v=2.4 cv=Oupcdgzt c=1 sm=1 tr=0 ts=6212ed63 cx=a_exe a=lZfnwhydZ+7bl6OdZ0zTBw==:117 a=lZfnwhydZ+7bl6OdZ0zTBw==:17 a=kj9zAlcOel0A:10 a=oGFeUVbbRNcA:10 a=mDV3o1hIAAAA:8 a=NLZqzBF-AAAA:8 a=WNFbFmWUdm2P1ypH0_oA:9 a=CjuIK1q_8ugA:10 a=_FVE-zBwftR9WsbkzFJk:22 a=wW_WBVUImv98JQXhvVPZ:22 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ntlworld.com; s=meg.feb2017; t=1645407587; bh=HEI3TxUXsm4rYpIA0jM7FkduQMqNRmvAlrwPf30E4uw=; h=Date:From:To:Subject:In-Reply-To:References; b=q9yCVN6EZgEH+3WYlj1Z7DvuT4t6oyL/O2j9PZM6+r9ibmrKA4a0p/yqSsEnDWMSp gQ3+rfvC6p7QdN5fG/fJ1IXPZyUWp3NGt+HTgrDw2fovXqwP+oUcPkS8X1iHXsmD1j nkMfqc8IBnSJiBRNVuoaaOC/edEIlOLFC/sE1O5IenzaJ9B2uuG1F9vlwSccBwf9n+ SF2/CkN5YBb6pvn50MM4SvMlaERL1SBGtXDodOtBzKRBLuFgVL9tFsdJLaMNIbOas2 eOTHTge+n+4orsTFAWetGQ9j2asa2SPC9R6hKe3hp6fXqkVGtcL7+zESef0CwimE1O FyQMb5qxbnbcA== In-Reply-To: <835yp9ya4x.fsf@gnu.org> X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu) X-CMAE-Envelope: MS4xfM8vwOa1cpqaRnL32L9ssyl/Kf4ea2RMYF0E7oiOF7AcLYMtsrBoOueuaj6zNFBVvGVU8GEl4OvV9efMsTExC5unFAws0tF+EOLtJrQW3IhBfKVF8Usg R9oGLXW0VLQb23abtrDkFpRk/KnAKLiUcCl26zMkPvHdY7W3Fw7IEbqyzuwB59fVe7452boMsPgxnt3iJsLfSHQKw+Xj7UoQ0kM= Received-SPF: pass client-ip=212.54.57.96; envelope-from=richard.wordingham@ntlworld.com; helo=smtpq1.tb.ukmail.iss.as9143.net X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.io gmane.emacs.help:136097 Archived-At: On Sun, 20 Feb 2022 14:50:54 +0200 Eli Zaretskii wrote: > > Date: Sun, 20 Feb 2022 11:09:26 +0000 > > From: Richard Wordingham > > > > I am trying to understand how Arabic script rendering works in Emacs > > 28.0.90, as it seems to be using a different mechanism to that used > > for Indic or European scripts. (There seems to be more to it than > > just the asymmetries between right-to-left and left-to-right.) To > > that end, I am trying to understand the contents of the variable > > composition-function-table. > > I think it is easier to just look at how the Arabic part of this table > is populated. See lisp/language/misc-lang.el starting from line 105. I first wanted to check that it was overwritten somewhere else. > > #^^[3 1152 nil nil nil #1# #1# #1# #1# #1# #1# #1# nil nil nil > > nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil > > nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil > > nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil > > nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil > > nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil > > nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil > > nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil > > nil nil nil nil] > > > > (I've converted lines to paragraphs and abbreviated leading white > > space.) > > > > I'm guessing that #1# is a macro invocation; when I invoke (print > > composition-function-table), I get something similar, but with #1# > > expanded and the '#1=' in the apparent macro definition omitted. > > #1# is a backreference to the value indicated by #1=. > > > Where is this syntax explained? I've looked in the elisp manual, > > but not found it, though I may simply have failed to guess where > > such a description was. > > See the node "Circular Objects" there. That was reassuring - but I'm wondering why it was not familiar. Had I forgotten it? Perhaps it's later then Emacs 19, when I last came close to reading the lisp reference manual cover to cover. Even the read syntax of a char-table is poorly documented. Using the hint of an unexpanded reference to a 'sub-char-table', I've discovered that the first key to understanding it is in list.h, and I may have to delve into the .c files for the finer details. It looks full of tricks to reduce the storage requirement, which are reflected in the read syntax. Perhaps it's not been documented because someone hopes it will be cleaned up, but it is a useful syntax for dumping the table if someone suspect the structure has been corrupted. I will now present my analysis in the hope that someone will find it useful. Basically the data is stored in 64 blocks (of 'depth' 1) each for 2^16 characters, which in turn are composed of 16 blocks (of 'depth' 2) each for 2^12 characters, which in turn are composed of 32 blocks (of 'depth' 3) each for 128 characters. These blocks are the 'sub-char-tables', and are introduced as a vector with two prepended items - the depth and the first character code. If all the data in a block is the same, that same value replaces its sub-char-table. (That happens with the Unicode Arabic Block, which is covered by two sub-char-tables.) This structure is, eminently sensibly, hidden from the lisp interfaces. The sub-char-tables' syntax is basically #^^[depth min_char ...] where the ellipsis is the values at the lower level. I suspect that the char-table syntax is basically #^^[default parent purpose ascii_block ...] but I haven't verified the order of those first four values, and indeed I may have them wrong. (In case anyone is wondering, the Emacs code space consists of 64 planes, rather than Unicode's 'measly' 17.) Richard.