From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#20789: Invalid script or charset name: cuneiform-numbers-and-punctuation Date: Fri, 12 Jun 2015 11:28:09 +0300 Message-ID: <83y4jpqqjq.fsf@gnu.org> References: <21zj45kiix.fsf@fencepost.gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-Trace: ger.gmane.org 1434097765 9454 80.91.229.3 (12 Jun 2015 08:29:25 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 12 Jun 2015 08:29:25 +0000 (UTC) Cc: 20789@debbugs.gnu.org To: Glenn Morris Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Jun 12 10:29:12 2015 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Z3KKs-0006d1-58 for geb-bug-gnu-emacs@m.gmane.org; Fri, 12 Jun 2015 10:29:10 +0200 Original-Received: from localhost ([::1]:49987 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z3KKr-000782-L1 for geb-bug-gnu-emacs@m.gmane.org; Fri, 12 Jun 2015 04:29:09 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:56991) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z3KKn-00077n-Pn for bug-gnu-emacs@gnu.org; Fri, 12 Jun 2015 04:29:06 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z3KKk-0003Vo-Ho for bug-gnu-emacs@gnu.org; Fri, 12 Jun 2015 04:29:05 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:36791) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z3KKk-0003Ve-FO for bug-gnu-emacs@gnu.org; Fri, 12 Jun 2015 04:29:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1Z3KKk-0000Nr-70 for bug-gnu-emacs@gnu.org; Fri, 12 Jun 2015 04:29:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 12 Jun 2015 08:29:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20789 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 20789-submit@debbugs.gnu.org id=B20789.14340977041419 (code B ref 20789); Fri, 12 Jun 2015 08:29:02 +0000 Original-Received: (at 20789) by debbugs.gnu.org; 12 Jun 2015 08:28:24 +0000 Original-Received: from localhost ([127.0.0.1]:51250 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z3KK7-0000Mn-JQ for submit@debbugs.gnu.org; Fri, 12 Jun 2015 04:28:24 -0400 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:42347) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z3KK5-0000MX-0t for 20789@debbugs.gnu.org; Fri, 12 Jun 2015 04:28:22 -0400 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0NPT00500OTNVV00@a-mtaout22.012.net.il> for 20789@debbugs.gnu.org; Fri, 12 Jun 2015 11:28:14 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NPT005OAOV1SU30@a-mtaout22.012.net.il>; Fri, 12 Jun 2015 11:28:14 +0300 (IDT) In-reply-to: X-012-Sender: halo1@inter.net.il X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:103856 Archived-At: > From: Glenn Morris > Date: Thu, 11 Jun 2015 18:24:06 -0400 > > Glenn Morris wrote: > > > Error (initialization): Creation of the default fontsets failed: (error > > Invalid script or charset name: cuneiform-numbers-and-punctuation) > > I fixed a typo that seems to have caused that. Sorry about that. > I don't suppose that big list can be auto-generated from the inputs? It's not trivial. I describe below some of the issues, in the hope that Someoneā„¢ will volunteer: . Most of the script names come from the corresponding Unicode blocks, with trivial transformations (downcase words and replace blanks with a hyphen). So basically, we will need to use the information in Blocks.txt, a file that is part of the Unicode Character Database (UCD), but with quirks described below. . The first quirk is that we lump together all the blocks that belong to the same script, like "Basic Latin", "Latin Extended-A", "Latin-1 Supplement", etc. -- these all go to the single script called 'latin'. Likewise with other similar blocks that are either "SOMETHING Extended" or "Supplement" or whatever. . The second quirk is with the CJK characters: those are divided into several broad scripts like 'han', 'kana', and 'cjk-misc' whose exact rules I don't know. . The third quirk is with the 'symbol' pseudo-script: we lump there all punctuation characters and all symbol characters (those for which the General Category is one of Pc, Pd, Ps, Pe, Pi, Pf, Po, Sm, Sc, Sk, So), but with the following notable exception: punctuation characters that belong to blocks that include non-punctuation characters are left in those blocks -- those are punctuation characters used only with the scripts named by those blocks, like U+05BE HEBREW PUNCTUATION MAQAF, which is only used by the Hebrew script. . Another quirk is that mathematical alphanumerics (which are just letters from the Unicode POV) are lumped into a separate script 'mathematical'. Alternatively, one could use Scripts.txt from the UCD, and then the only problem is to subdivide what they call "Common" into the scripts we use. For the general category of a character, one can do in Emacs: (get-char-code-property CHAR 'general-category) Alternatively, one can search UnicodeData.txt directly: the General Category is the 3rd field there. Patches are welcome to do all of the above automatically, perhaps with some small database that expresses the more tricky of the above rules.