From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Glenn Morris Newsgroups: gmane.emacs.bugs Subject: bug#20789: Invalid script or charset name: cuneiform-numbers-and-punctuation Date: Sat, 20 Jun 2015 19:34:01 -0400 Message-ID: <6pp4qlzti.fsf@fencepost.gnu.org> References: <21zj45kiix.fsf@fencepost.gnu.org> <83y4jpqqjq.fsf@gnu.org> <834mm7ogv3.fsf@gnu.org> <4cegla7rnj.fsf@fencepost.gnu.org> <83eglamha2.fsf@gnu.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1434843321 7488 80.91.229.3 (20 Jun 2015 23:35:21 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 20 Jun 2015 23:35:21 +0000 (UTC) Cc: 20789@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sun Jun 21 01:35:11 2015 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Z6SI2-0001vi-8E for geb-bug-gnu-emacs@m.gmane.org; Sun, 21 Jun 2015 01:35:10 +0200 Original-Received: from localhost ([::1]:34870 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z6SI1-0001by-D8 for geb-bug-gnu-emacs@m.gmane.org; Sat, 20 Jun 2015 19:35:09 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:37716) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z6SHw-0001aB-Ct for bug-gnu-emacs@gnu.org; Sat, 20 Jun 2015 19:35:05 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z6SHu-00021O-Qm for bug-gnu-emacs@gnu.org; Sat, 20 Jun 2015 19:35:04 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:51541) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z6SHu-00020x-N7 for bug-gnu-emacs@gnu.org; Sat, 20 Jun 2015 19:35:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1Z6SHu-0001Iu-9d for bug-gnu-emacs@gnu.org; Sat, 20 Jun 2015 19:35:02 -0400 X-Loop: help-debbugs@gnu.org In-Reply-To: <21zj45kiix.fsf@fencepost.gnu.org> Resent-From: Glenn Morris Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 20 Jun 2015 23:35:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20789 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 20789-submit@debbugs.gnu.org id=B20789.14348432534946 (code B ref 20789); Sat, 20 Jun 2015 23:35:02 +0000 Original-Received: (at 20789) by debbugs.gnu.org; 20 Jun 2015 23:34:13 +0000 Original-Received: from localhost ([127.0.0.1]:52987 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z6SH6-0001Hh-Cr for submit@debbugs.gnu.org; Sat, 20 Jun 2015 19:34:12 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:52367) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z6SH3-0001HT-7C for 20789@debbugs.gnu.org; Sat, 20 Jun 2015 19:34:09 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z6SGw-0001WA-Et for 20789@debbugs.gnu.org; Sat, 20 Jun 2015 19:34:03 -0400 Original-Received: from fencepost.gnu.org ([208.118.235.10]:35529) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z6SGw-0001Vs-B8 for 20789@debbugs.gnu.org; Sat, 20 Jun 2015 19:34:02 -0400 Original-Received: from rgm by fencepost.gnu.org with local (Exim 4.82) (envelope-from ) id 1Z6SGv-0003ww-Q3; Sat, 20 Jun 2015 19:34:01 -0400 X-Spook: CIDA Gazprom Border Patrol Tony Blair Dock Soviet X-Ran: o$"Z!Xw_D#rY2GFBBl*#nhsZ-h;9("_4+#Sr`-Z=Y89d?&:{A%~tpvaIBmzGF=L4N]-b{n X-Hue: yellow X-Attribution: GM User-Agent: Gnus (www.gnus.org), GNU Emacs (www.gnu.org/software/emacs/) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:104164 Archived-At: I spent some time looking at some of these. In no case could I see a clear path from the inputs to the outputs. Eli Zaretskii wrote: > . characters.el: > > . The modify-category-entry calls -- they basically can be derived > from Blocks.txt I looked at it briefly. I can see that they are somewhat related, but not precisely how. Eg: Emacs: 2E80:312F and 3190:33FF are "line breakable". Which means that "Hangul Compatibility Jamo" isn't. I have no idea why. Emacs: 3400:4DBF and 4E00:9FAF are "2-byte han". Which means that "Yijing Hexagram Symbols" aren't. Again, I have no idea why. I didn't look any further. > . The modify-syntax-entry and set-case-syntax calls can be derived > from the values of the 'general-category' property returned by > 'get-char-code-property', perhaps augmented by 'paired-bracket' > and 'paired-type' properties I didn't look at this yet. > . The set-case-syntax-pair calls (perhaps use the data in > CaseFolding.txt, or even the case mapping information in > UnicodeData.txt) I didn't look at this yet. > . The setup of char-width-table -- I think the information is in > EastAsianWidth.txt, with background information described in > UAX#11 (http://www.unicode.org/reports/tr11/) Looks somewhat promising, but could you be more specific? There's nothing in that file that defines "zero width" characters, so I don't see where Emacs's width 0 characters come from. The width 2 characters look like they might be the "W" and "F" characters, but just doing that gives a list that has many differences to the list Emacs uses. > . The setup of char-acronym-table: at least some of the data is in > NameAliases.txt and NameList.txt Looks somewhat promising. I can see how most of this comes from NameAliases.txt. But there are many oddities: Why does Emacs not have anything for 0009 (HT or TAB) or 000A (LF, NL, or EOF)? 0019 is EOM in the source but EM in Emacs. 0080 is PAD in the source but XXX in Emacs. 0081 is HOP in the source but XXX in Emacs. 008F is SS3 in the source but SS1 in Emacs. 0099 is SGC in the source but XXX in Emacs. How does Emacs choose which entries to list? There are many more in the source. Could it do any harm to add more? Where does "KIVAQ" come from? That appears nowhere in the source AFAICS. Why does Emacs list two Khmer entries, and nothing else? There are loads more of them. > . fontset.el: > > . The setup of script-representative-chars I don't see how. It seems to be "for some of, but not all, the entries in char-script-table, choose a single character somewhere in the range." There seems to be no pattern to how the character is chosen within the range. Often the first one, but by no means always. > . mule-cmds.el: > > . The setting of locale-language-names -- the data is available in > IANA's Language Subtag Registry > (http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry) > and in ISO 639-2 (http://www.loc.gov/standards/iso639-2/, > http://www.loc.gov/standards/iso639-2/php/English_list.php) Again, I don't see how. Eg nowhere in those source files do I see Welsh associated with iso-8859-14, and the comment in mule-cmds says that the last part is "implementation dependent". > P.S. It would be good to add to somewhere (admin/make-tarball.txt?) a > reminder to fetch all those reference files and regenerate their > dependencies, before we prepare a release. admin/FOR-RELEASE contains that kind of thing.