From: Eli Zaretskii <eliz@gnu.org>
To: Glenn Morris <rgm@gnu.org>
Cc: 20789@debbugs.gnu.org
Subject: bug#20789: Invalid script or charset name: cuneiform-numbers-and-punctuation
Date: Fri, 12 Jun 2015 11:28:09 +0300 [thread overview]
Message-ID: <83y4jpqqjq.fsf@gnu.org> (raw)
In-Reply-To: <rek2v93mux.fsf@fencepost.gnu.org>
> From: Glenn Morris <rgm@gnu.org>
> Date: Thu, 11 Jun 2015 18:24:06 -0400
>
> Glenn Morris wrote:
>
> > Error (initialization): Creation of the default fontsets failed: (error
> > Invalid script or charset name: cuneiform-numbers-and-punctuation)
>
> I fixed a typo that seems to have caused that.
Sorry about that.
> I don't suppose that big list can be auto-generated from the inputs?
It's not trivial. I describe below some of the issues, in the hope
that Someone™ will volunteer:
. Most of the script names come from the corresponding Unicode
blocks, with trivial transformations (downcase words and replace
blanks with a hyphen). So basically, we will need to use the
information in Blocks.txt, a file that is part of the Unicode
Character Database (UCD), but with quirks described below.
. The first quirk is that we lump together all the blocks that
belong to the same script, like "Basic Latin", "Latin Extended-A",
"Latin-1 Supplement", etc. -- these all go to the single script
called 'latin'. Likewise with other similar blocks that are
either "SOMETHING Extended" or "Supplement" or whatever.
. The second quirk is with the CJK characters: those are divided
into several broad scripts like 'han', 'kana', and 'cjk-misc'
whose exact rules I don't know.
. The third quirk is with the 'symbol' pseudo-script: we lump there
all punctuation characters and all symbol characters (those for
which the General Category is one of Pc, Pd, Ps, Pe, Pi, Pf, Po,
Sm, Sc, Sk, So), but with the following notable exception:
punctuation characters that belong to blocks that include
non-punctuation characters are left in those blocks -- those are
punctuation characters used only with the scripts named by those
blocks, like U+05BE HEBREW PUNCTUATION MAQAF, which is only used
by the Hebrew script.
. Another quirk is that mathematical alphanumerics (which are just
letters from the Unicode POV) are lumped into a separate script
'mathematical'.
Alternatively, one could use Scripts.txt from the UCD, and then the
only problem is to subdivide what they call "Common" into the scripts
we use.
For the general category of a character, one can do in Emacs:
(get-char-code-property CHAR 'general-category)
Alternatively, one can search UnicodeData.txt directly: the General
Category is the 3rd field there.
Patches are welcome to do all of the above automatically, perhaps with
some small database that expresses the more tricky of the above rules.
next prev parent reply other threads:[~2015-06-12 8:28 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-11 22:05 bug#20789: Invalid script or charset name: cuneiform-numbers-and-punctuation Glenn Morris
2015-06-11 22:24 ` Glenn Morris
2015-06-12 8:28 ` Eli Zaretskii [this message]
2015-06-16 0:22 ` Glenn Morris
2015-06-16 14:41 ` Eli Zaretskii
2015-06-17 6:52 ` Glenn Morris
2015-06-17 16:27 ` Eli Zaretskii
2015-06-20 23:34 ` Glenn Morris
2015-06-21 15:00 ` Eli Zaretskii
2015-06-27 2:02 ` Glenn Morris
2015-06-27 7:42 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=83y4jpqqjq.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=20789@debbugs.gnu.org \
--cc=rgm@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.