all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Glenn Morris <rgm@gnu.org>
Cc: 20789@debbugs.gnu.org
Subject: bug#20789: Invalid script or charset name:	cuneiform-numbers-and-punctuation
Date: Fri, 12 Jun 2015 11:28:09 +0300	[thread overview]
Message-ID: <83y4jpqqjq.fsf@gnu.org> (raw)
In-Reply-To: <rek2v93mux.fsf@fencepost.gnu.org>

> From: Glenn Morris <rgm@gnu.org>
> Date: Thu, 11 Jun 2015 18:24:06 -0400
> 
> Glenn Morris wrote:
> 
> >   Error (initialization): Creation of the default fontsets failed: (error
> >   Invalid script or charset name: cuneiform-numbers-and-punctuation)
> 
> I fixed a typo that seems to have caused that.

Sorry about that.

> I don't suppose that big list can be auto-generated from the inputs?

It's not trivial.  I describe below some of the issues, in the hope
that Someone™ will volunteer:

  . Most of the script names come from the corresponding Unicode
    blocks, with trivial transformations (downcase words and replace
    blanks with a hyphen).  So basically, we will need to use the
    information in Blocks.txt, a file that is part of the Unicode
    Character Database (UCD), but with quirks described below.

  . The first quirk is that we lump together all the blocks that
    belong to the same script, like "Basic Latin", "Latin Extended-A",
    "Latin-1 Supplement", etc. -- these all go to the single script
    called 'latin'.  Likewise with other similar blocks that are
    either "SOMETHING Extended" or "Supplement" or whatever.

  . The second quirk is with the CJK characters: those are divided
    into several broad scripts like 'han', 'kana', and 'cjk-misc'
    whose exact rules I don't know.

  . The third quirk is with the 'symbol' pseudo-script: we lump there
    all punctuation characters and all symbol characters (those for
    which the General Category is one of Pc, Pd, Ps, Pe, Pi, Pf, Po,
    Sm, Sc, Sk, So), but with the following notable exception:
    punctuation characters that belong to blocks that include
    non-punctuation characters are left in those blocks -- those are
    punctuation characters used only with the scripts named by those
    blocks, like U+05BE HEBREW PUNCTUATION MAQAF, which is only used
    by the Hebrew script.

  . Another quirk is that mathematical alphanumerics (which are just
    letters from the Unicode POV) are lumped into a separate script
    'mathematical'.

Alternatively, one could use Scripts.txt from the UCD, and then the
only problem is to subdivide what they call "Common" into the scripts
we use.

For the general category of a character, one can do in Emacs:

      (get-char-code-property CHAR 'general-category)

Alternatively, one can search UnicodeData.txt directly: the General
Category is the 3rd field there.

Patches are welcome to do all of the above automatically, perhaps with
some small database that expresses the more tricky of the above rules.





  reply	other threads:[~2015-06-12  8:28 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-11 22:05 bug#20789: Invalid script or charset name: cuneiform-numbers-and-punctuation Glenn Morris
2015-06-11 22:24 ` Glenn Morris
2015-06-12  8:28   ` Eli Zaretskii [this message]
2015-06-16  0:22     ` Glenn Morris
2015-06-16 14:41       ` Eli Zaretskii
2015-06-17  6:52         ` Glenn Morris
2015-06-17 16:27           ` Eli Zaretskii
2015-06-20 23:34             ` Glenn Morris
2015-06-21 15:00               ` Eli Zaretskii
2015-06-27  2:02                 ` Glenn Morris
2015-06-27  7:42                   ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83y4jpqqjq.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=20789@debbugs.gnu.org \
    --cc=rgm@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.