unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: "Stephen J. Turnbull" <stephen@xemacs.org>
Cc: handa@m17n.org, emacs-devel@gnu.org
Subject: Re: bidi properties from uniprop tables
Date: Fri, 19 Aug 2011 13:36:22 +0300	[thread overview]
Message-ID: <8339gxsn6h.fsf@gnu.org> (raw)
In-Reply-To: <87fwkxlq29.fsf@uwakimon.sk.tsukuba.ac.jp>

> From: "Stephen J. Turnbull" <stephen@xemacs.org>
> Cc: emacs-devel@gnu.org,
>     handa@m17n.org
> Date: Fri, 19 Aug 2011 18:15:58 +0900
> 
>     Unassigned characters are given strong types in the
>     algorithm. This is an explicit exception to the general Unicode
>     conformance requirements with respect to unassigned characters. As
>     characters become assigned in the future, these bidirectional
>     types may change. For assignments to character types, see
>     DerivedBidiClass.txt [DerivedBIDI] in the [UCD].

Thanks, I've managed to miss that addition to the UBA.

> Since Bidi_Class is only used in this algorithm (and explicit property
> lookups) AFAIK

That's not true, it is also used in regexp search by category.  So we
should decide whether to assign these types in the uniprop table, or
have a fallback for them in bidi.c.  Any opinions?  Handa-san?

> it seems reasonable to me that get-char-code-property
> et amis should return the "strong type" specified by DerivedBIDI
> (which is LTR it seems, but you should check that).

No, the type depends on the block:

  # Unlike other properties, unassigned code points in blocks
  # reserved for right-to-left scripts are given either types R or AL.
  #
  # The unassigned code points that default to AL are in the ranges:
  #     [\u0600-\u07BF \uFB50-\uFDFF \uFE70-\uFEFF]
  #
  #     Arabic:            U+0600  -  U+06FF
  #     Syriac:            U+0700  -  U+074F
  #     Arabic_Supplement: U+0750  -  U+077F
  #     Thaana:            U+0780  -  U+07BF
  #     Arabic_Presentation_Forms_A:
  #                        U+FB50  -  U+FDFF
  #     Arabic_Presentation_Forms_B:
  #                        U+FE70  -  U+FEFF
  #           minus noncharacter code points.
  #
  # The unassigned code points that default to R are in the ranges:
  #     [\u0590-\u05FF \u07C0-\u08FF \uFB1D-\uFB4F \U00010800-\U00010FFF \U0001E800-\U0001EFFF]
  #
  #     Hebrew:            U+0590  -  U+05FF
  #     NKo:               U+07C0  -  U+07FF
  #     Cypriot_Syllabary: U+10800 - U+1083F
  #     Phoenician:        U+10900 - U+1091F
  #     Lydian:            U+10920 - U+1093F
  #     Kharoshthi:        U+10A00 - U+10A5F
  #     and any others in the ranges:
  #                        U+0800  -  U+08FF,
  #                        U+FB1D  -  U+FB4F,
  #                        U+10840 - U+10FFF,
  #                        U+1E800 - U+1EFFF
  #
  # For all other cases:

  #  All code points not explicitly listed for Bidi_Class
  #  have the value Left_To_Right (L).



  reply	other threads:[~2011-08-19 10:36 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-18 19:06 bidi properties from uniprop tables Eli Zaretskii
2011-08-19  4:44 ` Stephen J. Turnbull
2011-08-19  6:43   ` Eli Zaretskii
2011-08-19  9:15     ` Stephen J. Turnbull
2011-08-19 10:36       ` Eli Zaretskii [this message]
2011-08-19 12:10         ` Stephen J. Turnbull
2011-08-20 12:42         ` Kenichi Handa
2011-08-20 13:00           ` Eli Zaretskii
2011-08-23 12:51   ` Kenichi Handa
2011-08-23 14:49     ` Eli Zaretskii
2011-08-23 23:36       ` Kenichi Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8339gxsn6h.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=handa@m17n.org \
    --cc=stephen@xemacs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).