unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Kenichi Handa <handa@m17n.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 11073@debbugs.gnu.org
Subject: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences
Date: Mon, 26 Mar 2012 16:45:56 +0900	[thread overview]
Message-ID: <tl7limnls97.fsf@m17n.org> (raw)
In-Reply-To: <837gybupdf.fsf@gnu.org> (message from Eli Zaretskii on Fri, 23 Mar 2012 20:46:36 +0200)

In article <837gybupdf.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > Why do we need this unification?  Or rather, why do we need multiple
> > codepoints, which then forces us to unify them?

> That's something Handa-san (CC'ed) will be able to explain much better
> than I ever could.

It's a long story.  When I designed emacs-unicode (the
version before merged to the trunk, more than 10 years ago),
the unification maps of CJK charsets to Unicode were not
stable.  In addtion, there were various conflicting policies
on which character to unify to which character.  One reason
of this confusion was that Unicode itself didn't define
mapping to/from such CJK charsets (JIS, GB, KSC).

The unification problem is not only for Ideographic
characters.  Many CJK charsets contain, for instance,
full-width version of Greek characters, but Unicode doesn't
distinguish them from single-width versions (though Unicode
has full-width version of 'A'..'Z', etc).  There were people
who wanted to distinguish full-width Greek chars from
single-width chars.

There also were people who have a text of iso-2022-7bit file
which distinguishes characters of GB charset and JIS
charset.  To edit such a file and write it back as the
original one, one has to disable unification of one of GB
and JIS (or both of them).

So, I decided at that time to give each CJK charset unique
code space (above #x110000) in Emacs, and allow users to
freely unify/disunify them to Unicode code space (below
#x110000) by giving the function unify-charset.

FYI, http://www.unicode.org/reports/tr38/ tells some
difficulty of mappings.

> AFAIU, there are good reasons to have some CJK
> characters on separate codepoints, because they need to be treated
> differently from their Unicode codepoints (perhaps a different choice
> of font to display them?)

That was one reaons, but the current code pay attention to
`charset' text property of each character to select a proper
font.

---
Kenichi Handa
handa@m17n.org





  reply	other threads:[~2012-03-26  7:45 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-23 10:55 bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences Eli Zaretskii
2012-03-23 12:35 ` Eli Zaretskii
2012-03-23 14:27   ` Stefan Monnier
2012-03-23 15:58     ` Eli Zaretskii
2012-03-23 17:34       ` Stefan Monnier
2012-03-23 18:46         ` Eli Zaretskii
2012-03-26  7:45           ` Kenichi Handa [this message]
2012-03-26 12:23             ` Stefan Monnier
2012-03-29  5:19               ` Kenichi Handa
2012-03-29 16:04                 ` Stefan Monnier
2012-04-03  2:22                   ` Kenichi Handa
2012-04-03  4:22                     ` Stefan Monnier
2012-04-03  5:55                       ` Kenichi Handa
2012-04-03 13:02                         ` Stefan Monnier
2012-04-04  0:07                           ` Kenichi Handa
2012-04-04  1:17                             ` Stefan Monnier
2012-04-06  1:13                               ` Kenichi Handa
2012-04-06 13:13                                 ` Eli Zaretskii
2012-04-09  4:14                                   ` Kenichi Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=tl7limnls97.fsf@m17n.org \
    --to=handa@m17n.org \
    --cc=11073@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).