From: Stefan Monnier <monnier@iro.umontreal.ca>
To: Kenichi Handa <handa@m17n.org>
Cc: 11073@debbugs.gnu.org
Subject: bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences
Date: Tue, 03 Apr 2012 00:22:32 -0400 [thread overview]
Message-ID: <jwvwr5xwimc.fsf-monnier+INBOX@gnu.org> (raw)
In-Reply-To: <tl74nt1d068.fsf@m17n.org> (Kenichi Handa's message of "Tue, 03 Apr 2012 11:22:23 +0900")
>> > Usually, yes. But as far as there is a code space in high
>> > area for a CJK charset, it is unavoidable to have a
>> > buffer/string that contains a character represented by a
>> > byte sequence in that high area as the test case of
>> > Bug#11073. And, as "unification" means to treat such a
>> > character the same way as the unified character, I thought
>> > they both have the same character code.
>> Since there are two internal byte-sequence representation, I don't see
>> any good reason why we shouldn't have 2 internal int representations.
>> I.e. if unification failed for the byte-sequence (which might be the
>> result of a bug, for all I know), we may as well keep them non-unified
>> in the int representation.
> Please note that not all characters in the code-space of a
> CJK charset are unified. For instance, Big5 has it's own
> PUA (private use area), and characters in PUA are not
> unified by default. So, if Emacs reads a Big5 file that
> contains PUA chars, those chars stay in high-area. Then,
> one can provide his own unification map that also maps PUA
> chars to some Unicode chars as this:
> (unify-charset 'big5 "MyBig5.map")
> After this, I thought that previously read PUA chars staying
> in the high-area should be treated as the corresponding
> Unicode chars (in displaying, search, etc).
But again, this unification takes place during decoding. Whereas what
I'm talking about takes place when reading the internal utf-8
representation, which should be already unified.
Stefan
next prev parent reply other threads:[~2012-04-03 4:22 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-23 10:55 bug#11073: 24.0.94; BIDI-related crash in redisplay with certain byte sequences Eli Zaretskii
2012-03-23 12:35 ` Eli Zaretskii
2012-03-23 14:27 ` Stefan Monnier
2012-03-23 15:58 ` Eli Zaretskii
2012-03-23 17:34 ` Stefan Monnier
2012-03-23 18:46 ` Eli Zaretskii
2012-03-26 7:45 ` Kenichi Handa
2012-03-26 12:23 ` Stefan Monnier
2012-03-29 5:19 ` Kenichi Handa
2012-03-29 16:04 ` Stefan Monnier
2012-04-03 2:22 ` Kenichi Handa
2012-04-03 4:22 ` Stefan Monnier [this message]
2012-04-03 5:55 ` Kenichi Handa
2012-04-03 13:02 ` Stefan Monnier
2012-04-04 0:07 ` Kenichi Handa
2012-04-04 1:17 ` Stefan Monnier
2012-04-06 1:13 ` Kenichi Handa
2012-04-06 13:13 ` Eli Zaretskii
2012-04-09 4:14 ` Kenichi Handa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=jwvwr5xwimc.fsf-monnier+INBOX@gnu.org \
--to=monnier@iro.umontreal.ca \
--cc=11073@debbugs.gnu.org \
--cc=handa@m17n.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).