all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* what-cursor-position vs. Unicode
@ 2006-06-03  2:34 Dan Jacobson
  2006-06-03  8:09 ` Eli Zaretskii
  2006-06-05  7:01 ` Kenichi Handa
  0 siblings, 2 replies; 8+ messages in thread
From: Dan Jacobson @ 2006-06-03  2:34 UTC (permalink / raw)
  Cc: handa

Today we shall discuss what-cursor-position when given an argument of ^U.
We see that it gives Unicode information:
  character: Z (90, #o132, #x5a, U+005A)
Except when you really need it:
  character: 丹 (107109, #o321145, #x1a265)
It should mention U+4E39.
emacs-version "22.0.50.1"

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: what-cursor-position vs. Unicode
  2006-06-03  2:34 what-cursor-position vs. Unicode Dan Jacobson
@ 2006-06-03  8:09 ` Eli Zaretskii
  2006-06-05 23:17   ` Dan Jacobson
       [not found]   ` <mailman.2667.1149552104.9609.bug-gnu-emacs@gnu.org>
  2006-06-05  7:01 ` Kenichi Handa
  1 sibling, 2 replies; 8+ messages in thread
From: Eli Zaretskii @ 2006-06-03  8:09 UTC (permalink / raw)
  Cc: bug-gnu-emacs, handa

> From: Dan Jacobson <jidanni@jidanni.org>
> Date: Sat, 03 Jun 2006 10:34:59 +0800
> Cc: handa@etl.go.jp
> 
> It should mention U+4E39.
> emacs-version "22.0.50.1"

It does for me, at least when reading your mail.

Perhaps you should send the original file (as a binary attachment), or
describe how you produced the character, if it wasn't from a file.

Also, please tell when was your Emacs resync'ed with CVS, and please
try looking at the character in "emacs -Q".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: what-cursor-position vs. Unicode
  2006-06-03  2:34 what-cursor-position vs. Unicode Dan Jacobson
  2006-06-03  8:09 ` Eli Zaretskii
@ 2006-06-05  7:01 ` Kenichi Handa
  2006-06-05  8:22   ` Werner LEMBERG
  1 sibling, 1 reply; 8+ messages in thread
From: Kenichi Handa @ 2006-06-05  7:01 UTC (permalink / raw)
  Cc: bug-gnu-emacs

In article <87irnjklv0.fsf@jidanni.org>, Dan Jacobson <jidanni@jidanni.org> writes:

> Today we shall discuss what-cursor-position when given an argument of ^U.
> We see that it gives Unicode information:
>   character: Z (90, #o132, #x5a, U+005A)
> Except when you really need it:
>   character: 丹 (107109, #o321145, #x1a265)
> It should mention U+4E39.
> emacs-version "22.0.50.1"

#x1a265 is a character of chinese-cns11643-1, and the
current Emacs doesn't support Unicode mapping for that
character set.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: what-cursor-position vs. Unicode
  2006-06-05  7:01 ` Kenichi Handa
@ 2006-06-05  8:22   ` Werner LEMBERG
  2006-06-05 11:07     ` Kenichi Handa
  0 siblings, 1 reply; 8+ messages in thread
From: Werner LEMBERG @ 2006-06-05  8:22 UTC (permalink / raw)
  Cc: bug-gnu-emacs, jidanni


> #x1a265 is a character of chinese-cns11643-1, and the
> current Emacs doesn't support Unicode mapping for that
> character set.

Just wondering: Why not?


    Werner

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: what-cursor-position vs. Unicode
  2006-06-05  8:22   ` Werner LEMBERG
@ 2006-06-05 11:07     ` Kenichi Handa
  2006-06-09 15:09       ` Werner LEMBERG
  0 siblings, 1 reply; 8+ messages in thread
From: Kenichi Handa @ 2006-06-05 11:07 UTC (permalink / raw)
  Cc: bug-gnu-emacs, jidanni

In article <20060605.102222.112830788.wl@gnu.org>, Werner LEMBERG <wl@gnu.org> writes:

>> #x1a265 is a character of chinese-cns11643-1, and the
>> current Emacs doesn't support Unicode mapping for that
>> character set.

> Just wondering: Why not?

Because no one has implemented it.  I myself want to avoid
spending a time on what becomes useless in the future.  In
addition, in the current Emacs code, adding something like
lisp/international/subst-cns.el leads to slower startup in
CJK locales, which I want to avoid.

But, if someone implement it and Richard agrees on including
it before the release, please go ahead.

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: what-cursor-position vs. Unicode
  2006-06-03  8:09 ` Eli Zaretskii
@ 2006-06-05 23:17   ` Dan Jacobson
       [not found]   ` <mailman.2667.1149552104.9609.bug-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 8+ messages in thread
From: Dan Jacobson @ 2006-06-05 23:17 UTC (permalink / raw)
  Cc: handa

EZ> It does for me, at least when reading your mail.
Hmmm, me too, but not from a file.
EZ> Also, please tell when was your Emacs resync'ed with CVS
All I know beyond emacs-version "22.0.50.1" is I use Debian
emacs-snapshot 20060518-1.
KH> #x1a265 is a character of chinese-cns11643-1, and the
KH> current Emacs doesn't support Unicode mapping for that
KH> character set.
All I know is me and my Unicode UTF-8 char sitting in the file.
>> Just wondering: Why not?
KH> I myself want to avoid spending a time on what becomes useless in
KH> the future.
I see, there is some funny level of indirection that will be
eliminated in the future. Good.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: what-cursor-position vs. Unicode
  2006-06-05 11:07     ` Kenichi Handa
@ 2006-06-09 15:09       ` Werner LEMBERG
  0 siblings, 0 replies; 8+ messages in thread
From: Werner LEMBERG @ 2006-06-09 15:09 UTC (permalink / raw)
  Cc: emacs-devel

[-- Attachment #1: Type: Text/Plain, Size: 1209 bytes --]

> >> #x1a265 is a character of chinese-cns11643-1, and the
> >> current Emacs doesn't support Unicode mapping for that
> >> character set.
>
> > Just wondering: Why not?
>
> Because no one has implemented it.

I've sent a `subst-cns.el' file to you, Ken'ichi-san, and the
experimental diff for utf-8.el is attached.  A great deal of character
codes is larger than U+20000; this works just fine.

> I myself want to avoid spending a time on what becomes useless in
> the future.

Well, it was rather simple; I just wrote a small perl script to
extract the data from the Unihan.txt data base.  On the other hand, I
think it is *very* important to provide good conversion from and to
Unicode for all the charsets Emacs supports, thus it wasn't wasted
time IMHO.

> In addition, in the current Emacs code, adding something like
> lisp/international/subst-cns.el leads to slower startup in CJK
> locales, which I want to avoid.

Agreed -- my changes to utf-8.el don't take this into account.  What
about an additional `unicode' language environment which loads really
all mapping tables?

BTW, I suggest to set up a `Chinese-EUC-TW' language environment for
which `subst-cns.el' is loaded by default.


    Werner

[-- Attachment #2: utf-8.el.diff --]
[-- Type: Text/Plain, Size: 3445 bytes --]

--- utf-8.el.old	2005-10-15 07:43:43.000000000 +0200
+++ utf-8.el	2006-06-09 17:01:46.000000000 +0200
@@ -1,7 +1,7 @@
 ;;; utf-8.el --- UTF-8 decoding/encoding support -*- coding: iso-2022-7bit -*-
 
 ;; Copyright (C) 2001, 2002, 2003, 2004  Free Software Foundation, Inc.
-;; Copyright (C) 2001, 2002, 2003, 2004
+;; Copyright (C) 2001, 2002, 2003, 2004, 2006
 ;;   National Institute of Advanced Industrial Science and Technology (AIST)
 ;;   Registration Number H14PRO021
 
@@ -194,6 +194,10 @@
 
 (defconst utf-translate-cjk-charsets '(chinese-gb2312
 				       chinese-big5-1 chinese-big5-2
+				       chinese-cns11643-1 chinese-cns11643-2
+				       chinese-cns11643-3 chinese-cns11643-4
+				       chinese-cns11643-5 chinese-cns11643-6
+				       chinese-cns11643-7
 				       japanese-jisx0208 japanese-jisx0212
 				       katakana-jisx0201
 				       korean-ksc5601)
@@ -267,7 +271,9 @@
 	ucs-unicode-to-mule-cjk (make-hash-table :test 'eq)))
 
 (defcustom utf-translate-cjk-unicode-range '((#x2e80 . #xd7a3)
-					     (#xff00 . #xffef))
+					     (#xff00 . #xffef)
+					     (#x20000 . #x2a6df)
+					     (#x2f800 . #x2fa1f))
   "List of Unicode code ranges supported by `utf-translate-cjk-mode'.
 Setting this variable directly does not take effect;
 use either \\[customize] or the function
@@ -314,22 +320,26 @@
 	     (load "subst-jis")
 	     (load "subst-big5")
 	     (load "subst-gb2312")
-	     (load "subst-ksc"))
+	     (load "subst-ksc")
+	     (load "subst-cns"))
 	    ((string= "Chinese-BIG5" current-language-environment)
 	     (load "subst-jis")
 	     (load "subst-ksc")
 	     (load "subst-gb2312")
-	     (load "subst-big5"))
+	     (load "subst-big5")
+	     (load "subst-cns"))
 	    ((string= "Chinese-GB" current-language-environment)
 	     (load "subst-jis")
 	     (load "subst-ksc")
 	     (load "subst-big5")
-	     (load "subst-gb2312"))
+	     (load "subst-gb2312")
+	     (load "subst-cns"))
 	    (t
 	     (load "subst-ksc")
 	     (load "subst-gb2312")
 	     (load "subst-big5")
-	     (load "subst-jis")))) ; jis covers as much as big5, gb2312
+	     (load "subst-jis")
+	     (load "subst-cns")))) ; jis covers as much as big5, gb2312
 
     (when redefined
       (define-translation-hash-table 'utf-subst-table-for-decode
@@ -365,14 +375,22 @@
 zero or negative.  This is a minor mode.
 Enabling this allows the coding systems mule-utf-8,
 mule-utf-16le and mule-utf-16be to encode characters in the charsets
-`korean-ksc5601', `chinese-gb2312', `chinese-big5-1',
-`chinese-big5-2', `japanese-jisx0208' and `japanese-jisx0212', and to
-decode the corresponding unicodes into such characters.
+
+  korean-ksc5601
+  chinese-gb2312
+  chinese-big5-1 chinese-big5-2
+  chinese-cns11643-1 chinese-cns11643-2 chinese-cns11643-3
+  chinese-cns11643-4 chinese-cns11643-5 chinese-cns11643-6
+  chinese-cns11643-7
+  japanese-jisx0208 japanese-jisx0212
+
+and to decode the corresponding unicodes into such characters.
 
 Where the charsets overlap, the one preferred for decoding is chosen
 according to the language environment in effect when this option is
 turned on: ksc5601 for Korean, gb2312 for Chinese-GB, big5 for
-Chinese-Big5 and jisx for other environments.
+Chinese-Big5 and jisx for other environments.  The CNS charsets
+are always loaded last.
 
 This mode is on by default.  If you are not interested in CJK
 characters and want to avoid some overhead on encoding/decoding

[-- Attachment #3: Type: text/plain, Size: 142 bytes --]

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: what-cursor-position vs. Unicode
       [not found]   ` <mailman.2667.1149552104.9609.bug-gnu-emacs@gnu.org>
@ 2006-06-09 22:17     ` Miles Bader
  0 siblings, 0 replies; 8+ messages in thread
From: Miles Bader @ 2006-06-09 22:17 UTC (permalink / raw)
  Cc: bug-gnu-emacs, handa

Dan Jacobson <jidanni@jidanni.org> writes:
> KH> I myself want to avoid spending a time on what becomes useless in
> KH> the future.
>
> I see, there is some funny level of indirection that will be
> eliminated in the future. Good.

In the future (well actually right now, on a CVS branch) Emacs will use
a unicode internal representation, where obviously this sort of thing
will be easier...

-Miles
-- 
"Most attacks seem to take place at night, during a rainstorm, uphill,
 where four map sheets join."   -- Anon. British Officer in WW I

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-06-09 22:17 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-03  2:34 what-cursor-position vs. Unicode Dan Jacobson
2006-06-03  8:09 ` Eli Zaretskii
2006-06-05 23:17   ` Dan Jacobson
     [not found]   ` <mailman.2667.1149552104.9609.bug-gnu-emacs@gnu.org>
2006-06-09 22:17     ` Miles Bader
2006-06-05  7:01 ` Kenichi Handa
2006-06-05  8:22   ` Werner LEMBERG
2006-06-05 11:07     ` Kenichi Handa
2006-06-09 15:09       ` Werner LEMBERG

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.