Re: MML charset tag regression

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Re: MML charset tag regression
       [not found]                 ` <ilubryuyty8.fsf@latte.josefsson.org>
@ 2003-04-26 10:50                   ` James H. Cloos Jr.
  2003-04-28 11:58                     ` Kenichi Handa
  0 siblings, 1 reply; 49+ messages in thread
From: James H. Cloos Jr. @ 2003-04-26 10:50 UTC (permalink / raw)
  Cc: emacs-devel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 1075 bytes --]

>>>>> "Simon" == Simon Josefsson <jas@extundo.com> writes:

Simon> For me, when I yanked the string into emacs from galeon it
Simon> becomes double-width.  It is single-width in galeon though.

I also see that; any pasting of cyrillic text via pasting X's
primary or from the clipboard.  The wide cyrillic is from the
japanese-jisx0208 charset.  Eg, Cyrillic Ñ‚ gets buffer code
0x92 0xA7 0xE4 when pasted, but 0x8C 0xE2 and charset
cyrillic-iso8859-5 when typed directly, or inserted from a utf-8
encoded file.  In both cases, (describe-char) shows the same value
one the Unicode: line, eg 0442 for Ñ‚.

The next line is typed directly:

        Ñ‚Ñ…Ð¸Ñ Ð¸Ñ Ð° Ñ‚ÐµÑÑ‚

This line is pasted: (this probably won't be visible after I send)

        à‘‚à‘…à¸à‘ à¸à‘ à° à‘‚àµà‘à‘‚

The first issue is to get emacs to prefer 8859-5 over jisx0208 when
pasting cyrillic utf8.  The next is getting the cyrillic in jisx0208
to properly convert to utf8.

I'm using GNU Emacs 21.3.50.1 (i686-pc-linux-gnu, X toolkit, Xaw3d
scroll bars) of 2003-03-06 in en_US.UTF-8.

-JimC

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-04-26 10:50                   ` MML charset tag regression James H. Cloos Jr.
@ 2003-04-28 11:58                     ` Kenichi Handa
  2003-04-28 12:43                       ` Stephen J. Turnbull
                                         ` (2 more replies)
  0 siblings, 3 replies; 49+ messages in thread
From: Kenichi Handa @ 2003-04-28 11:58 UTC (permalink / raw)
  Cc: emacs-devel

In article <m3wuhhwn0l.fsf@lugabout.jhcloos.org>, "James H. Cloos Jr." <cloos@jhcloos.com> writes:
>>>>>>  "Simon" == Simon Josefsson <jas@extundo.com> writes:
Simon>  For me, when I yanked the string into emacs from galeon it
Simon>  becomes double-width.  It is single-width in galeon though.

> I also see that; any pasting of cyrillic text via pasting X's
> primary or from the clipboard.  The wide cyrillic is from the
> japanese-jisx0208 charset.
[...]

In article <iluu1clxymd.fsf@latte.josefsson.org>, Simon Josefsson <jas@extundo.com> writes:
> That may be interesting by itself.  Go to
> http://www.nns.ru/persons/gorbach.html using galeon (or mozilla, I
> think).  Cut'n'paste the first word and yank it in Emacs.  It looks as
> single-width in galeon, but when yanked into emacs it becomes double
> width. Yanking it into xterm or gnome-terminal doesn't change the
> string, it looks like single-width.  Save the HTML file and open it in
> emacs as a koi8 file (note that emacs doesn't auto detect it as koi8
> so you to do that manually), then it is single-width too.

> I guess it is the emacs X cut'n'paste code that somehow makes the
> string into double width japanese characters.

I don't think so.  There's no such code in Emacs that does
such a conversion.

I think galeon sends Emacs those cyrillic characters by
encoding into COMPOUND_TEXT as a charset of JISX0208.

Please try this:

At first, select a cyrillic text on galeon.  Then type this
in Emacs: C-x RET X raw-text RET C-y.  You'll see something
like this; "ESC $ ( B ...".

Next, try this:

At first, select a cyrillic text on galeon.  Then evalute
this in Emacs:
   (decode-coding-string (x-get-selecion 'PRIMARY 'UTF8_STRING) 'utf-8)
I think you'll see single width cyrillic chars (you have to
have a iso10646-1 font containing cyrillic glyphs).

The selection problem is very deep.  :-(

Ideally, the requester should be able to request of the type
'TEXT instead of the specific 'COMPOUND_TEXT or
'UTF8_STRING, and the requestee should return a text by one
of these appropriate types that can endocde the text;
STRING, COMPOUND_TEXT, or UTF8_STRING (in this priority
order).

But, unfortunetely, many X clients (requestee) don't behaves
like that.  If 'TEXT is requested, many returns just "?????"
even if the text can be correctly encoded by COMPOUND_TEXT
or UTF8_STRING.

So, it is necessary for Emacs to request by a specific type
'COMPOUND_TEXT ('UTF8_STRING has been recently introduced in
XFree86, and there are many clients that still doesn't
support it).

Recently, many gtk clients start supporting UTF8_STRING
without making COMPOUND_TEXT support better.  It may cause
no problem between gtk clients because they will request
only the type UTF8_STING.  But, it's a too shortsighted
manner.  :-(

The new encoding method using "Non-Standard Character Set
Encodings" of COMPOUND_TEXT makes the cyrillic case much
more complicated.  In some case (perhaps only in KOI8
locale), X clients recently start to encode cyrillic
characters in "ESC % / 0 ...".  They don't consider the
situation that the requester is running in a different
locale.  :-(

Perhaps, we should make Emacs to request UTF8_STRING at
first if the locale is UTF8, and if that request fails,
request COMPOUND_TEXT.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-04-28 11:58                     ` Kenichi Handa
@ 2003-04-28 12:43                       ` Stephen J. Turnbull
  2003-04-28 12:59                         ` Kenichi Handa
  2003-04-28 23:05                       ` Simon Josefsson
  2003-04-29  5:38                       ` Richard Stallman
  2 siblings, 1 reply; 49+ messages in thread
From: Stephen J. Turnbull @ 2003-04-28 12:43 UTC (permalink / raw)
  Cc: jas

>>>>> "Kenichi" == Kenichi Handa <handa@m17n.org> writes:

    Kenichi> 'UTF8_STRING has been recently introduced in XFree86, and
    Kenichi> there are many clients that still doesn't support it.

I thought the UTF-8 interfaces of XFree86 were deprecated because they
are redundant (extended segments work fine, and XFree86 already uses
them even where specifically prohibited by the ICCCM, namely for ISO
8859/15), and I thought the X Consortium opposed it.  It was discussed
on emacs-devel about a year ago.  Has this changed?

    Kenichi> Perhaps, we should make Emacs to request UTF8_STRING at
    Kenichi> first if the locale is UTF8, and if that request fails,
    Kenichi> request COMPOUND_TEXT.

Surely this will need to be configurable in Lisp, maybe even
controlled by the user.  The variety of conventions used by
applications is too wide, and the standards hard to interpret.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-04-28 12:43                       ` Stephen J. Turnbull
@ 2003-04-28 12:59                         ` Kenichi Handa
  0 siblings, 0 replies; 49+ messages in thread
From: Kenichi Handa @ 2003-04-28 12:59 UTC (permalink / raw)
  Cc: jas

In article <87d6j64ws4.fsf@tleepslib.sk.tsukuba.ac.jp>, "Stephen J. Turnbull" <stephen@xemacs.org> writes:

>>>>>>  "Kenichi" == Kenichi Handa <handa@m17n.org> writes:
Kenichi>  'UTF8_STRING has been recently introduced in XFree86, and
Kenichi>  there are many clients that still doesn't support it.

> I thought the UTF-8 interfaces of XFree86 were deprecated because they
> are redundant (extended segments work fine, and XFree86 already uses
> them even where specifically prohibited by the ICCCM, namely for ISO
> 8859/15), and I thought the X Consortium opposed it.  It was discussed
> on emacs-devel about a year ago.  Has this changed?

As far as I remember, that was the different topic.  What
deprecated was the function Xutf8LookupString (and perhaps
Xutf8DrawString as well).

Kenichi>  Perhaps, we should make Emacs to request UTF8_STRING at
Kenichi>  first if the locale is UTF8, and if that request fails,
Kenichi>  request COMPOUND_TEXT.

> Surely this will need to be configurable in Lisp, maybe even
> controlled by the user.  The variety of conventions used by
> applications is too wide, and the standards hard to interpret.

Yes, of course.  What I wrote was the default behavior of
Emacs.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-04-28 11:58                     ` Kenichi Handa
  2003-04-28 12:43                       ` Stephen J. Turnbull
@ 2003-04-28 23:05                       ` Simon Josefsson
  2003-04-29  7:12                         ` Stephen J. Turnbull
  2003-04-29  5:38                       ` Richard Stallman
  2 siblings, 1 reply; 49+ messages in thread
From: Simon Josefsson @ 2003-04-28 23:05 UTC (permalink / raw)
  Cc: cloos, emacs-devel, ding

Kenichi Handa <handa@m17n.org> writes:

>> I guess it is the emacs X cut'n'paste code that somehow makes the
>> string into double width japanese characters.
>
> I don't think so.  There's no such code in Emacs that does
> such a conversion.

Emacs behaves different from xterm, gnome-terminal, gedit, etc though.

> I think galeon sends Emacs those cyrillic characters by
> encoding into COMPOUND_TEXT as a charset of JISX0208.
>
> Please try this:
>
> At first, select a cyrillic text on galeon.  Then type this
> in Emacs: C-x RET X raw-text RET C-y.  You'll see something
> like this; "ESC $ ( B ...".

I see ^[$(B'$'`'b'R'Q'i'V'S...

> Next, try this:
>
> At first, select a cyrillic text on galeon.  Then evalute
> this in Emacs:
>    (decode-coding-string (x-get-selecion 'PRIMARY 'UTF8_STRING) 'utf-8)
> I think you'll see single width cyrillic chars (you have to
> have a iso10646-1 font containing cyrillic glyphs).

Yes, this works fine.

> Ideally, the requester should be able to request of the type
> 'TEXT instead of the specific 'COMPOUND_TEXT or
> 'UTF8_STRING, and the requestee should return a text by one
> of these appropriate types that can endocde the text;
> STRING, COMPOUND_TEXT, or UTF8_STRING (in this priority
> order).
>
> But, unfortunetely, many X clients (requestee) don't behaves
> like that.  If 'TEXT is requested, many returns just "?????"
> even if the text can be correctly encoded by COMPOUND_TEXT
> or UTF8_STRING.

Is this a bug in that client?

Or maybe emacs can detect that the TEXT request failed?  Is "?????"
some magic string emacs can test for?  If it could detect this, it
could continue and try to ask for COMPOUND_TEXT or UTF8_STRING.

This isn't the problem I'm seeing though.

> So, it is necessary for Emacs to request by a specific type
> 'COMPOUND_TEXT ('UTF8_STRING has been recently introduced in
> XFree86, and there are many clients that still doesn't
> support it).

What do XFree86 recommend applications to use?  UTF8_STRING with
fallback to COMPOUND_TEXT?  Or TEXT?  Unless there is some well-agreed
on non-controversial recommendation on how internationalized X11
cut'n'paste should work, all attempts to get a complete system working
seems futile.

> Recently, many gtk clients start supporting UTF8_STRING
> without making COMPOUND_TEXT support better.  It may cause
> no problem between gtk clients because they will request
> only the type UTF8_STING.  But, it's a too shortsighted
> manner.  :-(

Ouch.  Some people claim GTK2 support both UTF8_STRING and
COMPOUND_TEXT though
<http://mail.nl.linux.org/linux-utf8/2002-09/msg00115.html>, but
Galeon uses GTK2 and obviously it doesn't produce a good
COMPOUND_TEXT.

> The new encoding method using "Non-Standard Character Set
> Encodings" of COMPOUND_TEXT makes the cyrillic case much
> more complicated.  In some case (perhaps only in KOI8
> locale), X clients recently start to encode cyrillic
> characters in "ESC % / 0 ...".  They don't consider the
> situation that the requester is running in a different
> locale.  :-(

Do you mean the client sends data in a locale-specific charset via
COMPOUND_TEXT?  Ouch.

> Perhaps, we should make Emacs to request UTF8_STRING at
> first if the locale is UTF8, and if that request fails,
> request COMPOUND_TEXT.

This sounds like a good idea to me.




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-04-28 11:58                     ` Kenichi Handa
  2003-04-28 12:43                       ` Stephen J. Turnbull
  2003-04-28 23:05                       ` Simon Josefsson
@ 2003-04-29  5:38                       ` Richard Stallman
  2003-05-20 12:47                         ` Kenichi Handa
  2 siblings, 1 reply; 49+ messages in thread
From: Richard Stallman @ 2003-04-29  5:38 UTC (permalink / raw)
  Cc: jas

    Recently, many gtk clients start supporting UTF8_STRING
    without making COMPOUND_TEXT support better.  It may cause
    no problem between gtk clients because they will request
    only the type UTF8_STING.  But, it's a too shortsighted
    manner.  :-(

Is this an issue I should raise with the GTK developers?  Could they,
should they, do something to encourage app developers to handle
COMPOUND_TEXT properly?

    Perhaps, we should make Emacs to request UTF8_STRING at
    first if the locale is UTF8, and if that request fails,
    request COMPOUND_TEXT.

If we do this, it should be controlled by a Lisp variable, not by the
locale.  Perhaps the Lisp variable could default based on the locale.

Is there a reason not to do this unconditionally, always?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-04-28 23:05                       ` Simon Josefsson
@ 2003-04-29  7:12                         ` Stephen J. Turnbull
  0 siblings, 0 replies; 49+ messages in thread
From: Stephen J. Turnbull @ 2003-04-29  7:12 UTC (permalink / raw)
  Cc: emacs-devel

>>>>> "Simon" == Simon Josefsson <jas@extundo.com> writes:

    Simon> Emacs behaves different from xterm, gnome-terminal, gedit,
    Simon> etc though.

The X protocol is designed so that clients with different needs/wants
can negotiate the best available transfer.

    Simon> Is this a bug in that client?

Yes.  We lose.

    Simon> Or maybe emacs can detect that the TEXT request failed?  Is
    Simon> "?????"  some magic string emacs can test for?

No.  Heuristic, yes.  Standard or wide-spread practice, no.
Unfortunately, a failed request should return a failure indication,
and no data, not some bogus data.  Apparently these clients fail to do
that correctly.

The big problem with TEXT is that it gives the requestor no way to
negotiate content.  TEXT is simply whatever the selection owner
chooses to spew; you'd better be able to handle it.

Emacs should avoid asking for TEXT.  The algorithm should be

0. Ask for TARGETS.  A proper client will be able to tell you what it
   supports.  (We may be able to cache this information, and avoid
   X protocol round-trips.)  In steps 1-4 below, qualify with "unless
   known to be unavailable."
1. Ask for UTF8_STRING or COMPOUND_TEXT first.  Default to
   UTF8_STRING, but there should be a user option to start with
   COMPOUND_TEXT (the Unihan disambiguation problem).
2. Ask for the other universal encoding.
3. Ask for STRING (ISO 8859/1, if that is not known to be unacceptable).
4. Ask for Heaven's intercession, and TEXT.

(Now I see why UTF8_STRING is a good thing; even though the _sender_
can use COMPOUND_TEXT to send UTF-8 reliably, requesting COMPOUND_TEXT
doesn't restrict the sender to UTF-8.)

    Simon> Unless there is some well-agreed on non-controversial
    Simon> recommendation on how internationalized X11 cut'n'paste
    Simon> should work, all attempts to get a complete system working
    Simon> seems futile.

I don't see why the above should be controversial, except that there's
the Unihan political issue, and some Asian language users would want
the factory default to be COMPOUND_TEXT in Han-using locales.

To deal with broken clients, it might be best to have the above
algorithm implemented as a Lisp list containing targets in order of
desirability.  Then if a client is known to send junk when
COMPOUND_TEXT is requested, you can not send it.  This might also
allow the selection request function to be flexibly used.  (Eg, if the
selection contains an image, you could prepend (PIXMAP POSTSCRIPT) to
the list of text targets, where presumably the text targets would get
the ALT string from HTML or a tooltip from a toolbar button, etc.  To
get a file name, you could prepend (FILE) (the problem with the text
targets is that they might be interpreted as "send me the file
contents").  And so on.)

By having a cache of windows we've gotten stuff from, we could (1)
avoid round-trips to get the TARGET list, and (2) keep a record of
TARGETs that give undesired results, etc.

    Simon> Galeon uses GTK2 and obviously it doesn't produce a good
    Simon> COMPOUND_TEXT.

Depends on what you mean by "good."  This method guarantees that a
font capable of displying the text is available in the standard X
distribution (ISTR that ISO 8859/5 fonts appeared well after Japanese
fonts in X, and I doubt that X distributes KOI8 fonts at all, although
they're easily available).

    >> The new encoding method using "Non-Standard Character Set
    >> Encodings" of COMPOUND_TEXT makes the cyrillic case much more
    >> complicated.  In some case (perhaps only in KOI8 locale), X
    >> clients recently start to encode cyrillic characters in "ESC %
    >> / 0 ...".  They don't consider the situation that the requester
    >> is running in a different locale.  :-(

I don't understand the problem, as long the extended segment is
properly formed, you know it's KOI8.  How is this different from TEXT?
The extended segment is much better than the alternative I've seen,
which is sending non-Latin-1 text as STRING!

    Simon> Do you mean the client sends data in a locale-specific
    Simon> charset via COMPOUND_TEXT?  Ouch.

COMPOUND_TEXT _is_ basically locale-specific.  It's a modal ISO 2022
encoding.  The only semantic difference between the usual escape
sequence and the extended segment used for UTF-8 and KOI8 is that
extended segments can be used for not-yet-standardized encodings that
don't have an ISO-registered final byte.  The method is actually
better than that for the standard encodings since it includes a length
parameter.

    >> Perhaps, we should make Emacs to request UTF8_STRING at first
    >> if the locale is UTF8, and if that request fails, request
    >> COMPOUND_TEXT.

    Simon> This sounds like a good idea to me.

Locales are just plain broken for this purpose.  As Handa-san points
out, you have no idea what locale the partner is running in.  Our own
locale is the best heuristic for Emacs if the partner is unwilling to
talk about it, but really we need clients that implement a proper
negotiation protocol.  I'm regularly running clients in three separate
locales simultaneously on the same host (POSIX, ja_JP.eucJP,
en_US.utf8).  I imagine many Europeans are in a similar situation.
(And I haven't even started to talk about my development/testing
environment!)

I think that we should start by being "selfish", ie, think about what
form of text Emacs is best prepared to use, and request that.  I would
say _always_ request UTF8_STRING unless we have reason to believe the
sender can't do it (eg, previously failed) or our user would prefer
COMPOUND_TEXT (eg, that fraction of Han users).  (I'm thinking in
terms of emacs-unicode, obviously.)

Also, a related topic, I think that we should think carefully about
canonicalizing variant codes (such as "full-width" Latin or Cyrillic
characters).  For example, I'm pretty careful about the aesthetics of
half-width and full-width characters in my Japanese mail, but my
colleagues no longer are (in fact, I once received a mail in which the
4 digits of the year were in three different encodings! JIS X 0201,
JIS X 0208, and ASCII).  When I investigated this curiosity, what I
found is that on most Windows and Mac systems the full- and half-width
variants are visually hard to distinguish, and the JIS Roman and ASCII
characters are the identical glyph with different indices in the Cmap
going to the same CID.

Of course such canonicalization needs to be user-controllable, but I
doubt most users will even notice if we default to canonicalization.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-04-29  5:38                       ` Richard Stallman
@ 2003-05-20 12:47                         ` Kenichi Handa
  2003-05-20 19:42                           ` Jan D.
  2003-05-23 12:03                           ` Richard Stallman
  0 siblings, 2 replies; 49+ messages in thread
From: Kenichi Handa @ 2003-05-20 12:47 UTC (permalink / raw)
  Cc: cloos, jas, emacs-devel, ding

I'm sorry for this late response.

In article <E19ANpQ-0003Pb-00@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:

>     Recently, many gtk clients start supporting UTF8_STRING
>     without making COMPOUND_TEXT support better.  It may cause
>     no problem between gtk clients because they will request
>     only the type UTF8_STING.  But, it's a too shortsighted
>     manner.  :-(

> Is this an issue I should raise with the GTK developers?  Could they,
> should they, do something to encourage app developers to handle
> COMPOUND_TEXT properly?

Perhaps, app developers are just using some GTK function for
X selection handing (I don't know if such a function surely
exists).  In that case, improving that function solve the
problem.

>     Perhaps, we should make Emacs to request UTF8_STRING at
>     first if the locale is UTF8, and if that request fails,
>     request COMPOUND_TEXT.

> If we do this, it should be controlled by a Lisp variable, not by the
> locale.  Perhaps the Lisp variable could default based on the locale.

> Is there a reason not to do this unconditionally, always?

Perhaps, no.

I've just written these changes (not yet installed).  What
do you think?  When I finish writing ChangeLog and do some
more tests, I'll install it.

(1) Modify selection_data_to_lisp_data (in xselect.c) to
    simply return a unibyte string generated from selection
    data while putting text property `foreign-selection'
    (value is a symbol of type).  This property is to
    distinguish it from the return string of
    x_get_local_selection.  This property is also used to
    decode it properly.  With this change, we can completely
    get rid of code-conversion routine from xselect.c.

(2) Modify x-get-selection (in select.el) so that it decode
    the received data if it has `foreign-selection'
    property.

(3) New variable x-select-request-type (in x-win.el) which
    is nil (default), a data-type, or a list of data-types.
    As I don't know how to write a customization code for
    this kind of data type, it's currently just `defver'ed.

(5) Modify x-cut-buffer-or-selection-value (in x-win.el) to
    call the following x-selection-value.

(6) New function x-selction-value (in x-win.el), a support
    function for x-cut-buffer-of selection-value that calls
    x-get-selection according to x-select-request-type.  If
    x-select-request-type is nil, it tries both
    COMPOUND_TEXT and UTF8_STRING, and choose the better one
    by this heuristics.

;;   (1) If their lengthes are different, select the longer one.  This
;;   is because an X client may just cut off unsupported characters.
;;
;;   (2) Otherwise, if the Nth character of CTEXT is an ASCII
;;   character that is different from the Nth character of UTF8,
;;   select UTF8.  This is because an X client may replace unsupported
;;   characters with some ASCII character (typically ` ' or `?') in
;;   CTEXT.
;;
;;   (3) Otherwise, select CTEXT.  This is because legacy charsets are
;;   better for the current Emacs, especially when the selection owner
;;   is also Emacs.

---
Ken'ichi HANDA
handa@m17n.org

*** x-win.el.~1.163.~	Thu Mar 13 15:21:29 2003
--- x-win.el	Tue May 20 21:25:34 2003
***************
*** 2145,2150 ****
--- 2145,2249 ----
      (setq x-last-selected-text-clipboard text))
    )
  
+ (defvar x-select-request-type nil
+   "*Data type request for X selection.
+ The value is nil, one of the following data types, or a list of them:
+   `COMPOUND_TEXT', `UTF8_STRING', `STRING', `TEXT'
+ 
+ If the value is nil, try `COMPOUND_TEXT' and `UTF8_STRING', and
+ use the more appropriate result.  If both fail, try `STRING', and
+ then `TEXT'.
+ 
+ If the value is one of the above symbols, try only the specified
+ type.
+ 
+ If the value is a list of them, try each of them in the specified
+ order until succeed.")
+ 
+ ;; Helper function for x-selection-value.  Select UTF8 or CTEXT
+ ;; whichever is more appropriate.  Here, we use this heurisitcs.
+ ;;
+ ;;   (1) If their lengthes are different, select the longer one.  This
+ ;;   is because an X client may just cut off unsupported characters.
+ ;;
+ ;;   (2) Otherwise, if the Nth character of CTEXT is an ASCII
+ ;;   character that is different from the Nth character of UTF8,
+ ;;   select UTF8.  This is because an X client may replace unsupported
+ ;;   characters with some ASCII character (typically ` ' or `?') in
+ ;;   CTEXT.
+ ;;
+ ;;   (3) Otherwise, select CTEXT.  This is because legacy charsets are
+ ;;   better for the current Emacs, especially when the selection owner
+ ;;   is also Emacs.
+ 
+ (defun x-select-utf8-or-ctext (utf8 ctext)
+   (let ((len-utf8 (length utf8))
+ 	(len-ctext (length ctext))
+ 	(selected ctext)
+ 	(i 0)
+ 	char)
+     (if (/= len-utf8 len-ctext)
+ 	(if (> len-utf8 len-ctext) utf8 ctext)
+       (while (< i len-utf8)
+ 	(setq char (aref ctext i))
+ 	(if (and (< char 128) (/= char (aref utf8 i)))
+ 	    (setq selected utf8
+ 		  i len-utf8)
+ 	  (setq i (1+ i))))
+       selected)))
+ 
+ (defun x-selection-value (type)
+   (let (text)
+     (cond ((null x-select-request-type)
+ 	   (let (utf8 ctext utf8-coding)
+ 	     ;; We try both UTF8_STRING and COMPOUND_TEXT, and choose
+ 	     ;; the more appropriate one.  If both fail, try STRING.
+ 
+ 	     ;; At first try UTF8_STRING.
+ 	     (setq utf8 (x-get-selection type 'UTF8_STRING)
+ 		   utf8-coding last-coding-system-used)
+ 	     (if utf8
+ 		 ;; If it is a locale selection, choose it.
+ 		 (or (condition-case nil
+ 			 (get-text-property 0 'foreign-selection utf8)
+ 		       (error nil))
+ 		     (setq text utf8)))
+ 	     ;; If not yet decided, try COMPOUND_TEXT.
+ 	     (if (not text)
+ 		 (if (setq ctext (condition-case nil
+ 				     (x-get-selection type 'COMPOUND_TEXT)
+ 				   (error nil)))
+ 		     ;; If UTF8_STRING was also successful, choose the
+ 		     ;; more appropriate one from UTF8 and CTEXT.
+ 		     (if utf8
+ 			 (setq text (x-select-utf8-or-ctext utf8 ctext))
+ 		       ;; Othewise, choose CTEXT.
+ 		       (setq text ctext))))
+ 	     ;; If not yet decided, try STRING.
+ 	     (or text
+ 		 (setq text (condition-case nil
+ 				(x-get-selection type 'STRING)
+ 			      (error nil))))
+ 	     (if (eq text utf8)
+ 		 (setq last-coding-system-used utf8-coding))))
+ 
+ 	  ((consp x-select-request-type)
+ 	   (let ((tail x-select-request-type))
+ 	     (while (and tail (not text))
+ 	       (condition-case nil
+ 		   (setq text (x-get-selection type (car tail)))
+ 		 (error nil)))
+ 	     (setq tail (cdr tail))))
+ 
+ 	  (t
+ 	   (condition-case nil
+ 	       (setq text (x-get-selection type x-select-request-type))
+ 	     (error nil))))
+ 
+     (if text
+ 	(put-text-property 0 (length text) 'foreign-selection nil text))
+     text))
+       
  ;;; Return the value of the current X selection.
  ;;; Consult the selection, and the cut buffer.  Treat empty strings
  ;;; as if they were unset.
***************
*** 2154,2168 ****
  (defun x-cut-buffer-or-selection-value ()
    (let (clip-text primary-text cut-text)
      (when x-select-enable-clipboard
!       ;; Don't die if x-get-selection signals an error.
!       (if (null clip-text)
! 	  (condition-case c
! 	      (setq clip-text (x-get-selection 'CLIPBOARD 'COMPOUND_TEXT))
! 	    (error nil)))
!       (if (null clip-text)
! 	  (condition-case c
! 	      (setq clip-text (x-get-selection 'CLIPBOARD 'STRING))
! 	    (error nil)))
        (if (string= clip-text "") (setq clip-text nil))
  
        ;; Check the CLIPBOARD selection for 'newness', is it different
--- 2253,2259 ----
  (defun x-cut-buffer-or-selection-value ()
    (let (clip-text primary-text cut-text)
      (when x-select-enable-clipboard
!       (setq clip-text (x-selection-value 'CLIPBOARD))
        (if (string= clip-text "") (setq clip-text nil))
  
        ;; Check the CLIPBOARD selection for 'newness', is it different
***************
*** 2182,2196 ****
  	      (setq x-last-selected-text-clipboard clip-text))))
        )
  
!     ;; Don't die if x-get-selection signals an error.
!     (if (null primary-text)
! 	(condition-case c
! 	    (setq primary-text (x-get-selection 'PRIMARY 'COMPOUND_TEXT))
! 	  (error nil)))
!     (if (null primary-text)
! 	(condition-case c
! 	    (setq primary-text (x-get-selection 'PRIMARY 'STRING))
! 	  (error nil)))
      ;; Check the PRIMARY selection for 'newness', is it different
      ;; from what we remebered them to be last time we did a
      ;; cut/paste operation.
--- 2273,2279 ----
  	      (setq x-last-selected-text-clipboard clip-text))))
        )
  
!     (setq primary-text (x-selection-value 'PRIMARY))
      ;; Check the PRIMARY selection for 'newness', is it different
      ;; from what we remebered them to be last time we did a
      ;; cut/paste operation.
***************
*** 2224,2229 ****
--- 2307,2315 ----
        nil)
       (t
  	    (setq x-last-selected-text-cut cut-text))))
+ 
+     ;; As we have done one selection, clear this now.
+     (setq next-selection-coding-system nil)
  
      ;; At this point we have recorded the current values for the
      ;; selection from clipboard (if we are supposed to) primary,
*** select.el.~1.19.~	Wed Jan 29 22:04:38 2003
--- select.el	Tue May 20 21:37:41 2003
***************
*** 38,44 ****
  TYPE may be `SECONDARY' or `CLIPBOARD', in addition to `PRIMARY'.
  DATA-TYPE is usually `STRING', but can also be one of the symbols
  in `selection-converter-alist', which see."
!   (x-get-selection-internal (or type 'PRIMARY) (or data-type 'STRING)))
  
  (defun x-get-clipboard ()
    "Return text pasted to the clipboard."
--- 38,55 ----
  TYPE may be `SECONDARY' or `CLIPBOARD', in addition to `PRIMARY'.
  DATA-TYPE is usually `STRING', but can also be one of the symbols
  in `selection-converter-alist', which see."
!   (let ((data (x-get-selection-internal (or type 'PRIMARY)
! 					(or data-type 'STRING)))
! 	coding)
!     (when (and data
! 	       (setq data-type (get-text-property 0 'foreign-selection data)))
!       (setq coding (if (eq data-type 'UTF8_STRING)
! 		       'utf-8
! 		     (or next-selection-coding-system
! 			 selection-coding-system))
! 	    data (decode-coding-string data coding))
!       (put-text-property 0 (length data) 'foreign-selection data-type data))
!     data))
  
  (defun x-get-clipboard ()
    "Return text pasted to the clipboard."
*** xselect.c.~1.128.~	Mon Apr  7 11:03:27 2003
--- xselect.c	Tue May 20 11:55:40 2003
***************
*** 29,38 ****
  #include "frame.h"	/* Need this to get the X window of selected_frame */
  #include "blockinput.h"
  #include "buffer.h"
- #include "charset.h"
- #include "coding.h"
  #include "process.h"
- #include "composite.h"
  
  struct prop_location;
  
--- 29,35 ----
***************
*** 114,119 ****
--- 111,118 ----
  /* Coding system for the next communicating with other X clients.  */
  static Lisp_Object Vnext_selection_coding_system;
  
+ static Lisp_Object Qforeign_selection;
+ 
  /* If this is a smaller number than the max-request-size of the display,
     emacs will use INCR selection transfer when the selection is larger
     than this.  The max-request-size is usually around 64k, so if you want
***************
*** 1605,1678 ****
    /* Convert any 8-bit data to a string, for compactness.  */
    else if (format == 8)
      {
!       Lisp_Object str;
!       int require_encoding = 0;
  
!       if (
! #if 1
! 	  1
! #else
! 	  ! NILP (buffer_defaults.enable_multibyte_characters)
! #endif
! 	  )
! 	{
! 	  /* If TYPE is `TEXT' or `COMPOUND_TEXT', we should decode
! 	     DATA to Emacs internal format because DATA may be encoded
! 	     in compound text format.  In addtion, if TYPE is `STRING'
! 	     and DATA contains any 8-bit Latin-1 code, we should also
! 	     decode it.  */
! 	  if (type == dpyinfo->Xatom_TEXT
! 	      || type == dpyinfo->Xatom_COMPOUND_TEXT)
! 	    require_encoding = 1;
! 	  else if (type == XA_STRING)
! 	    {
! 	      int i;
! 	      for (i = 0; i < size; i++)
! 		{
! 		  if (data[i] >= 0x80)
! 		    {
! 		      require_encoding = 1;
! 		      break;
! 		    }
! 		}
! 	    }
! 	}
!       if (!require_encoding)
! 	{
! 	  str = make_unibyte_string ((char *) data, size);
! 	  Vlast_coding_system_used = Qraw_text;
! 	}
        else
! 	{
! 	  int bufsize;
! 	  unsigned char *buf;
! 	  struct coding_system coding;
! 
! 	  if (NILP (Vnext_selection_coding_system))
! 	    Vnext_selection_coding_system = Vselection_coding_system;
! 	  setup_coding_system
! 	    (Fcheck_coding_system(Vnext_selection_coding_system), &coding);
! 	  coding.src_multibyte = 0;
! 	  coding.dst_multibyte = 1;
! 	  Vnext_selection_coding_system = Qnil;
!           coding.mode |= CODING_MODE_LAST_BLOCK;
! 	  /* We explicitely disable composition handling because
! 	     selection data should not contain any composition
! 	     sequence.  */
! 	  coding.composing = COMPOSITION_DISABLED;
! 	  bufsize = decoding_buffer_size (&coding, size);
! 	  buf = (unsigned char *) xmalloc (bufsize);
! 	  decode_coding (&coding, data, buf, size, bufsize);
! 	  str = make_string_from_bytes ((char *) buf,
! 					coding.produced_char, coding.produced);
! 	  xfree (buf);
! 
! 	  if (SYMBOLP (coding.post_read_conversion)
! 	      && !NILP (Ffboundp (coding.post_read_conversion)))
! 	    str = run_pre_post_conversion_on_str (str, &coding, 0);
! 	  Vlast_coding_system_used = coding.symbol;
! 	}
!       compose_chars_in_text (0, SCHARS (str), str);
        return str;
      }
    /* Convert a single atom to a Lisp_Symbol.  Convert a set of atoms to
--- 1604,1622 ----
    /* Convert any 8-bit data to a string, for compactness.  */
    else if (format == 8)
      {
!       Lisp_Object str, lispy_type;
  
!       str = make_unibyte_string ((char *) data, size);
!       /* Indicate that this string is from foreign selection thus the
! 	 caller of x-get-selection-internal has to decode it.  */
!       if (type == dpyinfo->Xatom_COMPOUND_TEXT)
! 	lispy_type = QCOMPOUND_TEXT;
!       else if (type == dpyinfo->Xatom_UTF8_STRING)
! 	lispy_type = QUTF8_STRING;
        else
! 	lispy_type = QSTRING;
!       Fput_text_property (make_number (0), make_number (size),
! 			  Qforeign_selection, lispy_type, str);
        return str;
      }
    /* Convert a single atom to a Lisp_Symbol.  Convert a set of atoms to
***************
*** 2451,2454 ****
--- 2395,2400 ----
    QCUT_BUFFER7 = intern ("CUT_BUFFER7"); staticpro (&QCUT_BUFFER7);
  #endif
  
+   Qforeign_selection = intern ("foreign-selection");
+   staticpro (&Qforeign_selection);
  }



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-20 12:47                         ` Kenichi Handa
@ 2003-05-20 19:42                           ` Jan D.
  2003-05-21 15:31                             ` Richard Stallman
  2003-05-23 12:03                           ` Richard Stallman
  1 sibling, 1 reply; 49+ messages in thread
From: Jan D. @ 2003-05-20 19:42 UTC (permalink / raw)
  Cc: emacs-devel

> In article <E19ANpQ-0003Pb-00@fencepost.gnu.org>, Richard Stallman 
> <rms@gnu.org> writes:
>
>>     Recently, many gtk clients start supporting UTF8_STRING
>>     without making COMPOUND_TEXT support better.  It may cause
>>     no problem between gtk clients because they will request
>>     only the type UTF8_STING.  But, it's a too shortsighted
>>     manner.  :-(
>
>> Is this an issue I should raise with the GTK developers?  Could they,
>> should they, do something to encourage app developers to handle
>> COMPOUND_TEXT properly?
>
> Perhaps, app developers are just using some GTK function for
> X selection handing (I don't know if such a function surely
> exists).  In that case, improving that function solve the
> problem.

There is a function that converts selection data to an UTF8 string,
if the format is a known text format.  But mostly widgets take care
of selection handling by themselves (i.e. transparent to an application
that uses GTK).  To get better COMPOUND_STRING handling, I suspect GTK 
as a library must change.

The reason you are seeing more UTF8_STRING is probably not a choice
made by GTK client developers, it is just an effect of porting to
GTK version 2.  UTF8_STRING is not present in GTK 1.x, but the
preferred format in GTK 2.x.  The reason for this is the standards
work being done by the free desktop people
(see http://www.freedesktop.org/standards/) to improve interoperability
between desktop environments, for example KDE and GNOME.  It seems
UTF8_STRING is the preferred coding.

I am not sure, but I suspect that better COMPOUND_STRING handling
is low on the priority, as the web page above kind of implies that
UTF8_STRING is to replace COMPOUND_STRING in the future.  Qt (KDE)
also uses UTF8_STRING as a first choice.

	Jan D.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-20 19:42                           ` Jan D.
@ 2003-05-21 15:31                             ` Richard Stallman
  2003-05-21 16:23                               ` Jan D.
  2003-05-24  0:51                               ` Kenichi Handa
  0 siblings, 2 replies; 49+ messages in thread
From: Richard Stallman @ 2003-05-21 15:31 UTC (permalink / raw)
  Cc: handa, emacs-devel, ding, jas

Can you write a self-contained statement addressed to the GTK people
requesting them to provide better COMPOUND_TEXT handling, and
explaining more specifically what we need and why?  We need that,
I think, to make the case to them.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-21 15:31                             ` Richard Stallman
@ 2003-05-21 16:23                               ` Jan D.
  2003-05-22  0:58                                 ` Kenichi Handa
  2003-05-25 16:27                                 ` Dave Love
  2003-05-24  0:51                               ` Kenichi Handa
  1 sibling, 2 replies; 49+ messages in thread
From: Jan D. @ 2003-05-21 16:23 UTC (permalink / raw)
  Cc: emacs-devel, jas, ding, handa

> Can you write a self-contained statement addressed to the GTK people
> requesting them to provide better COMPOUND_TEXT handling, and
> explaining more specifically what we need and why?  We need that,
> I think, to make the case to them.

I do not understand the issues involved in detail, I do not know what
we need and why.  In my naive reasoning, Emacs would be fine if it
requested UTF8_STRING first and COMPOUND_TEXT second always.  This is
what the free desktop documents seems to recommend.  Or prehaps request
TARGETS first to check for UTF8_STRING, and use that if available and
something else if not (COMPOUND_STRING, STRING or TEXT in that order?).

But then, I am not using any locales that require more than 8-bit 
characters
so I may be off here.

	Jan D.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-21 16:23                               ` Jan D.
@ 2003-05-22  0:58                                 ` Kenichi Handa
  2003-05-22 16:25                                   ` Jan D.
                                                     ` (2 more replies)
  2003-05-25 16:27                                 ` Dave Love
  1 sibling, 3 replies; 49+ messages in thread
From: Kenichi Handa @ 2003-05-22  0:58 UTC (permalink / raw)
  Cc: rms, emacs-devel, jas, ding

In article <8B17870A-8BA8-11D7-8E1F-00039363E640@swipnet.se>, "Jan D." <jan.h.d@swipnet.se> writes:
> I do not understand the issues involved in detail, I do not know what
> we need and why.  In my naive reasoning, Emacs would be fine if it
> requested UTF8_STRING first and COMPOUND_TEXT second always.  This is
> what the free desktop documents seems to recommend.  Or prehaps request
> TARGETS first to check for UTF8_STRING, and use that if available and
> something else if not (COMPOUND_STRING, STRING or TEXT in that order?).

The current Emacs still don't unify Unicode and the other
legacy charsets (e.g. iso-8859-2, jisx0208, gb2312)
automatically.  So, for instance, if iso-8859-2 characters
arrive at Emacs with UTF8_STRING, they are decoded into the
charset mule-unicode-0100-24ff and treated differently
(e.g. in searching) than the characters of the charset
iso-8859-2.

---
Ken'ichi HANDA
handa@m17n.org



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-22  0:58                                 ` Kenichi Handa
@ 2003-05-22 16:25                                   ` Jan D.
  2003-05-23  1:33                                     ` Kenichi Handa
                                                       ` (2 more replies)
  2003-05-23 12:05                                   ` Richard Stallman
  2003-05-25 16:31                                   ` Dave Love
  2 siblings, 3 replies; 49+ messages in thread
From: Jan D. @ 2003-05-22 16:25 UTC (permalink / raw)
  Cc: ding, rms, jas, emacs-devel

> The current Emacs still don't unify Unicode and the other
> legacy charsets (e.g. iso-8859-2, jisx0208, gb2312)
> automatically.  So, for instance, if iso-8859-2 characters
> arrive at Emacs with UTF8_STRING, they are decoded into the
> charset mule-unicode-0100-24ff and treated differently
> (e.g. in searching) than the characters of the charset
> iso-8859-2.

Okay, that explains it.  But playing the devils advocate a bit, does 
this
not simply point out a problem with Emacs, not GTK?  If UTF8_STRING is
the recommended thing to use, changing GTK would not help much, as there
are other X toolkits out there (Qt, Motif, and so on), that will start 
to
use UTF8_STRING also (Qt already does)?
Isn't this an argument for getting the Unicode Emacs branch released, or
unify charsets?

	Jan D.






^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-22 16:25                                   ` Jan D.
@ 2003-05-23  1:33                                     ` Kenichi Handa
  2003-05-23  7:45                                       ` David Kastrup
  2003-05-23 22:48                                     ` Richard Stallman
  2003-05-25 16:32                                     ` Dave Love
  2 siblings, 1 reply; 49+ messages in thread
From: Kenichi Handa @ 2003-05-23  1:33 UTC (permalink / raw)
  Cc: ding, rms, jas, emacs-devel

In article <0F223D16-8C72-11D7-8F50-00039363E640@swipnet.se>, "Jan D." <jan.h.d@swipnet.se> writes:
>>  The current Emacs still don't unify Unicode and the other
>>  legacy charsets (e.g. iso-8859-2, jisx0208, gb2312)
>>  automatically.  So, for instance, if iso-8859-2 characters
>>  arrive at Emacs with UTF8_STRING, they are decoded into the
>>  charset mule-unicode-0100-24ff and treated differently
>>  (e.g. in searching) than the characters of the charset
>>  iso-8859-2.

> Okay, that explains it.  But playing the devils advocate a
> bit, does this not simply point out a problem with Emacs,
> not GTK?

It's surely Emacs' problem that the same iso-8859-2
character is represented in two ways internally.  But,
incomplete support of COMPOUND_TEXT is GTK's (or some other
X client's) problem.  As far as they react upon the request
of COMPOUND_TEXT, it should send the correct data (without
cutting off unsupported characters or replacing them with
'?' silenty).  Otherwise, it shouldn't react upon that
request.

> If UTF8_STRING is the recommended thing to use,
> changing GTK would not help much, as there are other X
> toolkits out there (Qt, Motif, and so on), that will start
> to use UTF8_STRING also (Qt already does)?  Isn't this an
> argument for getting the Unicode Emacs branch released, or
> unify charsets?

Of course, with Emacs-unicode, there's no such problem, and
I want to release it as soon as possible.

---
Ken'ichi HANDA
handa@m17n.org




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-23  1:33                                     ` Kenichi Handa
@ 2003-05-23  7:45                                       ` David Kastrup
  0 siblings, 0 replies; 49+ messages in thread
From: David Kastrup @ 2003-05-23  7:45 UTC (permalink / raw)
  Cc: jas

Kenichi Handa <handa@m17n.org> writes:

> In article <0F223D16-8C72-11D7-8F50-00039363E640@swipnet.se>, "Jan D." <jan.h.d@swipnet.se> writes:
> >>  The current Emacs still don't unify Unicode and the other
> >>  legacy charsets (e.g. iso-8859-2, jisx0208, gb2312)
> >>  automatically.  So, for instance, if iso-8859-2 characters
> >>  arrive at Emacs with UTF8_STRING, they are decoded into the
> >>  charset mule-unicode-0100-24ff and treated differently
> >>  (e.g. in searching) than the characters of the charset
> >>  iso-8859-2.
> 
> > Okay, that explains it.  But playing the devils advocate a
> > bit, does this not simply point out a problem with Emacs,
> > not GTK?
> 
> It's surely Emacs' problem that the same iso-8859-2
> character is represented in two ways internally.  But,
> incomplete support of COMPOUND_TEXT is GTK's (or some other
> X client's) problem.  As far as they react upon the request
> of COMPOUND_TEXT, it should send the correct data (without
> cutting off unsupported characters or replacing them with
> '?' silenty).  Otherwise, it shouldn't react upon that
> request.

Right.  I have found it very annoying to find that I can cut&paste
unicode strings from Emacs to galeon, but get only ? when doing it the
other way round.

> > If UTF8_STRING is the recommended thing to use, changing GTK would
> > not help much, as there are other X toolkits out there (Qt, Motif,
> > and so on), that will start to use UTF8_STRING also (Qt already
> > does)?  Isn't this an argument for getting the Unicode Emacs
> > branch released, or unify charsets?
> 
> Of course, with Emacs-unicode, there's no such problem, and I want
> to release it as soon as possible.

What's the state of it?  Am I right in assuming that we would first
be releasing a full-featured 21.4 (or, if really necessary, another
bug fix 21.4 followed by a full 21.5)?  In that case, probably
another bug fix 21.xx series would have to follow, and one would
probably make something like 22.0 the goal for a Unicode Emacs, with
probably some alpha versions before that?

Sorry to be using the "R" word here.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-20 12:47                         ` Kenichi Handa
  2003-05-20 19:42                           ` Jan D.
@ 2003-05-23 12:03                           ` Richard Stallman
  2003-05-23 12:21                             ` Kenichi Handa
  1 sibling, 1 reply; 49+ messages in thread
From: Richard Stallman @ 2003-05-23 12:03 UTC (permalink / raw)
  Cc: cloos, jas, emacs-devel, ding

    (1) Modify selection_data_to_lisp_data (in xselect.c) to
	simply return a unibyte string generated from selection
	data while putting text property `foreign-selection'
	(value is a symbol of type).  This property is to
	distinguish it from the return string of
	x_get_local_selection.  This property is also used to
	decode it properly.

Would it work to return a cons cell (STRING . CODING-SYSTEM)?
That is a little cleaner, I think, than using a text property.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-22  0:58                                 ` Kenichi Handa
  2003-05-22 16:25                                   ` Jan D.
@ 2003-05-23 12:05                                   ` Richard Stallman
  2003-05-25 16:31                                   ` Dave Love
  2 siblings, 0 replies; 49+ messages in thread
From: Richard Stallman @ 2003-05-23 12:05 UTC (permalink / raw)
  Cc: jan.h.d, emacs-devel, jas, ding

    The current Emacs still don't unify Unicode and the other
    legacy charsets (e.g. iso-8859-2, jisx0208, gb2312)
    automatically.  So, for instance, if iso-8859-2 characters
    arrive at Emacs with UTF8_STRING, they are decoded into the
    charset mule-unicode-0100-24ff and treated differently
    (e.g. in searching) than the characters of the charset
    iso-8859-2.

Can you draw up something for us to send to the GTK people
to request a specific feature that we need?



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-23 12:03                           ` Richard Stallman
@ 2003-05-23 12:21                             ` Kenichi Handa
  2003-05-24 23:18                               ` Richard Stallman
  0 siblings, 1 reply; 49+ messages in thread
From: Kenichi Handa @ 2003-05-23 12:21 UTC (permalink / raw)
  Cc: cloos, jas, emacs-devel, ding

In article <E19JBGV-0006Cz-Cv@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
>     (1) Modify selection_data_to_lisp_data (in xselect.c) to
> 	simply return a unibyte string generated from selection
> 	data while putting text property `foreign-selection'
> 	(value is a symbol of type).  This property is to
> 	distinguish it from the return string of
> 	x_get_local_selection.  This property is also used to
> 	decode it properly.

> Would it work to return a cons cell (STRING . CODING-SYSTEM)?
> That is a little cleaner, I think, than using a text property.

It works if (STRING . RETURN-TYPE) is returned.  But, can we
also change x-get-selection to return that cons?  In my
code, x-selection-value (in x-win.el) calls x-get-selection,
and check if or not the returned string is from the locale
selection by the existence of that text property.
x-selection-value have to know that to suppress another try
with different request-type if the string is from local
selection.

If we can't change x-get-selection, x-get-selection anyway
have to add that text property.  In that case, I think it is
better that x-get-selection-internal itself returns a string
with that text property instead of a cons.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-22 16:25                                   ` Jan D.
  2003-05-23  1:33                                     ` Kenichi Handa
@ 2003-05-23 22:48                                     ` Richard Stallman
  2003-05-23 23:41                                       ` Jan D.
  2003-05-25 16:38                                       ` Dave Love
  2003-05-25 16:32                                     ` Dave Love
  2 siblings, 2 replies; 49+ messages in thread
From: Richard Stallman @ 2003-05-23 22:48 UTC (permalink / raw)
  Cc: handa, ding, jas, emacs-devel

    Okay, that explains it.  But playing the devils advocate a bit, does 
    this
    not simply point out a problem with Emacs, not GTK?

It would not be a "problem" if GTK handed this better.

Indeed, people want to make Emacs use Unicode.  There are benefits to
be had.  And Handa is working on it--though it won't be done soon.
But it is unfair to say that not being based on Unicode is a
"problem".




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-23 22:48                                     ` Richard Stallman
@ 2003-05-23 23:41                                       ` Jan D.
  2003-05-24  0:31                                         ` Miles Bader
  2003-05-25 18:01                                         ` Richard Stallman
  2003-05-25 16:38                                       ` Dave Love
  1 sibling, 2 replies; 49+ messages in thread
From: Jan D. @ 2003-05-23 23:41 UTC (permalink / raw)
  Cc: emacs-devel, jas, ding, handa


lördagen den 24 maj 2003 kl 00.48 skrev Richard Stallman:

>     Okay, that explains it.  But playing the devils advocate a bit, 
> does
>     this
>     not simply point out a problem with Emacs, not GTK?
>
> It would not be a "problem" if GTK handed this better.

There is a bug in GTK if it sends ? for unsupported characters,
that should be fixed IMHO.

> But it is unfair to say that not being based on Unicode is a
> "problem".

If I start a GTK application in a directory with a name in a non-Unicode
charset and then bring up the GTK file selection widget, the application
will crash.  GTK considers this an error by the user for not having file
and directory names in Unicode.  I think this situation is similar.

	Jan D.




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-23 23:41                                       ` Jan D.
@ 2003-05-24  0:31                                         ` Miles Bader
  2003-05-25 16:40                                           ` Dave Love
  2003-05-25 18:01                                         ` Richard Stallman
  1 sibling, 1 reply; 49+ messages in thread
From: Miles Bader @ 2003-05-24  0:31 UTC (permalink / raw)
  Cc: emacs-devel

On Sat, May 24, 2003 at 01:41:10AM +0200, Jan D. wrote:
> >But it is unfair to say that not being based on Unicode is a
> >"problem".
> 
> If I start a GTK application in a directory with a name in a non-Unicode
> charset and then bring up the GTK file selection widget, the application
> will crash.  GTK considers this an error by the user for not having file
> and directory names in Unicode.  I think this situation is similar.

I assume GTK's returning `?' in compound-text, though highly annoying, is
just laziness and/or cluelessness, but _this_ attitude is completely
braindead... Isn't GTK's big goal these days to be user-centric?!?

-Miles
-- 
"1971 pickup truck; will trade for guns"

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-21 15:31                             ` Richard Stallman
  2003-05-21 16:23                               ` Jan D.
@ 2003-05-24  0:51                               ` Kenichi Handa
  1 sibling, 0 replies; 49+ messages in thread
From: Kenichi Handa @ 2003-05-24  0:51 UTC (permalink / raw)
  Cc: jan.h.d, emacs-devel, ding, jas

In article <E19IVZG-0005PS-La@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
> Can you write a self-contained statement addressed to the GTK people
> requesting them to provide better COMPOUND_TEXT handling, and
> explaining more specifically what we need and why?  We need that,
> I think, to make the case to them.

How about this?  Please note that I don't know if this is the
responsibility of GTK, nor of the underling GDK, nor of each
client program.

Could you send it (after polishing English) to a proper
person/group?

This is a strong request.

When a selection owner reacts upon the request of type
COMPOUND_TEXT and the selection data (text) contains a
character that is not encodable by COMPOUND_TEXT, the owner
should not replace such a character with '?' nor strip off
the charater silently.  Instead the owner should send back
the XSelectionEvent with the `property' member None.  That
way, a requester can request the different type, e.g.,
UTF8_STRING, TEXT.  Otherwise, the requester has no way to
know that the received data is the correct one or not.

This is a less strong request.

When a selection owner have to convert Unicode characters to
some other legacy charsets (ISO-8859-X, JISX0208, GB2312,
etc) upon a request of type COMPOUND_TEXT and there are
multiple choices (e.g. some HAN character to JISX0208 or to
GB2312), it is better that the owner chooses a charset
supported by the current locale with the highest priority.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-23 12:21                             ` Kenichi Handa
@ 2003-05-24 23:18                               ` Richard Stallman
  2003-06-13  8:37                                 ` Kenichi Handa
  0 siblings, 1 reply; 49+ messages in thread
From: Richard Stallman @ 2003-05-24 23:18 UTC (permalink / raw)
  Cc: cloos, jas, emacs-devel, ding

    It works if (STRING . RETURN-TYPE) is returned.  But, can we
    also change x-get-selection to return that cons?

That would be an incompatible change, but it is in a low-level function
that I think is not used much by users.  So I think it is ok to make an
incompatible change here.

I am not certain that approach is better, but if you agree it is better,
please do it.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-21 16:23                               ` Jan D.
  2003-05-22  0:58                                 ` Kenichi Handa
@ 2003-05-25 16:27                                 ` Dave Love
  1 sibling, 0 replies; 49+ messages in thread
From: Dave Love @ 2003-05-25 16:27 UTC (permalink / raw)
  Cc: emacs-devel, jas, ding, handa

[I don't have the start of this thread and I don't know what it has to
do with the subject, which I may be able to help with.]

"Jan D." <jan.h.d@swipnet.se> writes:

> In my naive reasoning, Emacs would be fine if it requested
> UTF8_STRING first and COMPOUND_TEXT second always.

That's not so useful for CJK text, at least.

> But then, I am not using any locales that require more than 8-bit
> characters

The locale used shouldn't be relevant (except for choosing encoding
defaults).  That's part of the problem.  The selection clients and the
server may be in different locales with different X implementations.
I often have a mixture of four or five on my desktop.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-22  0:58                                 ` Kenichi Handa
  2003-05-22 16:25                                   ` Jan D.
  2003-05-23 12:05                                   ` Richard Stallman
@ 2003-05-25 16:31                                   ` Dave Love
  2003-05-30 12:03                                     ` Kenichi Handa
  2 siblings, 1 reply; 49+ messages in thread
From: Dave Love @ 2003-05-25 16:31 UTC (permalink / raw)
  Cc: rms, emacs-devel, jas, ding

Kenichi Handa <handa@m17n.org> writes:

> So, for instance, if iso-8859-2 characters
> arrive at Emacs with UTF8_STRING, they are decoded into the
> charset mule-unicode-0100-24ff and treated differently
> (e.g. in searching) than the characters of the charset
> iso-8859-2.

That's actually customizable if it's a real problem.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-22 16:25                                   ` Jan D.
  2003-05-23  1:33                                     ` Kenichi Handa
  2003-05-23 22:48                                     ` Richard Stallman
@ 2003-05-25 16:32                                     ` Dave Love
  2003-05-25 19:14                                       ` Jan D.
  2 siblings, 1 reply; 49+ messages in thread
From: Dave Love @ 2003-05-25 16:32 UTC (permalink / raw)
  Cc: ding, rms, jas, emacs-devel

"Jan D." <jan.h.d@swipnet.se> writes:

> If UTF8_STRING is the recommended thing to use,

Recommended by who, though?  As far as I know, it's registered for X,
but that's all.  Does anything other than recent Xfree86 releases even
support it?

> Isn't this an argument for getting the Unicode Emacs branch released, or
> unify charsets?

I'm not convinced there's relevant unification that you can't do now.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-23 22:48                                     ` Richard Stallman
  2003-05-23 23:41                                       ` Jan D.
@ 2003-05-25 16:38                                       ` Dave Love
  2003-05-25 17:25                                         ` Eli Zaretskii
  2003-05-26 13:49                                         ` Richard Stallman
  1 sibling, 2 replies; 49+ messages in thread
From: Dave Love @ 2003-05-25 16:38 UTC (permalink / raw)
  Cc: handa, ding, jas, emacs-devel

Richard Stallman <rms@gnu.org> writes:

> Indeed, people want to make Emacs use Unicode.

Emacs can/does use Unicode, even if it doesn't have complete support
(what does?).  This is largely orthogonal to the internal encoding, as
I keep trying to point out.  That's a red herring concerning the
handling of X selections.

> But it is unfair to say that not being based on Unicode is a
> "problem".

Indeed.  However, the Unicode branch doesn't actually deal with
compound text properly, because it doesn't do extended segments, and
the released version is at best pretty confused about that and I think
not correct.  (Extended segments are not an extension to compound
text, they're part of the specification, contrary to what the current
code says.)  Of course, something in XFree86 and/or gtk disobeys the
CTEXT spec regarding them anyhow, which is another thing that should
be fixed.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-24  0:31                                         ` Miles Bader
@ 2003-05-25 16:40                                           ` Dave Love
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Love @ 2003-05-25 16:40 UTC (permalink / raw)
  Cc: rms, emacs-devel, jas, handa, ding

Miles Bader <miles@gnu.org> writes:

> I assume GTK's returning `?' in compound-text, though highly annoying, is
> just laziness and/or cluelessness, but _this_ attitude is completely
> braindead... Isn't GTK's big goal these days to be user-centric?!?

Indeed.  It's a pity about utf-8 fundamentalism.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-25 16:38                                       ` Dave Love
@ 2003-05-25 17:25                                         ` Eli Zaretskii
  2003-05-30  8:39                                           ` Kenichi Handa
  2003-05-30  9:23                                           ` Dave Love
  2003-05-26 13:49                                         ` Richard Stallman
  1 sibling, 2 replies; 49+ messages in thread
From: Eli Zaretskii @ 2003-05-25 17:25 UTC (permalink / raw)
  Cc: jas

> From: Dave Love <d.love@dl.ac.uk>
> Date: Sun, 25 May 2003 17:38:55 +0100
> 
>                        the Unicode branch doesn't actually deal with
> compound text properly, because it doesn't do extended segments, and
> the released version is at best pretty confused about that and I think
> not correct.  (Extended segments are not an extension to compound
> text, they're part of the specification, contrary to what the current
> code says.)

The support for extended segments was implemented as an extension of
ctext to avoid a thorough rewrite of the ctext en/decoder.  As you
know, the ctext encoder and decoder are variants of the iso-2022
en/decoder and are handled by the same code (in C).  At the time,
Handa-san recommended not to touch the iso-2022 code, saying that the
code was tricky and hard to maintain, and that we could inadvertently
break something important in the process.

The general idea of the current implementation (using post-read and
pre-write conversions) was also suggested by Handa-san.  As you might
expect, I gratefully accepted his expert opinions and suggestions.

That said, I personally won't object if someone would set forth
rewriting coding.c to have extended segments supported natively by the
iso-2022 code.  Please feel free.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-23 23:41                                       ` Jan D.
  2003-05-24  0:31                                         ` Miles Bader
@ 2003-05-25 18:01                                         ` Richard Stallman
  1 sibling, 0 replies; 49+ messages in thread
From: Richard Stallman @ 2003-05-25 18:01 UTC (permalink / raw)
  Cc: emacs-devel, jas, ding, handa

    If I start a GTK application in a directory with a name in a non-Unicode
    charset and then bring up the GTK file selection widget, the application
    will crash.  GTK considers this an error by the user for not having file
    and directory names in Unicode.  I think this situation is similar.

If that is true, it is rather unfriendly on the part of GTK.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-25 16:32                                     ` Dave Love
@ 2003-05-25 19:14                                       ` Jan D.
  2003-05-30  9:23                                         ` Dave Love
  0 siblings, 1 reply; 49+ messages in thread
From: Jan D. @ 2003-05-25 19:14 UTC (permalink / raw)
  Cc: emacs-devel, rms, ding, jas

> "Jan D." <jan.h.d@swipnet.se> writes:
>
>> If UTF8_STRING is the recommended thing to use,
>
> Recommended by who, though?  As far as I know, it's registered for X,
> but that's all.  Does anything other than recent Xfree86 releases even
> support it?

The free desktop specifications (www.freedesktop.org).  The X server 
does
not have to support a new selection type.  It is all transparent to the
server, only the clients need to interoperate.  The core Xlib does not
have to support it either.  Clients can freely invent new selection 
types
without any modification to X servers or its libraries.

I guess things like xterm in XFree86 is modified to support UTF8_STRING,
and that is what is meant by "support".

	Jan D.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-25 16:38                                       ` Dave Love
  2003-05-25 17:25                                         ` Eli Zaretskii
@ 2003-05-26 13:49                                         ` Richard Stallman
  2003-05-30  9:28                                           ` Dave Love
  1 sibling, 1 reply; 49+ messages in thread
From: Richard Stallman @ 2003-05-26 13:49 UTC (permalink / raw)
  Cc: handa, ding, jas, emacs-devel

      Of course, something in XFree86 and/or gtk disobeys the
    CTEXT spec regarding them anyhow, which is another thing that should
    be fixed.

If someone gives me a suitable explanation of what needs to be fixed
and in which program, I can try asking its maintainers to fix it.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-25 17:25                                         ` Eli Zaretskii
@ 2003-05-30  8:39                                           ` Kenichi Handa
  2003-05-30  9:23                                           ` Dave Love
  1 sibling, 0 replies; 49+ messages in thread
From: Kenichi Handa @ 2003-05-30  8:39 UTC (permalink / raw)
  Cc: d.love, jas, ding, emacs-devel

In article <2950-Sun25May2003202510+0300-eliz@elta.co.il>, "Eli Zaretskii" <eliz@elta.co.il> writes:
>>  From: Dave Love <d.love@dl.ac.uk> Date: Sun, 25 May 2003
>> 17:38:55 +0100
>>  
>>                         the Unicode branch doesn't
>> actually deal with compound text properly, because it
>> doesn't do extended segments, and the released version is
>> at best pretty confused about that and I think not
>> correct.  (Extended segments are not an extension to
>> compound text, they're part of the specification,
>> contrary to what the current code says.)

> The support for extended segments was implemented as an
> extension of ctext to avoid a thorough rewrite of the
> ctext en/decoder.  As you know, the ctext encoder and
> decoder are variants of the iso-2022 en/decoder and are
> handled by the same code (in C).  At the time, Handa-san
> recommended not to touch the iso-2022 code, saying that
> the code was tricky and hard to maintain, and that we
> could inadvertently break something important in the
> process.

I've found that at least we must modify iso-2022 decoder so
that it retains ctext extended segements.   Otherwise, we
can't handle this kind of sequence:

  ESC $ A --GB-SEQ-- ESC % / --EX-SEGENT-- --GB-SEQ-- ESC ( B

Please note that the first ESC $ A must take effect also on
the second GB-SEQ.

So, I've just installed a change in coding.c in addition to
the changes in mule.el to utilize the codinc.c change and to
rename variables property (ICCCM is not relevant to the spec
of CTEXT).

I've also installed the similar change in unicode branch.

---
Ken'ichi HANDA
handa@m17n.org



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-25 17:25                                         ` Eli Zaretskii
  2003-05-30  8:39                                           ` Kenichi Handa
@ 2003-05-30  9:23                                           ` Dave Love
  2003-05-30 11:36                                             ` Kenichi Handa
  2003-06-01 15:40                                             ` Eli Zaretskii
  1 sibling, 2 replies; 49+ messages in thread
From: Dave Love @ 2003-05-30  9:23 UTC (permalink / raw)
  Cc: ding, jas, emacs-devel

"Eli Zaretskii" <eliz@elta.co.il> writes:

> The support for extended segments was implemented as an extension of
> ctext to avoid a thorough rewrite of the ctext en/decoder.  As you
> know, the ctext encoder and decoder are variants of the iso-2022
> en/decoder and are handled by the same code (in C).

I don't see what this has to do with what I said.  The ctext coding
system needs to support extended segments because they are part of the
specification, not an extension of it as the code either says or
implies.  What was implemented is a different coding system, not an
extended version of ctext.  Also, the relevant specification is not
ICCCM, it's CTEXT, and it seems difficult to argue the case for other
people implementing that spec properly if it looks as though the Emacs
version is in ignorance of it.

> The general idea of the current implementation (using post-read and
> pre-write conversions) was also suggested by Handa-san.

I don't know what that's meant to imply, but I doubt it's fair to
blame him for problems with it.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-25 19:14                                       ` Jan D.
@ 2003-05-30  9:23                                         ` Dave Love
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Love @ 2003-05-30  9:23 UTC (permalink / raw)
  Cc: emacs-devel, rms, ding, jas

"Jan D." <jan.h.d@swipnet.se> writes:

> The X server does not have to support a new selection type.

Sure.

> It is all transparent to the server, only the clients need to
> interoperate.

But presumably most clients in the X11R6.4 world won't know about
UTF8_STRING.

> The core Xlib does not have to support it either.

Thanks.  I thought the support for it was in Xlib.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-26 13:49                                         ` Richard Stallman
@ 2003-05-30  9:28                                           ` Dave Love
  0 siblings, 0 replies; 49+ messages in thread
From: Dave Love @ 2003-05-30  9:28 UTC (permalink / raw)
  Cc: handa, ding, jas, emacs-devel

Richard Stallman <rms@gnu.org> writes:

>       Of course, something in XFree86 and/or gtk disobeys the
>     CTEXT spec regarding them anyhow, which is another thing that should
>     be fixed.
>
> If someone gives me a suitable explanation of what needs to be fixed
> and in which program, I can try asking its maintainers to fix it.

I don't know exactly what and where, and I doubt it will get changed
as it's already well-known, I think.  However:  Extended segments get
used for standard charsets like Latin-9, contrary to the CTEXT spec:

  Extended segments are not to be used for any character set encoding that can
  be constructed from a GL/GR pair of approved standard encodings. For
  example, it is incorrect to use an extended segment for any of the ISO 8859
  family of encodings.

[The CTEXT spec implies that Emacs is wrong _not_ to use extended
segments for private charsets, but the text of the spec is rather
ambiguous.  I couldn't locate its author to check its intent.]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-30  9:23                                           ` Dave Love
@ 2003-05-30 11:36                                             ` Kenichi Handa
  2003-06-04 22:01                                               ` Dave Love
  2003-06-01 15:40                                             ` Eli Zaretskii
  1 sibling, 1 reply; 49+ messages in thread
From: Kenichi Handa @ 2003-05-30 11:36 UTC (permalink / raw)
  Cc: eliz, jas, ding, emacs-devel

In article <rzq4r3cdbzq.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes:

> "Eli Zaretskii" <eliz@elta.co.il> writes:
>>  The support for extended segments was implemented as an
>> extension of ctext to avoid a thorough rewrite of the
>> ctext en/decoder.  As you know, the ctext encoder and
>> decoder are variants of the iso-2022 en/decoder and are
>> handled by the same code (in C).

> I don't see what this has to do with what I said.  The
> ctext coding system needs to support extended segments
> because they are part of the specification, 

The ctext coding system at least should not treat extended
segments as a part of "Standard Character Set Encodings".
So, I commited a change to coding.c.  But, ctext itself
doesn't have to support it, i.e., decode it as the sender's
intention.  It's impossible to know about all possible
encoding names that will be used in the extended segment.

> not an extension of it as the code either says or implies.
> What was implemented is a different coding system, not an
> extended version of ctext.

Surely it's not.  ctext and compound-text-with-extensions
encode text differently.  But, I don't think
compound-text-with-extensions implies an extended version of
ctext.  The "-with-extensions" part of the name just means
"that uses extended segments".  Is it a problem?

> Also, the relevant specification is not ICCCM, it's CTEXT,

Yes.  I fixed the comment and the variable.  I should have
noticed it from the first.

>>  The general idea of the current implementation (using
>> post-read and pre-write conversions) was also suggested
>> by Handa-san.

> I don't know what that's meant to imply, but I doubt it's
> fair to blame him for problems with it.

I don't feel like I'm blamed by anyone, am I too unblushing? :-p
And, I appreciate Eli's original work because it saved my
time when I was extremely busy.   I eventually fixed the
code, but even for that, the existence of the original work
saved lots of my time.

By the way, I noticed this change of yours.

2002-09-11  Dave Love  <fx@gnu.org>

	* international/mule.el (non-standard-designations-alist)
	(ctext-pre-write-conversion): Don't generate invalid extended
	segments for iso8859.

I agree with this change.  I remeber we had been discussed
on it.  I'm sorry that I didn't react to it by myself.  I've
just fixed emacs-unicode by the same way.  If one really
want to encode iso-8859-X by using an extended segment, he
can modify non-standard-designations-alist (now the name is
changed to ctext-non-standard-designations-alist).

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-25 16:31                                   ` Dave Love
@ 2003-05-30 12:03                                     ` Kenichi Handa
  2003-06-04 21:52                                       ` Dave Love
  0 siblings, 1 reply; 49+ messages in thread
From: Kenichi Handa @ 2003-05-30 12:03 UTC (permalink / raw)
  Cc: rms, emacs-devel, jas, ding

In article <rzqbrxrj8e0.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes:

> Kenichi Handa <handa@m17n.org> writes:
>>  So, for instance, if iso-8859-2 characters arrive at
>> Emacs with UTF8_STRING, they are decoded into the charset
>> mule-unicode-0100-24ff and treated differently (e.g. in
>> searching) than the characters of the charset iso-8859-2.

> That's actually customizable if it's a real problem.

Currently, even if we customize utf-fragment-on-decoding to
t, iso-8859-2 chars encoded in utf-8 can't be decoded into
latin-iso8859-2 charset because utf-fragmentation-table
contain only Greek and Cyrillic chars.

---
Ken'ichi HANDA
handa@m17n.org




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-30  9:23                                           ` Dave Love
  2003-05-30 11:36                                             ` Kenichi Handa
@ 2003-06-01 15:40                                             ` Eli Zaretskii
  2003-06-04 22:04                                               ` Dave Love
  1 sibling, 1 reply; 49+ messages in thread
From: Eli Zaretskii @ 2003-06-01 15:40 UTC (permalink / raw)
  Cc: ding, jas, emacs-devel

> From: Dave Love <d.love@dl.ac.uk>
> Date: Fri, 30 May 2003 10:23:21 +0100
> 
> "Eli Zaretskii" <eliz@elta.co.il> writes:
> 
> > The support for extended segments was implemented as an extension of
> > ctext to avoid a thorough rewrite of the ctext en/decoder.  As you
> > know, the ctext encoder and decoder are variants of the iso-2022
> > en/decoder and are handled by the same code (in C).
> 
> I don't see what this has to do with what I said.

It was intended to explain why the code was built on top of the
existing ctext en/decoder.  I'm sorry it wasn't helpful, but given the
attitude, perhaps I shouldn't be surprised.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-30 12:03                                     ` Kenichi Handa
@ 2003-06-04 21:52                                       ` Dave Love
  2003-06-05  1:36                                         ` Kenichi Handa
  0 siblings, 1 reply; 49+ messages in thread
From: Dave Love @ 2003-06-04 21:52 UTC (permalink / raw)
  Cc: rms, emacs-devel, jas, ding

Kenichi Handa <handa@m17n.org> writes:

> Currently, even if we customize utf-fragment-on-decoding to
> t, iso-8859-2 chars encoded in utf-8 can't be decoded into
> latin-iso8859-2 charset because utf-fragmentation-table
> contain only Greek and Cyrillic chars.

But you can add what you like to it; I didn't mean you could currently
use Custom to do it.  One could make a case for the table being
modified according to the locale, for instance.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-30 11:36                                             ` Kenichi Handa
@ 2003-06-04 22:01                                               ` Dave Love
  2003-06-05  1:16                                                 ` Kenichi Handa
  2003-06-11 12:33                                                 ` Stephen J. Turnbull
  0 siblings, 2 replies; 49+ messages in thread
From: Dave Love @ 2003-06-04 22:01 UTC (permalink / raw)
  Cc: eliz, jas, ding, emacs-devel

Kenichi Handa <handa@m17n.org> writes:

> But, ctext itself doesn't have to support it, i.e., decode it as the
> sender's intention.

But then you might as well ignore extended segments entirely, and I
assume it must decode it if the name for the segment is registered.
However, the CTEXT spec says that you must use extended segments
for private charsets.

> It's impossible to know about all possible
> encoding names that will be used in the extended segment.

Sure.  I was holding off changes in this area until I convinced myself
what is the best way to do the heuristic conversion between external
charset names and Emacs names.  (Sorry, I could have saved you the
work.)  At least you have a chance of interpreting the names, but you
can't know anything about private charset definitions, even if they
were allowed.  Extended segment names are supposed to be registered
and follow font encoding names, of course.

> Surely it's not.  ctext and compound-text-with-extensions
> encode text differently.  But, I don't think
> compound-text-with-extensions implies an extended version of
> ctext.

It does to me, and that was clearly intended.  It has been changed
recently, but in my Emacs it says:

x -- compound-text-with-extensions (alias: x-ctext-with-extensions ctext-with-extensions)
  Compound text encoding with ICCCM Extended Segment extensions.

and the NEWS entry says only some versions of X use extended segments.

Giving the impression of not following the CTEXT spec can't help with
trying to persuade someone else to fix their problems, as I hope you
can do.

Anyhow the point is that whatever's called compound-text should deal
with extended segments.

> 2002-09-11  Dave Love  <fx@gnu.org>
>
> 	* international/mule.el (non-standard-designations-alist)
> 	(ctext-pre-write-conversion): Don't generate invalid extended
> 	segments for iso8859.
>
> I agree with this change.

[It was essential for people in Latin-9 locales not on recent
XFree86/Gtk systems.]

> If one really want to encode iso-8859-X by using an extended
> segment, he can modify non-standard-designations-alist

But that violates the specification in the same way that xfree86 (or
gtk or whatever it is) does.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-06-01 15:40                                             ` Eli Zaretskii
@ 2003-06-04 22:04                                               ` Dave Love
  2003-06-06 10:55                                                 ` Eli Zaretskii
  0 siblings, 1 reply; 49+ messages in thread
From: Dave Love @ 2003-06-04 22:04 UTC (permalink / raw)
  Cc: ding, jas, emacs-devel

"Eli Zaretskii" <eliz@elta.co.il> writes:

> It was intended to explain why the code was built on top of the
> existing ctext en/decoder.

But that's not at issue.

> I'm sorry it wasn't helpful, but given the attitude, perhaps I
> shouldn't be surprised.

The only attitude involved is trying to do as well as possible for
multilingual support in fairly trying circumstances.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-06-04 22:01                                               ` Dave Love
@ 2003-06-05  1:16                                                 ` Kenichi Handa
  2003-06-11 12:33                                                 ` Stephen J. Turnbull
  1 sibling, 0 replies; 49+ messages in thread
From: Kenichi Handa @ 2003-06-05  1:16 UTC (permalink / raw)
  Cc: eliz, jas, ding, emacs-devel

In article <rzqwug1ec3p.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes:

> Kenichi Handa <handa@m17n.org> writes:
>>  But, ctext itself doesn't have to support it, i.e., decode it as the
>>  sender's intention.

> But then you might as well ignore extended segments entirely, and I
> assume it must decode it if the name for the segment is registered.
> However, the CTEXT spec says that you must use extended segments
> for private charsets.

>>  It's impossible to know about all possible
>>  encoding names that will be used in the extended segment.

> Sure.  I was holding off changes in this area until I convinced myself
> what is the best way to do the heuristic conversion between external
> charset names and Emacs names.  (Sorry, I could have saved you the
> work.)  At least you have a chance of interpreting the names, but you
> can't know anything about private charset definitions, even if they
> were allowed.  Extended segment names are supposed to be registered
> and follow font encoding names, of course.

I'm sorry but I can't see how, you think, the current ctext
and ctext-with-extensions should be changed.   Could you
give me a concrete proposal?

>>  Surely it's not.  ctext and compound-text-with-extensions
>>  encode text differently.  But, I don't think
>>  compound-text-with-extensions implies an extended version of
>>  ctext.

> It does to me, and that was clearly intended.

Perhaps the last words "-with-extentions" was wrong.  I
thought it can mean "-using-extended-segment".  But, of
course I'm not a native English speaker, thus ...

> It has been changed recently, but in my Emacs it says:

> x -- compound-text-with-extensions (alias: x-ctext-with-extensions ctext-with-extensions)
>   Compound text encoding with ICCCM Extended Segment extensions.

This is already changed to;
    Compound text encoding with extended segments.

> and the NEWS entry says only some versions of X use extended segments.

Isn't it correct?

> Giving the impression of not following the CTEXT spec can't help with
> trying to persuade someone else to fix their problems, as I hope you
> can do.

> Anyhow the point is that whatever's called compound-text should deal
> with extended segments.

If "deal with" means "correctly decode as senders
intention", it's impossible.  If "deal with" just means
"at least don't collapse", now they do.

---
Ken'ichi HANDA
handa@m17n.org




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-06-04 21:52                                       ` Dave Love
@ 2003-06-05  1:36                                         ` Kenichi Handa
  0 siblings, 0 replies; 49+ messages in thread
From: Kenichi Handa @ 2003-06-05  1:36 UTC (permalink / raw)
  Cc: rms, emacs-devel, jas, ding

In article <rzqy90hechy.fsf@albion.dl.ac.uk>, Dave Love <d.love@dl.ac.uk> writes:
> Kenichi Handa <handa@m17n.org> writes:
>>  Currently, even if we customize utf-fragment-on-decoding to
>>  t, iso-8859-2 chars encoded in utf-8 can't be decoded into
>>  latin-iso8859-2 charset because utf-fragmentation-table
>>  contain only Greek and Cyrillic chars.

> But you can add what you like to it; I didn't mean you
> could currently use Custom to do it.

I think writing series of (aset ...) is far from
"customizable".  If there were utf-fragment-charset-list, by
customizing it, people can reflect their preference easily
in utf-fragment-on-decoding mode.

Anyway, it seems that you are suggesting that Emacs should
request UTF8_STRING at first to receive selection data.
Correct?

---
Ken'ichi HANDA
handa@m17n.org



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-06-04 22:04                                               ` Dave Love
@ 2003-06-06 10:55                                                 ` Eli Zaretskii
  0 siblings, 0 replies; 49+ messages in thread
From: Eli Zaretskii @ 2003-06-06 10:55 UTC (permalink / raw)
  Cc: ding, jas, emacs-devel

> From: Dave Love <d.love@dl.ac.uk>
> Date: Wed, 04 Jun 2003 23:04:10 +0100
> 
> "Eli Zaretskii" <eliz@elta.co.il> writes:
> 
> > It was intended to explain why the code was built on top of the
> > existing ctext en/decoder.
> 
> But that's not at issue.

In that case, how about making a point of explaining what's at issue
in a manner that will make it clear to us mere mortals?  Please don't
assume that everybody else can guess what you regard as being ``at
issue'' from a short sentence that disguises more than it reveals.

> > I'm sorry it wasn't helpful, but given the attitude, perhaps I
> > shouldn't be surprised.
> 
> The only attitude involved is trying to do as well as possible for
> multilingual support in fairly trying circumstances.

If that was supposed to be an apology (as in ``sorry, I was
misunderstood''), then I accept it.

[The facts remain that there isn't a piece of Mule-related code in
Emacs I've written during the last few years that was not attacked by
you at some point.  "Carthaginem esse delendam" is an attitude that
comes to mind.]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-06-04 22:01                                               ` Dave Love
  2003-06-05  1:16                                                 ` Kenichi Handa
@ 2003-06-11 12:33                                                 ` Stephen J. Turnbull
  1 sibling, 0 replies; 49+ messages in thread
From: Stephen J. Turnbull @ 2003-06-11 12:33 UTC (permalink / raw)
  Cc: Kenichi Handa

>>>>> "Dave" == Dave Love <d.love@dl.ac.uk> writes:

    Dave> At least you have a chance of interpreting the names, but
    Dave> you can't know anything about private charset definitions,
    Dave> even if they were allowed.

I don't understand what you mean.  There's nothing in the definition
of Compound Text that prohibits use private charset definitions in
extended segments that I can see.  (I don't have a recent X Consortium
version, but the R5 version is word for word identical with the
X11R6.4 version that comes with XFree86 4.2, excepting the exception
for using DOCS for UTF-8, of course.  Aargh.  At least they don't
prohibit use of extended segments for UTF-8.)

Are you referring to use of private charsets via the private final
bytes with regular ISO-2022 designations?

    >> If one really want to encode iso-8859-X by using an extended
    >> segment, he can modify non-standard-designations-alist

    Dave> But that violates the specification in the same way that
    Dave> xfree86 (or gtk or whatever it is) does.

Right.  We can't prevent it, but we should document it as
"discouraged" in the strongest possible terms.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-05-24 23:18                               ` Richard Stallman
@ 2003-06-13  8:37                                 ` Kenichi Handa
  2003-06-15 15:59                                   ` Richard Stallman
  0 siblings, 1 reply; 49+ messages in thread
From: Kenichi Handa @ 2003-06-13  8:37 UTC (permalink / raw)
  Cc: jas

Richard Stallman <rms@gnu.org> writes:
>     (1) Modify selection_data_to_lisp_data (in xselect.c)
> to simply return a unibyte string generated from selection
> data while putting text property `foreign-selection'
> (value is a symbol of type).  This property is to
> distinguish it from the return string of
> x_get_local_selection.  This property is also used to
> decode it properly.

> Would it work to return a cons cell (STRING
> . CODING-SYSTEM)?  That is a little cleaner, I think, than
> using a text property.

I've just found that selection_data_to_lisp_data returns a
cons when a selection data is 32-bit number.  I think it is
better not to give two meanings for a cons type even if we
can distinguish them by checking the type of the car part.
So...

>>     It works if (STRING . RETURN-TYPE) is returned.  But,
>> can we also change x-get-selection to return that cons?

> That would be an incompatible change, but it is in a
> low-level function that I think is not used much by users.
> So I think it is ok to make an incompatible change here.

> I am not certain that approach is better, but if you agree
> it is better, please do it.

My opinion now is that using text-property both in
x-get-selection-internal and x-get-selection is better.  If
you don't have a farther counterargument, I'll install a
change along my original line.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-06-13  8:37                                 ` Kenichi Handa
@ 2003-06-15 15:59                                   ` Richard Stallman
  2003-06-17 11:06                                     ` Kenichi Handa
  0 siblings, 1 reply; 49+ messages in thread
From: Richard Stallman @ 2003-06-15 15:59 UTC (permalink / raw)
  Cc: jas, ding, emacs-devel

    I've just found that selection_data_to_lisp_data returns a
    cons when a selection data is 32-bit number.  I think it is
    better not to give two meanings for a cons type even if we
    can distinguish them by checking the type of the car part.

That is a valid argument.

    My opinion now is that using text-property both in
    x-get-selection-internal and x-get-selection is better.  If
    you don't have a farther counterargument, I'll install a
    change along my original line.

Please do.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: MML charset tag regression
  2003-06-15 15:59                                   ` Richard Stallman
@ 2003-06-17 11:06                                     ` Kenichi Handa
  0 siblings, 0 replies; 49+ messages in thread
From: Kenichi Handa @ 2003-06-17 11:06 UTC (permalink / raw)
  Cc: jas, ding, emacs-devel

In article <E19RZui-0003SC-EG@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
>     I've just found that selection_data_to_lisp_data
> returns a cons when a selection data is 32-bit number.  I
> think it is better not to give two meanings for a cons
> type even if we can distinguish them by checking the type
> of the car part.

> That is a valid argument.

>     My opinion now is that using text-property both in
> x-get-selection-internal and x-get-selection is better.
> If you don't have a farther counterargument, I'll install
> a change along my original line.

> Please do.

Just done.  It is better to declare x-select-request-type
with defcustom, but I don't know how to write :type code.
Someone please work on it.

And, I added this in etc/NEWS. 

Index: NEWS
===================================================================
RCS file: /cvsroot/emacs/emacs/etc/NEWS,v
retrieving revision 1.823
retrieving revision 1.824
diff -u -c -r1.823 -r1.824
cvs server: conflicting specifications of output style
*** NEWS	5 Jun 2003 23:56:32 -0000	1.823
--- NEWS	17 Jun 2003 11:00:53 -0000	1.824
***************
*** 364,369 ****
--- 364,374 ----
  X selections.  If you don't want this support, set
  `selection-coding-system' to `compound-text'.
  
+ ** The new variable `x-select-request-type' controls how Emacs
+ requests X selection.  The default value is nil, which means that
+ Emacs requests X selection with types COMPOUND_TEXT and UTF8_STRING,
+ and use the more appropriately result.
+ 
  +++
  ** The parameters of automatic hscrolling can now be customized.
  The variable `hscroll-margin' determines how many columns away from

---
Ken'ichi HANDA
handa@m17n.org



^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2003-06-17 11:06 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <iluk7djncdf.fsf@latte.josefsson.org>
     [not found] ` <8465p3kgpl.fsf@lucy.is.informatik.uni-duisburg.de>
     [not found]   ` <iluvfx3lcvu.fsf@latte.josefsson.org>
     [not found]     ` <84bryuogke.fsf@lucy.is.informatik.uni-duisburg.de>
     [not found]       ` <iluptnaitaf.fsf@latte.josefsson.org>
     [not found]         ` <m34r4md5hd.fsf@defun.localdomain>
     [not found]           ` <iluhe8minvx.fsf@latte.josefsson.org>
     [not found]             ` <ilu8yty35eo.fsf@latte.josefsson.org>
     [not found]               ` <m3y91y9xjr.fsf@defun.localdomain>
     [not found]                 ` <ilubryuyty8.fsf@latte.josefsson.org>
2003-04-26 10:50                   ` MML charset tag regression James H. Cloos Jr.
2003-04-28 11:58                     ` Kenichi Handa
2003-04-28 12:43                       ` Stephen J. Turnbull
2003-04-28 12:59                         ` Kenichi Handa
2003-04-28 23:05                       ` Simon Josefsson
2003-04-29  7:12                         ` Stephen J. Turnbull
2003-04-29  5:38                       ` Richard Stallman
2003-05-20 12:47                         ` Kenichi Handa
2003-05-20 19:42                           ` Jan D.
2003-05-21 15:31                             ` Richard Stallman
2003-05-21 16:23                               ` Jan D.
2003-05-22  0:58                                 ` Kenichi Handa
2003-05-22 16:25                                   ` Jan D.
2003-05-23  1:33                                     ` Kenichi Handa
2003-05-23  7:45                                       ` David Kastrup
2003-05-23 22:48                                     ` Richard Stallman
2003-05-23 23:41                                       ` Jan D.
2003-05-24  0:31                                         ` Miles Bader
2003-05-25 16:40                                           ` Dave Love
2003-05-25 18:01                                         ` Richard Stallman
2003-05-25 16:38                                       ` Dave Love
2003-05-25 17:25                                         ` Eli Zaretskii
2003-05-30  8:39                                           ` Kenichi Handa
2003-05-30  9:23                                           ` Dave Love
2003-05-30 11:36                                             ` Kenichi Handa
2003-06-04 22:01                                               ` Dave Love
2003-06-05  1:16                                                 ` Kenichi Handa
2003-06-11 12:33                                                 ` Stephen J. Turnbull
2003-06-01 15:40                                             ` Eli Zaretskii
2003-06-04 22:04                                               ` Dave Love
2003-06-06 10:55                                                 ` Eli Zaretskii
2003-05-26 13:49                                         ` Richard Stallman
2003-05-30  9:28                                           ` Dave Love
2003-05-25 16:32                                     ` Dave Love
2003-05-25 19:14                                       ` Jan D.
2003-05-30  9:23                                         ` Dave Love
2003-05-23 12:05                                   ` Richard Stallman
2003-05-25 16:31                                   ` Dave Love
2003-05-30 12:03                                     ` Kenichi Handa
2003-06-04 21:52                                       ` Dave Love
2003-06-05  1:36                                         ` Kenichi Handa
2003-05-25 16:27                                 ` Dave Love
2003-05-24  0:51                               ` Kenichi Handa
2003-05-23 12:03                           ` Richard Stallman
2003-05-23 12:21                             ` Kenichi Handa
2003-05-24 23:18                               ` Richard Stallman
2003-06-13  8:37                                 ` Kenichi Handa
2003-06-15 15:59                                   ` Richard Stallman
2003-06-17 11:06                                     ` Kenichi Handa

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).