* Cut buffers and character encoding
@ 2006-11-09 7:39 Romain Francoise
2006-11-09 19:10 ` Jan D.
0 siblings, 1 reply; 17+ messages in thread
From: Romain Francoise @ 2006-11-09 7:39 UTC (permalink / raw)
Hi,
I received a bug report about Emacs 22.0.90 stating that Emacs doesn't
do charset conversion when receiving text from a cut buffer. From the
report:
,----
| When I paste the cut buffer in an Emacs window in UTF-8 locales, Emacs
| doesn't do any charset conversion. This problem occurs with both X and
| GTK versions.
|
| To reproduce the problem:
| 1. In UTF-8 locales: emacs -q
| 2. Open an xterm.
| 3. In the xterm, type 'éèê'.
| 4. Select 'éèê' in the xterm.
| 5. Quit the xterm (now, 'éèê' is no longer in the primary selection,
| only in the cut buffer, which Emacs supports).
| 6. Paste in Emacs (middle mouse button).
|
| I get:
|
| \351\350\352
|
| instead of:
|
| éèê
`----
This is in apparent contradiction to what the docstring of the
`selection-coding-system' variable says:
,----[ C-h v selection-coding-system RET ]
| Documentation:
| Coding system for communicating with other X clients.
| When sending or receiving text via cut_buffer, selection, and clipboard,
| the text is encoded or decoded by this coding system.
`----
Using xcutsel to move the cut buffer back to a primary selection shows
that the content itself is fine, so the problem lies with Emacs.
More info here: http://bugs.debian.org/397447
Thanks,
--
Romain Francoise <romain@orebokech.com> | The sea! the sea! the open
it's a miracle -- http://orebokech.com/ | sea! The blue, the fresh, the
| ever free! --Bryan W. Procter
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Cut buffers and character encoding
2006-11-09 7:39 Cut buffers and character encoding Romain Francoise
@ 2006-11-09 19:10 ` Jan D.
2006-11-09 20:56 ` Romain Francoise
2006-11-10 18:41 ` Richard Stallman
0 siblings, 2 replies; 17+ messages in thread
From: Jan D. @ 2006-11-09 19:10 UTC (permalink / raw)
Cc: emacs-devel
Romain Francoise skrev:
> Hi,
>
> I received a bug report about Emacs 22.0.90 stating that Emacs doesn't
> do charset conversion when receiving text from a cut buffer. From the
> report:
>
> ,----
> | When I paste the cut buffer in an Emacs window in UTF-8 locales, Emacs
> | doesn't do any charset conversion. This problem occurs with both X and
> | GTK versions.
> |
> | To reproduce the problem:
> | 1. In UTF-8 locales: emacs -q
> | 2. Open an xterm.
> | 3. In the xterm, type 'éèê'.
> | 4. Select 'éèê' in the xterm.
> | 5. Quit the xterm (now, 'éèê' is no longer in the primary selection,
> | only in the cut buffer, which Emacs supports).
> | 6. Paste in Emacs (middle mouse button).
> |
> | I get:
> |
> | \351\350\352
> |
> | instead of:
> |
> | éèê
> `----
>
> This is in apparent contradiction to what the docstring of the
> `selection-coding-system' variable says:
>
> ,----[ C-h v selection-coding-system RET ]
> | Documentation:
> | Coding system for communicating with other X clients.
> | When sending or receiving text via cut_buffer, selection, and clipboard,
> | the text is encoded or decoded by this coding system.
> `----
>
The text encoding for cut buffers are defined to be ISO-Latin-1, so
selection-coding-systemshould not have any effect. That said, we could decode
data from cut buffers from Latin-1 and encode to Latin-1 when putting data in
there.
But cut buffers are obsolete anyway, so I vote for just fixing the
documentation and leave it as is.
Other suggestions?
Jan D.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Cut buffers and character encoding
2006-11-09 19:10 ` Jan D.
@ 2006-11-09 20:56 ` Romain Francoise
2006-11-10 7:42 ` Jan D.
2006-11-10 18:41 ` Richard Stallman
1 sibling, 1 reply; 17+ messages in thread
From: Romain Francoise @ 2006-11-09 20:56 UTC (permalink / raw)
Cc: emacs-devel
"Jan D." <jan.h.d@swipnet.se> writes:
> The text encoding for cut buffers are defined to be ISO-Latin-1, so
> selection-coding-system should not have any effect. That said, we
> could decode data from cut buffers from Latin-1 and encode to Latin-1
> when putting data in there.
Ah, thanks, you put me on the right track.
Emacs *does* decode the contents of the cut buffer in the
`x-cut-buffer-or-selection-value' function, but it tries to decode them
using `locale-coding-system' which is wrong if the locale is a UTF-8
locale...
The following patch fixes the problem for me, and if cut buffers are
*always* iso-latin-1 then it should be the right thing. WDYT?
Index: lisp/term/x-win.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/term/x-win.el,v
retrieving revision 1.194
diff -c -r1.194 x-win.el
*** lisp/term/x-win.el 18 Oct 2006 14:05:02 -0000 1.194
--- lisp/term/x-win.el 9 Nov 2006 20:54:47 -0000
***************
*** 2346,2353 ****
(t
(setq x-last-selected-text-cut-encoded cut-text
x-last-selected-text-cut
! (decode-coding-string cut-text (or locale-coding-system
! 'iso-latin-1))))))
;; As we have done one selection, clear this now.
(setq next-selection-coding-system nil)
--- 2346,2352 ----
(t
(setq x-last-selected-text-cut-encoded cut-text
x-last-selected-text-cut
! (decode-coding-string cut-text 'iso-latin-1)))))
;; As we have done one selection, clear this now.
(setq next-selection-coding-system nil)
--
Romain Francoise <romain@orebokech.com> | The sea! the sea! the open
it's a miracle -- http://orebokech.com/ | sea! The blue, the fresh, the
| ever free! --Bryan W. Procter
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Cut buffers and character encoding
2006-11-09 20:56 ` Romain Francoise
@ 2006-11-10 7:42 ` Jan D.
2006-11-10 10:24 ` Kenichi Handa
0 siblings, 1 reply; 17+ messages in thread
From: Jan D. @ 2006-11-10 7:42 UTC (permalink / raw)
Cc: emacs-devel
Romain Francoise skrev:
> "Jan D." <jan.h.d@swipnet.se> writes:
>
>> The text encoding for cut buffers are defined to be ISO-Latin-1, so
>> selection-coding-system should not have any effect. That said, we
>> could decode data from cut buffers from Latin-1 and encode to Latin-1
>> when putting data in there.
>
> Ah, thanks, you put me on the right track.
>
> Emacs *does* decode the contents of the cut buffer in the
> `x-cut-buffer-or-selection-value' function, but it tries to decode them
> using `locale-coding-system' which is wrong if the locale is a UTF-8
> locale...
>
> The following patch fixes the problem for me, and if cut buffers are
> *always* iso-latin-1 then it should be the right thing. WDYT?
I've committed this change and the corresponding when writing to a cut buffer.
I also changed the documentation you pointed out was wrong.
Jan D.
> Index: lisp/term/x-win.el
> ===================================================================
> RCS file: /cvsroot/emacs/emacs/lisp/term/x-win.el,v
> retrieving revision 1.194
> diff -c -r1.194 x-win.el
> *** lisp/term/x-win.el 18 Oct 2006 14:05:02 -0000 1.194
> --- lisp/term/x-win.el 9 Nov 2006 20:54:47 -0000
> ***************
> *** 2346,2353 ****
> (t
> (setq x-last-selected-text-cut-encoded cut-text
> x-last-selected-text-cut
> ! (decode-coding-string cut-text (or locale-coding-system
> ! 'iso-latin-1))))))
>
> ;; As we have done one selection, clear this now.
> (setq next-selection-coding-system nil)
> --- 2346,2352 ----
> (t
> (setq x-last-selected-text-cut-encoded cut-text
> x-last-selected-text-cut
> ! (decode-coding-string cut-text 'iso-latin-1)))))
>
> ;; As we have done one selection, clear this now.
> (setq next-selection-coding-system nil)
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Cut buffers and character encoding
2006-11-10 7:42 ` Jan D.
@ 2006-11-10 10:24 ` Kenichi Handa
2006-11-10 13:39 ` Romain Francoise
0 siblings, 1 reply; 17+ messages in thread
From: Kenichi Handa @ 2006-11-10 10:24 UTC (permalink / raw)
Cc: romain, emacs-devel
In article <45542D63.5070402@swipnet.se>, "Jan D." <jan.h.d@swipnet.se> writes:
> Romain Francoise skrev:
> > "Jan D." <jan.h.d@swipnet.se> writes:
> >
>>> The text encoding for cut buffers are defined to be ISO-Latin-1, so
>>> selection-coding-system should not have any effect. That said, we
>>> could decode data from cut buffers from Latin-1 and encode to Latin-1
>>> when putting data in there.
> >
> > Ah, thanks, you put me on the right track.
> >
> > Emacs *does* decode the contents of the cut buffer in the
> > `x-cut-buffer-or-selection-value' function, but it tries to decode them
> > using `locale-coding-system' which is wrong if the locale is a UTF-8
> > locale...
> >
> > The following patch fixes the problem for me, and if cut buffers are
> > *always* iso-latin-1 then it should be the right thing. WDYT?
> I've committed this change and the corresponding when writing to a cut buffer.
> I also changed the documentation you pointed out was wrong.
I vaguely remember that I changed cut-buffer decoding to use
locale-coding-system (if any) instead of iso-8859-1 upon a
bug report from someone. He claimed that many X
applications store a data encoded by the current locale in
cut-buffer (even if that doesn't conform to ICCCM), thus it
is better that Emacs also decodes it by the coding system
specified by the locale. Even xterm, when run under, for
instance, cs_CS.ISO8859-2 locale, stores ISO8859-2
characters as is in cut buffer.
In the case of the origianl bug report, "éèê" is actually
tried to be decoded by utf-8 and failed.
Anyway, the documentation bug of selection-coding-system
should be fixed. But, it may be good to use
next-selection-coding-system even for cut buffer if it is
set temporarily by C-x RET X.
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Cut buffers and character encoding
2006-11-10 10:24 ` Kenichi Handa
@ 2006-11-10 13:39 ` Romain Francoise
2006-11-11 12:40 ` Kenichi Handa
0 siblings, 1 reply; 17+ messages in thread
From: Romain Francoise @ 2006-11-10 13:39 UTC (permalink / raw)
Cc: Jan D., emacs-devel
Kenichi Handa <handa@m17n.org> writes:
> I vaguely remember that I changed cut-buffer decoding to use
> locale-coding-system (if any) instead of iso-8859-1 upon a
> bug report from someone. He claimed that many X
> applications store a data encoded by the current locale in
> cut-buffer (even if that doesn't conform to ICCCM), thus it
> is better that Emacs also decodes it by the coding system
> specified by the locale. Even xterm, when run under, for
> instance, cs_CS.ISO8859-2 locale, stores ISO8859-2
> characters as is in cut buffer.
Would it be feasible to use `detect-coding-string' to decide which
encoding to use?
--
Romain Francoise <romain@orebokech.com> | The sea! the sea! the open
it's a miracle -- http://orebokech.com/ | sea! The blue, the fresh, the
| ever free! --Bryan W. Procter
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Cut buffers and character encoding
2006-11-09 19:10 ` Jan D.
2006-11-09 20:56 ` Romain Francoise
@ 2006-11-10 18:41 ` Richard Stallman
1 sibling, 0 replies; 17+ messages in thread
From: Richard Stallman @ 2006-11-10 18:41 UTC (permalink / raw)
Cc: romain, emacs-devel
The text encoding for cut buffers are defined to be ISO-Latin-1, so
selection-coding-systemshould not have any effect. That said, we could decode
data from cut buffers from Latin-1 and encode to Latin-1 when putting data in
there.
But cut buffers are obsolete anyway, so I vote for just fixing the
documentation and leave it as is.
Since people are evidently still using them, I think we may as well
fix this.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Cut buffers and character encoding
2006-11-10 13:39 ` Romain Francoise
@ 2006-11-11 12:40 ` Kenichi Handa
2006-11-12 5:14 ` Richard Stallman
0 siblings, 1 reply; 17+ messages in thread
From: Kenichi Handa @ 2006-11-11 12:40 UTC (permalink / raw)
Cc: jan.h.d, emacs-devel
In article <87irhn1kvk.fsf@pacem.orebokech.com>, Romain Francoise <romain@orebokech.com> writes:
> Kenichi Handa <handa@m17n.org> writes:
> > I vaguely remember that I changed cut-buffer decoding to use
> > locale-coding-system (if any) instead of iso-8859-1 upon a
> > bug report from someone. He claimed that many X
> > applications store a data encoded by the current locale in
> > cut-buffer (even if that doesn't conform to ICCCM), thus it
> > is better that Emacs also decodes it by the coding system
> > specified by the locale. Even xterm, when run under, for
> > instance, cs_CS.ISO8859-2 locale, stores ISO8859-2
> > characters as is in cut buffer.
> Would it be feasible to use `detect-coding-string' to decide which
> encoding to use?
I think tring locale-coding-system at first, and if that
yields undecoded eight-bit characters, decoding by
iso-8859-1 is good.
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Cut buffers and character encoding
2006-11-11 12:40 ` Kenichi Handa
@ 2006-11-12 5:14 ` Richard Stallman
2006-11-18 2:23 ` Kenichi Handa
0 siblings, 1 reply; 17+ messages in thread
From: Richard Stallman @ 2006-11-12 5:14 UTC (permalink / raw)
Cc: romain, jan.h.d, emacs-devel
I think tring locale-coding-system at first, and if that
yields undecoded eight-bit characters, decoding by
iso-8859-1 is good.
That is not a good idea. locale-coding-system is set based on the
locale, and people set their locales for general reasons, not
specifically about Emacs. It would be undesirable to choose
nonstandard behavior for cut buffers merely because someone normally
uses latin-2, for instance.
It is ok to have a feature to specify nonstandard encoding of
cut-buffers, but it should be a specific feature, which affects
nothing else, and which people will enable only when they specifically
want nonstandard encoding of cut-buffers.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Cut buffers and character encoding
2006-11-12 5:14 ` Richard Stallman
@ 2006-11-18 2:23 ` Kenichi Handa
2006-11-18 13:18 ` Jan Djärv
2006-11-18 16:05 ` Richard Stallman
0 siblings, 2 replies; 17+ messages in thread
From: Kenichi Handa @ 2006-11-18 2:23 UTC (permalink / raw)
Cc: romain, jan.h.d, emacs-devel
In article <E1Gj7ft-0000j7-Gf@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
> I think tring locale-coding-system at first, and if that
> yields undecoded eight-bit characters, decoding by
> iso-8859-1 is good.
> That is not a good idea. locale-coding-system is set based on the
> locale, and people set their locales for general reasons, not
> specifically about Emacs. It would be undesirable to choose
> nonstandard behavior for cut buffers merely because someone normally
> uses latin-2, for instance.
> It is ok to have a feature to specify nonstandard encoding of
> cut-buffers, but it should be a specific feature, which affects
> nothing else, and which people will enable only when they specifically
> want nonstandard encoding of cut-buffers.
Then, I think it is quite natural to allow people to use
next-selection-coding-system by C-x RET X for such a
purpose. That variable is not scrictly specific to
cut-buffer, but it's a just one time setting, and doesn't
affect the further cut&paste operation.
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Cut buffers and character encoding
2006-11-18 2:23 ` Kenichi Handa
@ 2006-11-18 13:18 ` Jan Djärv
2006-11-20 3:23 ` Katsumi Yamaoka
2006-11-18 16:05 ` Richard Stallman
1 sibling, 1 reply; 17+ messages in thread
From: Jan Djärv @ 2006-11-18 13:18 UTC (permalink / raw)
Cc: romain, rms, emacs-devel
Kenichi Handa skrev:
> Then, I think it is quite natural to allow people to use
> next-selection-coding-system by C-x RET X for such a
> purpose. That variable is not scrictly specific to
> cut-buffer, but it's a just one time setting, and doesn't
> affect the further cut&paste operation.
I've checked in that change.
Jan D.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Cut buffers and character encoding
2006-11-18 2:23 ` Kenichi Handa
2006-11-18 13:18 ` Jan Djärv
@ 2006-11-18 16:05 ` Richard Stallman
1 sibling, 0 replies; 17+ messages in thread
From: Richard Stallman @ 2006-11-18 16:05 UTC (permalink / raw)
Cc: romain, jan.h.d, emacs-devel
Then, I think it is quite natural to allow people to use
next-selection-coding-system by C-x RET X for such a
purpose. That variable is not scrictly specific to
cut-buffer, but it's a just one time setting, and doesn't
affect the further cut&paste operation.
I agree.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Cut buffers and character encoding
2006-11-18 13:18 ` Jan Djärv
@ 2006-11-20 3:23 ` Katsumi Yamaoka
2006-11-20 7:44 ` Jan Djärv
0 siblings, 1 reply; 17+ messages in thread
From: Katsumi Yamaoka @ 2006-11-20 3:23 UTC (permalink / raw)
Cc: romain, emacs-devel, rms, Kenichi Handa
>>>>> In <455F081B.8030009@swipnet.se> Jan Djärv wrote:
> I've checked in that change.
When killing and yanking even non-Latin text within Emacs, Emacs
seems to encode and decode the text using iso-latin-1 now.
* term/x-win.el (x-cut-buffer-or-selection-value): Decode text from
cut-buffers with next-selection-coding-system if not nil.
* term/x-win.el (x-select-text, x-cut-buffer-or-selection-value):
Encode/decode text to/from cut buffers to/from iso-latin-1 only.
So, Japanese text are all yanked as ??????. Please test it by
copying text in the HELLO buffer and yanking into another buffer.
Regards,
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Cut buffers and character encoding
2006-11-20 3:23 ` Katsumi Yamaoka
@ 2006-11-20 7:44 ` Jan Djärv
2006-11-20 7:57 ` Katsumi Yamaoka
2006-11-20 23:58 ` Richard Stallman
0 siblings, 2 replies; 17+ messages in thread
From: Jan Djärv @ 2006-11-20 7:44 UTC (permalink / raw)
Cc: romain, rms, Kenichi Handa, emacs-devel
Katsumi Yamaoka skrev:
>>>>>> In <455F081B.8030009@swipnet.se> Jan Djärv wrote:
>
>> I've checked in that change.
>
> When killing and yanking even non-Latin text within Emacs, Emacs
> seems to encode and decode the text using iso-latin-1 now.
>
Ah, I misunderstood the check for newness, please try again (don't forget to
recompile x-win.el).
Jan D.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Cut buffers and character encoding
2006-11-20 7:44 ` Jan Djärv
@ 2006-11-20 7:57 ` Katsumi Yamaoka
2006-11-20 23:58 ` Richard Stallman
1 sibling, 0 replies; 17+ messages in thread
From: Katsumi Yamaoka @ 2006-11-20 7:57 UTC (permalink / raw)
Cc: romain, rms, Kenichi Handa, emacs-devel
>>>>> In <45615CE3.1060207@swipnet.se> Jan Djärv wrote:
>> When killing and yanking even non-Latin text within Emacs, Emacs
>> seems to encode and decode the text using iso-latin-1 now.
> Ah, I misunderstood the check for newness, please try again (don't
> forget to recompile x-win.el).
Good! That works for copying text not only within Emacs but
also between Emacs and other X clients now. Thank you.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Cut buffers and character encoding
2006-11-20 7:44 ` Jan Djärv
2006-11-20 7:57 ` Katsumi Yamaoka
@ 2006-11-20 23:58 ` Richard Stallman
2006-11-21 12:47 ` Jan Djärv
1 sibling, 1 reply; 17+ messages in thread
From: Richard Stallman @ 2006-11-20 23:58 UTC (permalink / raw)
Cc: yamaoka, romain, handa, emacs-devel
> When killing and yanking even non-Latin text within Emacs, Emacs
> seems to encode and decode the text using iso-latin-1 now.
>
Ah, I misunderstood the check for newness, please try again (don't forget to
recompile x-win.el).
Didn't someone say that X11 specifies that the cut buffer is always
supposed to be in latin-1? We changed Emacs so that C-x RET c
would specify another coding system in case you're using one,
but aside from when C-x RET c is used, Emacs should always treat
it as Latin 1. Isn't that so?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Cut buffers and character encoding
2006-11-20 23:58 ` Richard Stallman
@ 2006-11-21 12:47 ` Jan Djärv
0 siblings, 0 replies; 17+ messages in thread
From: Jan Djärv @ 2006-11-21 12:47 UTC (permalink / raw)
Cc: yamaoka, romain, emacs-devel, handa
Richard Stallman skrev:
> > When killing and yanking even non-Latin text within Emacs, Emacs
> > seems to encode and decode the text using iso-latin-1 now.
> >
>
> Ah, I misunderstood the check for newness, please try again (don't forget to
> recompile x-win.el).
>
> Didn't someone say that X11 specifies that the cut buffer is always
> supposed to be in latin-1?
Yes, it is in ICCCM.
> We changed Emacs so that C-x RET c
> would specify another coding system in case you're using one,
> but aside from when C-x RET c is used, Emacs should always treat
> it as Latin 1. Isn't that so?
Yes it is so. The problem was that wen you select something in Emacs,
regardless of coding, Emacs puts it in the cut buffer. And then when pasting
in the same Emacs, Emacs compares the contents of the cut buffer with the last
thing it put there. If they are the same, Emacs assumes that there is no new
stuff in the cut buffer and uses data from its kill ring instead.
I messed up so that the contents from the cut buffer always looked new to
Emacs. But I've fixed that now.
Jan D.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2006-11-21 12:47 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-09 7:39 Cut buffers and character encoding Romain Francoise
2006-11-09 19:10 ` Jan D.
2006-11-09 20:56 ` Romain Francoise
2006-11-10 7:42 ` Jan D.
2006-11-10 10:24 ` Kenichi Handa
2006-11-10 13:39 ` Romain Francoise
2006-11-11 12:40 ` Kenichi Handa
2006-11-12 5:14 ` Richard Stallman
2006-11-18 2:23 ` Kenichi Handa
2006-11-18 13:18 ` Jan Djärv
2006-11-20 3:23 ` Katsumi Yamaoka
2006-11-20 7:44 ` Jan Djärv
2006-11-20 7:57 ` Katsumi Yamaoka
2006-11-20 23:58 ` Richard Stallman
2006-11-21 12:47 ` Jan Djärv
2006-11-18 16:05 ` Richard Stallman
2006-11-10 18:41 ` Richard Stallman
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.