unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#14368: 24.3.50; Big screw: multibyte characters become unibyte
@ 2013-05-08  8:21 Richard Stallman
  2013-05-11 16:49 ` Richard Stallman
  2013-05-24 14:51 ` Handa Kenichi
  0 siblings, 2 replies; 15+ messages in thread
From: Richard Stallman @ 2013-05-08  8:21 UTC (permalink / raw)
  To: 14368


When I use latin-1-postfix to enter characters such
as i-with-acute-accent and inverse-?, as soon as I type
another character they turn into Latin-1 single byte codes.

For instance, when I type i and ', the input method turns that into
i-with-acute-accent; but then my next keystroke turns the
i-with-acute-accent into \355.

This is horrible!

I can't find what is doing it.  It seems to happen no matter what the
next character is -- even M-x does it.  But I can't see this on
post-command-hook.


In GNU Emacs 24.3.50.1 (mips64el-unknown-linux-gnu, GTK+ Version 2.20.1)
 of 2013-05-01 on chiefs-gnewsense
Bzr revision: 112434 juri@jurta.org-20130501081012-n3c351r92cr17lu5
System Description:	Debian GNU/Linux 6.0.6 (squeeze)

Configured using:
 `configure CFLAGS=-g -O0'

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix
  default enable-multibyte-characters: t

Major mode: Dired by date

Minor modes in effect:
  shell-dirtrack-mode: t
  gpm-mouse-mode: t
  tooltip-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  line-number-mode: t
  transient-mark-mode: t
  abbrev-mode: t

Recent input:
n . C-c C-c d d d x C-x b o u t g TAB RET g C-u C-p 
C-p C-p C-p C-p C-o C-n C-o C-x o C-x o d x y e s RET 
C-x b R TAB RET C-x 1 d u d d d d u d x 3 0 p d d d 
x n x C-a C-u C-u C-n C-u C-n C-n C-n C-n C-@ C-n C-n 
C-n r ESC , RET P u e d e SPC s e r SPC e l SPC 4 ? 
C-a ? / DEL DEL C-\ ? / C-e C-c C-c C-n C-@ C-n C-n 
ESC w C-x b o u t g TAB RET g C-p C-p C-p C-o C-x o 
ESC > RET RET ESC , RET C-o S e r i ' DEL DEL C-\ i 
' a SPC p o s i b l e SPC a d n DEL m i t i r SPC a 
SPC o t r o s SPC s i n SPC ESC DEL g r a t u i t a 
m e n t e SPC y SPC n o SPC e n t r e g a r l e s SPC 
n a d a ? C-a ? / C-c C-c C-g C-x C-s C-g ESC x t o 
g g l e SPC e n a TAB RET ESC x ESC p RET C-x C-s RET 
C-x k RET C-x o e C-x 1 C-u C-u C-n C-n C-n C-d C-x 
C-s C-x k RET ESC x b u g SPC g n u SPC e m a DEL DEL 
DEL ESC DEL ESC DEL r e p o r t SPC e m a c s SPC b 
u g RET

Recent messages:
Wrote /home/rms/outgoing/out-56
Sending...done
Mark set [3 times]
Quit
Saving file /home/rms/outgoing/out-56...
Quit
Saving file /home/rms/outgoing/out-56...
Wrote /home/rms/outgoing/out-56
Saving file /home/rms/outgoing/out-56...
Wrote /home/rms/outgoing/out-56

Load-path shadows:
None found.

Features:
(shadow emacsbug quail cal-move cal-menu calendar cal-loaddefs
dired-aux rmailsum grep compile parse-time vc-cvs sgml-mode shell
pcomplete comint ansi-color ring mule-util qp help-mode rmailout
misearch multi-isearch dabbrev mailalias rmailmm message sendmail
format-spec rfc822 mml easymenu mml-sec mm-decode mm-bodies mm-encode
mailabbrev gmm-utils mailheader mail-parse rfc2231 dired t-mouse
time-date rmailedit rmail rfc2047 rfc2045 ietf-drums mm-util
mail-prsvr mail-utils paren cus-start cus-load nadvice advice help-fns
tooltip ediff-hook vc-hooks lisp-float-type mwheel x-win x-dnd
tool-bar dnd fontset image regexp-opt fringe tabulated-list newcomment
lisp-mode register page menu-bar rfn-eshadow timer select scroll-bar
mouse jit-lock font-lock syntax facemenu font-core frame cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev
minibuffer loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote make-network-process
dbusbind inotify dynamic-setting system-font-setting
font-render-setting move-toolbar gtk x-toolkit x multi-tty emacs)

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call






^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#14368: 24.3.50; Big screw: multibyte characters become unibyte
  2013-05-08  8:21 bug#14368: 24.3.50; Big screw: multibyte characters become unibyte Richard Stallman
@ 2013-05-11 16:49 ` Richard Stallman
  2013-05-11 17:17   ` Eli Zaretskii
  2013-05-24 14:51 ` Handa Kenichi
  1 sibling, 1 reply; 15+ messages in thread
From: Richard Stallman @ 2013-05-11 16:49 UTC (permalink / raw)
  To: 14368

I did some debugging and found that the mistaken
replacement of multibyte characters with unibyte
occurs in quail-start-translation.

However, the bug is probably not in quail.el, because quail.el has not
changed since the start of the year, and the last change log entry was
long before that.

Can someone who understands quail please investigate this bug?

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call






^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#14368: 24.3.50; Big screw: multibyte characters become unibyte
  2013-05-11 16:49 ` Richard Stallman
@ 2013-05-11 17:17   ` Eli Zaretskii
  2013-05-11 21:44     ` Richard Stallman
  0 siblings, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2013-05-11 17:17 UTC (permalink / raw)
  To: rms; +Cc: 14368

> Date: Sat, 11 May 2013 12:49:07 -0400
> From: Richard Stallman <rms@gnu.org>
> 
> I did some debugging and found that the mistaken
> replacement of multibyte characters with unibyte
> occurs in quail-start-translation.
> 
> However, the bug is probably not in quail.el, because quail.el has not
> changed since the start of the year, and the last change log entry was
> long before that.
> 
> Can someone who understands quail please investigate this bug?

Can you reproduce it starting with "emacs -Q"?  I tried, but couldn't.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#14368: 24.3.50; Big screw: multibyte characters become unibyte
  2013-05-11 17:17   ` Eli Zaretskii
@ 2013-05-11 21:44     ` Richard Stallman
  2013-05-12  2:51       ` Eli Zaretskii
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Stallman @ 2013-05-11 21:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 14368

    Can you reproduce it starting with "emacs -Q"?

Yes.  I type

emacs -Q
C-\ latin-1-postfix RET
a ' C-a

and it fails

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call






^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#14368: 24.3.50; Big screw: multibyte characters become unibyte
  2013-05-11 21:44     ` Richard Stallman
@ 2013-05-12  2:51       ` Eli Zaretskii
  2013-05-12 16:04         ` Eli Zaretskii
  0 siblings, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2013-05-12  2:51 UTC (permalink / raw)
  To: rms; +Cc: 14368

> Date: Sat, 11 May 2013 17:44:27 -0400
> From: Richard Stallman <rms@gnu.org>
> CC: 14368@debbugs.gnu.org
> 
>     Can you reproduce it starting with "emacs -Q"?
> 
> Yes.  I type
> 
> emacs -Q
> C-\ latin-1-postfix RET
> a ' C-a
> 
> and it fails

It doesn't fail for me, with yesterday's trunk.  C-a just moves to the
beginning of the line, as expected.

Wait, I can reproduce this in a TTY session (the above was a GUI
session).  I will try to look into it.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#14368: 24.3.50; Big screw: multibyte characters become unibyte
  2013-05-12  2:51       ` Eli Zaretskii
@ 2013-05-12 16:04         ` Eli Zaretskii
  2013-05-13 15:50           ` Stefan Monnier
  2013-05-23 17:28           ` Stefan Monnier
  0 siblings, 2 replies; 15+ messages in thread
From: Eli Zaretskii @ 2013-05-12 16:04 UTC (permalink / raw)
  To: Stefan Monnier, Kenichi Handa; +Cc: 14368, rms

> Date: Sun, 12 May 2013 05:51:35 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 14368@debbugs.gnu.org
> 
> > Date: Sat, 11 May 2013 17:44:27 -0400
> > From: Richard Stallman <rms@gnu.org>
> > CC: 14368@debbugs.gnu.org
> > 
> >     Can you reproduce it starting with "emacs -Q"?
> > 
> > Yes.  I type
> > 
> > emacs -Q
> > C-\ latin-1-postfix RET
> > a ' C-a
> > 
> > and it fails
> 
> It doesn't fail for me, with yesterday's trunk.  C-a just moves to the
> beginning of the line, as expected.
> 
> Wait, I can reproduce this in a TTY session (the above was a GUI
> session).  I will try to look into it.

I found the reason, but I don't know enough about quail or input
decoding to suggest a solution.

The reason seems to be this changeset:

  112000: Stefan Monnier 2013-03-11 * src/keyboard.c: Move keyboard decoding to read_key_sequence.

The problem is that we now decode all input that comes from quail
(read_char calls input-method-function, and then read_decoded_char
decodes the result).

However, quail seems to work by deleting some characters from the
buffer, and then reinserting them, possibly after translation, as
instructed by the additional characters you type.  In this case,
typing "a '" inserts á, and quail then waits for another character.
Typing C-a at this point removes á from the buffer, and then sends as
input 2 events: a self-inserting character whose code is 225 decimal
(that's á), followed by the code 1, which is C-a.  (I don't know if
this is how quail is supposed to work; what I described is what I saw
in the debugger.  Perhaps Handa-san could comment on that.)

What happens next is that read_decoded_char attempts to decode 225,
which will cause different results depending on the current keyboard
encoding: on GNU/Linux, we get an 8-bit raw byte \341 (that's octal
for 225), while on Windows with cp862 as the keyboard encoding, I get
ß.  C-a is executed as expected, but the net result is that á was
replaced by something else.

I'm not sure how to fix this cleanly.  One way would be to get quail
to encode the character events it sends, but then we have problems
with un-encodable characters.  Another way would be to somehow detect
that the character comes from quail and refrain from decoding it,
although I always thought that one of the goals of revision 112000 was
precisely to _allow_ decoding characters coming from quail.

Stefan, can you take a look, please?






^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#14368: 24.3.50; Big screw: multibyte characters become unibyte
  2013-05-12 16:04         ` Eli Zaretskii
@ 2013-05-13 15:50           ` Stefan Monnier
  2013-05-23 17:28           ` Stefan Monnier
  1 sibling, 0 replies; 15+ messages in thread
From: Stefan Monnier @ 2013-05-13 15:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 14368, rms

> The reason seems to be this changeset:
>   112000: Stefan Monnier 2013-03-11 * src/keyboard.c: Move keyboard
>   decoding to read_key_sequence.
> The problem is that we now decode all input that comes from quail
> (read_char calls input-method-function, and then read_decoded_char
> decodes the result).
[...]
> Stefan, can you take a look, please?

Your analysis makes a lot of sense.  I'll take a look as soon as I can,
but this week is pretty busy.


        Stefan





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#14368: 24.3.50; Big screw: multibyte characters become unibyte
  2013-05-12 16:04         ` Eli Zaretskii
  2013-05-13 15:50           ` Stefan Monnier
@ 2013-05-23 17:28           ` Stefan Monnier
  2013-05-23 18:55             ` Eli Zaretskii
  1 sibling, 1 reply; 15+ messages in thread
From: Stefan Monnier @ 2013-05-23 17:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 14368, rms

> I found the reason, but I don't know enough about quail or input
> decoding to suggest a solution.

I just installed a patch which should fix it.

During development, I got some weird behavior and a few crashes, but
I was neither able to track them down, nor to reproduce them now, so
I installed the code as is.  Let's hope for the best.


        Stefan





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#14368: 24.3.50; Big screw: multibyte characters become unibyte
  2013-05-23 17:28           ` Stefan Monnier
@ 2013-05-23 18:55             ` Eli Zaretskii
  0 siblings, 0 replies; 15+ messages in thread
From: Eli Zaretskii @ 2013-05-23 18:55 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 14368, rms

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Kenichi Handa <handa@gnu.org>,  rms@gnu.org,  14368@debbugs.gnu.org
> Date: Thu, 23 May 2013 13:28:01 -0400
> 
> > I found the reason, but I don't know enough about quail or input
> > decoding to suggest a solution.
> 
> I just installed a patch which should fix it.

Thanks, it seems to work well for me, both on Windows and on
GNU/Linux.

> During development, I got some weird behavior and a few crashes, but
> I was neither able to track them down, nor to reproduce them now, so
> I installed the code as is.  Let's hope for the best.

Didn't crash for me.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#14368: 24.3.50; Big screw: multibyte characters become unibyte
  2013-05-08  8:21 bug#14368: 24.3.50; Big screw: multibyte characters become unibyte Richard Stallman
  2013-05-11 16:49 ` Richard Stallman
@ 2013-05-24 14:51 ` Handa Kenichi
  2013-05-24 15:31   ` Eli Zaretskii
  2013-05-24 15:34   ` bug#14368: 24.3.50; Big screw: multibyte characters become unibyte Stefan Monnier
  1 sibling, 2 replies; 15+ messages in thread
From: Handa Kenichi @ 2013-05-24 14:51 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 14368, rms

I'm very sorry for the late response on this matter.

In article <83a9o09oc1.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org>
writes:
> However, quail seems to work by deleting some characters from the
> buffer, and then reinserting them, possibly after translation, as
> instructed by the additional characters you type.  In this case,
> typing "a '" inserts á, and quail then waits for another character.
> Typing C-a at this point removes á from the buffer, and then sends as
> input 2 events: a self-inserting character whose code is 225 decimal
> (that's á), followed by the code 1, which is C-a.  (I don't know if
> this is how quail is supposed to work; what I described is what I saw
> in the debugger.  Perhaps Handa-san could comment on that.)

Your analysis is correct.  Quail is an event translator.  It
is designed not to insert a character directly but to
generate proper character events.

> I'm not sure how to fix this cleanly.  One way would be to get quail
> to encode the character events it sends, but then we have problems
> with un-encodable characters.

It is a possible way, but I don't think that is the right
thing.  Making quail encode characters and making the caller
to re-decode them looks like very silly.

> Another way would be to somehow detect
> that the character comes from quail and refrain from decoding it,

It's not only the quail problem.  Currently the handling of
unread-command-events is broken; this does not work correctly on
terminal
   (setq unread-command-events '(?À))

---
Kenichi Handa
handa@gnu.org





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#14368: 24.3.50; Big screw: multibyte characters become unibyte
  2013-05-24 14:51 ` Handa Kenichi
@ 2013-05-24 15:31   ` Eli Zaretskii
  2013-05-25  1:05     ` bug#14368: 24.3.50; Big screw: multibyte characters become Handa Kenichi
  2013-05-24 15:34   ` bug#14368: 24.3.50; Big screw: multibyte characters become unibyte Stefan Monnier
  1 sibling, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2013-05-24 15:31 UTC (permalink / raw)
  To: Handa Kenichi; +Cc: 14368, rms

> From: Handa Kenichi <handa@gnu.org>
> Cc: monnier@iro.umontreal.ca, rms@gnu.org, 14368@debbugs.gnu.org
> Date: Fri, 24 May 2013 10:51:20 -0400
> 
> Currently the handling of unread-command-events is broken; this does
> not work correctly on terminal
>
>  (setq unread-command-events '(?À))

Did you try with the latest trunk?  If so, please explain what doesn't
work with this, because it seems to work for me after Stefan's changes
yesterday (I get À inserted).






^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#14368: 24.3.50; Big screw: multibyte characters become unibyte
  2013-05-24 14:51 ` Handa Kenichi
  2013-05-24 15:31   ` Eli Zaretskii
@ 2013-05-24 15:34   ` Stefan Monnier
  1 sibling, 0 replies; 15+ messages in thread
From: Stefan Monnier @ 2013-05-24 15:34 UTC (permalink / raw)
  To: Handa Kenichi; +Cc: 14368, rms

> It's not only the quail problem.  Currently the handling of
> unread-command-events is broken; this does not work correctly on
> terminal
>    (setq unread-command-events '(?À))

It should work now, with the patch I installed yesterday,


        Stefan





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#14368: 24.3.50; Big screw: multibyte characters become
  2013-05-24 15:31   ` Eli Zaretskii
@ 2013-05-25  1:05     ` Handa Kenichi
  2013-05-25 19:22       ` Richard Stallman
  0 siblings, 1 reply; 15+ messages in thread
From: Handa Kenichi @ 2013-05-25  1:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 14368, rms

In article <838v34s8cj.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org>
writes:
> > Currently the handling of unread-command-events is broken; this does
> > not work correctly on terminal
> >
> >  (setq unread-command-events '(?À))

> Did you try with the latest trunk?

No. I wrote that before readingg Stefan's mail.  I've just tried the
latest code and confirmed that it was fixed.  Thank you.

---
Kenichi Handa
handa@gnu.org





^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#14368: 24.3.50; Big screw: multibyte characters become
  2013-05-25  1:05     ` bug#14368: 24.3.50; Big screw: multibyte characters become Handa Kenichi
@ 2013-05-25 19:22       ` Richard Stallman
  2013-05-26  0:58         ` Stefan Monnier
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Stallman @ 2013-05-25 19:22 UTC (permalink / raw)
  To: Handa Kenichi; +Cc: 14368

I too observe the bug to be fixed now.  Thanks for fixing it.

-- 
Dr Richard Stallman
President, Free Software Foundation
51 Franklin St
Boston MA 02110
USA
www.fsf.org  www.gnu.org
Skype: No way! That's nonfree (freedom-denying) software.
  Use Ekiga or an ordinary phone call






^ permalink raw reply	[flat|nested] 15+ messages in thread

* bug#14368: 24.3.50; Big screw: multibyte characters become
  2013-05-25 19:22       ` Richard Stallman
@ 2013-05-26  0:58         ` Stefan Monnier
  0 siblings, 0 replies; 15+ messages in thread
From: Stefan Monnier @ 2013-05-26  0:58 UTC (permalink / raw)
  To: Richard Stallman; +Cc: 14368-done

> I too observe the bug to be fixed now.  Thanks for fixing it.

Thanks for confirming,


        Stefan





^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-05-26  0:58 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-08  8:21 bug#14368: 24.3.50; Big screw: multibyte characters become unibyte Richard Stallman
2013-05-11 16:49 ` Richard Stallman
2013-05-11 17:17   ` Eli Zaretskii
2013-05-11 21:44     ` Richard Stallman
2013-05-12  2:51       ` Eli Zaretskii
2013-05-12 16:04         ` Eli Zaretskii
2013-05-13 15:50           ` Stefan Monnier
2013-05-23 17:28           ` Stefan Monnier
2013-05-23 18:55             ` Eli Zaretskii
2013-05-24 14:51 ` Handa Kenichi
2013-05-24 15:31   ` Eli Zaretskii
2013-05-25  1:05     ` bug#14368: 24.3.50; Big screw: multibyte characters become Handa Kenichi
2013-05-25 19:22       ` Richard Stallman
2013-05-26  0:58         ` Stefan Monnier
2013-05-24 15:34   ` bug#14368: 24.3.50; Big screw: multibyte characters become unibyte Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).