bug#5235: 23.1; Unibyte keyboard input problem

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#5235: 23.1; Unibyte keyboard input problem
@ 2009-12-16 21:17 Tomasz Zbrożek
  2009-12-17 16:47 ` Jason Rumney
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Tomasz Zbrożek @ 2009-12-16 21:17 UTC (permalink / raw)
  To: bug-gnu-emacs

Hi,
In Emacs 23.1, in unibyte mode (emacs --unibyte) and with windows-1250
coding I can't write Polish chars with right Alt key.  For example right Alt 
+ 'a' gives ^E on the screen. In Emacs 22.3 it works fine (I see polish 
char 'ą'), but there there is other problem that buffer is printed in 
iso-8859 even if I configure Language Environment to use windows-1250.  In 
23.1 with such Language Environment (configured to use cp1250) polish special 
chars read from file are printed correctly (I see them) but I can't write 
them using right Alt key (or even input mode polish-slash).

I checked it on GNU/Linux and also on MS Windows XP (pure NT-Emacs and 
EmacsW32), it's the same problem.

Regards
Tomek

In GNU Emacs 23.1.1 (i686-pc-linux-gnu, GTK+ Version 2.12.9)
 of 2009-08-15 on scianagoryczy
Windowing system distributor `The X.Org Foundation', version 11.0.10400090
configured using `configure  '--with-x-toolkit=gtk''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: pl_PL.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default-enable-multibyte-characters: t

Major mode: Lisp Interaction

Minor modes in effect:
  show-paren-mode: t
  gud-tooltip-mode: t
  global-hl-line-mode: t
  global-auto-revert-mode: t
  display-time-mode: t
  auto-insert-mode: t
  yas/minor-mode: t
  tooltip-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  global-auto-composition-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <menu-bar> <help-menu> 
<send-emacs-bug-report>

Recent messages:
Loading /home/tomek/emacs/color-theme-6.6.0/themes/color-theme-library.el 
(source)...done
Loading autoinsert...done
Loading time...done
Loading autorevert...done
Loading hl-line...done
Loading gud...done
Loading paren...done
Loading which-func...done
For information about GNU Emacs and the GNU system, type C-h C-a.
call-interactively: Text is read-only







^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: bug#5235: 23.1; Unibyte keyboard input problem
  2009-12-16 21:17 bug#5235: 23.1; Unibyte keyboard input problem Tomasz Zbrożek
@ 2009-12-17 16:47 ` Jason Rumney
  2009-12-17 19:25   ` Tomasz Zbrożek
  2010-02-26 20:42 ` Tomasz Zbrożek
  2020-09-14 13:59 ` Lars Ingebrigtsen
  2 siblings, 1 reply; 23+ messages in thread
From: Jason Rumney @ 2009-12-17 16:47 UTC (permalink / raw)
  To: Tomasz Zbrożek, 5235; +Cc: bug-gnu-emacs

Tomasz Zbrożek wrote:
> Hi,
> In Emacs 23.1, in unibyte mode (emacs --unibyte)
Does it work as expected if you remove the --unibyte?





^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: bug#5235: 23.1; Unibyte keyboard input problem
  2009-12-17 16:47 ` Jason Rumney
@ 2009-12-17 19:25   ` Tomasz Zbrożek
  2009-12-24  3:40     ` Stefan Monnier
  0 siblings, 1 reply; 23+ messages in thread
From: Tomasz Zbrożek @ 2009-12-17 19:25 UTC (permalink / raw)
  To: Jason Rumney; +Cc: 5235, bug-gnu-emacs

Thanks for reply!
In multibyte mode (I mean no --unibyte) Emacs 23.1 works great for me :)
I'll try to explain why I need unibyte mode. I'm maintener of a C/C++ source 
code which has comments coded in cp1250 (polish language) but strings in code 
are coded in cp852. So I have two different code pages in source code file. 
This is old source code and it was developed in Windows (that's why comments 
are in cp1250) but is compiled to work on MS-DOS (that's why strings are 
coded in cp852). Of course in multibyte mode I am able to write in these code 
pages (for example reloading file with C-x RET r) but when I select cp1250 to 
save the buffer emacs often tells me that some cp852 coded chars are not able 
to be saved in cp1250 and it wants me to select between raw-text, 
no-conversion and emacs-mule. In this situation I have to enter "cp1250" and 
force Emacs to save buffer in cp1250. So I do not want to write "cp1250" 
again and again when saving buffer to file.. And additionaly I'm not sure 
when I force to save my buffer in cp1250 what's going on exactly with cp852 
coded chars (I noticed both cp1250 and 852 chars are coded ok). 
That's why I decided to use unibyte mode. But as I described I found it's a 
problem with writing polish native chars in unibyte mode in Emacs 23.1. 
In fact I what to change mode when Emacs works, I mean not with --unibyte but 
with set-buffer-multibyte to nil when cpp file is being loaded but it seems 
this function does not work correctly or I do not undestand something.

Here is how I configure Language Environment:
 '(current-language-environment "Polish")
 '(language-info-custom-alist (quote (("Polish" (charset cp1250) 
(coding-system cp1250) (coding-priority cp1250 cp852) (nonascii-translation . 
cp1250) (unibyte-display . cp1250)))))
'(unibyte-display-via-language-environment t)

--
tomek

On Thursday 17 December 2009 17:47:29 Jason Rumney wrote:
> Tomasz Zbrożek wrote:
> > Hi,
> > In Emacs 23.1, in unibyte mode (emacs --unibyte)
>
> Does it work as expected if you remove the --unibyte?

-- 
tomek

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#5235: 23.1; Unibyte keyboard input problem
  2009-12-17 19:25   ` Tomasz Zbrożek
@ 2009-12-24  3:40     ` Stefan Monnier
  2009-12-24 15:21       ` Jason Rumney
  0 siblings, 1 reply; 23+ messages in thread
From: Stefan Monnier @ 2009-12-24  3:40 UTC (permalink / raw)
  To: Tomasz Zbrożek; +Cc: 5235, bug-gnu-emacs

> In multibyte mode (I mean no --unibyte) Emacs 23.1 works great for me :)

--unibyte is deprecated, so rather than try and "fix" it, we want to fix
the problem that caused you to use --unibyte.

> I'll try to explain why I need unibyte mode. I'm maintener of a C/C++
> source  code which has comments coded in cp1250 (polish language) but
> strings in code  are coded in cp852. So I have two different code
> pages in source code file.  This is old source code and it was
> developed in Windows (that's why comments  are in cp1250) but is
> compiled to work on MS-DOS (that's why strings are  coded in cp852).

So what happens if you read those files as binary (i.e. C-x RET
r binary RET)?


        Stefan






^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#5235: 23.1; Unibyte keyboard input problem
  2009-12-24  3:40     ` Stefan Monnier
@ 2009-12-24 15:21       ` Jason Rumney
  2009-12-24 19:27         ` Eli Zaretskii
                           ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Jason Rumney @ 2009-12-24 15:21 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Tomasz Zbrożek, 5235, bug-gnu-emacs

Stefan Monnier wrote:
>> I'll try to explain why I need unibyte mode. I'm maintener of a C/C++
>> source  code which has comments coded in cp1250 (polish language) but
>> strings in code  are coded in cp852. So I have two different code
>> pages in source code file.  This is old source code and it was
>> developed in Windows (that's why comments  are in cp1250) but is
>> compiled to work on MS-DOS (that's why strings are  coded in cp852).
>>     
>
> So what happens if you read those files as binary (i.e. C-x RET
> r binary RET)?
>   

At best, he'd end up silently screwing up his files even further, with 
cp1250, cp852 and now utf-8 encoded characters in them.  More likely he 
would still get prompted when saving, just as if he'd used cp1250 or 
cp852 to read them.

The problem here is the files, not Emacs.  Basically the reason for 
using unibyte is that it allows the user to bury their head in the sand 
and pretend the problem does not exist.

I work on similar files in my day job, with Japanese comments in 
ShiftJIS and Chinese comments in GB2312. An easy method of fixing such 
files would be nice, but the best I can think of would be to provide a 
recode-region function, which would still be too much manual work to be 
worth it to me given that I can barely make sense of the Japanese 
comments and can't make any sense of the Chinese ones. The original 
poster might be more motivated to make use of such a function if it 
existed though.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#5235: 23.1; Unibyte keyboard input problem
  2009-12-24 15:21       ` Jason Rumney
@ 2009-12-24 19:27         ` Eli Zaretskii
  2009-12-25 11:03         ` Tomasz Zbrożek
  2009-12-29 15:43         ` Stefan Monnier
  2 siblings, 0 replies; 23+ messages in thread
From: Eli Zaretskii @ 2009-12-24 19:27 UTC (permalink / raw)
  To: Jason Rumney, 5235

> Date: Thu, 24 Dec 2009 23:21:41 +0800
> From: Jason Rumney <jasonr@gnu.org>
> Cc: Tomasz Zbrożek <scianagoryczy@wp.pl>,
> 	5235@emacsbugs.donarmstrong.com, bug-gnu-emacs@gnu.org
> 
> The problem here is the files, not Emacs.

I'd say, more accurately: the problem is that Emacs does not support
such use-cases.  It would be nice if we did: having comments in one
encoding and strings in another is not such a corner case.

> I work on similar files in my day job, with Japanese comments in 
> ShiftJIS and Chinese comments in GB2312. An easy method of fixing such 
> files would be nice, but the best I can think of would be to provide a 
> recode-region function

We would also need a way to encode different regions differently.
Perhaps adding special text properties to guide the encoding process
would be a way of doing that (we already have charset properties for
similar reasons).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#5235: 23.1; Unibyte keyboard input problem
  2009-12-24 15:21       ` Jason Rumney
  2009-12-24 19:27         ` Eli Zaretskii
@ 2009-12-25 11:03         ` Tomasz Zbrożek
  2009-12-25 11:23           ` Jason Rumney
  2009-12-25 20:42           ` Eli Zaretskii
  2009-12-29 15:43         ` Stefan Monnier
  2 siblings, 2 replies; 23+ messages in thread
From: Tomasz Zbrożek @ 2009-12-25 11:03 UTC (permalink / raw)
  To: Jason Rumney; +Cc: 5235, bug-gnu-emacs

The multibyte mode and its prompts for correct codepage is not problem. I 
think it's definitelty CORRECT behaviour and it's not the case I wanted to 
submit  to you.  
I think that solution for the problem with two code pages in one file is 
unibyte mode.

I started this bug-case to get the answer to the question: why in unibyte mode 
when I try to write in cp1250 I get codes like ^E instead of proper chars in 
buffer ? This behaviour is not correct even when comparing to previous Emacs 
version (22.3). So, my question is how to fix this strange keyboard input 
behaviour in unibyte mode ?

--
tomek

On Thursday 24 December 2009 16:21:41 Jason Rumney wrote:
> Stefan Monnier wrote:
> >> I'll try to explain why I need unibyte mode. I'm maintener of a C/C++
> >> source  code which has comments coded in cp1250 (polish language) but
> >> strings in code  are coded in cp852. So I have two different code
> >> pages in source code file.  This is old source code and it was
> >> developed in Windows (that's why comments  are in cp1250) but is
> >> compiled to work on MS-DOS (that's why strings are  coded in cp852).
> >
> > So what happens if you read those files as binary (i.e. C-x RET
> > r binary RET)?
>
> At best, he'd end up silently screwing up his files even further, with
> cp1250, cp852 and now utf-8 encoded characters in them.  More likely he
> would still get prompted when saving, just as if he'd used cp1250 or
> cp852 to read them.
>
> The problem here is the files, not Emacs.  Basically the reason for
> using unibyte is that it allows the user to bury their head in the sand
> and pretend the problem does not exist.
>
> I work on similar files in my day job, with Japanese comments in
> ShiftJIS and Chinese comments in GB2312. An easy method of fixing such
> files would be nice, but the best I can think of would be to provide a
> recode-region function, which would still be too much manual work to be
> worth it to me given that I can barely make sense of the Japanese
> comments and can't make any sense of the Chinese ones. The original
> poster might be more motivated to make use of such a function if it
> existed though.



-- 
tomek






^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#5235: 23.1; Unibyte keyboard input problem
  2009-12-25 11:03         ` Tomasz Zbrożek
@ 2009-12-25 11:23           ` Jason Rumney
  2009-12-25 11:43             ` Tomasz Zbrożek
  2009-12-26 12:45             ` Tomasz Zbrożek
  2009-12-25 20:42           ` Eli Zaretskii
  1 sibling, 2 replies; 23+ messages in thread
From: Jason Rumney @ 2009-12-25 11:23 UTC (permalink / raw)
  To: Tomasz Zbrożek; +Cc: 5235, bug-gnu-emacs

Tomasz Zbrożek wrote:
> I started this bug-case to get the answer to the question: why in unibyte mode 
> when I try to write in cp1250 I get codes like ^E instead of proper chars in 
> buffer ?

Keyboard input on Windows is Unicode in 23.1.  In previous versions it 
was in the system default codepage.



>  This behaviour is not correct even when comparing to previous Emacs 
> version (22.3). So, my question is how to fix this strange keyboard input 
> behaviour in unibyte mode ?
>   

What is "correct" is undefined in unibyte mode, since unibyte deals with 
bytes, not characters.







^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#5235: 23.1; Unibyte keyboard input problem
  2009-12-25 11:23           ` Jason Rumney
@ 2009-12-25 11:43             ` Tomasz Zbrożek
  2009-12-26 12:45             ` Tomasz Zbrożek
  1 sibling, 0 replies; 23+ messages in thread
From: Tomasz Zbrożek @ 2009-12-25 11:43 UTC (permalink / raw)
  To: Jason Rumney; +Cc: 5235, bug-gnu-emacs

On Friday 25 December 2009 12:23:42 Jason Rumney wrote:
> Tomasz Zbrożek wrote:
> > I started this bug-case to get the answer to the question: why in unibyte
> > mode when I try to write in cp1250 I get codes like ^E instead of proper
> > chars in buffer ?
>
> Keyboard input on Windows is Unicode in 23.1.  In previous versions it
> was in the system default codepage.
Is this why I get '^E' code instead of 'ą' when I press right ALT + 'a' in 
unibyte mode with codepage set to cp1250 (emacs version 23.1) ?
I checked it on Windows and GNU/Linux and it works the same.

Is there possibility to change emacs configuration somehow to get proper 
polish chars when writing in unibyte mode ?



-- 
tomek






^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#5235: 23.1; Unibyte keyboard input problem
  2009-12-25 11:03         ` Tomasz Zbrożek
  2009-12-25 11:23           ` Jason Rumney
@ 2009-12-25 20:42           ` Eli Zaretskii
  1 sibling, 0 replies; 23+ messages in thread
From: Eli Zaretskii @ 2009-12-25 20:42 UTC (permalink / raw)
  To: Tomasz Zbrożek, 5235

> From: Tomasz Zbrożek <scianagoryczy@wp.pl>
> Date: Fri, 25 Dec 2009 12:03:29 +0100
> Cc: 5235@emacsbugs.donarmstrong.com, bug-gnu-emacs@gnu.org
> 
> The multibyte mode and its prompts for correct codepage is not problem. I 
> think it's definitelty CORRECT behaviour and it's not the case I wanted to 
> submit  to you.  
> I think that solution for the problem with two code pages in one file is 
> unibyte mode.

I think Emacs developers are much more motivated to improve the
multibyte mode than to fix the unibyte mode.  I cannot speak for the
head maintainers, but that is certainly my opinion: the unibyte mode
should simply die, as a mode for interactive editing.

You received several suggestions for trying things in multibyte mode.
Perhaps you could try them and see if they allow you to edit your
programs without screwing up the cp852 characters.  If something is
still wrong, please describe the problems here: we are much more
likely to find a solution for multibyte mode editing than for unibyte.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#5235: 23.1; Unibyte keyboard input problem
  2009-12-25 11:23           ` Jason Rumney
  2009-12-25 11:43             ` Tomasz Zbrożek
@ 2009-12-26 12:45             ` Tomasz Zbrożek
  2009-12-26 14:30               ` Eli Zaretskii
  1 sibling, 1 reply; 23+ messages in thread
From: Tomasz Zbrożek @ 2009-12-26 12:45 UTC (permalink / raw)
  To: Jason Rumney; +Cc: 5235

[-- Attachment #1: Type: text/plain, Size: 1660 bytes --]


>I think Emacs developers are much more motivated to improve the
>multibyte mode than to fix the unibyte mode.  I cannot speak for the
>head maintainers, but that is certainly my opinion: the unibyte mode
>should simply die, as a mode for interactive editing.
ok, I will not use unibyte mode :)

>You received several suggestions for trying things in multibyte mode.
>Perhaps you could try them and see if they allow you to edit your
>programs without screwing up the cp852 characters.  If something is
>still wrong, please describe the problems here: we are much more
>likely to find a solution for multibyte mode editing than for unibyte.
so, my only problem (in multibyte mode) is annoying question for safe coding 
when saving buffer, I attach a new screenshot:
- on the most upper buffer you see my file which has originally some cp1250 
chars and also cp852 chars, 
- on the middle buffer you see that I have cp1250 set to save this buffer, 
- and below there is a buffer with information that there is no possibility to 
encode \210 char (originally cp852) to cp1250 (because cp1250 is my codepage 
to save, but of course after saving this char in the file should be cp852 
coded and it will be when I force cp1250 - this is ok)

I can't find any way to force emacs not to prompt me with codepage selection, 
I understand emacs treats it like an error (in his opinion \210 char is 
wrong) but I would like to set somehow that cp1250 is safe,
"-*- coding: cp1250 -*-" or modify-coding-system-alist function is not 
solution

my only question is: how to configure emacs to omit this codepage selection in 
such situation? 

I would be thankful for help!

[-- Attachment #2: emacs.png --]
[-- Type: image/png, Size: 160570 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#5235: 23.1; Unibyte keyboard input problem
  2009-12-26 12:45             ` Tomasz Zbrożek
@ 2009-12-26 14:30               ` Eli Zaretskii
  0 siblings, 0 replies; 23+ messages in thread
From: Eli Zaretskii @ 2009-12-26 14:30 UTC (permalink / raw)
  To: Tomasz Zbrożek; +Cc: 5235

> From: Tomasz Zbrożek <scianagoryczy@wp.pl>
> Date: Sat, 26 Dec 2009 13:45:53 +0100
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,
>  5235@emacsbugs.donarmstrong.com,
>  Eli Zaretskii <eliz@gnu.org>
> 
> encode \210 char (originally cp852) to cp1250 (because cp1250 is my codepage 
> to save, but of course after saving this char in the file should be cp852 
> coded and it will be when I force cp1250 - this is ok)
> 
> I can't find any way to force emacs not to prompt me with codepage selection, 
> I understand emacs treats it like an error (in his opinion \210 char is 
> wrong) but I would like to set somehow that cp1250 is safe,
> "-*- coding: cp1250 -*-" or modify-coding-system-alist function is not 
> solution
> 
> my only question is: how to configure emacs to omit this codepage selection in 
> such situation? 

Does it help to evaluate the expression below?

   (aset latin-extra-code-table ?\210 t)

Please do that _before_ visiting files which have the \210 character.
Then try to save such a file and see if this helps.

The above only handles the \210 character, so please don't try to use
any other characters whose code is between 128 and 160.  If this
works, it is trivial to cover the entire range, of course.

If the above does not help with cp1250, please try the same with
latin-2 instead (you will have to modify the `coding:' cookie for this
to work).







^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#5235: 23.1; Unibyte keyboard input problem
@ 2009-12-26 17:03 Tomasz Zbrożek
  2009-12-26 17:52 ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Tomasz Zbrożek @ 2009-12-26 17:03 UTC (permalink / raw)
  To: Eli Zaretskii, 5235

>Does it help to evaluate the expression below?
>
>  (aset latin-extra-code-table ?\210 t)
>
>Please do that _before_ visiting files which have the \210 character.
>Then try to save such a file and see if this helps.
>
>The above only handles the \210 character, so please don't try to use
>any other characters whose code is between 128 and 160.  If this
>works, it is trivial to cover the entire range, of course.
>
>If the above does not help with cp1250, please try the same with
>latin-2 instead (you will have to modify the `coding:' cookie for this
>to work).

(aset latin-extra-code-table ?\210 t) does not help with cp1250 and when I try 
to save buffer in latin-2 there is no need to use latin-extra-code-table 
because in this case there is no problem with \210 char encoding







^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#5235: 23.1; Unibyte keyboard input problem
  2009-12-26 17:03 Tomasz Zbrożek
@ 2009-12-26 17:52 ` Eli Zaretskii
  0 siblings, 0 replies; 23+ messages in thread
From: Eli Zaretskii @ 2009-12-26 17:52 UTC (permalink / raw)
  To: Tomasz Zbrożek; +Cc: 5235

> From: Tomasz Zbrożek <scianagoryczy@wp.pl>
> Date: Sat, 26 Dec 2009 18:03:27 +0100
> 
> >Does it help to evaluate the expression below?
> >
> >  (aset latin-extra-code-table ?\210 t)
> >
> >Please do that _before_ visiting files which have the \210 character.
> >Then try to save such a file and see if this helps.
> >
> >The above only handles the \210 character, so please don't try to use
> >any other characters whose code is between 128 and 160.  If this
> >works, it is trivial to cover the entire range, of course.
> >
> >If the above does not help with cp1250, please try the same with
> >latin-2 instead (you will have to modify the `coding:' cookie for this
> >to work).
> 
> (aset latin-extra-code-table ?\210 t) does not help with cp1250 and when I try 
> to save buffer in latin-2 there is no need to use latin-extra-code-table 
> because in this case there is no problem with \210 char encoding

So does this mean using latin-2 solves your original problem as well?
That is, are you able to edit the source files without the annoying
questions from Emacs when you save the files?







^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#5235: 23.1; Unibyte keyboard input problem
@ 2009-12-26 19:19 Tomasz Zbrożek
  2009-12-26 21:24 ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Tomasz Zbrożek @ 2009-12-26 19:19 UTC (permalink / raw)
  To: Eli Zaretskii, 5235

>So does this mean using latin-2 solves your original problem as well?
>That is, are you able to edit the source files without the annoying
>questions from Emacs when you save the files?

No, latin-2 does not solve my problem:) I do not want to read/write file in 
latin-2 but cp1250! 
But yes, when saving in latin-2 then there is no problem (I mean, no annoying 
question) which exists when saving with cp1250. I understand that's 
because "convertion tables" are different for these both codepages and e.g. 
\210 code is differently converted.

Once again, I want to load my file with cp1250, edit it (writing polish chars 
and see them properly on screen) and save buffer with cp1250. (Of course all 
the strange chars which comes from cp852 should be unchanged.) When I load 
file with cp1250 and then save this buffer with latin-2 then all the polish 
chars that were originally in cp1250  will be now in latin-2 and this is not 
what I want.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#5235: 23.1; Unibyte keyboard input problem
  2009-12-26 19:19 Tomasz Zbrożek
@ 2009-12-26 21:24 ` Eli Zaretskii
  0 siblings, 0 replies; 23+ messages in thread
From: Eli Zaretskii @ 2009-12-26 21:24 UTC (permalink / raw)
  To: Tomasz Zbrożek; +Cc: 5235

> From: Tomasz Zbrożek <scianagoryczy@wp.pl>
> Date: Sat, 26 Dec 2009 20:19:38 +0100
> 
> >So does this mean using latin-2 solves your original problem as well?
> >That is, are you able to edit the source files without the annoying
> >questions from Emacs when you save the files?
> 
> No, latin-2 does not solve my problem:) I do not want to read/write file in 
> latin-2 but cp1250! 

Does the patch below give good results?

You will need to rebuild Emacs or manually load mule-cmds.elc, after
patching and compiling it.  Then set
select-safe-coding-system-respect-auto-coding to a non-nil value, and
see if the annoying question goes away while the files are saved
correctly without screwing up the cp852 characters.

Index: lisp/international/mule-cmds.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/international/mule-cmds.el,v
retrieving revision 1.386
diff -u -r1.386 mule-cmds.el
--- lisp/international/mule-cmds.el	9 Dec 2009 00:55:55 -0000	1.386
+++ lisp/international/mule-cmds.el	26 Dec 2009 21:21:17 -0000
@@ -807,6 +807,9 @@
     (set-window-configuration window-configuration)
     coding-system))
 
+(defvar select-safe-coding-system-respect-auto-coding nil
+  "If non-nil, always use coding system from coding cookies &c if possible.")
+
 (defun select-safe-coding-system (from to &optional default-coding-system
 				       accept-default-p file)
   "Ask a user to select a safe coding system from candidates.
@@ -976,7 +979,14 @@
 		(push (car elt) safe))
 	    (push (car elt) unsafe)))
 	(if safe
-	    (setq coding-system (car safe))))
+	    (setq coding-system (car safe))
+	  ;; If default-coding-system is in unsafe, and the user
+	  ;; insists, use it.
+	  (if (and select-safe-coding-system-respect-auto-coding
+		   default-coding-system
+		   (memq (caar default-coding-system) unsafe))
+	      (setq coding-system (caar default-coding-system)))))
+
 
       ;; If all the defaults failed, ask a user.
       (when (not coding-system)







^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#5235: 23.1; Unibyte keyboard input problem
@ 2009-12-27 13:30 Tomasz Zbrożek
  2009-12-27 20:07 ` Eli Zaretskii
  0 siblings, 1 reply; 23+ messages in thread
From: Tomasz Zbrożek @ 2009-12-27 13:30 UTC (permalink / raw)
  To: Eli Zaretskii, 5235

Eli, it looks your patch works OK :D But...
On the default polish environment setting (latin-2), when I have -*- coding: 
cp1250 -*- in my file and when I try to save file with your feature I get 
such a message in minibuffer:

Selected encoding iso-latin-2-unix disagrees with windows-1250-unix specified 
by file contents.  Really save (else edit coding cookies and try again)? (yes 
or no) 

when I press yes the coding is gibberish in saved file.

If I use modify-coding-system-alist instead of "-*- coding:" result is the 
same, I mean gibberish but there is no question.

But when I change a little bit language enviroment to:

 '(language-info-custom-alist (quote (("Polish" (coding-priority cp1250)))))

then everything is OK and your patch works fine (no question when saving and 
coding is ok)! In this case I do not need to specify coding with "-*- 
coding:" or modify-coding-system-alist.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#5235: 23.1; Unibyte keyboard input problem
  2009-12-27 13:30 Tomasz Zbrożek
@ 2009-12-27 20:07 ` Eli Zaretskii
  2009-12-29 15:48   ` Stefan Monnier
  0 siblings, 1 reply; 23+ messages in thread
From: Eli Zaretskii @ 2009-12-27 20:07 UTC (permalink / raw)
  To: Tomasz Zbrożek; +Cc: 5235

> From: Tomasz Zbrożek <scianagoryczy@wp.pl>
> Date: Sun, 27 Dec 2009 14:30:47 +0100
> 
> Eli, it looks your patch works OK

Thanks for testing.

> :D But...
> On the default polish environment setting (latin-2), when I have -*- coding: 
> cp1250 -*- in my file and when I try to save file with your feature I get 
> such a message in minibuffer:
> 
> Selected encoding iso-latin-2-unix disagrees with windows-1250-unix specified 
> by file contents.  Really save (else edit coding cookies and try again)? (yes 
> or no) 

This is expected, I think.  I could make it honor the `coding' cookie
even in that case, but I'd like first to know if this kind of solution
is acceptable (below).

> But when I change a little bit language enviroment to:
> 
>  '(language-info-custom-alist (quote (("Polish" (coding-priority cp1250)))))
> 
> then everything is OK and your patch works fine (no question when saving and 
> coding is ok)!

Good.

Handa-san, could you please tell if you see anything wrong with this
semi-kludgey solution?

Stefan and Yidong: assuming that Handa-san has no objections, is it
okay to commit the patch I sent yesterday?







^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#5235: 23.1; Unibyte keyboard input problem
  2009-12-24 15:21       ` Jason Rumney
  2009-12-24 19:27         ` Eli Zaretskii
  2009-12-25 11:03         ` Tomasz Zbrożek
@ 2009-12-29 15:43         ` Stefan Monnier
  2 siblings, 0 replies; 23+ messages in thread
From: Stefan Monnier @ 2009-12-29 15:43 UTC (permalink / raw)
  To: Jason Rumney; +Cc: Tomasz Zbrożek, 5235

>>> I'll try to explain why I need unibyte mode. I'm maintener of a C/C++
>>> source  code which has comments coded in cp1250 (polish language) but
>>> strings in code  are coded in cp852. So I have two different code
>>> pages in source code file.  This is old source code and it was
>>> developed in Windows (that's why comments  are in cp1250) but is
>>> compiled to work on MS-DOS (that's why strings are  coded in cp852).
>> So what happens if you read those files as binary (i.e. C-x RET
>> r binary RET)?
> At best, he'd end up silently screwing up his files even further, with
> cp1250, cp852 and now utf-8 encoded characters in them.  More likely he
> would still get prompted when saving, just as if he'd used cp1250 or cp852
> to read them.

That would be a bug: a file visited as `binary' (or as `raw-text')
should be placed in a unibyte buffer, so it should not screw anything up
more than was already the case to start with.

> The problem here is the files, not Emacs.  Basically the reason for using
> unibyte is that it allows the user to bury their head in the sand and
> pretend the problem does not exist.

Of course, but if you start with such files and can't (or don't want to)
recode the parts consistently, we can't do much better.

> I work on similar files in my day job, with Japanese comments in ShiftJIS
> and Chinese comments in GB2312. An easy method of fixing such files would be
> nice, but the best I can think of would be to provide a recode-region
> function, which would still be too much manual work to be worth it to me
> given that I can barely make sense of the Japanese comments and can't make
> any sense of the Chinese ones. The original poster might be more motivated
> to make use of such a function if it existed though.

I'm not sure what would be the best approach in general or in particular
cases, but we could certainly provide a command that recodes comments.
Or another one that looks for invalid byte sequences (i.e. decoded as
eight-bit-bytes) and tries to re-decode them with a secondary coding system.


        Stefan






^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#5235: 23.1; Unibyte keyboard input problem
  2009-12-27 20:07 ` Eli Zaretskii
@ 2009-12-29 15:48   ` Stefan Monnier
  0 siblings, 0 replies; 23+ messages in thread
From: Stefan Monnier @ 2009-12-29 15:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 5235

> This is expected, I think.  I could make it honor the `coding' cookie
> even in that case, but I'd like first to know if this kind of solution
> is acceptable (below).

This aproach looks OK, actually, yes.  Basially, the user would set

  -*- coding: cp1250; select-safe-coding-system-respect-auto-coding: t -*-

tho in such a context a shorter varname would make sense, like
`coding-cookie-force'.  And then it'd be OK to obey it no matter what
(i.e. regardless of the default coding system, locale, etc...).

        Stefan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#5235: 23.1; Unibyte keyboard input problem
  2009-12-16 21:17 bug#5235: 23.1; Unibyte keyboard input problem Tomasz Zbrożek
  2009-12-17 16:47 ` Jason Rumney
@ 2010-02-26 20:42 ` Tomasz Zbrożek
  2010-02-26 23:42   ` Eli Zaretskii
  2020-09-14 13:59 ` Lars Ingebrigtsen
  2 siblings, 1 reply; 23+ messages in thread
From: Tomasz Zbrożek @ 2010-02-26 20:42 UTC (permalink / raw)
  To: Eli Zaretskii, 5235

Eli,
is something going on with case 5235 on emacs bug list ?
I mean, will your patch (as I remember it needs some improvement ;) be 
implemented to the emacs current version ?

best regards!
-- 
tomek






^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#5235: 23.1; Unibyte keyboard input problem
  2010-02-26 20:42 ` Tomasz Zbrożek
@ 2010-02-26 23:42   ` Eli Zaretskii
  0 siblings, 0 replies; 23+ messages in thread
From: Eli Zaretskii @ 2010-02-26 23:42 UTC (permalink / raw)
  To: Tomasz Zbrożek; +Cc: 5235

> From: Tomasz Zbrożek <scianagoryczy@wp.pl>
> Date: Fri, 26 Feb 2010 21:42:34 +0100
> 
> Eli,
> is something going on with case 5235 on emacs bug list ?
> I mean, will your patch (as I remember it needs some improvement ;) be 
> implemented to the emacs current version ?

I didn't yet have time to work on the improvement, sorry.  So I guess
it will not be in Emacs 23.2.







^ permalink raw reply	[flat|nested] 23+ messages in thread

* bug#5235: 23.1; Unibyte keyboard input problem
  2009-12-16 21:17 bug#5235: 23.1; Unibyte keyboard input problem Tomasz Zbrożek
  2009-12-17 16:47 ` Jason Rumney
  2010-02-26 20:42 ` Tomasz Zbrożek
@ 2020-09-14 13:59 ` Lars Ingebrigtsen
  2 siblings, 0 replies; 23+ messages in thread
From: Lars Ingebrigtsen @ 2020-09-14 13:59 UTC (permalink / raw)
  To: Tomasz Zbrożek; +Cc: 5235

Tomasz Zbrożek <scianagoryczy@wp.pl> writes:

> In Emacs 23.1, in unibyte mode (emacs --unibyte) and with windows-1250
> coding I can't write Polish chars with right Alt key.

The --unibyte switch has been removed, so I can't reproduce the bug in
question here, so I'm going to go ahead and guess that this is no longer
relevant, and I'm closing this bug report.  Although skimming this bug
report, I'm wondering whether this is still relevant if you're
explicitly (set-buffer-multibyte nil) and entering text, but...  I'm not
sure?  If it is, please respond to the debbugs address, and we'll reopen.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-09-14 13:59 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-16 21:17 bug#5235: 23.1; Unibyte keyboard input problem Tomasz Zbrożek
2009-12-17 16:47 ` Jason Rumney
2009-12-17 19:25   ` Tomasz Zbrożek
2009-12-24  3:40     ` Stefan Monnier
2009-12-24 15:21       ` Jason Rumney
2009-12-24 19:27         ` Eli Zaretskii
2009-12-25 11:03         ` Tomasz Zbrożek
2009-12-25 11:23           ` Jason Rumney
2009-12-25 11:43             ` Tomasz Zbrożek
2009-12-26 12:45             ` Tomasz Zbrożek
2009-12-26 14:30               ` Eli Zaretskii
2009-12-25 20:42           ` Eli Zaretskii
2009-12-29 15:43         ` Stefan Monnier
2010-02-26 20:42 ` Tomasz Zbrożek
2010-02-26 23:42   ` Eli Zaretskii
2020-09-14 13:59 ` Lars Ingebrigtsen
  -- strict thread matches above, loose matches on Subject: below --
2009-12-26 17:03 Tomasz Zbrożek
2009-12-26 17:52 ` Eli Zaretskii
2009-12-26 19:19 Tomasz Zbrożek
2009-12-26 21:24 ` Eli Zaretskii
2009-12-27 13:30 Tomasz Zbrożek
2009-12-27 20:07 ` Eli Zaretskii
2009-12-29 15:48   ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).