* bug#2354: 23.0.90; Emacs fails to detect utf-8 encoding with language environment Latin-1 @ 2009-02-17 10:35 ` David Engster 2009-02-17 16:45 ` Juanma Barranquero 2009-02-28 12:30 ` bug#2354: marked as done (23.0.90; Emacs fails to detect utf-8 encoding with language environment Latin-1) Emacs bug Tracking System 0 siblings, 2 replies; 41+ messages in thread From: David Engster @ 2009-02-17 10:35 UTC (permalink / raw) To: bug-gnu-emacs This is what I believe to be a regression in CVS Emacs since the 23.0.90 pretest. I'm using a fresh CVS checkout from 2009-02-17, compiled with 'make bootstrap'. You can reproduce it as follows: 1. emacs -Q 2. M-x set-language-environment RET Latin-1 RET 3. In some buffer write: (ucs-insert "2500") 4. Eval it, so that the unicode character is inserted into the buffer. 5. Save the file and choose utf-8 as encoding. 6. Kill the buffer. 7. Load the file you just saved. Result: Emacs displays "â\224\200" for the unicode character. Expected behaviour: Emacs should detect utf-8 encoding and display correct character. Please note that this has worked without problems with the Emacs 23.0.90 pretest, so it must be due to some change(s) since then in CVS. In GNU Emacs 23.0.90.1 (i686-pc-linux-gnu, GTK+ Version 2.12.11) of 2009-02-17 on void Windowing system distributor `The X.Org Foundation', version 11.0.10402000 configured using `configure '--prefix=/usr/local/emacs'' Important settings: value of $LC_ALL: nil value of $LC_COLLATE: nil value of $LC_CTYPE: nil value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: nil value of $XMODIFIERS: nil locale-coding-system: nil default-enable-multibyte-characters: t Major mode: Lisp Interaction Minor modes in effect: tooltip-mode: t tool-bar-mode: t mouse-wheel-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t global-auto-composition-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t line-number-mode: t transient-mark-mode: t Recent input: M-x r e p o <tab> r <tab> C-g M-x s e t - l a n <tab> <return> L a t i n w <backspace> - w <return> <backspace> 1 <return> M-x r e p o <tab> r <tab> <return> Recent messages: For information about GNU Emacs and the GNU system, type C-h C-a. Making completion list... Quit Making completion list... ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2354: 23.0.90; Emacs fails to detect utf-8 encoding with language environment Latin-1 2009-02-17 10:35 ` bug#2354: 23.0.90; Emacs fails to detect utf-8 encoding with language environment Latin-1 David Engster @ 2009-02-17 16:45 ` Juanma Barranquero 2009-02-17 18:04 ` David Engster 2009-02-28 12:30 ` bug#2354: marked as done (23.0.90; Emacs fails to detect utf-8 encoding with language environment Latin-1) Emacs bug Tracking System 1 sibling, 1 reply; 41+ messages in thread From: Juanma Barranquero @ 2009-02-17 16:45 UTC (permalink / raw) To: David Engster; +Cc: 2354 On Tue, Feb 17, 2009 at 11:35, David Engster <deng@randomsample.de> wrote: > You can reproduce it as follows: > > 1. emacs -Q > 2. M-x set-language-environment RET Latin-1 RET > 3. In some buffer write: > > (ucs-insert "2500") > > 4. Eval it, so that the unicode character is inserted into the buffer. > 5. Save the file and choose utf-8 as encoding. > 6. Kill the buffer. > 7. Load the file you just saved. > > Result: Emacs displays "â\224\200" for the unicode character. I cannot reproduce it on Windows with the current trunk. The file's coding is correctly detected as UTF-8. Juanma ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2354: 23.0.90; Emacs fails to detect utf-8 encoding with language environment Latin-1 2009-02-17 16:45 ` Juanma Barranquero @ 2009-02-17 18:04 ` David Engster 0 siblings, 0 replies; 41+ messages in thread From: David Engster @ 2009-02-17 18:04 UTC (permalink / raw) To: Juanma Barranquero; +Cc: 2354 Juanma Barranquero <lekktu@gmail.com> writes: > On Tue, Feb 17, 2009 at 11:35, David Engster <deng@randomsample.de> wrote: > >> You can reproduce it as follows: >> >> 1. emacs -Q >> 2. M-x set-language-environment RET Latin-1 RET >> 3. In some buffer write: >> >> (ucs-insert "2500") >> >> 4. Eval it, so that the unicode character is inserted into the buffer. >> 5. Save the file and choose utf-8 as encoding. >> 6. Kill the buffer. >> 7. Load the file you just saved. >> >> Result: Emacs displays "â\224\200" for the unicode character. > > I cannot reproduce it on Windows with the current trunk. The file's > coding is correctly detected as UTF-8. Thank you for looking into this. I tested this now again on a different machine, but also running GNU/Linux (Ubuntu 8.10), with the same result. FWIW, I think I could track down this issue to the following commit for src/coding.c: revision 1.413 date: 2009-02-09 01:42:37 +0100; author: handa; state: Exp; lines: +1 -1; commitid: WAhpeD8cqX926HBt; (detect_coding_charset): Fix previous change. With revision 1.412 of coding.c, the error disappears for me. -David ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2354: marked as done (23.0.90; Emacs fails to detect utf-8 encoding with language environment Latin-1) 2009-02-17 10:35 ` bug#2354: 23.0.90; Emacs fails to detect utf-8 encoding with language environment Latin-1 David Engster 2009-02-17 16:45 ` Juanma Barranquero @ 2009-02-28 12:30 ` Emacs bug Tracking System 1 sibling, 0 replies; 41+ messages in thread From: Emacs bug Tracking System @ 2009-02-28 12:30 UTC (permalink / raw) To: Eli Zaretskii [-- Attachment #1: Type: text/plain, Size: 907 bytes --] Your message dated Sat, 28 Feb 2009 14:21:08 +0200 with message-id <uzlg6oiq3.fsf@gnu.org> and subject line Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k has caused the Emacs bug report #2354, regarding 23.0.90; Emacs fails to detect utf-8 encoding with language environment Latin-1 to be marked as done. This means that you claim that the problem has been dealt with. If this is not the case it is now your responsibility to reopen the bug report if necessary, and/or fix the problem forthwith. (NB: If you are a system administrator and have no idea what this message is talking about, this may indicate a serious mail system misconfiguration somewhere. Please contact owner@emacsbugs.donarmstrong.com immediately.) -- 2354: http://emacsbugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=2354 Emacs Bug Tracking System Contact owner@emacsbugs.donarmstrong.com with problems [-- Attachment #2: Type: message/rfc822, Size: 4181 bytes --] From: David Engster <deng@randomsample.de> To: bug-gnu-emacs@gnu.org Subject: 23.0.90; Emacs fails to detect utf-8 encoding with language environment Latin-1 Date: Tue, 17 Feb 2009 11:35:11 +0100 Message-ID: <87y6w5jqqo.fsf@engster.org> This is what I believe to be a regression in CVS Emacs since the 23.0.90 pretest. I'm using a fresh CVS checkout from 2009-02-17, compiled with 'make bootstrap'. You can reproduce it as follows: 1. emacs -Q 2. M-x set-language-environment RET Latin-1 RET 3. In some buffer write: (ucs-insert "2500") 4. Eval it, so that the unicode character is inserted into the buffer. 5. Save the file and choose utf-8 as encoding. 6. Kill the buffer. 7. Load the file you just saved. Result: Emacs displays "â\224\200" for the unicode character. Expected behaviour: Emacs should detect utf-8 encoding and display correct character. Please note that this has worked without problems with the Emacs 23.0.90 pretest, so it must be due to some change(s) since then in CVS. In GNU Emacs 23.0.90.1 (i686-pc-linux-gnu, GTK+ Version 2.12.11) of 2009-02-17 on void Windowing system distributor `The X.Org Foundation', version 11.0.10402000 configured using `configure '--prefix=/usr/local/emacs'' Important settings: value of $LC_ALL: nil value of $LC_COLLATE: nil value of $LC_CTYPE: nil value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: nil value of $XMODIFIERS: nil locale-coding-system: nil default-enable-multibyte-characters: t Major mode: Lisp Interaction Minor modes in effect: tooltip-mode: t tool-bar-mode: t mouse-wheel-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t global-auto-composition-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t line-number-mode: t transient-mark-mode: t Recent input: M-x r e p o <tab> r <tab> C-g M-x s e t - l a n <tab> <return> L a t i n w <backspace> - w <return> <backspace> 1 <return> M-x r e p o <tab> r <tab> <return> Recent messages: For information about GNU Emacs and the GNU system, type C-h C-a. Making completion list... Quit Making completion list... [-- Attachment #3: Type: message/rfc822, Size: 2452 bytes --] From: Eli Zaretskii <eliz@gnu.org> To: 2497-done@emacsbugs.donarmstrong.com, 2354-done@emacsbugs.donarmstrong.com Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k Date: Sat, 28 Feb 2009 14:21:08 +0200 Message-ID: <uzlg6oiq3.fsf@gnu.org> > From: David Engster <deng@randomsample.de> > Date: Fri, 27 Feb 2009 18:46:12 +0100 > Cc: emacs-pretest-bug@gnu.org, 2497@emacsbugs.donarmstrong.com > > Uwe Siart <uwe.siart@tum.de> writes: > > I'm using the windows port of 23.0.91 on Win2k SP4 and I found that it > > fails to read utf-8 encoded files correctly. When visiting a file in > > utf-8 encoding all characters above 255 are screwed up and "C-h C RET" > > indicates iso-latin1-dos for saving the file. This has not been an > > issue in 23.0.90. > > Maybe this is a duplicate of what I reported in > > http://emacsbugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=2354 > > As I write later in that bug report, I think I could track down this > issue to the change in revision 1.413 of src/coding.c. Maybe you could > try if the same applies to your problem. Should be fixed by this change: 2009-02-28 Eli Zaretskii <eliz@gnu.org> * coding.c (detect_coding_charset): Fix change from 2008-10-21. Also, check iso-latin-*, not only iso-8859-*. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k @ 2009-02-27 14:10 ` Uwe Siart 2009-02-27 16:03 ` Eli Zaretskii ` (4 more replies) 0 siblings, 5 replies; 41+ messages in thread From: Uwe Siart @ 2009-02-27 14:10 UTC (permalink / raw) To: emacs-pretest-bug I'm using the windows port of 23.0.91 on Win2k SP4 and I found that it fails to read utf-8 encoded files correctly. When visiting a file in utf-8 encoding all characters above 255 are screwed up and "C-h C RET" indicates iso-latin1-dos for saving the file. This has not been an issue in 23.0.90. -- Uwe In GNU Emacs 23.0.91.1 (i386-mingw-nt5.0.2195) of 2009-02-27 on SOFT-MJASON Windowing system distributor `Microsoft Corp.', version 5.0.2195 configured using `configure --with-gcc (3.4)' Important settings: value of $LC_ALL: nil value of $LC_COLLATE: nil value of $LC_CTYPE: nil value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: DEU value of $XMODIFIERS: nil locale-coding-system: cp1252 default-enable-multibyte-characters: t Major mode: Lisp Interaction Minor modes in effect: iswitchb-mode: t display-time-mode: t auto-insert-mode: t diff-auto-refine-mode: t delete-selection-mode: t pc-selection-mode: t tooltip-mode: t mouse-wheel-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t global-auto-composition-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t column-number-mode: t line-number-mode: t transient-mark-mode: t Recent input: M-x r e <tab> p o <tab> r t <tab> <return> Recent messages: Loading time...done Loading iswitchb...done For information about GNU Emacs and the GNU system, type C-h C-a. Making completion list... [2 times] ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-27 14:10 ` bug#2497: 23.0.91; Fails to read UTF-8 on Win2k Uwe Siart @ 2009-02-27 16:03 ` Eli Zaretskii 2009-02-27 16:48 ` Uwe Siart 2009-02-27 16:11 ` Juanma Barranquero ` (3 subsequent siblings) 4 siblings, 1 reply; 41+ messages in thread From: Eli Zaretskii @ 2009-02-27 16:03 UTC (permalink / raw) To: uwe.siart, 2497 > Date: Fri, 27 Feb 2009 15:10:19 +0100 > From: Uwe Siart <uwe.siart@tum.de> > Cc: > > I'm using the windows port of 23.0.91 on Win2k SP4 and I found that it > fails to read utf-8 encoded files correctly. When visiting a file in > utf-8 encoding all characters above 255 are screwed up and "C-h C RET" > indicates iso-latin1-dos for saving the file. Does it work with "C-x RET c utf-8 RET" immediately prior to "C-x C-f"? If it does, then the problem is with guessing the encoding, not with decoding it. Also, what is the default value of buffer-file-coding-system, and was it the same in 23.0.90? ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-27 16:03 ` Eli Zaretskii @ 2009-02-27 16:48 ` Uwe Siart 2009-02-27 18:19 ` Eli Zaretskii 0 siblings, 1 reply; 41+ messages in thread From: Uwe Siart @ 2009-02-27 16:48 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 2497 Eli Zaretskii <eliz@gnu.org> writes: >> Date: Fri, 27 Feb 2009 15:10:19 +0100 >> From: Uwe Siart <uwe.siart@tum.de> >> Cc: >> >> I'm using the windows port of 23.0.91 on Win2k SP4 and I found that it >> fails to read utf-8 encoded files correctly. When visiting a file in >> utf-8 encoding all characters above 255 are screwed up and "C-h C RET" >> indicates iso-latin1-dos for saving the file. > > Does it work with "C-x RET c utf-8 RET" immediately prior to > "C-x C-f"? It works with "C-x RET c utf-8 RET" immediately prior to "C-x C-f". > If it does, then the problem is with guessing the encoding, not with > decoding it. That's also my impression. > Also, what is the default value of buffer-file-coding-system, and was > it the same in 23.0.90? iso-latin-1-dos in 23.0.90 and in 23.0.91. -- Uwe ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-27 16:48 ` Uwe Siart @ 2009-02-27 18:19 ` Eli Zaretskii 2009-02-27 20:35 ` Uwe Siart 2009-02-28 4:40 ` Stefan Monnier 0 siblings, 2 replies; 41+ messages in thread From: Eli Zaretskii @ 2009-02-27 18:19 UTC (permalink / raw) To: uwe.siart; +Cc: 2497 > From: Uwe Siart <uwe.siart@tum.de> > Cc: 2497@emacsbugs.donarmstrong.com > Date: Fri, 27 Feb 2009 17:48:15 +0100 > > It works with "C-x RET c utf-8 RET" immediately prior to "C-x C-f". > > > If it does, then the problem is with guessing the encoding, not with > > decoding it. > > That's also my impression. > > > Also, what is the default value of buffer-file-coding-system, and was > > it the same in 23.0.90? > > iso-latin-1-dos in 23.0.90 and in 23.0.91. Then you shouldn't expect Emacs to guess UTF-8 encoding correctly in every single instance. Distinguishing between UTF-8 and Latin-1 is generally impossible with the current state of the art of coded character sets support in Emacs. It might work in certain cases, but that's sheer luck. One way to work around that in your specific case, without changing your global defaults, is to add a `coding:' cookie to your .gnus.el file. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-27 18:19 ` Eli Zaretskii @ 2009-02-27 20:35 ` Uwe Siart 2009-02-28 4:40 ` Stefan Monnier 1 sibling, 0 replies; 41+ messages in thread From: Uwe Siart @ 2009-02-27 20:35 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 2497 Eli Zaretskii <eliz@gnu.org> writes: >> From: Uwe Siart <uwe.siart@tum.de> >> iso-latin-1-dos in 23.0.90 and in 23.0.91. > > Then you shouldn't expect Emacs to guess UTF-8 encoding correctly in > every single instance. Distinguishing between UTF-8 and Latin-1 is > generally impossible with the current state of the art of coded > character sets support in Emacs. It might work in certain cases, but > that's sheer luck. I do not have the background knowledge to join in this conversation but I just observed that it worked correctly for years now (even with CVS Emacsen prior to the 22.1 release) and that it stopped working in 23.0.91. If it appears that this is not a bug then I will take the measures you suggested and set a utf-8 cookie in all files concerned. -- Uwe ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-27 18:19 ` Eli Zaretskii 2009-02-27 20:35 ` Uwe Siart @ 2009-02-28 4:40 ` Stefan Monnier 2009-02-28 8:17 ` Uwe Siart 2009-02-28 10:49 ` Eli Zaretskii 1 sibling, 2 replies; 41+ messages in thread From: Stefan Monnier @ 2009-02-28 4:40 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 2497, uwe.siart >> It works with "C-x RET c utf-8 RET" immediately prior to "C-x C-f". >> > If it does, then the problem is with guessing the encoding, not with >> > decoding it. >> That's also my impression. >> > Also, what is the default value of buffer-file-coding-system, and was >> > it the same in 23.0.90? >> iso-latin-1-dos in 23.0.90 and in 23.0.91. > Then you shouldn't expect Emacs to guess UTF-8 encoding correctly in > every single instance. Distinguishing between UTF-8 and Latin-1 is The guessing shouldn't give priority to buffer-file-coding-system. Instead we have the set-coding-system-priority instead. And IIUC utf-8 should always have a pretty high priority since false positives are fairly rare. So this still looks like a real bug. Stefan ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-28 4:40 ` Stefan Monnier @ 2009-02-28 8:17 ` Uwe Siart 2009-02-28 10:14 ` David Engster 2009-02-28 22:00 ` Stefan Monnier 2009-02-28 10:49 ` Eli Zaretskii 1 sibling, 2 replies; 41+ messages in thread From: Uwe Siart @ 2009-02-28 8:17 UTC (permalink / raw) To: Stefan Monnier; +Cc: 2497 Stefan Monnier <monnier@iro.umontreal.ca> writes: > The guessing shouldn't give priority to buffer-file-coding-system. > Instead we have the set-coding-system-priority instead. And IIUC utf-8 > should always have a pretty high priority since false positives are > fairly rare. So this still looks like a real bug. Here I would like to note that I never had false positives in the past (before 23.0.91) but I do have false positives now. Therefore I'm inclined to call it a bug. -- Uwe ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-28 8:17 ` Uwe Siart @ 2009-02-28 10:14 ` David Engster 2009-02-28 12:09 ` Eli Zaretskii 2009-02-28 22:00 ` Stefan Monnier 1 sibling, 1 reply; 41+ messages in thread From: David Engster @ 2009-02-28 10:14 UTC (permalink / raw) To: uwe.siart; +Cc: 2497 Uwe Siart <uwe.siart@tum.de> writes: > Stefan Monnier <monnier@iro.umontreal.ca> writes: > >> The guessing shouldn't give priority to buffer-file-coding-system. >> Instead we have the set-coding-system-priority instead. And IIUC utf-8 >> should always have a pretty high priority since false positives are >> fairly rare. So this still looks like a real bug. > > Here I would like to note that I never had false positives in the past > (before 23.0.91) but I do have false positives now. Therefore I'm > inclined to call it a bug. I second this - this has worked for years without problems, and suddenly it fails to detect UTF-8 with a Latin-1 environment. I once again confirmed that this behaviour can be tracked down to this change in detect_coding_charset in coding.c (revision 1.413): --- coding.c 7 Feb 2009 10:49:39 -0000 1.412 +++ coding.c 9 Feb 2009 00:42:37 -0000 1.413 @@ -5101,7 +5101,7 @@ valids = AREF (attrs, coding_attr_charset_valids); name = CODING_ID_NAME (coding->id); if (VECTORP (Vlatin_extra_code_table) - && strcmp ((char *) SDATA (SYMBOL_NAME (name)), "iso-8859-")) + && strcmp ((char *) SDATA (SYMBOL_NAME (name)), "iso-8859-") == 0) check_latin_extra = 1; if (! NILP (CODING_ATTR_ASCII_COMPAT (attrs))) src += head_ascii; I'm inclined to say that this change is wrong, since strcmp will only return 0 if two strings are exactly equal. In this case though, the string "iso-8859-" is compared to "iso-8859-1" (in my case), so it returns 1 and therefore check_latin_extra is not set. -David ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-28 10:14 ` David Engster @ 2009-02-28 12:09 ` Eli Zaretskii 2009-02-28 14:16 ` Jason Rumney 2009-02-28 14:31 ` David Engster 0 siblings, 2 replies; 41+ messages in thread From: Eli Zaretskii @ 2009-02-28 12:09 UTC (permalink / raw) To: David Engster, 2497; +Cc: uwe.siart > From: David Engster <deng@randomsample.de> > Date: Sat, 28 Feb 2009 11:14:16 +0100 > Cc: 2497@emacsbugs.donarmstrong.com > > I once again confirmed that this behaviour can be tracked down to this > change in detect_coding_charset in coding.c (revision 1.413): > > --- coding.c 7 Feb 2009 10:49:39 -0000 1.412 > +++ coding.c 9 Feb 2009 00:42:37 -0000 1.413 > @@ -5101,7 +5101,7 @@ > valids = AREF (attrs, coding_attr_charset_valids); > name = CODING_ID_NAME (coding->id); > if (VECTORP (Vlatin_extra_code_table) > - && strcmp ((char *) SDATA (SYMBOL_NAME (name)), "iso-8859-")) > + && strcmp ((char *) SDATA (SYMBOL_NAME (name)), "iso-8859-") == 0) > check_latin_extra = 1; > if (! NILP (CODING_ATTR_ASCII_COMPAT (attrs))) > src += head_ascii; > > I'm inclined to say that this change is wrong, since strcmp will only > return 0 if two strings are exactly equal. In this case though, the > string "iso-8859-" is compared to "iso-8859-1" (in my case), so it > returns 1 and therefore check_latin_extra is not set. You are right. But in my case, it was not enough to test for "iso-8859-", as the symbol's name was "iso-latin-1", not "iso-8859-1". I installed the patch below, that does seem to fix the problem with the OP's .gnus.el, although I don't know how general that problem is, nor whether Emacs is capable of distinguishing UTF-8 from Latin-N in general. 2009-02-28 Eli Zaretskii <eliz@gnu.org> * coding.c (detect_coding_charset): Fix change from 2008-10-21. Also, check iso-latin-*, not only iso-8859-*. Index: src/coding.c =================================================================== RCS file: /cvsroot/emacs/emacs/src/coding.c,v retrieving revision 1.419 diff -u -r1.419 coding.c --- src/coding.c 22 Feb 2009 15:48:03 -0000 1.419 +++ src/coding.c 28 Feb 2009 12:01:18 -0000 @@ -5103,7 +5103,10 @@ valids = AREF (attrs, coding_attr_charset_valids); name = CODING_ID_NAME (coding->id); if (VECTORP (Vlatin_extra_code_table) - && strcmp ((char *) SDATA (SYMBOL_NAME (name)), "iso-8859-") == 0) + && (strncmp ((char *) SDATA (SYMBOL_NAME (name)), + "iso-8859-", sizeof ("iso-8859-") - 1) == 0 + || strncmp ((char *) SDATA (SYMBOL_NAME (name)), + "iso-latin-", sizeof ("iso-latin-") - 1) == 0)) check_latin_extra = 1; if (! NILP (CODING_ATTR_ASCII_COMPAT (attrs))) src += head_ascii; ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-28 12:09 ` Eli Zaretskii @ 2009-02-28 14:16 ` Jason Rumney 2009-02-28 14:31 ` David Engster 1 sibling, 0 replies; 41+ messages in thread From: Jason Rumney @ 2009-02-28 14:16 UTC (permalink / raw) To: Eli Zaretskii, 2497; +Cc: uwe.siart, David Engster Eli Zaretskii wrote: > You are right. But in my case, it was not enough to test for > "iso-8859-", as the symbol's name was "iso-latin-1", not "iso-8859-1". > > I installed the patch below, that does seem to fix the problem with > the OP's .gnus.el, although I don't know how general that problem is, > nor whether Emacs is capable of distinguishing UTF-8 from Latin-N in > general. > I installed a further change for the case where latin-extra-code-table is not a vector. But I don't understand why we have this table, and why the default value allows the 6 C1 control codes PU1, PU2, STS, CCH, MW and SPA to appear in latin text without breaking the auto detection. Are these control characters really that common? ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-28 12:09 ` Eli Zaretskii 2009-02-28 14:16 ` Jason Rumney @ 2009-02-28 14:31 ` David Engster 1 sibling, 0 replies; 41+ messages in thread From: David Engster @ 2009-02-28 14:31 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 2497, uwe.siart Eli Zaretskii <eliz@gnu.org> writes: >> From: David Engster <deng@randomsample.de> >> I'm inclined to say that this change is wrong, since strcmp will only >> return 0 if two strings are exactly equal. In this case though, the >> string "iso-8859-" is compared to "iso-8859-1" (in my case), so it >> returns 1 and therefore check_latin_extra is not set. > > You are right. But in my case, it was not enough to test for > "iso-8859-", as the symbol's name was "iso-latin-1", not "iso-8859-1". > > I installed the patch below, that does seem to fix the problem with > the OP's .gnus.el, although I don't know how general that problem is, > nor whether Emacs is capable of distinguishing UTF-8 from Latin-N in > general. I can confirm this patch fixes my original bug report (#2354). Thanks! -David ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-28 8:17 ` Uwe Siart 2009-02-28 10:14 ` David Engster @ 2009-02-28 22:00 ` Stefan Monnier 1 sibling, 0 replies; 41+ messages in thread From: Stefan Monnier @ 2009-02-28 22:00 UTC (permalink / raw) To: uwe.siart; +Cc: 2497 >> The guessing shouldn't give priority to buffer-file-coding-system. >> Instead we have the set-coding-system-priority instead. And IIUC utf-8 >> should always have a pretty high priority since false positives are >> fairly rare. So this still looks like a real bug. > Here I would like to note that I never had false positives in the past > (before 23.0.91) but I do have false positives now. Therefore I'm > inclined to call it a bug. To clear things up: by "false positives" I meant text that Emacs thinks is valid utf-8 whereas it's really using some other coding system. Stefan ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-28 4:40 ` Stefan Monnier 2009-02-28 8:17 ` Uwe Siart @ 2009-02-28 10:49 ` Eli Zaretskii 2009-02-28 12:16 ` Uwe Siart ` (2 more replies) 1 sibling, 3 replies; 41+ messages in thread From: Eli Zaretskii @ 2009-02-28 10:49 UTC (permalink / raw) To: Stefan Monnier, Kenichi Handa; +Cc: 2497, uwe.siart > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: 2497@emacsbugs.donarmstrong.com, uwe.siart@tum.de > Date: Fri, 27 Feb 2009 23:40:01 -0500 > > >> It works with "C-x RET c utf-8 RET" immediately prior to "C-x C-f". > >> > If it does, then the problem is with guessing the encoding, not with > >> > decoding it. > >> That's also my impression. > >> > Also, what is the default value of buffer-file-coding-system, and was > >> > it the same in 23.0.90? > >> iso-latin-1-dos in 23.0.90 and in 23.0.91. > > Then you shouldn't expect Emacs to guess UTF-8 encoding correctly in > > every single instance. Distinguishing between UTF-8 and Latin-1 is > > The guessing shouldn't give priority to buffer-file-coding-system. > Instead we have the set-coding-system-priority instead. Please give me some credit: I said ``the _default_value_ of buffer-file-coding-system''. That default tells volumes about the coding-system priorities. > And IIUC utf-8 should always have a pretty high priority With today's CVS on a Windows XP machine I get this: M-: (coding-system-priority-list) RET => (iso-latin-1 utf-8 iso-2022-7bit iso-2022-7bit-lock iso-2022-8bit-ss2 emacs-mule raw-text iso-2022-jp in-is13194-devanagari chinese-iso-8bit utf-8-auto utf-8-with-signature utf-16 utf-16be-with-signature utf-16le-with-signature utf-16be utf-16le japanese-shift-jis undecided) So UTF-8 is indeed ``pretty high'', but lower than the locale's default. > So this still looks like a real bug. Perhaps it is, but I didn't know Emacs 23 can reliably distinguish between Latin-1 and UTF-8, even when UTF-8 sequences are present in the text. Can we do that reliably? Perhaps Handa-san can shed some light on this. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-28 10:49 ` Eli Zaretskii @ 2009-02-28 12:16 ` Uwe Siart 2009-02-28 22:04 ` Stefan Monnier 2009-03-02 11:43 ` Kenichi Handa 2 siblings, 0 replies; 41+ messages in thread From: Uwe Siart @ 2009-02-28 12:16 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 2497 Eli Zaretskii <eliz@gnu.org> writes: >> From: Stefan Monnier <monnier@iro.umontreal.ca> >> So this still looks like a real bug. > > Perhaps it is, but I didn't know Emacs 23 can reliably distinguish > between Latin-1 and UTF-8, even when UTF-8 sequences are present in > the text. Can we do that reliably? Perhaps Handa-san can shed some > light on this. Finding a solution to do it reliably would of course be the best. Assumed this is not possible right now we should distinguish between »high reliability« and »poor reliability«. From my perception it has been much more reliable earlier so (as a user with limited viewpoint) I vote for reverting the change. -- Uwe ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-28 10:49 ` Eli Zaretskii 2009-02-28 12:16 ` Uwe Siart @ 2009-02-28 22:04 ` Stefan Monnier 2009-03-02 11:43 ` Kenichi Handa 2 siblings, 0 replies; 41+ messages in thread From: Stefan Monnier @ 2009-02-28 22:04 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 2497, uwe.siart >> The guessing shouldn't give priority to buffer-file-coding-system. >> Instead we have the set-coding-system-priority instead. > Please give me some credit: I said ``the _default_value_ of > buffer-file-coding-system''. That default tells volumes about the > coding-system priorities. I'm sorry for my bad wording: what I wrote was only meant to describe the way the code is currently expected to work (AFAIK). > M-: (coding-system-priority-list) RET > => (iso-latin-1 utf-8 iso-2022-7bit iso-2022-7bit-lock iso-2022-8bit-ss2 emacs-mule raw-text iso-2022-jp in-is13194-devanagari chinese-iso-8bit utf-8-auto utf-8-with-signature utf-16 utf-16be-with-signature utf-16le-with-signature utf-16be utf-16le japanese-shift-jis undecided) > So UTF-8 is indeed ``pretty high'', but lower than the locale's > default. That seems to be the source of the problem. utf-8 should always come before latin-1 in that list, since utf-8 streams that are valid latin-1 streams are not uncommon, whereas latin-1 streams that are valid utf-8 streams are extremely rare. Stefan ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-28 10:49 ` Eli Zaretskii 2009-02-28 12:16 ` Uwe Siart 2009-02-28 22:04 ` Stefan Monnier @ 2009-03-02 11:43 ` Kenichi Handa 2009-03-02 15:25 ` Stefan Monnier 2 siblings, 1 reply; 41+ messages in thread From: Kenichi Handa @ 2009-03-02 11:43 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 2497, uwe.siart In article <uab86q1ih.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes: > M-: (coding-system-priority-list) RET >>> (iso-latin-1 utf-8 iso-2022-7bit iso-2022-7bit-lock iso-2022-8bit-ss2 emacs-mule raw-text iso-2022-jp in-is13194-devanagari chinese-iso-8bit utf-8-auto utf-8-with-signature utf-16 utf-16be-with-signature utf-16le-with-signature utf-16be utf-16le japanese-shift-jis undecided) > So UTF-8 is indeed ``pretty high'', but lower than the locale's > default. > > So this still looks like a real bug. > Perhaps it is, but I didn't know Emacs 23 can reliably distinguish > between Latin-1 and UTF-8, even when UTF-8 sequences are present in > the text. Can we do that reliably? Perhaps Handa-san can shed some > light on this. The coding system iso-latin-1 is for the character set iso-8859-1, and the code-space of iso-8859-1 is 0x00..0xFF (without gap, i.e. including 0x80..0x9F) (see /usr/share/i18n/charmaps/ISO-8859-1.gz). So, if we follows it strictly, any byte sequence can be a correct iso-8859-1 stream, and it means that when iso-latin-1 has the highest priority, all files are detected as iso-latin-1. So, as far as we strictly follows the definition of iso-8859-1... In article <jwv7i3az0fc.fsf-monnier+emacsbugreports@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes: > That seems to be the source of the problem. utf-8 should always come > before latin-1 in that list, since utf-8 streams that are valid latin-1 > streams are not uncommon, whereas latin-1 streams that are valid utf-8 > streams are extremely rare. I think that is the only solution. In article <87ab86ah9z.fsf@tum.de>, Uwe Siart <uwe.siart@tum.de> writes: > Assumed this is not possible right now we should distinguish between > »high reliability« and »poor reliability«. From my perception it has > been much more reliable earlier so (as a user with limited viewpoint) > I vote for reverting the change. In Emacs 22, the coding system iso-latin-1 was defined as a variant of iso-2022-based coding system, and thus 0x80..0x9F were not a valid byte (except for 0x91 and etc. in latin-extra-code-table). So, some of UTF-8 texts were not detected as iso-latin-1. To recover that behaviour, we can define iso-latin-1 as before by doing this: (define-coding-system 'iso-latin-1 "Emacs 22 iso-latin-1." :mnemonic ?1 :coding-type 'iso-2022 :charset-list '(ascii latin-iso8859-1) :ascii-compatible-p t :mime-charset 'iso-8859-1 :designation [ascii latin-iso8859-1 nil nil]) But, even with that, still some valid UTF-8 texts will be detected as iso-latin-1. So I don't think this is the solution of "high reliability". --- Kenichi Handa handa@m17n.org ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-03-02 11:43 ` Kenichi Handa @ 2009-03-02 15:25 ` Stefan Monnier 2009-03-02 19:25 ` Eli Zaretskii 0 siblings, 1 reply; 41+ messages in thread From: Stefan Monnier @ 2009-03-02 15:25 UTC (permalink / raw) To: Kenichi Handa; +Cc: 2497, uwe.siart >> That seems to be the source of the problem. utf-8 should always come >> before latin-1 in that list, since utf-8 streams that are valid latin-1 >> streams are not uncommon, whereas latin-1 streams that are valid utf-8 >> streams are extremely rare. > I think that is the only solution. Not only it's the only solution, but it's a solution on which we agreed already several years ago. So, again, the bug is in the ordering, and we have to figure out which code ends up putting latin-1 before utf-8 in the coding system priority. Stefan ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-03-02 15:25 ` Stefan Monnier @ 2009-03-02 19:25 ` Eli Zaretskii 2009-03-03 16:34 ` Stefan Monnier 0 siblings, 1 reply; 41+ messages in thread From: Eli Zaretskii @ 2009-03-02 19:25 UTC (permalink / raw) To: Stefan Monnier; +Cc: 2497, uwe.siart > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Eli Zaretskii <eliz@gnu.org>, 2497@emacsbugs.donarmstrong.com, uwe.siart@tum.de > Date: Mon, 02 Mar 2009 10:25:45 -0500 > > So, again, the bug is in the ordering Actually, the OP was complaining that, even with this ordering, Emacs 23 did TRT for him, and that a recent change broke that. That bug is fixed now, I believe, so you are talking about a more general problem. > we have to figure out which code ends up putting latin-1 before utf-8 in > the coding system priority. Well, I think this is fairly easy: set-locale-environment does it. Observe: (defun set-locale-environment (&optional locale-name frame) "Set up multi-lingual environment for using LOCALE-NAME. This sets the language environment, the coding system priority, the default input method and sometimes other things. ... (let ((language-name (locale-name-match locale locale-language-names)) (charset-language-name (locale-name-match locale locale-charset-language-names)) (default-eol-type (coding-system-eol-type default-buffer-file-coding-system)) (coding-system (or (locale-name-match locale locale-preferred-coding-systems) (when locale (if (string-match "\\.\\([^@]+\\)" locale) (locale-charset-to-coding-system (match-string 1 locale))))))) ... (when (and (not frame) coding-system (not (coding-system-equal coding-system locale-coding-system))) >>>>> (prefer-coding-system coding-system) ;; Fixme: perhaps prefer-coding-system should set this too. ;; But it's not the time to do such a fundamental change. (setq default-sendmail-coding-system coding-system) (setq locale-coding-system coding-system)))) Even the doc string says that the coding system priority is set according to the locale's native encoding. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-03-02 19:25 ` Eli Zaretskii @ 2009-03-03 16:34 ` Stefan Monnier 0 siblings, 0 replies; 41+ messages in thread From: Stefan Monnier @ 2009-03-03 16:34 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 2497, uwe.siart >> So, again, the bug is in the ordering > Actually, the OP was complaining that, even with this ordering, Emacs > 23 did TRT for him, and that a recent change broke that. That bug is > fixed now, I believe, so you are talking about a more general problem. Yes. I didn't realize that the reason why it worked before is because we were luckly. >> we have to figure out which code ends up putting latin-1 before utf-8 in >> the coding system priority. > Well, I think this is fairly easy: set-locale-environment does it. > Observe: > (defun set-locale-environment (&optional locale-name frame) [...] >>>>>> (prefer-coding-system coding-system) [...] > Even the doc string says that the coding system priority is set > according to the locale's native encoding. Indeed, thanks for spotting it. Can someone change this code so it doesn't move utf-8 from first to second place? Stefan ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-27 14:10 ` bug#2497: 23.0.91; Fails to read UTF-8 on Win2k Uwe Siart 2009-02-27 16:03 ` Eli Zaretskii @ 2009-02-27 16:11 ` Juanma Barranquero 2009-02-27 16:16 ` Juanma Barranquero ` (2 more replies) 2009-02-27 17:46 ` David Engster ` (2 subsequent siblings) 4 siblings, 3 replies; 41+ messages in thread From: Juanma Barranquero @ 2009-02-27 16:11 UTC (permalink / raw) To: uwe.siart; +Cc: 2497 On Fri, Feb 27, 2009 at 15:10, Uwe Siart <uwe.siart@tum.de> wrote: > I'm using the windows port of 23.0.91 on Win2k SP4 and I found that it > fails to read utf-8 encoded files correctly. When visiting a file in > utf-8 encoding all characters above 255 are screwed up and "C-h C RET" > indicates iso-latin1-dos for saving the file. This has not been an > issue in 23.0.90. Do you have a specific example of a UTF-8 coded file that was detected as UTF-8 in 23.0.90 and it is detected as Latin-1 in 23.0.91? For example, I create a UTF-8 file (without UTF-8 byte-order-mark "signature") with just the following contents: cañón And 23.0.90 also thinks it is Latin-1. That said, if you need UTF-8 to be given more priority than Latin-1, etc, you can use `set-coding-system-priority' in your .emacs. Juanma ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-27 16:11 ` Juanma Barranquero @ 2009-02-27 16:16 ` Juanma Barranquero 2009-02-27 16:27 ` Uwe Siart 2009-02-27 16:23 ` Uwe Siart 2009-02-27 17:02 ` Leo 2 siblings, 1 reply; 41+ messages in thread From: Juanma Barranquero @ 2009-02-27 16:16 UTC (permalink / raw) To: uwe.siart; +Cc: 2497 On Fri, Feb 27, 2009 at 17:11, Juanma Barranquero <lekktu@gmail.com> wrote: > cañón > > And 23.0.90 also thinks it is Latin-1. Just to be clear: of course "cañón" is Latin-1. What I mean is that emacs 23.0.90 also reads the byte representation of "cañón" in UTF-8, that is: 0000000 63 61 c3 b1 c3 b3 6e and interprets it as Latin-1: cañón Juanma ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-27 16:16 ` Juanma Barranquero @ 2009-02-27 16:27 ` Uwe Siart 2009-02-27 16:32 ` Juanma Barranquero 0 siblings, 1 reply; 41+ messages in thread From: Uwe Siart @ 2009-02-27 16:27 UTC (permalink / raw) To: Juanma Barranquero; +Cc: 2497 Juanma Barranquero <lekktu@gmail.com> writes: > Just to be clear: of course "cañón" is Latin-1. What I mean is that > emacs 23.0.90 also reads the byte representation of "cañón" in UTF-8, > that is: > > 0000000 63 61 c3 b1 c3 b3 6e > > and interprets it as Latin-1: cañón I tried this out in 23.0.90 in the following way: - mark "cañón" from your mail - create empty file with 'touch t.txt' - visit t.txt and yank cañón - save t.txt - visit t.txt and get correct result (cañón not cañón) -- Uwe ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-27 16:27 ` Uwe Siart @ 2009-02-27 16:32 ` Juanma Barranquero 0 siblings, 0 replies; 41+ messages in thread From: Juanma Barranquero @ 2009-02-27 16:32 UTC (permalink / raw) To: uwe.siart; +Cc: 2497 On Fri, Feb 27, 2009 at 17:27, Uwe Siart <uwe.siart@tum.de> wrote: > I tried this out in 23.0.90 in the following way: > > - mark "cañón" from your mail > - create empty file with 'touch t.txt' > - visit t.txt and yank cañón > - save t.txt > - visit t.txt > > and get correct result (cañón not cañón) Of course: you've created a file t.txt encoded in Latin-1, not UTF-8. Juanma ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-27 16:11 ` Juanma Barranquero 2009-02-27 16:16 ` Juanma Barranquero @ 2009-02-27 16:23 ` Uwe Siart 2009-02-27 16:38 ` Juanma Barranquero 2009-02-27 17:02 ` Leo 2 siblings, 1 reply; 41+ messages in thread From: Uwe Siart @ 2009-02-27 16:23 UTC (permalink / raw) To: Juanma Barranquero; +Cc: 2497 Juanma Barranquero <lekktu@gmail.com> writes: > On Fri, Feb 27, 2009 at 15:10, Uwe Siart <uwe.siart@tum.de> wrote: > >> I'm using the windows port of 23.0.91 on Win2k SP4 and I found that it >> fails to read utf-8 encoded files correctly. When visiting a file in >> utf-8 encoding all characters above 255 are screwed up and "C-h C RET" >> indicates iso-latin1-dos for saving the file. This has not been an >> issue in 23.0.90. > > Do you have a specific example of a UTF-8 coded file that was detected > as UTF-8 in 23.0.90 and it is detected as Latin-1 in 23.0.91? Yes. My .gnus.el: <http://www.siart.de/etc/.gnus.el> I hope, the webserver delivers it in utf-8 encoding. > For example, I create a UTF-8 file (without UTF-8 byte-order-mark > "signature") with just the following contents: > > cañón > > And 23.0.90 also thinks it is Latin-1. Maybe because it can be encoded in latin-1. That would be ok for me. But my .gnus.el contains symbols (arrows for the summary buffer) that are definitely not included in latin-1 but 23.0.91 recognises latin-1. -- Uwe ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-27 16:23 ` Uwe Siart @ 2009-02-27 16:38 ` Juanma Barranquero 2009-02-27 18:19 ` Eli Zaretskii 0 siblings, 1 reply; 41+ messages in thread From: Juanma Barranquero @ 2009-02-27 16:38 UTC (permalink / raw) To: uwe.siart; +Cc: 2497 On Fri, Feb 27, 2009 at 17:23, Uwe Siart <uwe.siart@tum.de> wrote: > Yes. My .gnus.el: <http://www.siart.de/etc/.gnus.el> Aha, yes, the bug is reproducible. > I hope, the webserver delivers it in utf-8 encoding. Yes. Emacs 23.0.90 opens it as utf-8, as does Notepad2. > But > my .gnus.el contains symbols (arrows for the summary buffer) that are > definitely not included in latin-1 but 23.0.91 recognises latin-1. Juanma ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-27 16:38 ` Juanma Barranquero @ 2009-02-27 18:19 ` Eli Zaretskii 2009-02-27 20:38 ` Juanma Barranquero 2009-02-28 1:29 ` Jason Rumney 0 siblings, 2 replies; 41+ messages in thread From: Eli Zaretskii @ 2009-02-27 18:19 UTC (permalink / raw) To: Juanma Barranquero, 2497; +Cc: uwe.siart > Date: Fri, 27 Feb 2009 17:38:37 +0100 > From: Juanma Barranquero <lekktu@gmail.com> > Cc: 2497@emacsbugs.donarmstrong.com > > On Fri, Feb 27, 2009 at 17:23, Uwe Siart <uwe.siart@tum.de> wrote: > > > Yes. My .gnus.el: <http://www.siart.de/etc/.gnus.el> > > Aha, yes, the bug is reproducible. Which bug? ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-27 18:19 ` Eli Zaretskii @ 2009-02-27 20:38 ` Juanma Barranquero 2009-02-28 1:29 ` Jason Rumney 1 sibling, 0 replies; 41+ messages in thread From: Juanma Barranquero @ 2009-02-27 20:38 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 2497, uwe.siart On Fri, Feb 27, 2009 at 19:19, Eli Zaretskii <eliz@gnu.org> wrote: >> Aha, yes, the bug is reproducible. > > Which bug? I mean, the fact that the given .gnus.el file was read as utf-8-dos in 23.0.90 and as iso-latin1-dos in 23.0.91 (with characters that are not latin-1). Juanma ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-27 18:19 ` Eli Zaretskii 2009-02-27 20:38 ` Juanma Barranquero @ 2009-02-28 1:29 ` Jason Rumney 1 sibling, 0 replies; 41+ messages in thread From: Jason Rumney @ 2009-02-28 1:29 UTC (permalink / raw) To: Eli Zaretskii, 2497; +Cc: Juanma Barranquero, uwe.siart Eli Zaretskii wrote: >> Date: Fri, 27 Feb 2009 17:38:37 +0100 >> From: Juanma Barranquero <lekktu@gmail.com> >> Cc: 2497@emacsbugs.donarmstrong.com >> >> On Fri, Feb 27, 2009 at 17:23, Uwe Siart <uwe.siart@tum.de> wrote: >> >> >>> Yes. My .gnus.el: <http://www.siart.de/etc/.gnus.el> >>> >> Aha, yes, the bug is reproducible. >> > > Which bug? > The one where the OP's .gnus.el contains characters which were correctly detected as UTF-8 in 23.0.90, but now appear as \200\224 octal escapes, as the file is incorrectly detected as Latin-1. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-27 16:11 ` Juanma Barranquero 2009-02-27 16:16 ` Juanma Barranquero 2009-02-27 16:23 ` Uwe Siart @ 2009-02-27 17:02 ` Leo 2 siblings, 0 replies; 41+ messages in thread From: Leo @ 2009-02-27 17:02 UTC (permalink / raw) To: bug-gnu-emacs On 2009-02-27 16:11 +0000, Juanma Barranquero wrote: > On Fri, Feb 27, 2009 at 15:10, Uwe Siart <uwe.siart@tum.de> wrote: > >> I'm using the windows port of 23.0.91 on Win2k SP4 and I found that it >> fails to read utf-8 encoded files correctly. When visiting a file in >> utf-8 encoding all characters above 255 are screwed up and "C-h C RET" >> indicates iso-latin1-dos for saving the file. This has not been an >> issue in 23.0.90. > > Do you have a specific example of a UTF-8 coded file that was detected > as UTF-8 in 23.0.90 and it is detected as Latin-1 in 23.0.91? > > For example, I create a UTF-8 file (without UTF-8 byte-order-mark > "signature") with just the following contents: > > cañón > > And 23.0.90 also thinks it is Latin-1. > > That said, if you need UTF-8 to be given more priority than Latin-1, > etc, you can use `set-coding-system-priority' in your .emacs. > > Juanma I have the following code in my .emacs when I changed to w32 last June. So the problem might exist longer. ;;; FIXME: find out why GNU/Linux does not need this (prefer-coding-system 'utf-8) I just tested some Chinese files. Without that line, all of them are being opened in latin-1 encoding and are unreadable. Tested in GNU Emacs 23.0.91.1 (i386-mingw-nt5.1.2600) of 2009-02-26 -- .: Leo :. [ sdl.web AT gmail.com ] .: I use Emacs :. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-27 14:10 ` bug#2497: 23.0.91; Fails to read UTF-8 on Win2k Uwe Siart 2009-02-27 16:03 ` Eli Zaretskii 2009-02-27 16:11 ` Juanma Barranquero @ 2009-02-27 17:46 ` David Engster 2009-02-27 21:15 ` Uwe Siart 2009-02-28 1:32 ` Jason Rumney 2009-02-27 23:34 ` bug#2497: 23.0.91; Fails to read UTF-8 on Windows2k Richard M Stallman 2009-02-28 12:30 ` bug#2497: marked as done (23.0.91; Fails to read UTF-8 on Win2k) Emacs bug Tracking System 4 siblings, 2 replies; 41+ messages in thread From: David Engster @ 2009-02-27 17:46 UTC (permalink / raw) To: uwe.siart; +Cc: emacs-pretest-bug, 2497 Uwe Siart <uwe.siart@tum.de> writes: > I'm using the windows port of 23.0.91 on Win2k SP4 and I found that it > fails to read utf-8 encoded files correctly. When visiting a file in > utf-8 encoding all characters above 255 are screwed up and "C-h C RET" > indicates iso-latin1-dos for saving the file. This has not been an > issue in 23.0.90. Maybe this is a duplicate of what I reported in http://emacsbugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=2354 As I write later in that bug report, I think I could track down this issue to the change in revision 1.413 of src/coding.c. Maybe you could try if the same applies to your problem. -David ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-27 17:46 ` David Engster @ 2009-02-27 21:15 ` Uwe Siart 2009-02-28 1:32 ` Jason Rumney 1 sibling, 0 replies; 41+ messages in thread From: Uwe Siart @ 2009-02-27 21:15 UTC (permalink / raw) To: 2497; +Cc: emacs-pretest-bug David Engster <deng@randomsample.de> writes: > Maybe this is a duplicate of what I reported in > > http://emacsbugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=2354 > > As I write later in that bug report, I think I could track down this > issue to the change in revision 1.413 of src/coding.c. Maybe you could > try if the same applies to your problem. At least I can reproduce it and it seems to be the very same thing that I stumbled across. But due to lack of detailed knowledge about coding recognition I'm unable to join the discussion whether this is a bug or not. It's just that I felt more comfortable about the previous state. So far I got things back to work with ;; -*- coding:utf-8-dos; -*- as the first line of my .gnus.el :-) -- Uwe ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-27 17:46 ` David Engster 2009-02-27 21:15 ` Uwe Siart @ 2009-02-28 1:32 ` Jason Rumney 2009-02-28 1:35 ` Processed (with 5 errors): " Emacs bug Tracking System 1 sibling, 1 reply; 41+ messages in thread From: Jason Rumney @ 2009-02-28 1:32 UTC (permalink / raw) To: David Engster, 2497 merge 2354 2497 David Engster wrote: > Maybe this is a duplicate of what I reported in > > http://emacsbugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=2354 > It seems so, yes. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Processed (with 5 errors): Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k 2009-02-28 1:32 ` Jason Rumney @ 2009-02-28 1:35 ` Emacs bug Tracking System 0 siblings, 0 replies; 41+ messages in thread From: Emacs bug Tracking System @ 2009-02-28 1:35 UTC (permalink / raw) To: Jason Rumney; +Cc: Emacs Bugs Processing commands for control@emacsbugs.donarmstrong.com: > merge 2354 2497 bug#2354: 23.0.90; Emacs fails to detect utf-8 encoding with language environment Latin-1 bug#2497: 23.0.91; Fails to read UTF-8 on Win2k Merged 2354 2497. > David Engster wrote: Unknown command or malformed arguments to command. > > Maybe this is a duplicate of what I reported in Unknown command or malformed arguments to command. > > Unknown command or malformed arguments to command. > > http://emacsbugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=2354 Unknown command or malformed arguments to command. > > Unknown command or malformed arguments to command. Too many unknown commands, stopping here. Please contact me if you need assistance. Don Armstrong (administrator, Emacs bugs database) ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Windows2k 2009-02-27 14:10 ` bug#2497: 23.0.91; Fails to read UTF-8 on Win2k Uwe Siart ` (2 preceding siblings ...) 2009-02-27 17:46 ` David Engster @ 2009-02-27 23:34 ` Richard M Stallman 2009-02-28 9:47 ` Uwe Siart 2009-02-28 12:30 ` bug#2497: marked as done (23.0.91; Fails to read UTF-8 on Win2k) Emacs bug Tracking System 4 siblings, 1 reply; 41+ messages in thread From: Richard M Stallman @ 2009-02-27 23:34 UTC (permalink / raw) To: uwe.siart, 2497; +Cc: emacs-pretest-bug Please don't call that system "Win"--that name implies praise. ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Windows2k 2009-02-27 23:34 ` bug#2497: 23.0.91; Fails to read UTF-8 on Windows2k Richard M Stallman @ 2009-02-28 9:47 ` Uwe Siart 2009-02-28 18:08 ` Richard M Stallman 0 siblings, 1 reply; 41+ messages in thread From: Uwe Siart @ 2009-02-28 9:47 UTC (permalink / raw) To: rms; +Cc: emacs-pretest-bug, 2497 Richard M Stallman <rms@gnu.org> writes: > Please don't call that system "Win"--that name implies praise. How right you are. Forgive me my trespasses. In my own defence I have to say that I never thought of W2k as the "system". My system is Emacs and I'm very comfortable with it. W2k is its boot loader. The boot loader does not become noticeable too much. I never understood, however, why this boot loader takes up a whole CD. -- Uwe ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: 23.0.91; Fails to read UTF-8 on Windows2k 2009-02-28 9:47 ` Uwe Siart @ 2009-02-28 18:08 ` Richard M Stallman 0 siblings, 0 replies; 41+ messages in thread From: Richard M Stallman @ 2009-02-28 18:08 UTC (permalink / raw) To: uwe.siart, 2497; +Cc: emacs-pretest-bug, 2497 How right you are. Forgive me my trespasses. Only Emacs can forgive you, but I am confident that it will. In my own defence I have to say that I never thought of W2k as the "system". My system is Emacs and I'm very comfortable with it. W2k is its boot loader. Why not switch to a free boot loader then? ^ permalink raw reply [flat|nested] 41+ messages in thread
* bug#2497: marked as done (23.0.91; Fails to read UTF-8 on Win2k) 2009-02-27 14:10 ` bug#2497: 23.0.91; Fails to read UTF-8 on Win2k Uwe Siart ` (3 preceding siblings ...) 2009-02-27 23:34 ` bug#2497: 23.0.91; Fails to read UTF-8 on Windows2k Richard M Stallman @ 2009-02-28 12:30 ` Emacs bug Tracking System 4 siblings, 0 replies; 41+ messages in thread From: Emacs bug Tracking System @ 2009-02-28 12:30 UTC (permalink / raw) To: Eli Zaretskii [-- Attachment #1: Type: text/plain, Size: 865 bytes --] Your message dated Sat, 28 Feb 2009 14:21:08 +0200 with message-id <uzlg6oiq3.fsf@gnu.org> and subject line Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k has caused the Emacs bug report #2354, regarding 23.0.91; Fails to read UTF-8 on Win2k to be marked as done. This means that you claim that the problem has been dealt with. If this is not the case it is now your responsibility to reopen the bug report if necessary, and/or fix the problem forthwith. (NB: If you are a system administrator and have no idea what this message is talking about, this may indicate a serious mail system misconfiguration somewhere. Please contact owner@emacsbugs.donarmstrong.com immediately.) -- 2354: http://emacsbugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=2354 Emacs Bug Tracking System Contact owner@emacsbugs.donarmstrong.com with problems [-- Attachment #2: Type: message/rfc822, Size: 3281 bytes --] From: Uwe Siart <uwe.siart@tum.de> To: emacs-pretest-bug@gnu.org Subject: 23.0.91; Fails to read UTF-8 on Win2k Date: Fri, 27 Feb 2009 15:10:19 +0100 Message-ID: <877i3c55tg.fsf@tum.de> I'm using the windows port of 23.0.91 on Win2k SP4 and I found that it fails to read utf-8 encoded files correctly. When visiting a file in utf-8 encoding all characters above 255 are screwed up and "C-h C RET" indicates iso-latin1-dos for saving the file. This has not been an issue in 23.0.90. -- Uwe In GNU Emacs 23.0.91.1 (i386-mingw-nt5.0.2195) of 2009-02-27 on SOFT-MJASON Windowing system distributor `Microsoft Corp.', version 5.0.2195 configured using `configure --with-gcc (3.4)' Important settings: value of $LC_ALL: nil value of $LC_COLLATE: nil value of $LC_CTYPE: nil value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: DEU value of $XMODIFIERS: nil locale-coding-system: cp1252 default-enable-multibyte-characters: t Major mode: Lisp Interaction Minor modes in effect: iswitchb-mode: t display-time-mode: t auto-insert-mode: t diff-auto-refine-mode: t delete-selection-mode: t pc-selection-mode: t tooltip-mode: t mouse-wheel-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t global-auto-composition-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t column-number-mode: t line-number-mode: t transient-mark-mode: t Recent input: M-x r e <tab> p o <tab> r t <tab> <return> Recent messages: Loading time...done Loading iswitchb...done For information about GNU Emacs and the GNU system, type C-h C-a. Making completion list... [2 times] [-- Attachment #3: Type: message/rfc822, Size: 2452 bytes --] From: Eli Zaretskii <eliz@gnu.org> To: 2497-done@emacsbugs.donarmstrong.com, 2354-done@emacsbugs.donarmstrong.com Subject: Re: bug#2497: 23.0.91; Fails to read UTF-8 on Win2k Date: Sat, 28 Feb 2009 14:21:08 +0200 Message-ID: <uzlg6oiq3.fsf@gnu.org> > From: David Engster <deng@randomsample.de> > Date: Fri, 27 Feb 2009 18:46:12 +0100 > Cc: emacs-pretest-bug@gnu.org, 2497@emacsbugs.donarmstrong.com > > Uwe Siart <uwe.siart@tum.de> writes: > > I'm using the windows port of 23.0.91 on Win2k SP4 and I found that it > > fails to read utf-8 encoded files correctly. When visiting a file in > > utf-8 encoding all characters above 255 are screwed up and "C-h C RET" > > indicates iso-latin1-dos for saving the file. This has not been an > > issue in 23.0.90. > > Maybe this is a duplicate of what I reported in > > http://emacsbugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=2354 > > As I write later in that bug report, I think I could track down this > issue to the change in revision 1.413 of src/coding.c. Maybe you could > try if the same applies to your problem. Should be fixed by this change: 2009-02-28 Eli Zaretskii <eliz@gnu.org> * coding.c (detect_coding_charset): Fix change from 2008-10-21. Also, check iso-latin-*, not only iso-8859-*. ^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2009-03-03 16:34 UTC | newest] Thread overview: 41+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <uzlg6oiq3.fsf@gnu.org> 2009-02-17 10:35 ` bug#2354: 23.0.90; Emacs fails to detect utf-8 encoding with language environment Latin-1 David Engster 2009-02-17 16:45 ` Juanma Barranquero 2009-02-17 18:04 ` David Engster 2009-02-28 12:30 ` bug#2354: marked as done (23.0.90; Emacs fails to detect utf-8 encoding with language environment Latin-1) Emacs bug Tracking System 2009-02-27 14:10 ` bug#2497: 23.0.91; Fails to read UTF-8 on Win2k Uwe Siart 2009-02-27 16:03 ` Eli Zaretskii 2009-02-27 16:48 ` Uwe Siart 2009-02-27 18:19 ` Eli Zaretskii 2009-02-27 20:35 ` Uwe Siart 2009-02-28 4:40 ` Stefan Monnier 2009-02-28 8:17 ` Uwe Siart 2009-02-28 10:14 ` David Engster 2009-02-28 12:09 ` Eli Zaretskii 2009-02-28 14:16 ` Jason Rumney 2009-02-28 14:31 ` David Engster 2009-02-28 22:00 ` Stefan Monnier 2009-02-28 10:49 ` Eli Zaretskii 2009-02-28 12:16 ` Uwe Siart 2009-02-28 22:04 ` Stefan Monnier 2009-03-02 11:43 ` Kenichi Handa 2009-03-02 15:25 ` Stefan Monnier 2009-03-02 19:25 ` Eli Zaretskii 2009-03-03 16:34 ` Stefan Monnier 2009-02-27 16:11 ` Juanma Barranquero 2009-02-27 16:16 ` Juanma Barranquero 2009-02-27 16:27 ` Uwe Siart 2009-02-27 16:32 ` Juanma Barranquero 2009-02-27 16:23 ` Uwe Siart 2009-02-27 16:38 ` Juanma Barranquero 2009-02-27 18:19 ` Eli Zaretskii 2009-02-27 20:38 ` Juanma Barranquero 2009-02-28 1:29 ` Jason Rumney 2009-02-27 17:02 ` Leo 2009-02-27 17:46 ` David Engster 2009-02-27 21:15 ` Uwe Siart 2009-02-28 1:32 ` Jason Rumney 2009-02-28 1:35 ` Processed (with 5 errors): " Emacs bug Tracking System 2009-02-27 23:34 ` bug#2497: 23.0.91; Fails to read UTF-8 on Windows2k Richard M Stallman 2009-02-28 9:47 ` Uwe Siart 2009-02-28 18:08 ` Richard M Stallman 2009-02-28 12:30 ` bug#2497: marked as done (23.0.91; Fails to read UTF-8 on Win2k) Emacs bug Tracking System
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).