unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* Broken charset=utf-16be articles with Gnus and Emacs 21.3
@ 2003-03-29 11:11 Reiner Steib
  2003-03-31  1:51 ` Kenichi Handa
  2003-03-31  4:55 ` Jesper Harder
  0 siblings, 2 replies; 5+ messages in thread
From: Reiner Steib @ 2003-03-29 11:11 UTC (permalink / raw)
  Cc: bugs

In GNU Emacs 21.3.1 (i686-pc-linux-gnu, X toolkit, Xaw3d scroll bars)
 of 2003-03-19 on ni
configured using `configure  --prefix=/import/xtra/emacs/RC'
Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: POSIX
  value of $LC_CTYPE: en_US.ISO_8859-1
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: C
  locale-coding-system: iso-latin-1
  default-enable-multibyte-characters: t

Gnus v5.9.0
GNU Emacs 21.3.1 (i686-pc-linux-gnu, X toolkit, Xaw3d scroll bars)
 of 2003-03-19 on ni
200 news.uni-ulm.de DNEWS Version  5.6f3,, S0, posting OK 

Please describe exactly what actions triggered the bug
and the precise symptoms of the bug:

Actions to reproduce the bug:

- Start Gnus `M-x gnus RET' and compose an article `a'.

- Switch to TeX input method: `M-x set-input-method RET TeX RET'

- Enter some characters:

  \sigma ä (a with diaeresis) \omega
  (Note: \sigma and \omega (without ä) is not sufficient.)
  
  .. *or* ...

  \sigma \omega \alpha \o \int

- Preview `M-m P' (or `C-u M-m P') or send the message.

  The preview (and the outgoing message) will be encoded with
  "Content-Type: text/plain; charset=utf-16be" and it will not be
  readable with Gnus and most other MUAs or newsreaders.

  I've been told (-> Simon Krahnke) that the result isn't even correct
  UTF-16.

Expected behavior:

- The article should be encoded with
  "Content-Type: text/plain; charset=utf-8".

Note: I'm sending this message with a different version of Gnus, so
don't be confused when looking at my User-Agent header.  FWIW, the
same bug also happens with Oort Gnus from CVS.  With Emacs 21.1 / Gnus
5.9.0 [1] and with Emacs 21.2 / Oort Gnus v0.17 [2] we have the
expected behavior [1].  Therefore I think this is a bug in Emacs 21.3,
but I'm not sure (Cc-ing bugs@gnus.org).

Additional confusion: Replying to [1] (with full citation), I get
utf-8 [3], but copying the non-Latin charaters into this buffer
results in utf-16be in the preview.  I also get utf-16be, when I reply
to [2] and delete the lines containing the word "Blindtext".

Bye, Reiner.

[1] Reported by Simon Krahnke: <news:8765q3nwm9.fsf@xts.gnuu.de>
    (Newsgroups: de.comm.software.gnus)

[2] Reported by Mark Trettin:
    <news:dcsg.m3of3vcgwr.fsf@beldin.mt743742.dialup.rwth-aachen.de>
    (Newsgroups: de.comm.software.gnus)

[3] <news:v9d6kaig3n.fsf@marauder.physik.uni-ulm.de>
    (Newsgroups: de.comm.software.gnus)
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo--- PGP key available via WWW   http://rsteib.home.pages.de/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Broken charset=utf-16be articles with Gnus and Emacs 21.3
  2003-03-29 11:11 Broken charset=utf-16be articles with Gnus and Emacs 21.3 Reiner Steib
@ 2003-03-31  1:51 ` Kenichi Handa
  2003-03-31  4:55 ` Jesper Harder
  1 sibling, 0 replies; 5+ messages in thread
From: Kenichi Handa @ 2003-03-31  1:51 UTC (permalink / raw)
  Cc: overlord

In article <v9adfeig1c.fsf@marauder.physik.uni-ulm.de>, Reiner Steib <4.uce.03.r.s@nurfuerspam.de> writes:
> Actions to reproduce the bug:

> - Start Gnus `M-x gnus RET' and compose an article `a'.

> - Switch to TeX input method: `M-x set-input-method RET TeX RET'

> - Enter some characters:

>   \sigma ä (a with diaeresis) \omega
>   (Note: \sigma and \omega (without ä) is not sufficient.)
  
>   .. *or* ...

>   \sigma \omega \alpha \o \int

> - Preview `M-m P' (or `C-u M-m P') or send the message.

>   The preview (and the outgoing message) will be encoded with
>   "Content-Type: text/plain; charset=utf-16be" and it will not be
>   readable with Gnus and most other MUAs or newsreaders.

>   I've been told (-> Simon Krahnke) that the result isn't even correct
>   UTF-16.

Oops, I've just found that Emacs' coding systems utf-16-le
and utf-16-be produce BOM (Byte Order Mark) which is a bug
according to their specifications.  I've just installed a
fix.

> Expected behavior:

> - The article should be encoded with
>   "Content-Type: text/plain; charset=utf-8".

I don't know why GNUS prefers utf-16-X to utf-8.  At least,
sort-coding-systems prefers utf-8.

---
Ken'ichi HANDA
handa@m17n.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Broken charset=utf-16be articles with Gnus and Emacs 21.3
  2003-03-29 11:11 Broken charset=utf-16be articles with Gnus and Emacs 21.3 Reiner Steib
  2003-03-31  1:51 ` Kenichi Handa
@ 2003-03-31  4:55 ` Jesper Harder
  2003-03-31 13:41   ` Reiner Steib
  1 sibling, 1 reply; 5+ messages in thread
From: Jesper Harder @ 2003-03-31  4:55 UTC (permalink / raw)
  Cc: Simon Krahnke

Reiner Steib <4.uce.03.r.s@nurfuerspam.de> writes:

>   The preview (and the outgoing message) will be encoded with
>   "Content-Type: text/plain; charset=utf-16be" and it will not be
>
> Note: I'm sending this message with a different version of Gnus, so
> don't be confused when looking at my User-Agent header.  FWIW, the
> same bug also happens with Oort Gnus from CVS.  With Emacs 21.1 / Gnus
> 5.9.0 [1] and with Emacs 21.2 / Oort Gnus v0.17 [2] we have the
> expected behavior [1].  Therefore I think this is a bug in Emacs 21.3,
> but I'm not sure (Cc-ing bugs@gnus.org).

FWIW, Oort Gnus and current CVS Emacs works as expected.  It used to
exhibit the same bug you're describing, but I was fixed earlier this
year -- probably by this change:


2003-01-03  Dave Love  <fx@gnu.org>

	* international/mule-cmds.el (sort-coding-systems):
	Adjust priority of utf-16 and x-ctext.

Didn't it get applied to Emacs 21.3?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Broken charset=utf-16be articles with Gnus and Emacs 21.3
  2003-03-31  4:55 ` Jesper Harder
@ 2003-03-31 13:41   ` Reiner Steib
  2003-03-31 16:14     ` Mark Trettin
  0 siblings, 1 reply; 5+ messages in thread
From: Reiner Steib @ 2003-03-31 13:41 UTC (permalink / raw)
  Cc: Kenichi Handa

On Mon, Mar 31 2003, Jesper Harder wrote:

> FWIW, Oort Gnus and current CVS Emacs works as expected.  It used to
> exhibit the same bug you're describing, 

Yes, I remember seeing this on the Ding-List in some of Kai's
articles.  But Kai and me didn't recall whether the problem was in
Emacs CVS-HEAD or in Oort Gnus (and whether it was fixed or not).

> but I was fixed earlier this year -- probably by this change:
>
> 2003-01-03  Dave Love  <fx@gnu.org>
>
> 	* international/mule-cmds.el (sort-coding-systems):
> 	Adjust priority of utf-16 and x-ctext.

Yes, that's it, thanks for pointing this out.  After applying the
following patch to lisp/international/mule-cmds.el from Emacs 21.3.1
and evaluation `sort-coding-systems', I get utf-8 as expected.

--8<---------------cut here---------------start------------->8---
--- mule-cmds.el	26 Dec 2002 17:27:20 -0000	1.216
+++ mule-cmds.el	3 Jan 2003 20:16:11 -0000	1.217
@@ -425,9 +425,18 @@
 		    (let ((base (coding-system-base x)))
 		      (+ (if (eq base most-preferred) 64 0)
 			 (let ((mime (coding-system-get base 'mime-charset)))
+			   ;; Prefer coding systems corresponding to a
+			   ;; MIME charset.
 			   (if mime
-			       (if (string-match "^x-" (symbol-name mime))
-				   16 32)
+			       ;; Lower utf-16 priority so that we
+			       ;; normally prefer utf-8 to it, and put
+			       ;; x-ctext below that.
+			       (cond ((or (eq base 'mule-utf-16-le)
+					  (eq base 'mule-utf-16-be))
+				      16)
+				     ((string-match "^x-" (symbol-name mime))
+				      8)
+				     (t 32))
 			     0))
 			 (if (memq base lang-preferred) 8 0)
 			 (if (string-match "-with-esc$" (symbol-name base))
--8<---------------cut here---------------end--------------->8---

> Didn't it get applied to Emacs 21.3?

No.  If the EMACS_21_1_RC branch is still maintained, it should
probably be applied there as well.

On Mon, Mar 31 2003, Kenichi Handa wrote:

> Oops, I've just found that Emacs' coding systems utf-16-le and
> utf-16-be produce BOM (Byte Order Mark) which is a bug according to
> their specifications.  I've just installed a fix.

Does it make sense to apply it to EMACS_21_1_RC too?

>> Expected behavior:
>> - The article should be encoded with
>>   "Content-Type: text/plain; charset=utf-8".
> I don't know why GNUS prefers utf-16-X to utf-8.  At least,
> sort-coding-systems prefers utf-8.

Apparently not in Emacs 21.3, unless I misunderstood the
abovementioned patch to `sort-coding-systems'.

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo--- PGP key available via WWW   http://rsteib.home.pages.de/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Broken charset=utf-16be articles with Gnus and Emacs 21.3
  2003-03-31 13:41   ` Reiner Steib
@ 2003-03-31 16:14     ` Mark Trettin
  0 siblings, 0 replies; 5+ messages in thread
From: Mark Trettin @ 2003-03-31 16:14 UTC (permalink / raw)


On Mon, 31 Mar 2003, Reiner Steib verbalised:
> On Mon, Mar 31 2003, Jesper Harder wrote:

[...]

>> but I was fixed earlier this year -- probably by this change:
>>
>> 2003-01-03  Dave Love  <fx@gnu.org>
>>
>> 	* international/mule-cmds.el (sort-coding-systems):
>> 	Adjust priority of utf-16 and x-ctext.
> 
> Yes, that's it, thanks for pointing this out.  After applying the
> following patch to lisp/international/mule-cmds.el from Emacs 21.3.1
> and evaluation `sort-coding-systems', I get utf-8 as expected.

[...]

>> Didn't it get applied to Emacs 21.3?
 
> No.  If the EMACS_21_1_RC branch is still maintained, it should
> probably be applied there as well.

Will there be some kind of "official" patch for Emacs 21.3.1 or do have to
apply the snipped patch?

[...]

Bis dann

	 Mark
-- 
Mark Trettin ------- *Aachen* -- Wo ist das? ------> N: 50°46' O: 06°05'
BOFH excuse #317:
Internet exceeded Luser level, please wait until a luser logs off before
attempting to log back on.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-03-31 16:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-03-29 11:11 Broken charset=utf-16be articles with Gnus and Emacs 21.3 Reiner Steib
2003-03-31  1:51 ` Kenichi Handa
2003-03-31  4:55 ` Jesper Harder
2003-03-31 13:41   ` Reiner Steib
2003-03-31 16:14     ` Mark Trettin

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).