unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
@ 2008-10-15 20:21 ` Frank Schmitt
  2008-10-16  7:32   ` Andreas Schwab
                     ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Frank Schmitt @ 2008-10-15 20:21 UTC (permalink / raw)
  To: emacs-pretest-bug

Gnus in Emacs 23 displays some German umlauts incorrectly while Gnus
in Emacs 22 displays the same message correctly. It only happens when
my boss uses thunderbird to reply to my UTF-8 mails from Gnus.

In Emacs 23 in the mail from my boss I get
Char: � (4194300, #o17777774, #x3ffffc, raw-byte) point=212 of 461 (46%) column=4
and the character is displayed as \374
In an other UTF-8 mail written by me it is
Char: ü (252, #o374, #xfc, file ...) point=493 of 940 (52%) column=0


In Emacs 22 when I do C-x = on the same character in the mail from my
boss as above I get
Char: ü (2300, #o4374, #x8fc, file ...) point=212 of 461 (46%) column=4
and in the other mail
Char: ü (2300, #o4374, #x8fc, file ...) point=493 of 940 (52%) column=0

I could reproduce with CVS head from this evening, minimal .gnus just
specifying the select method and emacs -q.


In GNU Emacs 23.0.60.2 (i686-pc-linux-gnu, GTK+ Version 2.12.8)
 of 2008-10-15 on glasgow.frank-schmitt.net
Windowing system distributor `The X.Org Foundation', version 11.0.10300000
configured using `configure  '--prefix=/opt/emacs23''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: de_DE.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default-enable-multibyte-characters: t

Major mode: Group

Minor modes in effect:
  gnus-undo-mode: t
  show-paren-mode: t
  delete-selection-mode: t
  pc-selection-mode: t
  recentf-mode: t
  iswitchb-mode: t
  shell-dirtrack-mode: t
  tooltip-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  global-auto-composition-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:
<help-echo> M-x g n u s <return> <down> C-u 1 0 <return> 
<up> <up> <return> t <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <down> 
<return> <down-mouse-1> <mouse-1> C-x = <help-echo> 
<help-echo> <help-echo> <down-mouse-2> <mouse-2> <help-echo> 
<down-mouse-1> <mouse-1> <right> <left> C-x = <help-echo> 
<down-mouse-1> <drag-mouse-1> <help-echo> <help-echo> 
<down-mouse-1> <mouse-1> <home> <S-end> <C-insert> 
<down> <home> <S-end> <C-insert> <help-echo> <down-mouse-1> 
<mouse-movement> <mouse-1> <help-echo> q <return> <return> 
SPC q L <up> <up> <up> <up> <up> <up> <up> <up> <up> 
<up> <up> <up> <up> <up> <up> <up> <up> <up> <up> <up> 
<up> <up> <up> <up> <up> <up> <up> <up> <up> <up> <up> 
<down> <down> <down> <down> M-x e m a <tab> b u <tab> 
<backspace> <backspace> <backspace> <backspace> <backspace> 
<backspace> <backspace> <backspace> b u g <tab> <backspace> 
<backspace> <backspace> <backspace> <backspace> <backspace> 
<backspace> <backspace> <backspace> <backspace> <backspace> 
<backspace> <backspace> <backspace> r e p <tab> o <tab> 
r <tab> <return>

Recent messages:
IMAP mailbox `Uni' closed... OK
Retrieving newsgroup: nnimap+ltw:Uni...
nnimap: Updating info for nnimap+ltw:Uni...done
Fetching headers for nnimap+ltw:Uni...done
Scoring...done
Generating summary...done
nnimap: Setting marks in Uni...done
No more unread newsgroups
IMAP mailbox `Uni' closed... OK
Making completion list... [2 times]






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
  2008-10-15 20:21 ` bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 Frank Schmitt
@ 2008-10-16  7:32   ` Andreas Schwab
  2008-10-16  8:36     ` Frank Schmitt
  2008-12-04 22:05   ` bug#1174: marked as done (23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23) Emacs bug Tracking System
  2009-02-18  8:55   ` Emacs bug Tracking System
  2 siblings, 1 reply; 39+ messages in thread
From: Andreas Schwab @ 2008-10-16  7:32 UTC (permalink / raw)
  To: usenet; +Cc: emacs-pretest-bug, 1174

Frank Schmitt <ich@frank-schmitt.net> writes:

> In Emacs 23 in the mail from my boss I get
> Char: � (4194300, #o17777774, #x3ffffc, raw-byte) point=212 of 461 (46%) column=4

How is that character encoded in the mail?

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
  2008-10-16  7:32   ` Andreas Schwab
@ 2008-10-16  8:36     ` Frank Schmitt
  2008-10-16 17:39       ` Eli Zaretskii
                         ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Frank Schmitt @ 2008-10-16  8:36 UTC (permalink / raw)
  To: bug-gnu-emacs; +Cc: emacs-pretest-bug

Andreas Schwab <schwab@suse.de> writes:

> Frank Schmitt <ich@frank-schmitt.net> writes:
>
>> In Emacs 23 in the mail from my boss I get
>> Char: � (4194300, #o17777774, #x3ffffc, raw-byte) point=212 of 461 (46%) column=4
>
> How is that character encoded in the mail?

You mean the headers of the mail?

Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

and in the working one 

Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

-- 
Have you ever considered how much text can fit in eighty columns?  Given that a
signature typically contains up to four lines of text, this space allows you to
attach a tremendous amount of valuable information to your messages.  Seize the
opportunity and don't waste your signature on bullshit that nobody cares about.








^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
  2008-10-16  8:36     ` Frank Schmitt
@ 2008-10-16 17:39       ` Eli Zaretskii
  2008-10-16 18:00         ` Frank Schmitt
  2008-10-17  8:28         ` Andreas Schwab
  2008-10-17  5:15       ` Richard M. Stallman
  2008-10-28 22:54       ` James Cloos
  2 siblings, 2 replies; 39+ messages in thread
From: Eli Zaretskii @ 2008-10-16 17:39 UTC (permalink / raw)
  To: Frank Schmitt, 1174; +Cc: emacs-pretest-bug, bug-gnu-emacs

> From: Frank Schmitt <ich@frank-schmitt.net>
> Date: Thu, 16 Oct 2008 10:36:36 +0200
> Cc: emacs-pretest-bug@gnu.org
> 
> Andreas Schwab <schwab@suse.de> writes:
> 
> > Frank Schmitt <ich@frank-schmitt.net> writes:
> >
> >> In Emacs 23 in the mail from my boss I get
> >> Char: � (4194300, #o17777774, #x3ffffc, raw-byte) point=212 of 461 (46%) column=4
> >
> > How is that character encoded in the mail?
> 
> You mean the headers of the mail?

No, Andreas means what sequence of 8-bit bytes was used to encode the
character in the original message?  A utf-8 encoding uses several
bytes to encode a single character; can you please show those bytes
from the body of the original mail message?







^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
  2008-10-16 17:39       ` Eli Zaretskii
@ 2008-10-16 18:00         ` Frank Schmitt
  2008-10-16 18:15           ` Eli Zaretskii
  2008-10-17  8:28         ` Andreas Schwab
  1 sibling, 1 reply; 39+ messages in thread
From: Frank Schmitt @ 2008-10-16 18:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-pretest-bug, bug-gnu-emacs, 1174

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii, Size: 1176 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Frank Schmitt <ich@frank-schmitt.net>
>> Date: Thu, 16 Oct 2008 10:36:36 +0200
>> Cc: emacs-pretest-bug@gnu.org
>> 
>> Andreas Schwab <schwab@suse.de> writes:
>> 
>> > Frank Schmitt <ich@frank-schmitt.net> writes:
>> >
>> >> In Emacs 23 in the mail from my boss I get
>> >> Char: � (4194300, #o17777774, #x3ffffc, raw-byte) point=212 of 461 (46%) column=4
>> >
>> > How is that character encoded in the mail?
>> 
>> You mean the headers of the mail?
>
> No, Andreas means what sequence of 8-bit bytes was used to encode the
> character in the original message?  A utf-8 encoding uses several
> bytes to encode a single character; can you please show those bytes
> from the body of the original mail message?

But how should I do this? Just yanking? Here we go: ü

-- 
Have you ever considered how much text can fit in eighty columns?  Given that a
signature typically contains up to four lines of text, this space allows you to
attach a tremendous amount of valuable information to your messages.  Seize the
opportunity and don't waste your signature on bullshit that nobody cares about.






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
  2008-10-16 18:00         ` Frank Schmitt
@ 2008-10-16 18:15           ` Eli Zaretskii
  2008-10-16 20:05             ` Frank Schmitt
  0 siblings, 1 reply; 39+ messages in thread
From: Eli Zaretskii @ 2008-10-16 18:15 UTC (permalink / raw)
  To: Frank Schmitt; +Cc: emacs-pretest-bug, bug-gnu-emacs, 1174

> From: Frank Schmitt <ich@frank-schmitt.net>
> Cc: 1174@emacsbugs.donarmstrong.com,  bug-gnu-emacs@gnu.org,  emacs-pretest-bug@gnu.org
> Date: Thu, 16 Oct 2008 20:00:07 +0200
> 
> > No, Andreas means what sequence of 8-bit bytes was used to encode the
> > character in the original message?  A utf-8 encoding uses several
> > bytes to encode a single character; can you please show those bytes
> > from the body of the original mail message?
> 
> But how should I do this?

Do you still have the original mail text, or can access it?  If so,
something like "od -c" (outside Emacs) on that text or
"M-x find-file-literally" inside Emacs will show you the original
encoding of that character.






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
  2008-10-16 18:15           ` Eli Zaretskii
@ 2008-10-16 20:05             ` Frank Schmitt
  0 siblings, 0 replies; 39+ messages in thread
From: Frank Schmitt @ 2008-10-16 20:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-pretest-bug, bug-gnu-emacs, 1174

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Frank Schmitt <ich@frank-schmitt.net>
>> Cc: 1174@emacsbugs.donarmstrong.com,  bug-gnu-emacs@gnu.org,  emacs-pretest-bug@gnu.org
>> Date: Thu, 16 Oct 2008 20:00:07 +0200
>> 
>> > No, Andreas means what sequence of 8-bit bytes was used to encode the
>> > character in the original message?  A utf-8 encoding uses several
>> > bytes to encode a single character; can you please show those bytes
>> > from the body of the original mail message?
>> 
>> But how should I do this?
>
> Do you still have the original mail text, or can access it?  If so,
> something like "od -c" (outside Emacs) on that text or
> "M-x find-file-literally" inside Emacs will show you the original
> encoding of that character.

I opened the mail again in Gnus, saved it and saved with no-conversion
and said od -c on it. Here is the relevant part

0001100   i   l   i   e   n   .       I   c   h       w 374   s   s   t
0001120   e       a   u   c   h       n   i   c   h   t   ,       w   i

when I save the same buffer from Emacs 22 I get something different

0001100  \n   B   r   a   s   i   l   i   e   n   .       I   c   h    
0001120   w 303 274   s   s   t   e       a   u   c   h       n   i   c
0001140   h   t   ,       w   i   e       w   i   r       a   u   s    

so I went to my mail server and did it on the raw message there

0012240      \n   B   r   a   s   i   l   i   e   n   .       I   c   h
0012260       w 303 274   s   s   t   e       a   u   c   h       n   i
0012300   c   h   t   ,       w   i   e       w   i   r       a   u   s

so I guess the last two are correct.

-- 
Have you ever considered how much text can fit in eighty columns?  Given that a
signature typically contains up to four lines of text, this space allows you to
attach a tremendous amount of valuable information to your messages.  Seize the
opportunity and don't waste your signature on bullshit that nobody cares about.







^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
  2008-10-16  8:36     ` Frank Schmitt
  2008-10-16 17:39       ` Eli Zaretskii
@ 2008-10-17  5:15       ` Richard M. Stallman
  2008-10-28 22:54       ` James Cloos
  2 siblings, 0 replies; 39+ messages in thread
From: Richard M. Stallman @ 2008-10-17  5:15 UTC (permalink / raw)
  To: Frank Schmitt, 1174; +Cc: emacs-pretest-bug, bug-gnu-emacs, bug-submit-list

It seems to me that coding system names should be interpreted in a
case-insensitive manner, always.






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
  2008-10-16 17:39       ` Eli Zaretskii
  2008-10-16 18:00         ` Frank Schmitt
@ 2008-10-17  8:28         ` Andreas Schwab
  2008-10-17  8:52           ` Frank Schmitt
  2008-10-17  9:12           ` Frank Schmitt
  1 sibling, 2 replies; 39+ messages in thread
From: Andreas Schwab @ 2008-10-17  8:28 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-pretest-bug, bug-gnu-emacs, 1174, Frank Schmitt

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Frank Schmitt <ich@frank-schmitt.net>
>> Date: Thu, 16 Oct 2008 10:36:36 +0200
>> Cc: emacs-pretest-bug@gnu.org
>> 
>> Andreas Schwab <schwab@suse.de> writes:
>> 
>> > Frank Schmitt <ich@frank-schmitt.net> writes:
>> >
>> >> In Emacs 23 in the mail from my boss I get
>> >> Char: � (4194300, #o17777774, #x3ffffc, raw-byte) point=212 of 461 (46%) column=4
>> >
>> > How is that character encoded in the mail?
>> 
>> You mean the headers of the mail?
>
> No, Andreas means what sequence of 8-bit bytes was used to encode the
> character in the original message?

It was acatually the right hint.  ISTR there was a problem in Gnus with
8-bit encoded articles, but all I could find is
<http://permalink.gmane.org/gmane.emacs.gnus.general/67112>.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."







^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
  2008-10-17  8:28         ` Andreas Schwab
@ 2008-10-17  8:52           ` Frank Schmitt
  2008-10-17  9:53             ` Eli Zaretskii
  2008-10-17  9:12           ` Frank Schmitt
  1 sibling, 1 reply; 39+ messages in thread
From: Frank Schmitt @ 2008-10-17  8:52 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: emacs-pretest-bug, 1174, bug-gnu-emacs

Andreas Schwab <schwab@suse.de> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>>> From: Frank Schmitt <ich@frank-schmitt.net>
>>> Date: Thu, 16 Oct 2008 10:36:36 +0200
>>> Cc: emacs-pretest-bug@gnu.org
>>> 
>>> Andreas Schwab <schwab@suse.de> writes:
>>> 
>>> > Frank Schmitt <ich@frank-schmitt.net> writes:
>>> >
>>> >> In Emacs 23 in the mail from my boss I get
>>> >> Char: � (4194300, #o17777774, #x3ffffc, raw-byte) point=212 of 461 (46%) column=4
>>> >
>>> > How is that character encoded in the mail?
>>> 
>>> You mean the headers of the mail?
>>
>> No, Andreas means what sequence of 8-bit bytes was used to encode the
>> character in the original message?
>
> It was acatually the right hint.  ISTR there was a problem in Gnus with
> 8-bit encoded articles, but all I could find is
> <http://permalink.gmane.org/gmane.emacs.gnus.general/67112>.

Yes, this seems to be the same problem I am seeing. Any idea on how I
could further debug this? It is a real show-stopper for me.

-- 
Have you ever considered how much text can fit in eighty columns?  Given that a
signature typically contains up to four lines of text, this space allows you to
attach a tremendous amount of valuable information to your messages.  Seize the
opportunity and don't waste your signature on bullshit that nobody cares about.







^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
  2008-10-17  8:28         ` Andreas Schwab
  2008-10-17  8:52           ` Frank Schmitt
@ 2008-10-17  9:12           ` Frank Schmitt
  1 sibling, 0 replies; 39+ messages in thread
From: Frank Schmitt @ 2008-10-17  9:12 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: emacs-pretest-bug, 1174, bug-gnu-emacs

Andreas Schwab <schwab@suse.de> writes:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>>> From: Frank Schmitt <ich@frank-schmitt.net>
>>> Date: Thu, 16 Oct 2008 10:36:36 +0200
>>> Cc: emacs-pretest-bug@gnu.org
>>> 
>>> Andreas Schwab <schwab@suse.de> writes:
>>> 
>>> > Frank Schmitt <ich@frank-schmitt.net> writes:
>>> >
>>> >> In Emacs 23 in the mail from my boss I get
>>> >> Char: � (4194300, #o17777774, #x3ffffc, raw-byte) point=212 of 461 (46%) column=4
>>> >
>>> > How is that character encoded in the mail?
>>> 
>>> You mean the headers of the mail?
>>
>> No, Andreas means what sequence of 8-bit bytes was used to encode the
>> character in the original message?
>
> It was acatually the right hint.  ISTR there was a problem in Gnus with
> 8-bit encoded articles, but all I could find is
> <http://permalink.gmane.org/gmane.emacs.gnus.general/67112>.

One more data point: It only happens in Mails fetched via nnimap. If I
reproduce with nntp (that means if I send a news articles with 8bit
encoding and UTF-8) everything is displayed correctly. 

As stated in the bug report cited by Andreas above nnimap does have a function

(defsubst imap-disable-multibyte ()
  "Enable multibyte in the current buffer."
  (when (fboundp 'set-buffer-multibyte)
    (set-buffer-multibyte nil)))

which is called for every buffer involved (see e.g. imap-open). Might
this be the problem?

-- 
Have you ever considered how much text can fit in eighty columns?  Given that a
signature typically contains up to four lines of text, this space allows you to
attach a tremendous amount of valuable information to your messages.  Seize the
opportunity and don't waste your signature on bullshit that nobody cares about.






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
  2008-10-17  8:52           ` Frank Schmitt
@ 2008-10-17  9:53             ` Eli Zaretskii
  2008-10-17 10:02               ` Frank Schmitt
  2008-10-17 10:08               ` Frank Schmitt
  0 siblings, 2 replies; 39+ messages in thread
From: Eli Zaretskii @ 2008-10-17  9:53 UTC (permalink / raw)
  To: Frank Schmitt; +Cc: schwab, bug-gnu-emacs, 1174, emacs-pretest-bug

> From: Frank Schmitt <ich@frank-schmitt.net>
> Cc: Eli Zaretskii <eliz@gnu.org>,  1174@emacsbugs.donarmstrong.com,  emacs-pretest-bug@gnu.org,  bug-gnu-emacs@gnu.org
> Date: Fri, 17 Oct 2008 10:52:15 +0200
> 
> Andreas Schwab <schwab@suse.de> writes:
> 
> > Eli Zaretskii <eliz@gnu.org> writes:
> >
> >>> From: Frank Schmitt <ich@frank-schmitt.net>
> >>> Date: Thu, 16 Oct 2008 10:36:36 +0200
> >>> Cc: emacs-pretest-bug@gnu.org
> >>> 
> >>> Andreas Schwab <schwab@suse.de> writes:
> >>> 
> >>> > Frank Schmitt <ich@frank-schmitt.net> writes:
> >>> >
> >>> >> In Emacs 23 in the mail from my boss I get
> >>> >> Char: � (4194300, #o17777774, #x3ffffc, raw-byte) point=212 of 461 (46%) column=4
> >>> >
> >>> > How is that character encoded in the mail?
> >>> 
> >>> You mean the headers of the mail?
> >>
> >> No, Andreas means what sequence of 8-bit bytes was used to encode the
> >> character in the original message?
> >
> > It was acatually the right hint.  ISTR there was a problem in Gnus with
> > 8-bit encoded articles, but all I could find is
> > <http://permalink.gmane.org/gmane.emacs.gnus.general/67112>.
> 
> Yes, this seems to be the same problem I am seeing. Any idea on how I
> could further debug this? It is a real show-stopper for me.

Is it true that this happens when the charset is stated as UTF-8
(upper-case), but works if it is utf-8 (lower-case)?  If so, then
Gnus, or one of the functions it calls, doesn't downcase the charset
name before it intern's it, to get the coding-system.

Or is the problem the "; format=flowed" thingy?






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
  2008-10-17  9:53             ` Eli Zaretskii
@ 2008-10-17 10:02               ` Frank Schmitt
  2008-10-17 10:08               ` Frank Schmitt
  1 sibling, 0 replies; 39+ messages in thread
From: Frank Schmitt @ 2008-10-17 10:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, bug-gnu-emacs, 1174, emacs-pretest-bug

Eli Zaretskii <eliz@gnu.org> writes:

> Is it true that this happens when the charset is stated as UTF-8
> (upper-case), but works if it is utf-8 (lower-case)?  If so, then
> Gnus, or one of the functions it calls, doesn't downcase the charset
> name before it intern's it, to get the coding-system.
>
> Or is the problem the "; format=flowed" thingy?

I think neither of both. It must be a problem of raw 8bit encoded UTF-8
specific to nnimap.

-- 
Have you ever considered how much text can fit in eighty columns?  Given that a
signature typically contains up to four lines of text, this space allows you to
attach a tremendous amount of valuable information to your messages.  Seize the
opportunity and don't waste your signature on bullshit that nobody cares about.







^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
  2008-10-17  9:53             ` Eli Zaretskii
  2008-10-17 10:02               ` Frank Schmitt
@ 2008-10-17 10:08               ` Frank Schmitt
  2008-10-17 15:39                 ` Stefan Monnier
  1 sibling, 1 reply; 39+ messages in thread
From: Frank Schmitt @ 2008-10-17 10:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: schwab, bug-gnu-emacs, 1174, emacs-pretest-bug

Eli Zaretskii <eliz@gnu.org> writes:

>> > It was acatually the right hint.  ISTR there was a problem in Gnus with
>> > 8-bit encoded articles, but all I could find is
>> > <http://permalink.gmane.org/gmane.emacs.gnus.general/67112>.
>> 
>> Yes, this seems to be the same problem I am seeing. Any idea on how I
>> could further debug this? It is a real show-stopper for me.

Got it!

In lisp/gnus/nnimap.el there is 

(defun nnimap-demule (string)
  ;; BEWARE: we used to use string-as-multibyte here which is braindead
  ;; because it will turn accidental emacs-mule-valid byte sequences
  ;; into multibyte chars.  --Stef
  ;; Reverted, braindead got 7.5 out of 10 on imdb, so it can't be
  ;; that bad. --Simon
  (funcall (if (and (fboundp 'string-as-multibyte)
		    (subrp (symbol-function 'string-as-multibyte)))
	       'string-as-multibyte
	     'identity)
	   (or string "")))

which is called at several places. And this breaks it. If I change this
function so that string is not changed, my mails are displayed correctly.

-- 
Have you ever considered how much text can fit in eighty columns?  Given that a
signature typically contains up to four lines of text, this space allows you to
attach a tremendous amount of valuable information to your messages.  Seize the
opportunity and don't waste your signature on bullshit that nobody cares about.







^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
  2008-10-17 10:08               ` Frank Schmitt
@ 2008-10-17 15:39                 ` Stefan Monnier
  2008-10-17 18:19                   ` Reiner Steib
                                     ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Stefan Monnier @ 2008-10-17 15:39 UTC (permalink / raw)
  To: Frank Schmitt; +Cc: schwab, 1174, bug-gnu-emacs, emacs-pretest-bug

>   ;; BEWARE: we used to use string-as-multibyte here which is braindead
>   ;; because it will turn accidental emacs-mule-valid byte sequences
>   ;; into multibyte chars.  --Stef
>   ;; Reverted, braindead got 7.5 out of 10 on imdb, so it can't be
>   ;; that bad. --Simon

Who's this Simon who reverted my change without even explaining why?


        Stefan







^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
  2008-10-17 15:39                 ` Stefan Monnier
@ 2008-10-17 18:19                   ` Reiner Steib
       [not found]                   ` <871vyf6q40.fsf@marauder.physik.uni-ulm.de>
  2008-10-23  7:24                   ` Frank Schmitt
  2 siblings, 0 replies; 39+ messages in thread
From: Reiner Steib @ 2008-10-17 18:19 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 1174, Frank Schmitt, ding, Simon Josefsson

On Fri, Oct 17 2008, Stefan Monnier wrote:

>>   ;; BEWARE: we used to use string-as-multibyte here which is braindead
>>   ;; because it will turn accidental emacs-mule-valid byte sequences
>>   ;; into multibyte chars.  --Stef
>>   ;; Reverted, braindead got 7.5 out of 10 on imdb, so it can't be
>>   ;; that bad. --Simon
>
> Who's this Simon who reverted my change without even explaining why?

The one who's listed as the author of nnimap.el (cc-ed).

>> which is called at several places. And this breaks it. If I change
>> this function so that string is not changed, my mails are displayed
>> correctly.

Does it work correctly when using Stefan's version?
( s/string-as-multibyte/string-to-multibyte/ ...)

(defun nnimap-demule (string)
  ;; BEWARE: we used to use string-as-multibyte here which is braindead
  ;; because it will turn accidental emacs-mule-valid byte sequences
  ;; into multibyte chars.  --Stef
  (funcall (if (and (fboundp 'string-to-multibyte)
		    (subrp (symbol-function 'string-to-multibyte)))
	       'string-to-multibyte
	     'identity)
	   (or string "")))

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
       [not found]                   ` <871vyf6q40.fsf@marauder.physik.uni-ulm.de>
@ 2008-10-17 18:36                     ` Frank Schmitt
       [not found]                     ` <m3ej2fdq5x.fsf@mid.gehheimdienst.de>
  1 sibling, 0 replies; 39+ messages in thread
From: Frank Schmitt @ 2008-10-17 18:36 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 1174, ding, Simon Josefsson

Reiner Steib <reinersteib+gmane@imap.cc> writes:

> On Fri, Oct 17 2008, Stefan Monnier wrote:
>
>>>   ;; BEWARE: we used to use string-as-multibyte here which is braindead
>>>   ;; because it will turn accidental emacs-mule-valid byte sequences
>>>   ;; into multibyte chars.  --Stef
>>>   ;; Reverted, braindead got 7.5 out of 10 on imdb, so it can't be
>>>   ;; that bad. --Simon
>>
>> Who's this Simon who reverted my change without even explaining why?
>
> The one who's listed as the author of nnimap.el (cc-ed).
>
>>> which is called at several places. And this breaks it. If I change
>>> this function so that string is not changed, my mails are displayed
>>> correctly.
>
> Does it work correctly when using Stefan's version?
> ( s/string-as-multibyte/string-to-multibyte/ ...)
>
> (defun nnimap-demule (string)
>   ;; BEWARE: we used to use string-as-multibyte here which is braindead
>   ;; because it will turn accidental emacs-mule-valid byte sequences
>   ;; into multibyte chars.  --Stef
>   (funcall (if (and (fboundp 'string-to-multibyte)
> 		    (subrp (symbol-function 'string-to-multibyte)))
> 	       'string-to-multibyte
> 	     'identity)
> 	   (or string "")))

Yes, it does.

Frank

-- 
Have you ever considered how much text can fit in eighty columns?  Given that a
signature typically contains up to four lines of text, this space allows you to
attach a tremendous amount of valuable information to your messages.  Seize the
opportunity and don't waste your signature on bullshit that nobody cares about.






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
  2008-10-17 15:39                 ` Stefan Monnier
  2008-10-17 18:19                   ` Reiner Steib
       [not found]                   ` <871vyf6q40.fsf@marauder.physik.uni-ulm.de>
@ 2008-10-23  7:24                   ` Frank Schmitt
  2008-10-23 13:58                     ` Stefan Monnier
  2 siblings, 1 reply; 39+ messages in thread
From: Frank Schmitt @ 2008-10-23  7:24 UTC (permalink / raw)
  To: emacs-pretest-bug; +Cc: bug-gnu-emacs

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>   ;; BEWARE: we used to use string-as-multibyte here which is braindead
>>   ;; because it will turn accidental emacs-mule-valid byte sequences
>>   ;; into multibyte chars.  --Stef
>>   ;; Reverted, braindead got 7.5 out of 10 on imdb, so it can't be
>>   ;; that bad. --Simon
>
> Who's this Simon who reverted my change without even explaining why?

So, what's the outcome? We found the place where the bug occurs, I can
confirm that with string-to-multibyte instead of string-as-multibyte
everything works as expected, what's next?

-- 
Have you ever considered how much text can fit in eighty columns?  Given that a
signature typically contains up to four lines of text, this space allows you to
attach a tremendous amount of valuable information to your messages.  Seize the
opportunity and don't waste your signature on bullshit that nobody cares about.







^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
  2008-10-23  7:24                   ` Frank Schmitt
@ 2008-10-23 13:58                     ` Stefan Monnier
  0 siblings, 0 replies; 39+ messages in thread
From: Stefan Monnier @ 2008-10-23 13:58 UTC (permalink / raw)
  To: Frank Schmitt; +Cc: emacs-pretest-bug, bug-gnu-emacs, 1174

>>> ;; BEWARE: we used to use string-as-multibyte here which is braindead
>>> ;; because it will turn accidental emacs-mule-valid byte sequences
>>> ;; into multibyte chars.  --Stef
>>> ;; Reverted, braindead got 7.5 out of 10 on imdb, so it can't be
>>> ;; that bad. --Simon
>> 
>> Who's this Simon who reverted my change without even explaining why?

> So, what's the outcome? We found the place where the bug occurs, I can
> confirm that with string-to-multibyte instead of string-as-multibyte
> everything works as expected, what's next?

Next is to get this Simon to explain his change, so we can resolve the
conflict between mine and his.


        Stefan






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
  2008-10-16  8:36     ` Frank Schmitt
  2008-10-16 17:39       ` Eli Zaretskii
  2008-10-17  5:15       ` Richard M. Stallman
@ 2008-10-28 22:54       ` James Cloos
  2 siblings, 0 replies; 39+ messages in thread
From: James Cloos @ 2008-10-28 22:54 UTC (permalink / raw)
  To: 1174

>>>>> "Frank" == Frank Schmitt <ich@frank-schmitt.net> writes:

[Catching up on mail backlog -JimC]

Frank> You mean the headers of the mail?

Frank> Content-Type: text/plain; charset=UTF-8; format=flowed
Frank> Content-Transfer-Encoding: 8bit

Frank> and in the working one 

Frank> Content-Type: text/plain; charset=utf-8
Frank> Content-Transfer-Encoding: quoted-printable

I reported this bug on the ding list some time back.

Shortly after the unicode-2 branch was merged into trunk changes were
made to gnus wrt unibyte vs multibyte buffers and/or strings at the
stage where it decodes article bodies.

The change broke much (all?) UTF-8 mail which has an 8bit CTE.

It had worked fine (in the unicode-2 branch) until then.

-JimC
-- 
James Cloos <cloos@jhcloos.com>         OpenPGP: 1024D/ED7DAEA6






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
       [not found]                     ` <m3ej2fdq5x.fsf@mid.gehheimdienst.de>
@ 2008-11-29 12:08                       ` Reiner Steib
       [not found]                       ` <87k5ampwej.fsf@marauder.physik.uni-ulm.de>
  1 sibling, 0 replies; 39+ messages in thread
From: Reiner Steib @ 2008-11-29 12:08 UTC (permalink / raw)
  To: Simon Josefsson; +Cc: 1174, Frank Schmitt, ding

On Fri, Oct 17 2008, Frank Schmitt wrote:

> Reiner Steib <reinersteib+gmane@imap.cc> writes:
>
>> On Fri, Oct 17 2008, Stefan Monnier wrote:
>>
>>>>   ;; BEWARE: we used to use string-as-multibyte here which is braindead
>>>>   ;; because it will turn accidental emacs-mule-valid byte sequences
>>>>   ;; into multibyte chars.  --Stef
>>>>   ;; Reverted, braindead got 7.5 out of 10 on imdb, so it can't be
>>>>   ;; that bad. --Simon

Simon, could you please clarify why you reverted Stefan's change in
`nnimap-demule'?  It breaks reading UTF-8 articles via nnimap.el in
Emacs 23.

>>> Who's this Simon who reverted my change without even explaining why?
>>
>> The one who's listed as the author of nnimap.el (cc-ed).
>>
>>>> which is called at several places. And this breaks it. If I change
>>>> this function so that string is not changed, my mails are displayed
>>>> correctly.
>>
>> Does it work correctly when using Stefan's version?
>> ( s/string-as-multibyte/string-to-multibyte/ ...)
>>
>> (defun nnimap-demule (string)
>>   ;; BEWARE: we used to use string-as-multibyte here which is braindead
>>   ;; because it will turn accidental emacs-mule-valid byte sequences
>>   ;; into multibyte chars.  --Stef
>>   (funcall (if (and (fboundp 'string-to-multibyte)
>> 		    (subrp (symbol-function 'string-to-multibyte)))
>> 	       'string-to-multibyte
>> 	     'identity)
>> 	   (or string "")))
>
> Yes, it does.

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
       [not found]                       ` <87k5ampwej.fsf@marauder.physik.uni-ulm.de>
@ 2008-11-29 12:18                         ` Simon Josefsson
  2008-12-01 21:04                         ` Stefan Monnier
                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 39+ messages in thread
From: Simon Josefsson @ 2008-11-29 12:18 UTC (permalink / raw)
  To: Frank Schmitt; +Cc: 1174, ding

Reiner Steib <reinersteib+gmane@imap.cc> writes:

> On Fri, Oct 17 2008, Frank Schmitt wrote:
>
>> Reiner Steib <reinersteib+gmane@imap.cc> writes:
>>
>>> On Fri, Oct 17 2008, Stefan Monnier wrote:
>>>
>>>>>   ;; BEWARE: we used to use string-as-multibyte here which is braindead
>>>>>   ;; because it will turn accidental emacs-mule-valid byte sequences
>>>>>   ;; into multibyte chars.  --Stef
>>>>>   ;; Reverted, braindead got 7.5 out of 10 on imdb, so it can't be
>>>>>   ;; that bad. --Simon
>
> Simon, could you please clarify why you reverted Stefan's change in
> `nnimap-demule'?  It breaks reading UTF-8 articles via nnimap.el in
> Emacs 23.

I don't recall, but people should notice relatively quickly if there are
problems in this area (wrong display of non-ascii) so feel free to
revert the patch or apply another patch instead.  It needs to be tested
under Emacs 22 too, though, if it is installed in the Gnus CVS.

/Simon






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
       [not found]                         ` <878wr2ohde.fsf@mocca.josefsson.org>
@ 2008-11-29 15:30                           ` Reiner Steib
       [not found]                           ` <87d4geim7p.fsf@marauder.physik.uni-ulm.de>
  2009-01-12 10:54                           ` Simon Josefsson
  2 siblings, 0 replies; 39+ messages in thread
From: Reiner Steib @ 2008-11-29 15:30 UTC (permalink / raw)
  To: Simon Josefsson; +Cc: ding, Clemens Schueller, 1174, Frank Schmitt

On Sat, Nov 29 2008, Simon Josefsson wrote:

> Reiner Steib <reinersteib+gmane@imap.cc> writes:
>>>>>>   ;; BEWARE: we used to use string-as-multibyte here which is braindead
>>>>>>   ;; because it will turn accidental emacs-mule-valid byte sequences
>>>>>>   ;; into multibyte chars.  --Stef
>>>>>>   ;; Reverted, braindead got 7.5 out of 10 on imdb, so it can't be
>>>>>>   ;; that bad. --Simon
>>
>> Simon, could you please clarify why you reverted Stefan's change in
>> `nnimap-demule'?  It breaks reading UTF-8 articles via nnimap.el in
>> Emacs 23.
>
> I don't recall, but people should notice relatively quickly if there are
> problems in this area (wrong display of non-ascii) 

Hm, both changes happened 2004, but I don't recall any reports beside
the recent ones in 2008:

  http://thread.gmane.org/gmane.emacs.gnus.general/67112
  (bug#464, reported by James Cloos)
  http://thread.gmane.org/gmane.emacs.bugs/21524
  (bug#1174, reported by Frank Schmitt)
  
  ----------------------------
  revision 7.9
  date: 2004-09-13 13:52:48 +0200;  author: jas;  state: Exp;  lines: +5 -3
  (nnimap-demule): Revert 2004-08-30 change.
  ----------------------------

> so feel free to revert the patch or apply another patch instead.  It
> needs to be tested under Emacs 22 too, though,

I never saw this problem myself.  I cannot see any difference with a
few UTF-8 articles (C-T-E: 8bit, [1]), neither with Emacs 22 (with
current Gnus trunk) or Emacs trunk (Gnus 5.13 from there).  Stefan,
what are the "accidental emacs-mule-valid byte sequences" that trigger
this problem?  It would be good if someone could send me a problematic
article. [2]

I've just checked in (Gnus and Emacs) some code to debug this problem.
I'd like to ask those who saw the bug in Emacs 23 to test the articles
in question:

- With current Emacs 23 (Emacs CVS trunk)

- With Emacs 22 plus current Gnus CVS trunk (No Gnus)

- If you see wrong display: does it display correctly after evaluating
  the following:

  M-x gnus-backlog-shutdown RET

  (setq nnimap-demule-use-string-to-multibyte nil
        gnus-verbose 10)

  Check the *Messages* buffer for messages
  "nnimap-demule-use-string-to-multibyte: nil" to ensure that the
  article is decoded again with this setting.  You may need to
  re-enter the group.

> if it is installed in the Gnus CVS.

I don't want different code in Gnus and Emacs.  If all else fails, we
can make it conditional.

Bye, Reiner.

[1] Cc-s of the following articles (available on Gmane) from Aidan Kehoe:
    <18492.30425.377545.700503@parhasard.net>
    <18518.43672.183610.662699@parhasard.net>
    <18712.43474.265690.792714@parhasard.net>

    Non-ascii characters in the attribution line ("scríobh"), the
    signature ("¿Dónde estará ahora mi sobrino Yoghurtu Nghé ..."),
    and "’", "İ", "ı"

[2] (push '(utf-8 . 8bit) mm-body-charset-encoding-alist)
    --> trying to produce a problematic article here:
    AE-Ä OE-Ö UE-Ü ae-ä oe-ö ue-ü ss-ß

    Should be sent with: 
      Content-Type: text/plain; charset=utf-8
      Content-Transfer-Encoding: 8bit
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
       [not found]                           ` <87d4geim7p.fsf@marauder.physik.uni-ulm.de>
@ 2008-11-29 21:30                             ` Stefan Monnier
  2008-11-29 22:14                             ` James Cloos
       [not found]                             ` <jwvfxlaz0ov.fsf-monnier+emacsbugreports@gnu.org>
  2 siblings, 0 replies; 39+ messages in thread
From: Stefan Monnier @ 2008-11-29 21:30 UTC (permalink / raw)
  To: Reiner Steib
  Cc: Simon Josefsson, Clemens Schueller, Frank Schmitt, 1174, ding

> I never saw this problem myself.  I cannot see any difference with a
> few UTF-8 articles (C-T-E: 8bit, [1]), neither with Emacs 22 (with
> current Gnus trunk) or Emacs trunk (Gnus 5.13 from there).  Stefan,
> what are the "accidental emacs-mule-valid byte sequences" that trigger
> this problem?  It would be good if someone could send me a problematic
> article. [2]

In Emacs-22, the problem was more difficult to trigger: you had to
receive an email whose undecoded text contained emacs-mule escape
sequences, which is rather uncommon.  With Emacs-23, it's a lot more
common since the internal encoding has changed to a variant of utf-8:
an 8bit body using utf-8 will see its content unwillingly decoded during
nnimap-demule which leads to the bugs we've seen recently.

I'm pretty sure that string-as-multibyte is wrong here in general.
Maybe the problem is that nnimap-demule is used blindly in different
contexts where some need string-to-multibyte and some need
string-as-multibyte.  E.g. maybe Simon's problem was linked to imap
groups with non-ASCII chars in their names, rather than in the
message bodies.


        Stefan






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
       [not found]                           ` <87d4geim7p.fsf@marauder.physik.uni-ulm.de>
  2008-11-29 21:30                             ` Stefan Monnier
@ 2008-11-29 22:14                             ` James Cloos
       [not found]                             ` <jwvfxlaz0ov.fsf-monnier+emacsbugreports@gnu.org>
  2 siblings, 0 replies; 39+ messages in thread
From: James Cloos @ 2008-11-29 22:14 UTC (permalink / raw)
  To: Reiner Steib
  Cc: Simon Josefsson, Clemens Schueller, Frank Schmitt, 1174, ding

The easiest test case are utf-8 emails which are 8bit mime and which
fail to specify their charset, or specify utf-8 in some incorrect way.

The most common example may be commit messgages; the scripts usually do
not specify a charset since they commit messages and src could be in any
encoding and they send them out as is.

NB that this bug never showed up for me when using the unicode-2 branch
before that was pulled into mainline.  Some unibyte vs multibyte change
was made after that happened which started triggering the bug.

A possibly related issue is that (gnus-article-treat-dumbquotes) also
does not work for me anymore.  I do specify my own gnus-article-dumbquotes-map
in ~/.gnus, but the syntax is the same as what is still (defvar)ed in
gnus-art.el.  (I use utf-8 results rather than ascii fallbacks.)
The symptom is that the strings such as "\221" do not match the relevant
octets in the *Article* buffer, as they used to do.  This also stopped
working at the same time.

My goal would be for gnus to treat messages and mime blocks w/o a
charset as utf-8 rather than ascii by default.  That does work for qp
and base64, just not for 8bitmime.  Unless Stefan's patch is applied.
But it also must be easy to manually tell it to use some other charset
when one recognizes the need.  I can use (gnus-article-view-part-as-charset)
for a mime part (at least w/ Stefan's patch) but haven't managed to make
that work for a message w/o any attachments or inlines.

I'm running w/ Stefan's patch now and it works.  I'll try a new compile
tonight or tomorrow and test the committed code.

-JimC
-- 
James Cloos <cloos@jhcloos.com>         OpenPGP: 1024D/ED7DAEA6








^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
       [not found]                             ` <jwvfxlaz0ov.fsf-monnier+emacsbugreports@gnu.org>
@ 2008-11-30 13:12                               ` Reiner Steib
  0 siblings, 0 replies; 39+ messages in thread
From: Reiner Steib @ 2008-11-30 13:12 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: Simon Josefsson, Clemens Schueller, Frank Schmitt, 1174, ding

On Sat, Nov 29 2008, Stefan Monnier wrote:

>> Stefan, what are the "accidental emacs-mule-valid byte sequences"
>> that trigger this problem?  It would be good if someone could send
>> me a problematic article. [2]
>
> In Emacs-22, the problem was more difficult to trigger: you had to
> receive an email whose undecoded text contained emacs-mule escape
> sequences, which is rather uncommon.  With Emacs-23, it's a lot more
> common since the internal encoding has changed to a variant of utf-8:
> an 8bit body using utf-8 will see its content unwillingly decoded during
> nnimap-demule which leads to the bugs we've seen recently.

Could you send me an article demonstrating the problem?

> I'm pretty sure that string-as-multibyte is wrong here in general.
> Maybe the problem is that nnimap-demule is used blindly in different
> contexts where some need string-to-multibyte and some need
> string-as-multibyte.  E.g. maybe Simon's problem was linked to imap
> groups with non-ASCII chars in their names, rather than in the
> message bodies.

I'm not familiar with the IMAP code, but AFAICS, `nnimap-demule' is
only used when getting headers or body:

| nnimap.el:611:	      headers (nnimap-demule

(defun nnimap-retrieve-headers-progress ()
  "Hook to insert NOV line for current article into `nntp-server-buffer'."

| nnimap.el:951:	 (nnimap-demule

(defun nnimap-callback (article gnus-callback buffer)
  (when (eq article (imap-current-message))
    (remove-hook 'imap-fetch-data-hook
		 (nnimap-make-callback article gnus-callback buffer))
    (with-current-buffer buffer
      (insert
       (with-current-buffer nnimap-server-buffer
	 (nnimap-demule
	  (if (imap-capability 'IMAP4rev1)
	      ;; xxx don't just use car? alist doesn't contain
	      ;; anything else now, but it might...
	      (nth 2 (car (imap-message-get article 'BODYDETAIL)))
	    (imap-message-get article 'RFC822)))))
      (nnheader-ms-strip-cr)
      (funcall gnus-callback t))))

| nnimap.el:977:		(insert (nnimap-demule (if detail

(defun nnimap-request-article-part (article part prop &optional
					    group server to-buffer detail)

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
       [not found]                       ` <87k5ampwej.fsf@marauder.physik.uni-ulm.de>
  2008-11-29 12:18                         ` Simon Josefsson
@ 2008-12-01 21:04                         ` Stefan Monnier
       [not found]                         ` <jwv7i6j393g.fsf-monnier+emacsbugreports@gnu.org>
       [not found]                         ` <878wr2ohde.fsf@mocca.josefsson.org>
  3 siblings, 0 replies; 39+ messages in thread
From: Stefan Monnier @ 2008-12-01 21:04 UTC (permalink / raw)
  To: Simon Josefsson; +Cc: 1174, Frank Schmitt, ding

> Simon, could you please clarify why you reverted Stefan's change in
> `nnimap-demule'?  It breaks reading UTF-8 articles via nnimap.el in
> Emacs 23.

Having looked at the code again, I'm more than ever confident that
string-to-unibyte is the right thing to use.  Maybe the code I installed
back then failed to fallback to string-as-unibyte when string-to-unibyte
was not available, which caused a bug for Simon?

In any case the newly committed code has a prenthesis typo that makes
it still use the old code and ignore the new config var
nnimap-demule-use-string-to-multibyte.

Also I recommend to just use the patch below instead.  The first hunk
removes an unnecessary use of nnimap-demule since the output will be
inserted into a unibyte buffer.


        Stefan


--- nnimap.el.~1.50.~	2008-12-01 15:38:55.000000000 -0500
+++ nnimap.el	2008-12-01 15:49:53.000000000 -0500
@@ -608,12 +608,11 @@
       (with-current-buffer nnimap-server-buffer
 	(setq uid imap-current-message
 	      mbx imap-current-mailbox
-	      headers (nnimap-demule
-		       (if (imap-capability 'IMAP4rev1)
+	      headers (if (imap-capability 'IMAP4rev1)
 			   ;; xxx don't just use car? alist doesn't contain
 			   ;; anything else now, but it might...
 			   (nth 2 (car (imap-message-get uid 'BODYDETAIL)))
-			 (imap-message-get uid 'RFC822.HEADER)))
+                        (imap-message-get uid 'RFC822.HEADER))
 	      lines (imap-body-lines (imap-message-body imap-current-message))
 	      chars (imap-message-get imap-current-message 'RFC822.SIZE)))
       (nnheader-insert-nov
@@ -901,40 +900,17 @@
   (when (nnimap-possibly-change-server server)
     (nnoo-status-message 'nnimap server)))
 
-(defvar nnimap-demule-use-string-to-multibyte (fboundp 'string-to-multibyte)
-  "Temporary internal debug variable.
-If you have problems (UTF-8 not decoded correctly on IMAP) with
-the default value, please report it as a bug!")
-;; FIXME: Clarify if we need to make this variable conditional on the Emacs
-;; version (Emacs 22 vs. Emacs 23;Emacs 21 doesn't have `string-to-multibyte'
-;; anyhow).  --rsteib
-;;
-;; http://thread.gmane.org/gmane.emacs.gnus.general/67112
-;; (bug#464, reported by James Cloos)
-;; http://thread.gmane.org/gmane.emacs.bugs/21524
-;; (bug#1174, reported by Frank Schmitt)
-
-(defun nnimap-demule (string)
-  ;; BEWARE: we used to use string-as-multibyte here which is braindead
-  ;; because it will turn accidental emacs-mule-valid byte sequences
-  ;; into multibyte chars.  --Stef
-  ;; Reverted, braindead got 7.5 out of 10 on imdb, so it can't be
-  ;; that bad. --Simon
-  (gnus-message 9 "nnimap-demule-use-string-to-multibyte: %s"
-		nnimap-demule-use-string-to-multibyte)
-  (if nnimap-demule-use-string-to-multibyte
-      ;; Stefan
-      (funcall (if (and (fboundp 'string-to-multibyte)
-			(subrp (symbol-function 'string-to-multibyte)))
-		   'string-to-multibyte
-		 'identity)
-	       (or string "")))
-  ;; Simon
-  (funcall (if (and (fboundp 'string-as-multibyte)
-		    (subrp (symbol-function 'string-as-multibyte)))
-	       'string-as-multibyte
-	     'identity)
-	   (or string "")))
+;; We used to use a string-as-multibyte here, but it is really incorrect.
+;; This function is used when we're about to insert a unibyte string
+;; into a potentially multibyte buffer.  The string is either an article
+;; header or body (or both?), undecoded.  When Emacs is asked to convert
+;; a unibyte string to multibyte, it may either use the equivalent of
+;; nothing (e.g. non-Mule XEmacs), string-make-unibyte (i.e. decode using
+;; locale), string-as-multibyte (decode using emacs-internal coding system)
+;; or string-to-multibyte (keep the data undecoded as a sequence of bytes).
+;; Only the last one preserves the data such that we can reliably later on
+;; decode the text using the mime info.
+(defalias 'nnimap-demule 'mm-string-to-multibyte)
 
 (defun nnimap-make-callback (article gnus-callback buffer)
   "Return a callback function."






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
       [not found]                         ` <jwv7i6j393g.fsf-monnier+emacsbugreports@gnu.org>
@ 2008-12-01 22:48                           ` Reiner Steib
       [not found]                           ` <87y6yzbjgf.fsf@marauder.physik.uni-ulm.de>
  1 sibling, 0 replies; 39+ messages in thread
From: Reiner Steib @ 2008-12-01 22:48 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 1174, Frank Schmitt, ding, Simon Josefsson

On Mon, Dec 01 2008, Stefan Monnier wrote:

> Having looked at the code again, I'm more than ever confident that
> string-to-unibyte is the right thing to use.  Maybe the code I installed
> back then failed to fallback to string-as-unibyte when string-to-unibyte
> was not available, which caused a bug for Simon?

Yes, it didn't fall back to string-as-unibyte:

--- nnimap.el	17 Aug 2004 14:27:16 -0000	7.7
+++ nnimap.el	30 Aug 2004 18:13:58 -0000	7.8
[...]
@@ -845,9 +847,12 @@
     (nnoo-status-message 'nnimap server)))
 
 (defun nnimap-demule (string)
-  (funcall (if (and (fboundp 'string-as-multibyte)
-		    (subrp (symbol-function 'string-as-multibyte)))
-	       'string-as-multibyte
+  ;; BEWARE: we used to use string-as-multibyte here which is braindead
+  ;; because it will turn accidental emacs-mule-valid byte sequences
+  ;; into multibyte chars.  --Stef
+  (funcall (if (and (fboundp 'string-to-multibyte)
+		    (subrp (symbol-function 'string-to-multibyte)))
+	       'string-to-multibyte
 	     'identity)
 	   (or string "")))

> In any case the newly committed code has a prenthesis typo that makes
> it still use the old code and ignore the new config var
> nnimap-demule-use-string-to-multibyte.

Oops, stupid me.

> Also I recommend to just use the patch below instead.  The first hunk
> removes an unnecessary use of nnimap-demule since the output will be
> inserted into a unibyte buffer.

Thanks for your analysis. Please install the patch.  I'll pull it into
Gnus CVS ASAP (unless Miles syncs first).

> +;; We used to use a string-as-multibyte here, but it is really incorrect.
> +;; This function is used when we're about to insert a unibyte string
> +;; into a potentially multibyte buffer.  The string is either an article
> +;; header or body (or both?), undecoded.  When Emacs is asked to convert
> +;; a unibyte string to multibyte, it may either use the equivalent of
> +;; nothing (e.g. non-Mule XEmacs), string-make-unibyte (i.e. decode using
> +;; locale), string-as-multibyte (decode using emacs-internal coding system)
> +;; or string-to-multibyte (keep the data undecoded as a sequence of bytes).
> +;; Only the last one preserves the data such that we can reliably later on
> +;; decode the text using the mime info.
> +(defalias 'nnimap-demule 'mm-string-to-multibyte)

In Emacs 21 (which Gnus still aim to be compatible with), we have
string-as-multibyte, but not string-to-multibyte.  So your proposed
code (i.e. mm-string-to-multibyte) runs
  (string-as-multibyte (char-to-string string))
whereas we used to run
  (string-as-multibyte string)
Does char-to-string matter here?

(defalias 'mm-string-to-multibyte
  (cond
   ((featurep 'xemacs)
    'identity)
   ((fboundp 'string-to-multibyte)
    'string-to-multibyte)
   (t
    (lambda (string)
      "Return a multibyte string with the same individual chars as string."
      (mapconcat
       (lambda (ch) (mm-string-as-multibyte (char-to-string ch)))
       string "")))))

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
       [not found]                           ` <87y6yzbjgf.fsf@marauder.physik.uni-ulm.de>
@ 2008-12-02  7:36                             ` Stefan Monnier
       [not found]                             ` <jwv3ah7av77.fsf-monnier+emacsbugreports@gnu.org>
  1 sibling, 0 replies; 39+ messages in thread
From: Stefan Monnier @ 2008-12-02  7:36 UTC (permalink / raw)
  To: Simon Josefsson; +Cc: 1174, Frank Schmitt, ding

> In Emacs 21 (which Gnus still aim to be compatible with), we have
> string-as-multibyte, but not string-to-multibyte.  So your proposed
> code (i.e. mm-string-to-multibyte) runs
>   (string-as-multibyte (char-to-string string))
> whereas we used to run
>   (string-as-multibyte string)
> Does char-to-string matter here?

> (defalias 'mm-string-to-multibyte
>   (cond
>    ((featurep 'xemacs)
>     'identity)
>    ((fboundp 'string-to-multibyte)
>     'string-to-multibyte)
>    (t
>     (lambda (string)
>       "Return a multibyte string with the same individual chars as string."
>       (mapconcat
>        (lambda (ch) (mm-string-as-multibyte (char-to-string ch)))
>        string "")))))

Oh, that's clever: yes, the mapconcat/char-to-string dance does make it
implement the string-to-multibyte behavior because doing the
string-as-multibyte conversion one byte at a time avoids the
problematic case.  To quote myself from mm-util.el:

     ;; string-as-multibyte often doesn't really do what you think it does.
     ;; Example:
     ;;    (aref (string-as-multibyte "\201") 0) -> 129 (aka ?\201)
     ;;    (aref (string-as-multibyte "\300") 0) -> 192 (aka ?\300)
     ;;    (aref (string-as-multibyte "\300\201") 0) -> 192 (aka ?\300)
     ;;    (aref (string-as-multibyte "\300\201") 1) -> 129 (aka ?\201)
     ;; but
     ;;    (aref (string-as-multibyte "\201\300") 0) -> 2240
     ;;    (aref (string-as-multibyte "\201\300") 1) -> <error>

Basically when the sring passed is made of a single byte,
string-as-multibyte is equal to string-to-multibyte, which is the
property ued by the code you quoted above to build a poor man's
string-to-multibyte.


        Stefan






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
       [not found]                             ` <jwv3ah7av77.fsf-monnier+emacsbugreports@gnu.org>
@ 2008-12-04 19:43                               ` Reiner Steib
       [not found]                               ` <87iqpz4tg2.fsf@marauder.physik.uni-ulm.de>
  1 sibling, 0 replies; 39+ messages in thread
From: Reiner Steib @ 2008-12-04 19:43 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 1174, Frank Schmitt, ding, Simon Josefsson

On Tue, Dec 02 2008, Stefan Monnier wrote:

>> In Emacs 21 (which Gnus still aim to be compatible with), we have
>> string-as-multibyte, but not string-to-multibyte.  So your proposed
>> code (i.e. mm-string-to-multibyte) runs
>
>>   (string-as-multibyte (char-to-string string))
>> whereas we used to run
>>   (string-as-multibyte string)
>> Does char-to-string matter here?
[...]
>>     (lambda (string)
>>       "Return a multibyte string with the same individual chars as string."
>>       (mapconcat
>>        (lambda (ch) (mm-string-as-multibyte (char-to-string ch)))
>>        string "")))))
>
> Oh, that's clever: yes, the mapconcat/char-to-string dance does make it
> implement the string-to-multibyte behavior because doing the
> string-as-multibyte conversion one byte at a time avoids the
> problematic case.  

Good.  So I think you can close this bug.  Thanks.

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
       [not found]                               ` <87iqpz4tg2.fsf@marauder.physik.uni-ulm.de>
@ 2008-12-04 21:43                                 ` Frank Schmitt
  0 siblings, 0 replies; 39+ messages in thread
From: Frank Schmitt @ 2008-12-04 21:43 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 1174, ding, Simon Josefsson

Reiner Steib <reinersteib+gmane@imap.cc> writes:

> On Tue, Dec 02 2008, Stefan Monnier wrote:
>
>>> In Emacs 21 (which Gnus still aim to be compatible with), we have
>>> string-as-multibyte, but not string-to-multibyte.  So your proposed
>>> code (i.e. mm-string-to-multibyte) runs
>>
>>>   (string-as-multibyte (char-to-string string))
>>> whereas we used to run
>>>   (string-as-multibyte string)
>>> Does char-to-string matter here?
> [...]
>>>     (lambda (string)
>>>       "Return a multibyte string with the same individual chars as string."
>>>       (mapconcat
>>>        (lambda (ch) (mm-string-as-multibyte (char-to-string ch)))
>>>        string "")))))
>>
>> Oh, that's clever: yes, the mapconcat/char-to-string dance does make it
>> implement the string-to-multibyte behavior because doing the
>> string-as-multibyte conversion one byte at a time avoids the
>> problematic case.  
>
> Good.  So I think you can close this bug.  Thanks.

Yes, I can confirm the bug is fixed in CVS head.

-- 
Have you ever considered how much text can fit in eighty columns?  Given that a
signature typically contains up to four lines of text, this space allows you to
attach a tremendous amount of valuable information to your messages.  Seize the
opportunity and don't waste your signature on bullshit that nobody cares about.






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: marked as done (23.0.60; Some UTF-8 mails displaying  wrongly in Emacs 23)
  2008-10-15 20:21 ` bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 Frank Schmitt
  2008-10-16  7:32   ` Andreas Schwab
@ 2008-12-04 22:05   ` Emacs bug Tracking System
  2009-02-18  8:55   ` Emacs bug Tracking System
  2 siblings, 0 replies; 39+ messages in thread
From: Emacs bug Tracking System @ 2008-12-04 22:05 UTC (permalink / raw)
  To: Stefan Monnier

[-- Attachment #1: Type: text/plain, Size: 905 bytes --]


Your message dated Thu, 04 Dec 2008 16:57:36 -0500
with message-id <jwviqpz1u4o.fsf-monnier+emacsbugreports@gnu.org>
and subject line Re: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
has caused the Emacs bug report #1174,
regarding 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact don@donarmstrong.com
immediately.)


-- 
1174: http://emacsbugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=1174
Emacs Bug Tracking System
Contact don@donarmstrong.com with problems

[-- Attachment #2: Type: message/rfc822, Size: 6763 bytes --]

From: Frank Schmitt <ich@frank-schmitt.net>
To: emacs-pretest-bug@gnu.org
Subject: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
Date: Wed, 15 Oct 2008 22:21:21 +0200
Message-ID: <m37i89boe6.fsf@mid.gehheimdienst.de>

Gnus in Emacs 23 displays some German umlauts incorrectly while Gnus
in Emacs 22 displays the same message correctly. It only happens when
my boss uses thunderbird to reply to my UTF-8 mails from Gnus.

In Emacs 23 in the mail from my boss I get
Char: � (4194300, #o17777774, #x3ffffc, raw-byte) point=212 of 461 (46%) column=4
and the character is displayed as \374
In an other UTF-8 mail written by me it is
Char: ü (252, #o374, #xfc, file ...) point=493 of 940 (52%) column=0


In Emacs 22 when I do C-x = on the same character in the mail from my
boss as above I get
Char: ü (2300, #o4374, #x8fc, file ...) point=212 of 461 (46%) column=4
and in the other mail
Char: ü (2300, #o4374, #x8fc, file ...) point=493 of 940 (52%) column=0

I could reproduce with CVS head from this evening, minimal .gnus just
specifying the select method and emacs -q.


In GNU Emacs 23.0.60.2 (i686-pc-linux-gnu, GTK+ Version 2.12.8)
 of 2008-10-15 on glasgow.frank-schmitt.net
Windowing system distributor `The X.Org Foundation', version 11.0.10300000
configured using `configure  '--prefix=/opt/emacs23''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: de_DE.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default-enable-multibyte-characters: t

Major mode: Group

Minor modes in effect:
  gnus-undo-mode: t
  show-paren-mode: t
  delete-selection-mode: t
  pc-selection-mode: t
  recentf-mode: t
  iswitchb-mode: t
  shell-dirtrack-mode: t
  tooltip-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  global-auto-composition-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:
<help-echo> M-x g n u s <return> <down> C-u 1 0 <return> 
<up> <up> <return> t <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <down> 
<return> <down-mouse-1> <mouse-1> C-x = <help-echo> 
<help-echo> <help-echo> <down-mouse-2> <mouse-2> <help-echo> 
<down-mouse-1> <mouse-1> <right> <left> C-x = <help-echo> 
<down-mouse-1> <drag-mouse-1> <help-echo> <help-echo> 
<down-mouse-1> <mouse-1> <home> <S-end> <C-insert> 
<down> <home> <S-end> <C-insert> <help-echo> <down-mouse-1> 
<mouse-movement> <mouse-1> <help-echo> q <return> <return> 
SPC q L <up> <up> <up> <up> <up> <up> <up> <up> <up> 
<up> <up> <up> <up> <up> <up> <up> <up> <up> <up> <up> 
<up> <up> <up> <up> <up> <up> <up> <up> <up> <up> <up> 
<down> <down> <down> <down> M-x e m a <tab> b u <tab> 
<backspace> <backspace> <backspace> <backspace> <backspace> 
<backspace> <backspace> <backspace> b u g <tab> <backspace> 
<backspace> <backspace> <backspace> <backspace> <backspace> 
<backspace> <backspace> <backspace> <backspace> <backspace> 
<backspace> <backspace> <backspace> r e p <tab> o <tab> 
r <tab> <return>

Recent messages:
IMAP mailbox `Uni' closed... OK
Retrieving newsgroup: nnimap+ltw:Uni...
nnimap: Updating info for nnimap+ltw:Uni...done
Fetching headers for nnimap+ltw:Uni...done
Scoring...done
Generating summary...done
nnimap: Setting marks in Uni...done
No more unread newsgroups
IMAP mailbox `Uni' closed... OK
Making completion list... [2 times]



[-- Attachment #3: Type: message/rfc822, Size: 2392 bytes --]

From: Stefan Monnier <monnier@IRO.UMontreal.CA>
To: 1174-done@emacsbugs.donarmstrong.com
Subject: Re: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
Date: Thu, 04 Dec 2008 16:57:36 -0500
Message-ID: <jwviqpz1u4o.fsf-monnier+emacsbugreports@gnu.org>

>> Oh, that's clever: yes, the mapconcat/char-to-string dance does make it
>> implement the string-to-multibyte behavior because doing the
>> string-as-multibyte conversion one byte at a time avoids the
>> problematic case.  
> Good.  So I think you can close this bug.  Thanks.

OK, done,


        Stefan


^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
       [not found]                         ` <878wr2ohde.fsf@mocca.josefsson.org>
  2008-11-29 15:30                           ` Reiner Steib
       [not found]                           ` <87d4geim7p.fsf@marauder.physik.uni-ulm.de>
@ 2009-01-12 10:54                           ` Simon Josefsson
  2009-01-12 11:03                             ` Frank Schmitt
       [not found]                             ` <m3d4es3hxu.fsf@mid.gehheimdienst.de>
  2 siblings, 2 replies; 39+ messages in thread
From: Simon Josefsson @ 2009-01-12 10:54 UTC (permalink / raw)
  To: Frank Schmitt; +Cc: 1174, ding

I have recently upgraded my Gnus installation, and it included this fix.
Now every e-mail I send has non-ASCII characters pre-fixed with \201.
So the patch installed does not seem to be the right, or there is
something else wrong with my configuration.  Can anyone else reproduce
this?  I'm including 'åäö' in this e-mail for debugging.

I'll see if I can debug this further, and find the exact part of the
patch that cause the problem.

/Simon

Simon Josefsson <simon@josefsson.org> writes:

> Reiner Steib <reinersteib+gmane@imap.cc> writes:
>
>> On Fri, Oct 17 2008, Frank Schmitt wrote:
>>
>>> Reiner Steib <reinersteib+gmane@imap.cc> writes:
>>>
>>>> On Fri, Oct 17 2008, Stefan Monnier wrote:
>>>>
>>>>>>   ;; BEWARE: we used to use string-as-multibyte here which is braindead
>>>>>>   ;; because it will turn accidental emacs-mule-valid byte sequences
>>>>>>   ;; into multibyte chars.  --Stef
>>>>>>   ;; Reverted, braindead got 7.5 out of 10 on imdb, so it can't be
>>>>>>   ;; that bad. --Simon
>>
>> Simon, could you please clarify why you reverted Stefan's change in
>> `nnimap-demule'?  It breaks reading UTF-8 articles via nnimap.el in
>> Emacs 23.
>
> I don't recall, but people should notice relatively quickly if there are
> problems in this area (wrong display of non-ascii) so feel free to
> revert the patch or apply another patch instead.  It needs to be tested
> under Emacs 22 too, though, if it is installed in the Gnus CVS.
>
> /Simon






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
  2009-01-12 10:54                           ` Simon Josefsson
@ 2009-01-12 11:03                             ` Frank Schmitt
       [not found]                             ` <m3d4es3hxu.fsf@mid.gehheimdienst.de>
  1 sibling, 0 replies; 39+ messages in thread
From: Frank Schmitt @ 2009-01-12 11:03 UTC (permalink / raw)
  To: Simon Josefsson; +Cc: 1174, ding

Simon Josefsson <simon@josefsson.org> writes:

> I have recently upgraded my Gnus installation, and it included this fix.
> Now every e-mail I send has non-ASCII characters pre-fixed with \201.
> So the patch installed does not seem to be the right, or there is
> something else wrong with my configuration.  Can anyone else reproduce
> this?  I'm including '\201å\201ä\201ö' in this e-mail for debugging.
>
> I'll see if I can debug this further, and find the exact part of the
> patch that cause the problem.

I see the \201 in your mail but for me everything is send and displayed
perfectly since the patch was applied. Maybe the difference is that you
are using Emacs 22? äöüß

-- 
Have you ever considered how much text can fit in eighty columns?  Given that a
signature typically contains up to four lines of text, this space allows you to
attach a tremendous amount of valuable information to your messages.  Seize the
opportunity and don't waste your signature on bullshit that nobody cares about.






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
       [not found]                             ` <m3d4es3hxu.fsf@mid.gehheimdienst.de>
@ 2009-01-12 11:10                               ` Simon Josefsson
       [not found]                               ` <873afoixtr.fsf@mocca.josefsson.org>
  1 sibling, 0 replies; 39+ messages in thread
From: Simon Josefsson @ 2009-01-12 11:10 UTC (permalink / raw)
  To: Frank Schmitt; +Cc: 1174, ding

Frank Schmitt <ich@frank-schmitt.net> writes:

> Simon Josefsson <simon@josefsson.org> writes:
>
>> I have recently upgraded my Gnus installation, and it included this fix.
>> Now every e-mail I send has non-ASCII characters pre-fixed with \201.
>> So the patch installed does not seem to be the right, or there is
>> something else wrong with my configuration.  Can anyone else reproduce
>> this?  I'm including '\201å\201ä\201ö' in this e-mail for debugging.
>>
>> I'll see if I can debug this further, and find the exact part of the
>> patch that cause the problem.
>
> I see the \201 in your mail but for me everything is send and displayed
> perfectly since the patch was applied. Maybe the difference is that you
> are using Emacs 22? äöüß

I'm testing Emacs 23 now and everything works fine.  So it seems the
problem only occurs in Emacs 22.

However, it seems the problem happens on _sending_ which confuses me.
Maybe the old patch just worked around the problem, and made \201 in
incoming e-mail disappear?  However, I think I would have noticed this
before anyway, because all my posts to mailing lists and news also has
\201 before any non-ascii character now (with Emacs 22) but I don't
think that has been the case before applying this patch.

/Simon






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
       [not found]                                 ` <x6ab9u2kxr@gzp>
@ 2009-01-14 20:12                                   ` Reiner Steib
       [not found]                                   ` <877i4xlk97.fsf@marauder.physik.uni-ulm.de>
  1 sibling, 0 replies; 39+ messages in thread
From: Reiner Steib @ 2009-01-14 20:12 UTC (permalink / raw)
  To: Gabor Z. Papp, Simon Josefsson; +Cc: 1174, ding, Frank Schmitt

reopen 1174
quit
----- end of commands for control@emacsbugs, bcc-ed -----

[ Gabor, please don't drop the other recipients, especially the bug data
  base 1174@emacsbugs.donarmstrong.com ]

On Wed, Jan 14 2009, Gabor Z. Papp wrote:

> * Simon Josefsson <simon@josefsson.org>:
>
> | I'm testing Emacs 23 now and everything works fine.  So it seems the
> | problem only occurs in Emacs 22.
>
> Same problem here using Emacs 22.
>
> | However, it seems the problem happens on _sending_ which confuses
> | me.
>
> Exactly.
>
> Should I upgrade to Emacs 23 or downgrade gnus cvs to an earlier snapshot?

Neither one.  We need to find a solution that works in all supported
Emacs versions.

Stefan, could you please suggest a fix or give some advice how to
debug this problem?

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/





^ permalink raw reply	[flat|nested] 39+ messages in thread

* Processed: Re: bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
       [not found]                                   ` <877i4xlk97.fsf@marauder.physik.uni-ulm.de>
@ 2009-01-14 20:20                                     ` Emacs bug Tracking System
  2009-01-14 22:08                                     ` Stefan Monnier
  1 sibling, 0 replies; 39+ messages in thread
From: Emacs bug Tracking System @ 2009-01-14 20:20 UTC (permalink / raw)
  To: Reiner Steib; +Cc: Emacs Bugs

Processing commands for control@emacsbugs.donarmstrong.com:

> reopen 1174
bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
bug reopened, originator not changed.

> quit
Stopping processing here.

Please contact me if you need assistance.

Don Armstrong
(administrator, Emacs bugs database)




^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
       [not found]                                   ` <877i4xlk97.fsf@marauder.physik.uni-ulm.de>
  2009-01-14 20:20                                     ` Processed: " Emacs bug Tracking System
@ 2009-01-14 22:08                                     ` Stefan Monnier
  1 sibling, 0 replies; 39+ messages in thread
From: Stefan Monnier @ 2009-01-14 22:08 UTC (permalink / raw)
  To: Reiner Steib; +Cc: Simon Josefsson, Gabor Z. Papp, 1174, Frank Schmitt, ding

> Stefan, could you please suggest a fix or give some advice how to
> debug this problem?

I don't know where the problem is, so I can't suggest a fix.
I guess what Simon should do is:
- make sure he can reliably reproduce the problem.
- change imap-demule to string-as-multibyte, make sure that removes
  the problem.
- then try and figure out why/where is imap-demule used in such a way as
  to cause problem.

I must say I can't understand why imap-demule would be called on the
sending side,


        Stefan






^ permalink raw reply	[flat|nested] 39+ messages in thread

* bug#1174: marked as done (23.0.60; Some UTF-8 mails displaying  wrongly in Emacs 23)
  2008-10-15 20:21 ` bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 Frank Schmitt
  2008-10-16  7:32   ` Andreas Schwab
  2008-12-04 22:05   ` bug#1174: marked as done (23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23) Emacs bug Tracking System
@ 2009-02-18  8:55   ` Emacs bug Tracking System
  2 siblings, 0 replies; 39+ messages in thread
From: Emacs bug Tracking System @ 2009-02-18  8:55 UTC (permalink / raw)
  To: Cress Klose

[-- Attachment #1: Type: text/plain, Size: 869 bytes --]


Your message dated Wed, 18 Feb 2009 08:49:41 +0000
with message-id <3902173433.20090218084927@stiki.is>
and subject line Warning! Virus detected
has caused the Emacs bug report #1174,
regarding 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@emacsbugs.donarmstrong.com
immediately.)


-- 
1174: http://emacsbugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=1174
Emacs Bug Tracking System
Contact owner@emacsbugs.donarmstrong.com with problems

[-- Attachment #2: Type: message/rfc822, Size: 6763 bytes --]

From: Frank Schmitt <ich@frank-schmitt.net>
To: emacs-pretest-bug@gnu.org
Subject: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23
Date: Wed, 15 Oct 2008 22:21:21 +0200
Message-ID: <m37i89boe6.fsf@mid.gehheimdienst.de>

Gnus in Emacs 23 displays some German umlauts incorrectly while Gnus
in Emacs 22 displays the same message correctly. It only happens when
my boss uses thunderbird to reply to my UTF-8 mails from Gnus.

In Emacs 23 in the mail from my boss I get
Char: � (4194300, #o17777774, #x3ffffc, raw-byte) point=212 of 461 (46%) column=4
and the character is displayed as \374
In an other UTF-8 mail written by me it is
Char: ü (252, #o374, #xfc, file ...) point=493 of 940 (52%) column=0


In Emacs 22 when I do C-x = on the same character in the mail from my
boss as above I get
Char: ü (2300, #o4374, #x8fc, file ...) point=212 of 461 (46%) column=4
and in the other mail
Char: ü (2300, #o4374, #x8fc, file ...) point=493 of 940 (52%) column=0

I could reproduce with CVS head from this evening, minimal .gnus just
specifying the select method and emacs -q.


In GNU Emacs 23.0.60.2 (i686-pc-linux-gnu, GTK+ Version 2.12.8)
 of 2008-10-15 on glasgow.frank-schmitt.net
Windowing system distributor `The X.Org Foundation', version 11.0.10300000
configured using `configure  '--prefix=/opt/emacs23''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: de_DE.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default-enable-multibyte-characters: t

Major mode: Group

Minor modes in effect:
  gnus-undo-mode: t
  show-paren-mode: t
  delete-selection-mode: t
  pc-selection-mode: t
  recentf-mode: t
  iswitchb-mode: t
  shell-dirtrack-mode: t
  tooltip-mode: t
  mouse-wheel-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  global-auto-composition-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent input:
<help-echo> M-x g n u s <return> <down> C-u 1 0 <return> 
<up> <up> <return> t <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <return> 
<return> <return> <return> <return> <return> <down> 
<return> <down-mouse-1> <mouse-1> C-x = <help-echo> 
<help-echo> <help-echo> <down-mouse-2> <mouse-2> <help-echo> 
<down-mouse-1> <mouse-1> <right> <left> C-x = <help-echo> 
<down-mouse-1> <drag-mouse-1> <help-echo> <help-echo> 
<down-mouse-1> <mouse-1> <home> <S-end> <C-insert> 
<down> <home> <S-end> <C-insert> <help-echo> <down-mouse-1> 
<mouse-movement> <mouse-1> <help-echo> q <return> <return> 
SPC q L <up> <up> <up> <up> <up> <up> <up> <up> <up> 
<up> <up> <up> <up> <up> <up> <up> <up> <up> <up> <up> 
<up> <up> <up> <up> <up> <up> <up> <up> <up> <up> <up> 
<down> <down> <down> <down> M-x e m a <tab> b u <tab> 
<backspace> <backspace> <backspace> <backspace> <backspace> 
<backspace> <backspace> <backspace> b u g <tab> <backspace> 
<backspace> <backspace> <backspace> <backspace> <backspace> 
<backspace> <backspace> <backspace> <backspace> <backspace> 
<backspace> <backspace> <backspace> r e p <tab> o <tab> 
r <tab> <return>

Recent messages:
IMAP mailbox `Uni' closed... OK
Retrieving newsgroup: nnimap+ltw:Uni...
nnimap: Updating info for nnimap+ltw:Uni...done
Fetching headers for nnimap+ltw:Uni...done
Scoring...done
Generating summary...done
nnimap: Setting marks in Uni...done
No more unread newsgroups
IMAP mailbox `Uni' closed... OK
Making completion list... [2 times]



[-- Attachment #3: Type: message/rfc822, Size: 2177 bytes --]

[-- Attachment #3.1.1: Type: text/plain, Size: 304 bytes --]


 A possible virus was found in this message.
	


Poured into the room flashing over the bewildering small
inclosed space. There was no question of hand and arms clasping
parcels, which he began him and so it goes, abe. That feller
would a damsite own work, my name is a strength.' he paused
a.	

[-- Attachment #3.1.2: Type: text/html, Size: 502 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2009-02-18  8:55 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <jwviqpz1u4o.fsf-monnier+emacsbugreports@gnu.org>
2008-10-15 20:21 ` bug#1174: 23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23 Frank Schmitt
2008-10-16  7:32   ` Andreas Schwab
2008-10-16  8:36     ` Frank Schmitt
2008-10-16 17:39       ` Eli Zaretskii
2008-10-16 18:00         ` Frank Schmitt
2008-10-16 18:15           ` Eli Zaretskii
2008-10-16 20:05             ` Frank Schmitt
2008-10-17  8:28         ` Andreas Schwab
2008-10-17  8:52           ` Frank Schmitt
2008-10-17  9:53             ` Eli Zaretskii
2008-10-17 10:02               ` Frank Schmitt
2008-10-17 10:08               ` Frank Schmitt
2008-10-17 15:39                 ` Stefan Monnier
2008-10-17 18:19                   ` Reiner Steib
     [not found]                   ` <871vyf6q40.fsf@marauder.physik.uni-ulm.de>
2008-10-17 18:36                     ` Frank Schmitt
     [not found]                     ` <m3ej2fdq5x.fsf@mid.gehheimdienst.de>
2008-11-29 12:08                       ` Reiner Steib
     [not found]                       ` <87k5ampwej.fsf@marauder.physik.uni-ulm.de>
2008-11-29 12:18                         ` Simon Josefsson
2008-12-01 21:04                         ` Stefan Monnier
     [not found]                         ` <jwv7i6j393g.fsf-monnier+emacsbugreports@gnu.org>
2008-12-01 22:48                           ` Reiner Steib
     [not found]                           ` <87y6yzbjgf.fsf@marauder.physik.uni-ulm.de>
2008-12-02  7:36                             ` Stefan Monnier
     [not found]                             ` <jwv3ah7av77.fsf-monnier+emacsbugreports@gnu.org>
2008-12-04 19:43                               ` Reiner Steib
     [not found]                               ` <87iqpz4tg2.fsf@marauder.physik.uni-ulm.de>
2008-12-04 21:43                                 ` Frank Schmitt
     [not found]                         ` <878wr2ohde.fsf@mocca.josefsson.org>
2008-11-29 15:30                           ` Reiner Steib
     [not found]                           ` <87d4geim7p.fsf@marauder.physik.uni-ulm.de>
2008-11-29 21:30                             ` Stefan Monnier
2008-11-29 22:14                             ` James Cloos
     [not found]                             ` <jwvfxlaz0ov.fsf-monnier+emacsbugreports@gnu.org>
2008-11-30 13:12                               ` Reiner Steib
2009-01-12 10:54                           ` Simon Josefsson
2009-01-12 11:03                             ` Frank Schmitt
     [not found]                             ` <m3d4es3hxu.fsf@mid.gehheimdienst.de>
2009-01-12 11:10                               ` Simon Josefsson
     [not found]                               ` <873afoixtr.fsf@mocca.josefsson.org>
     [not found]                                 ` <x6ab9u2kxr@gzp>
2009-01-14 20:12                                   ` Reiner Steib
     [not found]                                   ` <877i4xlk97.fsf@marauder.physik.uni-ulm.de>
2009-01-14 20:20                                     ` Processed: " Emacs bug Tracking System
2009-01-14 22:08                                     ` Stefan Monnier
2008-10-23  7:24                   ` Frank Schmitt
2008-10-23 13:58                     ` Stefan Monnier
2008-10-17  9:12           ` Frank Schmitt
2008-10-17  5:15       ` Richard M. Stallman
2008-10-28 22:54       ` James Cloos
2008-12-04 22:05   ` bug#1174: marked as done (23.0.60; Some UTF-8 mails displaying wrongly in Emacs 23) Emacs bug Tracking System
2009-02-18  8:55   ` Emacs bug Tracking System

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).