unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* possible bug in quoted-printable-decode region
@ 2018-06-27  8:09 Uwe Brauer
  2018-06-27  8:40 ` Yuri Khan
  0 siblings, 1 reply; 5+ messages in thread
From: Uwe Brauer @ 2018-06-27  8:09 UTC (permalink / raw)
  To: emacs-devel



Hi

(If needed I can provide a bug report).

Maybe I am missing something elementary here. I came across with a file
which still had quoted printable encoding like this 

Buenas d=C3=ADas:


So I used 
(quoted-printable-decode-region (region-beginning) (region-end) nil)

But I obtained the following 

Buenas d­as:

Not sure who these characters will be send so I write them literally 

Buedas d\303\255as:

Which looks to like some old pre uni code chars.

What went wrong?

Regards

Uwe Brauer




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: possible bug in quoted-printable-decode region
  2018-06-27  8:09 possible bug in quoted-printable-decode region Uwe Brauer
@ 2018-06-27  8:40 ` Yuri Khan
  2018-06-28 12:33   ` Uwe Brauer
  0 siblings, 1 reply; 5+ messages in thread
From: Yuri Khan @ 2018-06-27  8:40 UTC (permalink / raw)
  To: Emacs developers

On Wed, Jun 27, 2018 at 3:10 PM Uwe Brauer <oub@mat.ucm.es> wrote:

> Maybe I am missing something elementary here. I came across with a file
> which still had quoted printable encoding like this
>
> Buenas d=C3=ADas:
>
> So I used
> (quoted-printable-decode-region (region-beginning) (region-end) nil)
>
> But I obtained the following
>
> Buedas d\303\255as:

quoted-printable-decode-region gives you a byte string. You are then
supposed to decode that into text using whichever character encoding
you know is used. (Quoted-printable encoding, unlike URL
percent-encoding, does not mandate UTF-8. In RFC 2822 mail, the
character encoding will be given in the Content-Type header’s charset=
parameter. =C3=AD looks like a quoted-printable encoding of UTF-8
encoding of the character U+00ED LATIN SMALL LETTER I WITH ACUTE.)

As a shortcut, you could pass the character encoding as the last
argument to q-p-d-r, but that is described as deprecated in the
docstring.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: possible bug in quoted-printable-decode region
  2018-06-27  8:40 ` Yuri Khan
@ 2018-06-28 12:33   ` Uwe Brauer
  2018-06-28 13:14     ` Eli Zaretskii
  0 siblings, 1 reply; 5+ messages in thread
From: Uwe Brauer @ 2018-06-28 12:33 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1731 bytes --]

>>> "Yuri" == Yuri Khan <yurivkhan@gmail.com> writes:

   > On Wed, Jun 27, 2018 at 3:10 PM Uwe Brauer <oub@mat.ucm.es> wrote:
   >> Maybe I am missing something elementary here. I came across with a file
   >> which still had quoted printable encoding like this
   >> 
   >> Buenas d=C3=ADas:
   >> 
   >> So I used
   >> (quoted-printable-decode-region (region-beginning) (region-end) nil)
   >> 
   >> But I obtained the following
   >> 
   >> Buedas d\303\255as:

   > quoted-printable-decode-region gives you a byte string. You are then
   > supposed to decode that into text using whichever character encoding
   > you know is used. (Quoted-printable encoding, unlike URL
   > percent-encoding, does not mandate UTF-8. In RFC 2822 mail, the
   > character encoding will be given in the Content-Type header’s charset=
   > parameter. =C3=AD looks like a quoted-printable encoding of UTF-8
   > encoding of the character U+00ED LATIN SMALL LETTER I WITH ACUTE.)

   > As a shortcut, you could pass the character encoding as the last
   > argument to q-p-d-r, but that is described as deprecated in the
   > docstring.


Thanks, but now I am confused. I have that file, with 
 Buenas d=C3=ADas:

How can I translate/decode that to latin1 or utf8? Or are you saying
that this is not possible. 


Hm, The file indeed is a complete email, which was never sent.

Ah: I found a way to display that text.  In gnus I use 


gnus-group-make-doc-group 


give the name of that file and then visit that group with gnus.  Indeed the chars are displayed
correctly. Problem solved. 

Still don't know what to do if the file had no mail header.

Thanks for clarifying the issue a bit to me.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5025 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: possible bug in quoted-printable-decode region
  2018-06-28 12:33   ` Uwe Brauer
@ 2018-06-28 13:14     ` Eli Zaretskii
  2018-06-28 14:18       ` Uwe Brauer
  0 siblings, 1 reply; 5+ messages in thread
From: Eli Zaretskii @ 2018-06-28 13:14 UTC (permalink / raw)
  To: Uwe Brauer; +Cc: emacs-devel

> From: Uwe Brauer <oub@mat.ucm.es>
> Date: Thu, 28 Jun 2018 14:33:56 +0200
> 
>    > quoted-printable-decode-region gives you a byte string. You are then
>    > supposed to decode that into text using whichever character encoding
>    > you know is used. (Quoted-printable encoding, unlike URL
>    > percent-encoding, does not mandate UTF-8. In RFC 2822 mail, the
>    > character encoding will be given in the Content-Type header’s charset=
>    > parameter. =C3=AD looks like a quoted-printable encoding of UTF-8
>    > encoding of the character U+00ED LATIN SMALL LETTER I WITH ACUTE.)
> 
>    > As a shortcut, you could pass the character encoding as the last
>    > argument to q-p-d-r, but that is described as deprecated in the
>    > docstring.
> 
> 
> Thanks, but now I am confused. I have that file, with 
>  Buenas d=C3=ADas:
> 
> How can I translate/decode that to latin1 or utf8? Or are you saying
> that this is not possible. 

It's possible: you should follow quoted-printable-decode-region with
decode-coding-region, and pass it a suitable coding-system.

> Still don't know what to do if the file had no mail header.

You will have to guess the encoding somehow.  Emacs cannot (and
shouldn't in this case).



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: possible bug in quoted-printable-decode region
  2018-06-28 13:14     ` Eli Zaretskii
@ 2018-06-28 14:18       ` Uwe Brauer
  0 siblings, 0 replies; 5+ messages in thread
From: Uwe Brauer @ 2018-06-28 14:18 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 374 bytes --]



   > It's possible: you should follow quoted-printable-decode-region with
   > decode-coding-region, and pass it a suitable coding-system.

Right, thanks I did that and indeed it was utf-8.

   > You will have to guess the encoding somehow.  Emacs cannot (and
   > shouldn't in this case).

Ok, but it is good to know, that it is not obvious and one should play around.



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5025 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-06-28 14:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-27  8:09 possible bug in quoted-printable-decode region Uwe Brauer
2018-06-27  8:40 ` Yuri Khan
2018-06-28 12:33   ` Uwe Brauer
2018-06-28 13:14     ` Eli Zaretskii
2018-06-28 14:18       ` Uwe Brauer

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).