unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#74624: 29.4.50; Gnus cannot parse some filenames(UTF8) in an attachment
@ 2024-11-30 15:59 Konstantin
  2024-11-30 16:20 ` Eli Zaretskii
  0 siblings, 1 reply; 5+ messages in thread
From: Konstantin @ 2024-11-30 15:59 UTC (permalink / raw)
  To: 74624

[-- Attachment #1: Type: text/plain, Size: 1508 bytes --]

From time to time i get emails with attachments from my colleges, which they send from
"Roundcube" web-interface. 

Often, i cannot open these attachments by =RET=(gnus-article-press-button)
or save them =o=(gnus-mime-save-part) with correct name.
(interestingly =X-m=(gnus-summary-save-parts) works correctly)

The reason is gnus cannot parse correctly some attached filenames.

The example of such attachment (I took it from gnus-summary-show-raw-article)

 --=_d38c0abddd645077f401d42fa430d9d5
Content-Transfer-Encoding: base64
Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document;
 name="=?UTF-8?Q?=D0=9E=D0=B1=D0=B7=D0=BE=D1=80_2024_=28=D0=BD=D0=B0_=2Ed?=
 =?UTF-8?Q?ocx?="
Content-Disposition: attachment;
 filename*0*=UTF-8''%D0%9E%D0%B1%D0%B7%D0%BE%D1%80%202024%20%28%D0%BD%D0;
 filename*1*=%B0%20.docx;
 size=10

c2Rmc2FmYXNmCg==
--=_d38c0abddd645077f401d42fa430d9d5--

I have tried to examine the reason. As i see it,  
gnus-data for such attachment is formed incorrectly:

(#<buffer  *mm*-480444>
     ("application/vnd.openxmlformats-officedocument.word..."
     (name . "Обзор 2024 (на .docx"))
     base64 nil
     ("attachment" (size . "10")
     (filename . "Обзор 2024 (н\320")) nil nil nil)

One can see that the filename is broken.
It should be "Обзор 2024 (на .docx" just like the name.

I have attached the example of the mail(one can open it with nndoc)


Please, could you fix this bug.



[-- Attachment #2: mail.test --]
[-- Type: application/octet-stream, Size: 805 bytes --]

Return-path: <test@test.com>
Envelope-to: test@test.com
Delivery-date: Sat, 30 Nov 2024 11:04:35 +0300
MIME-Version: 1.0
Date: Sat, 30 Nov 2024 11:04:35 +0100
From: reich <test@test.com>
To: test@test.com
Subject: test
Message-ID: <1aac7676a838f3ec7a16820f65e6ff4c@test.com>
Content-Type: multipart/mixed;
 boundary="=_d38c0abddd645077f401d42fa430d9d5"

--=_d38c0abddd645077f401d42fa430d9d5
Content-Transfer-Encoding: base64
Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document;
 name="=?UTF-8?Q?=D0=9E=D0=B1=D0=B7=D0=BE=D1=80_2024_=28=D0=BD=D0=B0_=2Ed?=
 =?UTF-8?Q?ocx?="
Content-Disposition: attachment;
 filename*0*=UTF-8''%D0%9E%D0%B1%D0%B7%D0%BE%D1%80%202024%20%28%D0%BD%D0;
 filename*1*=%B0%20.docx;
 size=10

c2Rmc2FmYXNmCg==
--=_d38c0abddd645077f401d42fa430d9d5--

^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#74624: 29.4.50; Gnus cannot parse some filenames(UTF8) in an attachment
  2024-11-30 15:59 bug#74624: 29.4.50; Gnus cannot parse some filenames(UTF8) in an attachment Konstantin
@ 2024-11-30 16:20 ` Eli Zaretskii
  2024-12-01  6:24   ` Visuwesh
  0 siblings, 1 reply; 5+ messages in thread
From: Eli Zaretskii @ 2024-11-30 16:20 UTC (permalink / raw)
  To: Konstantin; +Cc: 74624

> From: Konstantin <reich-cv@yandex.ru>
> Date: Sat, 30 Nov 2024 18:59:25 +0300
> 
> >From time to time i get emails with attachments from my colleges, which they send from
> "Roundcube" web-interface. 
> 
> Often, i cannot open these attachments by =RET=(gnus-article-press-button)
> or save them =o=(gnus-mime-save-part) with correct name.
> (interestingly =X-m=(gnus-summary-save-parts) works correctly)
> 
> The reason is gnus cannot parse correctly some attached filenames.
> 
> The example of such attachment (I took it from gnus-summary-show-raw-article)
> 
>  --=_d38c0abddd645077f401d42fa430d9d5
> Content-Transfer-Encoding: base64
> Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document;
>  name="=?UTF-8?Q?=D0=9E=D0=B1=D0=B7=D0=BE=D1=80_2024_=28=D0=BD=D0=B0_=2Ed?=
>  =?UTF-8?Q?ocx?="
> Content-Disposition: attachment;
>  filename*0*=UTF-8''%D0%9E%D0%B1%D0%B7%D0%BE%D1%80%202024%20%28%D0%BD%D0;
>  filename*1*=%B0%20.docx;
>  size=10
> 
> c2Rmc2FmYXNmCg==
> --=_d38c0abddd645077f401d42fa430d9d5--
> 
> I have tried to examine the reason. As i see it,  
> gnus-data for such attachment is formed incorrectly:
> 
> (#<buffer  *mm*-480444>
>      ("application/vnd.openxmlformats-officedocument.word..."
>      (name . "Обзор 2024 (на .docx"))
>      base64 nil
>      ("attachment" (size . "10")
>      (filename . "Обзор 2024 (н\320")) nil nil nil)
> 
> One can see that the filename is broken.
> It should be "Обзор 2024 (на .docx" just like the name.

It looks like Gnus fails to decipher the file name when it is split in
the middle of a UTF-8 sequence.

I don't know Gnus.  If you can help me by showing where the value of
'gnus-data property is calculated, I might be able to find the bug and
suggest a fix.

Thanks.





^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#74624: 29.4.50; Gnus cannot parse some filenames(UTF8) in an attachment
  2024-11-30 16:20 ` Eli Zaretskii
@ 2024-12-01  6:24   ` Visuwesh
  2024-12-01  7:52     ` Konstantin
  0 siblings, 1 reply; 5+ messages in thread
From: Visuwesh @ 2024-12-01  6:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 74624, Konstantin

[சனி நவம்பர் 30, 2024] Eli Zaretskii wrote:

>> From: Konstantin <reich-cv@yandex.ru>
>> Date: Sat, 30 Nov 2024 18:59:25 +0300
>> 
>> >From time to time i get emails with attachments from my colleges, which they send from
>> "Roundcube" web-interface. 
>> 
>> Often, i cannot open these attachments by =RET=(gnus-article-press-button)
>> or save them =o=(gnus-mime-save-part) with correct name.
>> (interestingly =X-m=(gnus-summary-save-parts) works correctly)
>> 
>> The reason is gnus cannot parse correctly some attached filenames.
>> 
>> The example of such attachment (I took it from gnus-summary-show-raw-article)
>> 
>>  --=_d38c0abddd645077f401d42fa430d9d5
>> Content-Transfer-Encoding: base64
>> Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document;
>>  name="=?UTF-8?Q?=D0=9E=D0=B1=D0=B7=D0=BE=D1=80_2024_=28=D0=BD=D0=B0_=2Ed?=
>>  =?UTF-8?Q?ocx?="
>> Content-Disposition: attachment;
>>  filename*0*=UTF-8''%D0%9E%D0%B1%D0%B7%D0%BE%D1%80%202024%20%28%D0%BD%D0;
>>  filename*1*=%B0%20.docx;
>>  size=10
>> 
>> c2Rmc2FmYXNmCg==
>> --=_d38c0abddd645077f401d42fa430d9d5--
>> 
>> I have tried to examine the reason. As i see it,  
>> gnus-data for such attachment is formed incorrectly:
>> 
>> (#<buffer  *mm*-480444>
>>      ("application/vnd.openxmlformats-officedocument.word..."
>>      (name . "Обзор 2024 (на .docx"))
>>      base64 nil
>>      ("attachment" (size . "10")
>>      (filename . "Обзор 2024 (н\320")) nil nil nil)
>> 
>> One can see that the filename is broken.
>> It should be "Обзор 2024 (на .docx" just like the name.
>
> It looks like Gnus fails to decipher the file name when it is split in
> the middle of a UTF-8 sequence.
>
> I don't know Gnus.  If you can help me by showing where the value of
> 'gnus-data property is calculated, I might be able to find the bug and
> suggest a fix.

The decoding of the filename in the Content-Disposition header is done
in mm-dissect-buffer by calling mail-header-parse-content-disposition.
Specifically, rfc2231-parse-string.  The following patch fixes the issue
on my end:

diff --git a/lisp/mail/rfc2231.el b/lisp/mail/rfc2231.el
index 33324cafb5b..632e270a922 100644
--- a/lisp/mail/rfc2231.el
+++ b/lisp/mail/rfc2231.el
@@ -193,7 +193,7 @@ rfc2231-parse-string
 		     (push (list attribute value encoded) cparams))
 		    ;; Repetition of a part; do nothing.
 		    ((and elem
-			  (null number))
+			  (null part))
 		     )
 		    ;; Concatenate continuation parts.
 		    (t

NUMBER is the variable used during the parsing portion of the function
in the big condition-case form above the cl-loop form which the patch
modifies.  In the header below

    Content-Disposition: attachment;
      filename*0*=UTF-8''%D0%9E%D0%B1%D0%B7%D0%BE%D1%80%202024%20%28%D0%BD%D0;
      filename*1*=%B0%20.docx;
      size=10

the function first parses filename*0* and here NUMBER is 0, then
filename*1* and here NUMBER is 1.  By the time it finishes parsing size,
NUMBER is set to nil.  The loop should use the value of NUMBER pushed to
PARAMETERS as the 3rd element (referred to as `part' by the cl-loop
form) instead of whatever value NUMBER happened to be when we parsed the
last element.





^ permalink raw reply related	[flat|nested] 5+ messages in thread

* bug#74624: 29.4.50; Gnus cannot parse some filenames(UTF8) in an attachment
  2024-12-01  6:24   ` Visuwesh
@ 2024-12-01  7:52     ` Konstantin
  2024-12-01  8:17       ` Eli Zaretskii
  0 siblings, 1 reply; 5+ messages in thread
From: Konstantin @ 2024-12-01  7:52 UTC (permalink / raw)
  To: Visuwesh; +Cc: Eli Zaretskii, 74624


Visuwesh <visuweshm@gmail.com> writes:

> [சனி நவம்பர் 30, 2024] Eli Zaretskii wrote:
>
>>> From: Konstantin <reich-cv@yandex.ru>
>>> Date: Sat, 30 Nov 2024 18:59:25 +0300
>>> 
>>> >From time to time i get emails with attachments from my colleges, which they send from
>>> "Roundcube" web-interface. 
>>> 
>>> Often, i cannot open these attachments by =RET=(gnus-article-press-button)
>>> or save them =o=(gnus-mime-save-part) with correct name.
>>> (interestingly =X-m=(gnus-summary-save-parts) works correctly)
>>> 
>>> The reason is gnus cannot parse correctly some attached filenames.
>>> 
>>> The example of such attachment (I took it from gnus-summary-show-raw-article)
>>> 
>>>  --=_d38c0abddd645077f401d42fa430d9d5
>>> Content-Transfer-Encoding: base64
>>> Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document;
>>>  name="=?UTF-8?Q?=D0=9E=D0=B1=D0=B7=D0=BE=D1=80_2024_=28=D0=BD=D0=B0_=2Ed?=
>>>  =?UTF-8?Q?ocx?="
>>> Content-Disposition: attachment;
>>>  filename*0*=UTF-8''%D0%9E%D0%B1%D0%B7%D0%BE%D1%80%202024%20%28%D0%BD%D0;
>>>  filename*1*=%B0%20.docx;
>>>  size=10
>>> 
>>> c2Rmc2FmYXNmCg==
>>> --=_d38c0abddd645077f401d42fa430d9d5--
>>> 
>>> I have tried to examine the reason. As i see it,  
>>> gnus-data for such attachment is formed incorrectly:
>>> 
>>> (#<buffer  *mm*-480444>
>>>      ("application/vnd.openxmlformats-officedocument.word..."
>>>      (name . "О️бзор 2024 (на .docx"))
>>>      base64 nil
>>>      ("attachment" (size . "10")
>>>      (filename . "О️бзор 2024 (н\320")) nil nil nil)
>>> 
>>> One can see that the filename is broken.
>>> It should be "О️бзор 2024 (на .docx" just like the name.
>>
>> It looks like Gnus fails to decipher the file name when it is split in
>> the middle of a UTF-8 sequence.
>>
>> I don't know Gnus.  If you can help me by showing where the value of
>> 'gnus-data property is calculated, I might be able to find the bug and
>> suggest a fix.
>
> The decoding of the filename in the Content-Disposition header is done
> in mm-dissect-buffer by calling mail-header-parse-content-disposition.
> Specifically, rfc2231-parse-string.  The following patch fixes the issue
> on my end:
>
> diff --git a/lisp/mail/rfc2231.el b/lisp/mail/rfc2231.el
> index 33324cafb5b..632e270a922 100644
> --- a/lisp/mail/rfc2231.el
> +++ b/lisp/mail/rfc2231.el
> @@ -193,7 +193,7 @@ rfc2231-parse-string
>  		     (push (list attribute value encoded) cparams))
>  		    ;; Repetition of a part; do nothing.
>  		    ((and elem
> -			  (null number))
> +			  (null part))
>  		     )
>  		    ;; Concatenate continuation parts.
>  		    (t
>
> NUMBER is the variable used during the parsing portion of the function
> in the big condition-case form above the cl-loop form which the patch
> modifies.  In the header below
>
>     Content-Disposition: attachment;
>       filename*0*=UTF-8''%D0%9E%D0%B1%D0%B7%D0%BE%D1%80%202024%20%28%D0%BD%D0;
>       filename*1*=%B0%20.docx;
>       size=10
>
> the function first parses filename*0* and here NUMBER is 0, then
> filename*1* and here NUMBER is 1.  By the time it finishes parsing size,
> NUMBER is set to nil.  The loop should use the value of NUMBER pushed to
> PARAMETERS as the 3rd element (referred to as `part' by the cl-loop
> form) instead of whatever value NUMBER happened to be when we parsed the
> last element.

Thank you,

indeed the patch fixes this bug.





^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#74624: 29.4.50; Gnus cannot parse some filenames(UTF8) in an attachment
  2024-12-01  7:52     ` Konstantin
@ 2024-12-01  8:17       ` Eli Zaretskii
  0 siblings, 0 replies; 5+ messages in thread
From: Eli Zaretskii @ 2024-12-01  8:17 UTC (permalink / raw)
  To: Konstantin; +Cc: 74624-done, visuweshm

> From: Konstantin <reich-cv@yandex.ru>
> Cc: Eli Zaretskii <eliz@gnu.org>,  74624@debbugs.gnu.org
> Date: Sun, 01 Dec 2024 10:52:30 +0300
> 
> 
> Visuwesh <visuweshm@gmail.com> writes:
> 
> > The decoding of the filename in the Content-Disposition header is done
> > in mm-dissect-buffer by calling mail-header-parse-content-disposition.
> > Specifically, rfc2231-parse-string.  The following patch fixes the issue
> > on my end:
> >
> > diff --git a/lisp/mail/rfc2231.el b/lisp/mail/rfc2231.el
> > index 33324cafb5b..632e270a922 100644
> > --- a/lisp/mail/rfc2231.el
> > +++ b/lisp/mail/rfc2231.el
> > @@ -193,7 +193,7 @@ rfc2231-parse-string
> >  		     (push (list attribute value encoded) cparams))
> >  		    ;; Repetition of a part; do nothing.
> >  		    ((and elem
> > -			  (null number))
> > +			  (null part))
> >  		     )
> >  		    ;; Concatenate continuation parts.
> >  		    (t
> >
> > NUMBER is the variable used during the parsing portion of the function
> > in the big condition-case form above the cl-loop form which the patch
> > modifies.  In the header below
> >
> >     Content-Disposition: attachment;
> >       filename*0*=UTF-8''%D0%9E%D0%B1%D0%B7%D0%BE%D1%80%202024%20%28%D0%BD%D0;
> >       filename*1*=%B0%20.docx;
> >       size=10
> >
> > the function first parses filename*0* and here NUMBER is 0, then
> > filename*1* and here NUMBER is 1.  By the time it finishes parsing size,
> > NUMBER is set to nil.  The loop should use the value of NUMBER pushed to
> > PARAMETERS as the 3rd element (referred to as `part' by the cl-loop
> > form) instead of whatever value NUMBER happened to be when we parsed the
> > last element.
> 
> Thank you,
> 
> indeed the patch fixes this bug.

Thanks, installed on the emacs-30 branch, and closing the bug.





^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-12-01  8:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-30 15:59 bug#74624: 29.4.50; Gnus cannot parse some filenames(UTF8) in an attachment Konstantin
2024-11-30 16:20 ` Eli Zaretskii
2024-12-01  6:24   ` Visuwesh
2024-12-01  7:52     ` Konstantin
2024-12-01  8:17       ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).