From: ozzloy <ozzloy@gmail.com>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: 63941@debbugs.gnu.org, Eli Zaretskii <eliz@gnu.org>
Subject: bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form
Date: Fri, 21 Jul 2023 02:04:27 -0700 [thread overview]
Message-ID: <CACT2Oni9DHqSqT_ODtGu93AHDyMfAiqth1ZcySGoY7MmTm_MuQ@mail.gmail.com> (raw)
In-Reply-To: <jwvh6q1rrwy.fsf-monnier+emacs@gnu.org>
[-- Attachment #1.1: Type: text/plain, Size: 2657 bytes --]
Thanks so much for taking the time to review this!
> I'd rather not completely replace the old with a brand new code
> because that makes it hard for me to see what's really changed.
I thought this would be ok, since the existing version is a complete
rewrite of the original (so there's precedent for complete rewrite of
a function in a commit to fix a bug), and if there were tests showing
the behavior to be the same as before, except where this bug is
fixed. (Though I see the tests are currently broken).
Based on your feedback, and some help from #emacs, I made a patch
that is very minimal to the existing code, with better commit
message, and attached it here.
The patch removes the =(unless (bolp) ...)= guarding inserting CRLF.
The RFC says the "boundary delimiter MUST occur at the beginning of a
line, i.e., following a CRLF". =(bolp)= is not enough to guarantee
the boundary is preceded by CRLF. It can be true when the point
is after just "\n".
Because CRLF is inserted unconditionally after the =cond=, the code
does not include the boundary's CRLF in each branch of the =cond=.
> when `filedata` is an empty string, this add an additional \r\n
> compared to the current code. This seems right to me
Me too, and all the other clients I tested.
> I expect the decoding software will skip the \r\n at the of the
> header and then look for \r\n<BOUNDARY>, so it's important to have
> two \r\n
What you said is true. In addition, they also accept
"HEADER\r\nfile content\n--BOUNDARY"
as the content "file content", and consider the last "\n" as attached
to the boundary. That's where the file's final "\n" gets lost if the
file's content was initially "file content\n".
> There remain some questions on this patch:
While fixing this bug, I found a few more problems in addition to the
two that you mention here. I was not addressing them yet, since I
thought I should fix one bug per patch.
> I suspect we can also simply this code by moving the first (insert
> "--" boundary "\r\n") before the loop, and the second into the loop
> so we can make it insert "\r\n--" boundary "\r\n" (and thus remove
> \r\n from the end of each of the preceding branches).
Almost, but not quite, or at least not without some awkward (to my
eye) repositioning of inserting boundaries, "--", and "\r\n". The
final boundary complicates things. It is different from all the
others, it is "--BOUNDARY--" instead of "--BOUNDARY"
Here's what I ended up with when I tried that,
https://git.sr.ht/~ozzloy/emacs-bug-63941/tree/simplify-insert-boundaries-and---/item/mm-url.el#L397
This passes the tests in =emacs/tests/lisp/gnus/mm-url-tests.el=.
[-- Attachment #1.2: Type: text/html, Size: 3306 bytes --]
[-- Attachment #2: 0001-allow-uploading-files-ending-in-newline-via-EWW.patch --]
[-- Type: text/x-patch, Size: 7181 bytes --]
From b3e2f07367c6e9836b3a7635b86335bf7104b2b9 Mon Sep 17 00:00:00 2001
From: Daniel Watson <ozzloy@gmail.com>
Date: Fri, 21 Jul 2023 00:03:06 -0700
Subject: [PATCH] ; allow uploading files ending in newline via EWW
; Ensure that every boundary in HTTP message is preceded by "\r\n".
; According to RFC 2046, section 5, the "\r\n" preceding the boundary
; is not considered part of the preceding content, and is instead
; attached to the boundary that follows it.
;
; Consider a file named "1nl", consisting only of the single character
; '\n'.
;
; The old version of =mm-url-encode-multipart-form-data= creates the
; following HTTP message:
;
; (concat
; "--BOUNDARY\r\n"
; "Content-Disposition: form-data; name=\"a\"; filename=\"1nl\"\r\n"
; "Content-Transfer-Encoding: binary\r\n"
; "Content-Type: c\r\n"
; "\r\n"
;
; ;; file content
; "\n"
;
; ;; NOTE "\r\n" is absent before the following boundary
; "--BOUNDARY--\r\n")
;
; the new version of =mm-url-encode-multipart-form-data= creates this
; HTTP message:
;
; (concat
; "--BOUNDARY\r\n"
; "Content-Disposition: form-data; name=\"a\"; filename=\"1nl\"\r\n"
; "Content-Transfer-Encoding: binary\r\n"
; "Content-Type: c\r\n"
; "\r\n"
;
; ;; file content
; "\n"
;
; ;; NOTE "\r\n" precedes the boundary
; "\r\n"
; "--BOUNDARY--\r\n")
;
; The new code ensures all boundaries after the one at the very
; beginning are preceded by "\r\n", whether they are the final, or
; other internal boundaries.
---
lisp/gnus/mm-url.el | 5 +-
test/lisp/gnus/mm-url-tests.el | 160 +++++++++++++++++++++++++++++++++
2 files changed, 162 insertions(+), 3 deletions(-)
create mode 100644 test/lisp/gnus/mm-url-tests.el
diff --git a/lisp/gnus/mm-url.el b/lisp/gnus/mm-url.el
index 11847a79f17..5b68b25ec2e 100644
--- a/lisp/gnus/mm-url.el
+++ b/lisp/gnus/mm-url.el
@@ -433,13 +433,12 @@ mm-url-encode-multipart-form-data
(insert (number-to-string filedata))))))
((equal name "submit")
(insert
- "Content-Disposition: form-data; name=\"submit\"\r\n\r\nSubmit\r\n"))
+ "Content-Disposition: form-data; name=\"submit\"\r\n\r\nSubmit"))
(t
(insert (format "Content-Disposition: form-data; name=%S\r\n\r\n"
name))
(insert value)))
- (unless (bolp)
- (insert "\r\n"))))
+ (insert "\r\n")))
(insert "--" boundary "--\r\n")
(buffer-string)))
diff --git a/test/lisp/gnus/mm-url-tests.el b/test/lisp/gnus/mm-url-tests.el
new file mode 100644
index 00000000000..7b8d45b6061
--- /dev/null
+++ b/test/lisp/gnus/mm-url-tests.el
@@ -0,0 +1,160 @@
+;;; mm-url-tests.el --- -*- lexical-binding:t -*-
+
+;; Copyright (C) 2021-2023 Free Software Foundation, Inc.
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
+
+;;; Commentary:
+
+;;; Code:
+
+(require 'ert)
+(require 'mm-url)
+
+
+(ert-deftest test-mm-url-encode-multipart-form-data:nil ()
+ (should
+ (string=
+ (mm-url-encode-multipart-form-data '() "BOUNDARY")
+ "--BOUNDARY--\r\n")))
+
+(ert-deftest test-mm-url-encode-multipart-form-data:name-value ()
+ (should
+ (string=
+ (mm-url-encode-multipart-form-data
+ '(("key" . "value")) "BOUNDARY")
+ (concat "--BOUNDARY\r\n"
+ "Content-Disposition: form-data; name=\"key\"\r\n"
+ "\r\n"
+ "value\r\n"
+ "--BOUNDARY--\r\n"))))
+
+(ert-deftest test-mm-url-encode-multipart-form-data:submit ()
+ (should
+ (string=
+ (mm-url-encode-multipart-form-data '(("submit")) "BOUNDARY")
+ (concat "--BOUNDARY\r\n"
+ "Content-Disposition: form-data; name=\"submit\"\r\n"
+ "\r\n"
+ "Submit\r\n"
+ "--BOUNDARY--\r\n"))))
+
+(ert-deftest test-mm-url-encode-multipart-form-data:file ()
+ (should
+ (string=
+ (mm-url-encode-multipart-form-data
+ '(("file" . (("name" . "a")
+ ("filename" . "b")
+ ("content-type" . "c")
+ ("filedata" . "d\n"))))
+ "BOUNDARY")
+
+ (concat
+ "--BOUNDARY\r\n"
+ "Content-Disposition: form-data; name=\"a\"; filename=\"b\"\r\n"
+ "Content-Transfer-Encoding: binary\r\n"
+ "Content-Type: c\r\n"
+ "\r\n"
+
+ ;; file content
+ "d\n"
+
+ ;; rfc 2046 section 5
+ ;; https://www.rfc-editor.org/rfc/rfc2046#section-5
+ ;; "The boundary delimiter MUST occur at the beginning of a
+ ;; line, i.e., following a CRLF, and the initial CRLF is
+ ;; considered to be attached to the boundary delimiter line
+ ;; rather than part of the preceding part."
+ "\r\n"
+
+ "--BOUNDARY--\r\n"))))
+
+(ert-deftest test-mm-url-encode-multipart-form-data--all-parts ()
+ (should
+ (string=
+ (mm-url-encode-multipart-form-data
+ '(("name" . "value")
+ ("submit")
+ ("file" . (("name" . "a")
+ ("filename" . "b")
+ ("content-type" . "c")
+ ("filedata" . "d"))))
+ "BOUNDARY")
+ (concat
+ "--BOUNDARY\r\n"
+ "Content-Disposition: form-data; name=\"name\"\r\n"
+ "\r\n"
+ "value\r\n"
+ "--BOUNDARY\r\n"
+ "Content-Disposition: form-data; name=\"submit\"\r\n"
+ "\r\n"
+ "Submit\r\n"
+ "--BOUNDARY\r\n"
+ "Content-Disposition: form-data; name=\"a\"; filename=\"b\"\r\n"
+ "Content-Transfer-Encoding: binary\r\n"
+ "Content-Type: c\r\n"
+ "\r\n"
+
+ ;; file content
+ "d"
+
+ ;; rfc 2046 section 5
+ ;; the \r\n is attached to the boundary below it
+ "\r\n"
+ "--BOUNDARY--\r\n"))))
+
+(ert-deftest test-mm-url-encode-multipart-form-data-two-files ()
+ (should
+ (string=
+ (mm-url-encode-multipart-form-data
+ '(("file" . (("name" . "a")
+ ("filename" . "b")
+ ("content-type" . "c")
+ ("filedata" . "d\n")))
+ ("file" . (("name" . "e")
+ ("filename" . "f")
+ ("content-type" . "g")
+ ("filedata" . "h\n"))))
+ "BOUNDARY")
+ (concat
+ "--BOUNDARY\r\n"
+ "Content-Disposition: form-data; name=\"a\"; filename=\"b\"\r\n"
+ "Content-Transfer-Encoding: binary\r\n"
+ "Content-Type: c\r\n"
+ "\r\n"
+
+ ;; file content
+ "d\n"
+
+ ;; rfc2046 section 5
+ ;; the \r\n is attached to the boundary below it
+ "\r\n"
+ "--BOUNDARY\r\n"
+ "Content-Disposition: form-data; name=\"e\"; filename=\"f\"\r\n"
+ "Content-Transfer-Encoding: binary\r\n"
+ "Content-Type: g\r\n"
+ "\r\n"
+
+ ;; file content
+ "h\n"
+
+ ;; rfc 2046 section 5
+ ;; the \r\n is attached to the boundary below it
+ "\r\n"
+ "--BOUNDARY--\r\n"))))
+
+
+;;; mm-url-tests.el ends here
--
2.39.2
next prev parent reply other threads:[~2023-07-21 9:04 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-07 5:25 bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form ozzloy
2023-06-07 12:30 ` Eli Zaretskii
2023-06-08 2:48 ` ozzloy
2023-06-08 6:09 ` Eli Zaretskii
2023-06-08 6:43 ` ozzloy
2023-06-08 6:52 ` ozzloy
2023-06-10 9:42 ` Eli Zaretskii
2023-06-11 1:38 ` ozzloy
2023-06-18 23:23 ` ozzloy
2023-06-19 16:13 ` Eli Zaretskii
2023-06-22 16:49 ` ozzloy
2023-06-22 18:25 ` ozzloy
2023-06-22 18:29 ` Eli Zaretskii
2023-06-23 8:22 ` ozzloy
2023-07-18 19:04 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-07-21 9:04 ` ozzloy [this message]
2023-08-29 0:28 ` ozzloy
2023-12-02 15:03 ` ozzloy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CACT2Oni9DHqSqT_ODtGu93AHDyMfAiqth1ZcySGoY7MmTm_MuQ@mail.gmail.com \
--to=ozzloy@gmail.com \
--cc=63941@debbugs.gnu.org \
--cc=eliz@gnu.org \
--cc=monnier@iro.umontreal.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).