unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: ozzloy <ozzloy@gmail.com>
To: Stefan Monnier <monnier@iro.umontreal.ca>
Cc: 63941@debbugs.gnu.org, Eli Zaretskii <eliz@gnu.org>
Subject: bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form
Date: Fri, 21 Jul 2023 02:04:27 -0700	[thread overview]
Message-ID: <CACT2Oni9DHqSqT_ODtGu93AHDyMfAiqth1ZcySGoY7MmTm_MuQ@mail.gmail.com> (raw)
In-Reply-To: <jwvh6q1rrwy.fsf-monnier+emacs@gnu.org>


[-- Attachment #1.1: Type: text/plain, Size: 2657 bytes --]

Thanks so much for taking the time to review this!

> I'd rather not completely replace the old with a brand new code
> because that makes it hard for me to see what's really changed.

I thought this would be ok, since the existing version is a complete
rewrite of the original (so there's precedent for complete rewrite of
a function in a commit to fix a bug), and if there were tests showing
the behavior to be the same as before, except where this bug is
fixed. (Though I see the tests are currently broken).

Based on your feedback, and some help from #emacs, I made a patch
that is very minimal to the existing code, with better commit
message, and attached it here.

The patch removes the =(unless (bolp) ...)= guarding inserting CRLF.
The RFC says the "boundary delimiter MUST occur at the beginning of a
line, i.e., following a CRLF".  =(bolp)= is not enough to guarantee
the boundary is preceded by CRLF.  It can be true when the point
is after just "\n".

Because CRLF is inserted unconditionally after the =cond=, the code
does not include the boundary's CRLF in each branch of the =cond=.

> when `filedata` is an empty string, this add an additional \r\n
> compared to the current code.  This seems right to me

Me too, and all the other clients I tested.

> I expect the decoding software will skip the \r\n at the of the
> header and then look for \r\n<BOUNDARY>, so it's important to have
> two \r\n

 What you said is true.  In addition, they also accept

"HEADER\r\nfile content\n--BOUNDARY"

as the content "file content", and consider the last "\n" as attached
to the boundary.  That's where the file's final "\n" gets lost if the
file's content was initially "file content\n".

> There remain some questions on this patch:

While fixing this bug, I found a few more problems in addition to the
two that you mention here.  I was not addressing them yet, since I
thought I should fix one bug per patch.

> I suspect we can also simply this code by moving the first (insert
> "--" boundary "\r\n") before the loop, and the second into the loop
> so we can make it insert "\r\n--" boundary "\r\n" (and thus remove
> \r\n from the end of each of the preceding branches).

Almost, but not quite, or at least not without some awkward (to my
eye) repositioning of inserting boundaries, "--", and "\r\n".  The
final boundary complicates things.  It is different from all the
others, it is "--BOUNDARY--" instead of "--BOUNDARY"

Here's what I ended up with when I tried that,

https://git.sr.ht/~ozzloy/emacs-bug-63941/tree/simplify-insert-boundaries-and---/item/mm-url.el#L397

This passes the tests in =emacs/tests/lisp/gnus/mm-url-tests.el=.

[-- Attachment #1.2: Type: text/html, Size: 3306 bytes --]

[-- Attachment #2: 0001-allow-uploading-files-ending-in-newline-via-EWW.patch --]
[-- Type: text/x-patch, Size: 7181 bytes --]

From b3e2f07367c6e9836b3a7635b86335bf7104b2b9 Mon Sep 17 00:00:00 2001
From: Daniel Watson <ozzloy@gmail.com>
Date: Fri, 21 Jul 2023 00:03:06 -0700
Subject: [PATCH] ; allow uploading files ending in newline via EWW

; Ensure that every boundary in HTTP message is preceded by "\r\n".
; According to RFC 2046, section 5, the "\r\n" preceding the boundary
; is not considered part of the preceding content, and is instead
; attached to the boundary that follows it.
;
; Consider a file named "1nl", consisting only of the single character
; '\n'.
;
; The old version of =mm-url-encode-multipart-form-data= creates the
; following HTTP message:
;
;   (concat
;    "--BOUNDARY\r\n"
;    "Content-Disposition: form-data; name=\"a\"; filename=\"1nl\"\r\n"
;    "Content-Transfer-Encoding: binary\r\n"
;    "Content-Type: c\r\n"
;    "\r\n"
;
;    ;; file content
;    "\n"
;
;    ;; NOTE "\r\n" is absent before the following boundary
;    "--BOUNDARY--\r\n")
;
; the new version of =mm-url-encode-multipart-form-data= creates this
; HTTP message:
;
;   (concat
;    "--BOUNDARY\r\n"
;    "Content-Disposition: form-data; name=\"a\"; filename=\"1nl\"\r\n"
;    "Content-Transfer-Encoding: binary\r\n"
;    "Content-Type: c\r\n"
;    "\r\n"
;
;    ;; file content
;    "\n"
;
;    ;; NOTE "\r\n" precedes the boundary
;    "\r\n"
;    "--BOUNDARY--\r\n")
;
; The new code ensures all boundaries after the one at the very
; beginning are preceded by "\r\n", whether they are the final, or
; other internal boundaries.
---
 lisp/gnus/mm-url.el            |   5 +-
 test/lisp/gnus/mm-url-tests.el | 160 +++++++++++++++++++++++++++++++++
 2 files changed, 162 insertions(+), 3 deletions(-)
 create mode 100644 test/lisp/gnus/mm-url-tests.el

diff --git a/lisp/gnus/mm-url.el b/lisp/gnus/mm-url.el
index 11847a79f17..5b68b25ec2e 100644
--- a/lisp/gnus/mm-url.el
+++ b/lisp/gnus/mm-url.el
@@ -433,13 +433,12 @@ mm-url-encode-multipart-form-data
 	      (insert (number-to-string filedata))))))
 	 ((equal name "submit")
 	  (insert
-	   "Content-Disposition: form-data; name=\"submit\"\r\n\r\nSubmit\r\n"))
+	   "Content-Disposition: form-data; name=\"submit\"\r\n\r\nSubmit"))
 	 (t
 	  (insert (format "Content-Disposition: form-data; name=%S\r\n\r\n"
 			  name))
 	  (insert value)))
-	(unless (bolp)
-	  (insert "\r\n"))))
+	(insert "\r\n")))
     (insert "--" boundary "--\r\n")
     (buffer-string)))
 
diff --git a/test/lisp/gnus/mm-url-tests.el b/test/lisp/gnus/mm-url-tests.el
new file mode 100644
index 00000000000..7b8d45b6061
--- /dev/null
+++ b/test/lisp/gnus/mm-url-tests.el
@@ -0,0 +1,160 @@
+;;; mm-url-tests.el ---  -*- lexical-binding:t -*-
+
+;; Copyright (C) 2021-2023 Free Software Foundation, Inc.
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.
+
+;;; Commentary:
+
+;;; Code:
+
+(require 'ert)
+(require 'mm-url)
+
+
+(ert-deftest test-mm-url-encode-multipart-form-data:nil ()
+  (should
+   (string=
+    (mm-url-encode-multipart-form-data '() "BOUNDARY")
+    "--BOUNDARY--\r\n")))
+
+(ert-deftest test-mm-url-encode-multipart-form-data:name-value ()
+  (should
+   (string=
+    (mm-url-encode-multipart-form-data
+     '(("key" . "value")) "BOUNDARY")
+    (concat "--BOUNDARY\r\n"
+	    "Content-Disposition: form-data; name=\"key\"\r\n"
+	    "\r\n"
+	    "value\r\n"
+	    "--BOUNDARY--\r\n"))))
+
+(ert-deftest test-mm-url-encode-multipart-form-data:submit ()
+  (should
+   (string=
+    (mm-url-encode-multipart-form-data '(("submit")) "BOUNDARY")
+    (concat "--BOUNDARY\r\n"
+	    "Content-Disposition: form-data; name=\"submit\"\r\n"
+	    "\r\n"
+	    "Submit\r\n"
+	    "--BOUNDARY--\r\n"))))
+
+(ert-deftest test-mm-url-encode-multipart-form-data:file ()
+  (should
+   (string=
+    (mm-url-encode-multipart-form-data
+     '(("file" . (("name"         . "a")
+		  ("filename"     . "b")
+		  ("content-type" . "c")
+		  ("filedata"     . "d\n"))))
+     "BOUNDARY")
+
+    (concat
+     "--BOUNDARY\r\n"
+     "Content-Disposition: form-data; name=\"a\"; filename=\"b\"\r\n"
+     "Content-Transfer-Encoding: binary\r\n"
+     "Content-Type: c\r\n"
+     "\r\n"
+
+     ;; file content
+     "d\n"
+
+     ;; rfc 2046 section 5
+     ;; https://www.rfc-editor.org/rfc/rfc2046#section-5
+     ;; "The boundary delimiter MUST occur at the beginning of a
+     ;; line, i.e., following a CRLF, and the initial CRLF is
+     ;; considered to be attached to the boundary delimiter line
+     ;; rather than part of the preceding part."
+     "\r\n"
+
+     "--BOUNDARY--\r\n"))))
+
+(ert-deftest test-mm-url-encode-multipart-form-data--all-parts ()
+  (should
+   (string=
+    (mm-url-encode-multipart-form-data
+     '(("name" . "value")
+       ("submit")
+       ("file" . (("name"         . "a")
+		  ("filename"     . "b")
+		  ("content-type" . "c")
+		  ("filedata"     . "d"))))
+     "BOUNDARY")
+    (concat
+     "--BOUNDARY\r\n"
+     "Content-Disposition: form-data; name=\"name\"\r\n"
+     "\r\n"
+     "value\r\n"
+     "--BOUNDARY\r\n"
+     "Content-Disposition: form-data; name=\"submit\"\r\n"
+     "\r\n"
+     "Submit\r\n"
+     "--BOUNDARY\r\n"
+     "Content-Disposition: form-data; name=\"a\"; filename=\"b\"\r\n"
+     "Content-Transfer-Encoding: binary\r\n"
+     "Content-Type: c\r\n"
+     "\r\n"
+
+     ;; file content
+     "d"
+
+     ;; rfc 2046 section 5
+     ;; the \r\n is attached to the boundary below it
+     "\r\n"
+     "--BOUNDARY--\r\n"))))
+
+(ert-deftest test-mm-url-encode-multipart-form-data-two-files ()
+  (should
+   (string=
+    (mm-url-encode-multipart-form-data
+     '(("file" . (("name"         . "a")
+		  ("filename"     . "b")
+		  ("content-type" . "c")
+		  ("filedata"     . "d\n")))
+       ("file" . (("name"         . "e")
+		  ("filename"     . "f")
+		  ("content-type" . "g")
+		  ("filedata"     . "h\n"))))
+     "BOUNDARY")
+    (concat
+     "--BOUNDARY\r\n"
+     "Content-Disposition: form-data; name=\"a\"; filename=\"b\"\r\n"
+     "Content-Transfer-Encoding: binary\r\n"
+     "Content-Type: c\r\n"
+     "\r\n"
+
+     ;; file content
+     "d\n"
+
+     ;; rfc2046 section 5
+     ;; the \r\n is attached to the boundary below it
+     "\r\n"
+     "--BOUNDARY\r\n"
+     "Content-Disposition: form-data; name=\"e\"; filename=\"f\"\r\n"
+     "Content-Transfer-Encoding: binary\r\n"
+     "Content-Type: g\r\n"
+     "\r\n"
+
+     ;; file content
+     "h\n"
+
+     ;; rfc 2046 section 5
+     ;; the \r\n is attached to the boundary below it
+     "\r\n"
+     "--BOUNDARY--\r\n"))))
+
+
+;;; mm-url-tests.el ends here
-- 
2.39.2


  reply	other threads:[~2023-07-21  9:04 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-07  5:25 bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form ozzloy
2023-06-07 12:30 ` Eli Zaretskii
2023-06-08  2:48   ` ozzloy
2023-06-08  6:09     ` Eli Zaretskii
2023-06-08  6:43       ` ozzloy
2023-06-08  6:52         ` ozzloy
2023-06-10  9:42           ` Eli Zaretskii
2023-06-11  1:38             ` ozzloy
2023-06-18 23:23               ` ozzloy
2023-06-19 16:13                 ` Eli Zaretskii
2023-06-22 16:49                   ` ozzloy
2023-06-22 18:25                     ` ozzloy
2023-06-22 18:29                       ` Eli Zaretskii
2023-06-23  8:22                         ` ozzloy
2023-07-18 19:04     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-07-21  9:04       ` ozzloy [this message]
2023-08-29  0:28         ` ozzloy
2023-12-02 15:03           ` ozzloy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACT2Oni9DHqSqT_ODtGu93AHDyMfAiqth1ZcySGoY7MmTm_MuQ@mail.gmail.com \
    --to=ozzloy@gmail.com \
    --cc=63941@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=monnier@iro.umontreal.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).