* bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form @ 2023-06-07 5:25 ozzloy 2023-06-07 12:30 ` Eli Zaretskii 0 siblings, 1 reply; 18+ messages in thread From: ozzloy @ 2023-06-07 5:25 UTC (permalink / raw) To: 63941 [-- Attachment #1.1: Type: text/plain, Size: 3530 bytes --] When I POST a file ending with a newline using EWW, the final newline gets chomped off of the content. This is because mm-url-encode-multipart-form-data inserts CRLF unless it is at the beginning of a line. For file uploads, this behavior is incorrect. To reproduce, 0. Upload a file ending in "\n" using EWW. 1. Observe the POST does not have CRLF between the file content and the boundary. Refer to https://www.rfc-editor.org/rfc/rfc2046#section-5.1.1 for details. I have not tested other kinds of html form posts. I am not familiar with what should be tested. I did include tests for this specific bug though. In GNU Emacs 29.0.91 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.33, cairo version 1.16.0) of 2023-06-06 built on trent-reznor Repository revision: b5eb43ba289519704c6cb0fe456038dcaec172c3 Repository branch: CRLF-before-noninitial-boundary Windowing system distributor 'The X.Org Foundation', version 11.0.12101004 System Description: Ubuntu 22.04.2 LTS Configured features: CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GSETTINGS HARFBUZZ JPEG LIBSELINUX LIBXML2 MODULES NOTIFY INOTIFY PDUMPER PNG SECCOMP SOUND THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE XIM XINPUT2 XPM GTK3 ZLIB Important settings: value of $LANG: en_US.UTF-8 value of $XMODIFIERS: @im=ibus locale-coding-system: utf-8-unix Major mode: Fundamental Minor modes in effect: tooltip-mode: t global-eldoc-mode: t show-paren-mode: t electric-indent-mode: t mouse-wheel-mode: t tool-bar-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t blink-cursor-mode: t buffer-read-only: t line-number-mode: t indent-tabs-mode: t transient-mark-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t Load-path shadows: None found. Features: (shadow sort mail-extr emacsbug message mailcap yank-media puny dired dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg rfc6068 epg-config gnus-util text-property-search time-date subr-x mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader cl-loaddefs cl-lib sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils rmc iso-transl tooltip cconv eldoc paren electric uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel term/x-win x-win term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock font-lock syntax font-core term/tty-colors frame minibuffer nadvice seq simple cl-generic indonesian philippine cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese composite emoji-zwj charscript charprop case-table epa-hook jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button loaddefs theme-loaddefs faces cus-face macroexp files window text-properties overlay sha1 md5 base64 format env code-pages mule custom widget keymap hashtable-print-readable backquote threads dbusbind inotify dynamic-setting system-font-setting font-render-setting cairo move-toolbar gtk x-toolkit xinput2 x multi-tty make-network-process emacs) Memory information: ((conses 16 36910 11079) (symbols 48 5128 0) (strings 32 13160 1100) (string-bytes 1 372301) (vectors 16 9331) (vector-slots 8 149255 18597) (floats 8 33 21) (intervals 56 326 0) (buffers 976 11)) [-- Attachment #1.2: Type: text/html, Size: 3963 bytes --] [-- Attachment #2: 0001-always-CRLF-before-non-first-boundary-in-multipart-f.patch --] [-- Type: text/x-patch, Size: 3269 bytes --] From 46da4c9d9367aaf4bd3ce2faf118845f3930dabf Mon Sep 17 00:00:00 2001 From: Daniel Watson <ozzloy@gmail.com> Date: Sat, 3 Jun 2023 21:15:25 -0700 Subject: [PATCH] ; always CRLF before non-first boundary in multipart form data ; Insert CRLF after file contents and before boundary, ; in accordance with the syntax description here ; https://www.rfc-editor.org/rfc/rfc2046#section-5.1.1 ; The CRLF is attached to the boundary, and not the preceding part. --- lisp/gnus/mm-url.el | 3 +- test/lisp/gnus/mm-url-tests.el | 53 ++++++++++++++++++++++++++++++++++ 2 files changed, 54 insertions(+), 2 deletions(-) create mode 100644 test/lisp/gnus/mm-url-tests.el diff --git a/lisp/gnus/mm-url.el b/lisp/gnus/mm-url.el index 11847a79f17..022762a7799 100644 --- a/lisp/gnus/mm-url.el +++ b/lisp/gnus/mm-url.el @@ -438,8 +438,7 @@ mm-url-encode-multipart-form-data (insert (format "Content-Disposition: form-data; name=%S\r\n\r\n" name)) (insert value))) - (unless (bolp) - (insert "\r\n")))) + (insert "\r\n"))) (insert "--" boundary "--\r\n") (buffer-string))) diff --git a/test/lisp/gnus/mm-url-tests.el b/test/lisp/gnus/mm-url-tests.el new file mode 100644 index 00000000000..ed51cb7d086 --- /dev/null +++ b/test/lisp/gnus/mm-url-tests.el @@ -0,0 +1,53 @@ +;;; mm-url-tests.el --- -*- lexical-binding:t -*- + +;; Copyright (C) 2021-2023 Free Software Foundation, Inc. + +;; This file is part of GNU Emacs. + +;; GNU Emacs is free software: you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation, either version 3 of the License, or +;; (at your option) any later version. + +;; GNU Emacs is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. + +;; You should have received a copy of the GNU General Public License +;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. + +;;; Commentary: + +;;; Code: + +(require 'ert) +(require 'mm-url) + +(ert-deftest test-mm-url-encode-multipart-form-data () + (letrec + ((boundary "====-=-=") + (make-data (lambda (count) + `(("file" + ("filedata" . ,(make-string count ?\n)) + ("name" . "file") + ("filename" . "g"))))) + (template + (concat + "--" boundary "\r\n" + "Content-Disposition:" + " form-data; name=\"file\"; filename=\"g\"\r\n" + "Content-Transfer-Encoding: binary\r\n" + "Content-Type: text/plain\r\n" + "\r\n" + "%s" ;; here's the file content + "\r\n" ;; \r\n attaches to boundary below, not file content + ;; ref: https://www.rfc-editor.org/rfc/rfc2046#section-5.1.1 + "--" boundary "--" "\r\n"))) + (dotimes (count 3) + (let ((data (funcall make-data count)) + (expected (format template (make-string count ?\n)))) + (should (equal (mm-url-encode-multipart-form-data data boundary) + expected)))))) + +;;; mm-url-tests.el ends here -- 2.34.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form 2023-06-07 5:25 bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form ozzloy @ 2023-06-07 12:30 ` Eli Zaretskii 2023-06-08 2:48 ` ozzloy 0 siblings, 1 reply; 18+ messages in thread From: Eli Zaretskii @ 2023-06-07 12:30 UTC (permalink / raw) To: ozzloy; +Cc: 63941 > From: ozzloy <ozzloy@gmail.com> > Date: Tue, 6 Jun 2023 22:25:40 -0700 > > When I POST a file ending with a newline using EWW, the final newline > gets chomped off of the content. This is because > mm-url-encode-multipart-form-data inserts CRLF unless it is at the > beginning of a line. For file uploads, this behavior is incorrect. > > To reproduce, > > 0. Upload a file ending in "\n" using EWW. > > 1. Observe the POST does not have CRLF between the file content and the > boundary. Thanks, but could you please describe in detail what should be done in step 0? how to "post a file using EWW"? We need the details to be sure we reproduce the same problem as you do, and also to test the solution and any alternatives. ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form 2023-06-07 12:30 ` Eli Zaretskii @ 2023-06-08 2:48 ` ozzloy 2023-06-08 6:09 ` Eli Zaretskii 2023-07-18 19:04 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 0 siblings, 2 replies; 18+ messages in thread From: ozzloy @ 2023-06-08 2:48 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 63941 [-- Attachment #1: Type: text/plain, Size: 2147 bytes --] > please describe in detail what should be done Sure! I'm not sure what would be too much, and what would be too little explanation. The short version is to run the following code, #+begin_src elisp (require 'mm-url) (let ((data '(("file" ("filedata" . "file content\n") ("name" . "file") ("filename" . "filename")))) (boundary "BOUNDARY")) (mm-url-encode-multipart-form-data data boundary)) #+end_src #+RESULTS: : --BOUNDARY^M : Content-Disposition: form-data; name="file"; filename="filename"^M : Content-Transfer-Encoding: binary^M : Content-Type: text/plain^M : ^M : file content : --BOUNDARY--^M (I've replaced carriage returns with literal '^' and 'M' to avoid email mangling) and observe the absence of CRLF between the file content and the boundary. From https://www.rfc-editor.org/rfc/rfc2046#section-5.1.1 this description, it seems like there should be. Here's a longer set of steps. 0. run http server #+begin_src bash git clone https://git.sr.ht/~ozzloy/emacs-bug-63941 cd emacs-bug-63941 git checkout reproduce-bug-63941 ./server.py #+end_src 1. Then use EWW to browse to localhost:8085 and upload the file =filename=. Here's my result when doing that, first with EWW. #+begin_quote upload_content = b'file content', name = 'filename', size = 12 127.0.0.1 - - [07/Jun/2023 18:55:03] "POST / HTTP/1.1" 200 - #+end_quote The bug is that the file content no longer has the final "\n". 2. To confirm, in a separate shell, I posted using curl. #+begin_src bash curl -F "file=@filename;filename=filename" localhost:8085 #+end_src Here's the output I got #+begin_quote upload_content = b'file content\n', name = 'filename', size = 13 127.0.0.1 - - [07/Jun/2023 18:57:12] "POST / HTTP/1.1" 200 - #+end_quote The file's content ends with "\n", which is the expected behavior. This is the second HTTP file upload server I have used and seen this behavior. The same thing happened for one I wrote in clojure using Aleph. That one is here https://git.sr.ht/~ozzloy/fupload but I made the python one because it's probably less set up for other people to reproduce it. [-- Attachment #2: Type: text/html, Size: 2844 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form 2023-06-08 2:48 ` ozzloy @ 2023-06-08 6:09 ` Eli Zaretskii 2023-06-08 6:43 ` ozzloy 2023-07-18 19:04 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 1 sibling, 1 reply; 18+ messages in thread From: Eli Zaretskii @ 2023-06-08 6:09 UTC (permalink / raw) To: ozzloy; +Cc: 63941 > From: ozzloy <ozzloy@gmail.com> > Date: Wed, 7 Jun 2023 19:48:29 -0700 > Cc: 63941@debbugs.gnu.org > > 1. Then use EWW to browse to localhost:8085 and upload the file =filename=. This part is exactly what I'm asking about: how do I "upload the file" using EWW? Can you please show what I should type to do that? > Here's my result when doing that, first with EWW. > #+begin_quote > upload_content = b'file content', name = 'filename', size = 12 > 127.0.0.1 - - [07/Jun/2023 18:55:03] "POST / HTTP/1.1" 200 - > #+end_quote Where does one see this result? Is it in some Emacs buffer? In that case, what is the name of that buffer? Thanks. ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form 2023-06-08 6:09 ` Eli Zaretskii @ 2023-06-08 6:43 ` ozzloy 2023-06-08 6:52 ` ozzloy 0 siblings, 1 reply; 18+ messages in thread From: ozzloy @ 2023-06-08 6:43 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 63941 [-- Attachment #1: Type: text/plain, Size: 1828 bytes --] Thanks for clarifying! > This part is exactly what I'm asking about: how do I "upload the file" > using EWW? Can you please show what I should type to do that? 0. In a terminal, do the following #+begin_src bash git clone https://git.sr.ht/~ozzloy/emacs-bug-63941 cd emacs-bug-63941 git checkout reproduce-bug-63941 ./server.py #+end_src 1. In an emacs window, do the following M-x eww<Enter> localhost:8085<Enter> Once EWW opens localhost:8085, it will display the web page with a button labeled "Browse". Click that button. 2. It will then ask you to choose a file. Choose the file =.../emacs-bug-63941/filename= in the same directory where =server.py= is. 3. Hit <tab> to go to the "Submit" button, and hit <Enter> to upload that file. > Where does one see this result? Is it in some Emacs buffer? In that > case, what is the name of that buffer? The output will be in the terminal where =./server.py= was run. I ran it in an emacs shell buffer named =$emacs-bug-63941=. On Wed, Jun 7, 2023 at 11:09 PM Eli Zaretskii <eliz@gnu.org> wrote: > > From: ozzloy <ozzloy@gmail.com> > > Date: Wed, 7 Jun 2023 19:48:29 -0700 > > Cc: 63941@debbugs.gnu.org > > > > 1. Then use EWW to browse to localhost:8085 and upload the file > =filename=. > > This part is exactly what I'm asking about: how do I "upload the file" > using EWW? Can you please show what I should type to do that? > > > Here's my result when doing that, first with EWW. > > #+begin_quote > > upload_content = b'file content', name = 'filename', size = 12 > > 127.0.0.1 - - [07/Jun/2023 18:55:03] "POST / HTTP/1.1" 200 - > > #+end_quote > > Where does one see this result? Is it in some Emacs buffer? In that > case, what is the name of that buffer? > > Thanks. > [-- Attachment #2: Type: text/html, Size: 2563 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form 2023-06-08 6:43 ` ozzloy @ 2023-06-08 6:52 ` ozzloy 2023-06-10 9:42 ` Eli Zaretskii 0 siblings, 1 reply; 18+ messages in thread From: ozzloy @ 2023-06-08 6:52 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 63941 [-- Attachment #1: Type: text/plain, Size: 2300 bytes --] Quick fix. In the last email, I said > Once EWW opens localhost:8085, it will display the web page with a > button labeled "Browse". Click that button. My bad. Clicking does not work for me. What I actually do is put the cursor on the button and hit <Enter>. Then it will bring up the file On Wed, Jun 7, 2023 at 11:43 PM ozzloy <ozzloy@gmail.com> wrote: > > Thanks for clarifying! > > > This part is exactly what I'm asking about: how do I "upload the file" > > using EWW? Can you please show what I should type to do that? > > 0. In a terminal, do the following > #+begin_src bash > git clone https://git.sr.ht/~ozzloy/emacs-bug-63941 > cd emacs-bug-63941 > git checkout reproduce-bug-63941 > ./server.py > #+end_src > > 1. In an emacs window, do the following > > M-x eww<Enter> > localhost:8085<Enter> > > Once EWW opens localhost:8085, it will display the web page with a > button > labeled "Browse". Click that button. > > 2. It will then ask you to choose a file. Choose the file > =.../emacs-bug-63941/filename= in the same directory where =server.py= > is. > > 3. Hit <tab> to go to the "Submit" button, and hit <Enter> to upload that > file. > > > > Where does one see this result? Is it in some Emacs buffer? In that > > case, what is the name of that buffer? > > The output will be in the terminal where =./server.py= was run. I ran it > in an > emacs shell buffer named =$emacs-bug-63941=. > > > On Wed, Jun 7, 2023 at 11:09 PM Eli Zaretskii <eliz@gnu.org> wrote: > >> > From: ozzloy <ozzloy@gmail.com> >> > Date: Wed, 7 Jun 2023 19:48:29 -0700 >> > Cc: 63941@debbugs.gnu.org >> > >> > 1. Then use EWW to browse to localhost:8085 and upload the file >> =filename=. >> >> This part is exactly what I'm asking about: how do I "upload the file" >> using EWW? Can you please show what I should type to do that? >> >> > Here's my result when doing that, first with EWW. >> > #+begin_quote >> > upload_content = b'file content', name = 'filename', size = 12 >> > 127.0.0.1 - - [07/Jun/2023 18:55:03] "POST / HTTP/1.1" 200 - >> > #+end_quote >> >> Where does one see this result? Is it in some Emacs buffer? In that >> case, what is the name of that buffer? >> >> Thanks. >> > [-- Attachment #2: Type: text/html, Size: 3288 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form 2023-06-08 6:52 ` ozzloy @ 2023-06-10 9:42 ` Eli Zaretskii 2023-06-11 1:38 ` ozzloy 0 siblings, 1 reply; 18+ messages in thread From: Eli Zaretskii @ 2023-06-10 9:42 UTC (permalink / raw) To: ozzloy; +Cc: 63941 > From: ozzloy <ozzloy@gmail.com> > Date: Wed, 7 Jun 2023 23:52:31 -0700 > Cc: 63941@debbugs.gnu.org > > Quick fix. In the last email, I said > > Once EWW opens localhost:8085, it will display the web page with a > > button labeled "Browse". Click that button. > > My bad. Clicking does not work for me. What I actually do is put the > cursor on the button and hit <Enter>. Then it will bring up the file Thanks. Making the change in mm-url.el sounds scary: that code was written many years ago, and who knows where it is used? The reason for testing bolp there is not explained, but I'm guessing they didn't want to craete an empty line? It might be better to make a local change in eww.el: just append "\r\n" to the request data. I'll leave it to HTTP experts to decide what to do about this. ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form 2023-06-10 9:42 ` Eli Zaretskii @ 2023-06-11 1:38 ` ozzloy 2023-06-18 23:23 ` ozzloy 0 siblings, 1 reply; 18+ messages in thread From: ozzloy @ 2023-06-11 1:38 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 63941 [-- Attachment #1: Type: text/plain, Size: 2707 bytes --] > Making the change in mm-url.el sounds scary: that code was written > many years ago, and who knows where it is used? Yeah, that makes sense. I'll see what I can find about that, at least within emacs itself. > The reason for testing bolp there is not explained, but I'm guessing > they didn't want to craete an empty line? Not sure. It looks like it was introduced during a patch intended to fix submitting binary data. Would it make sense to CC the two authors of the following commits? I dug into the history a bit to try to find out where =(unless (bolp)= came from #+begin_src bash git log -L '/(unless (bolp)/,+1:lisp/gnus/mm-url.el' > unless-bolp #+end_src It looks like it was introduced in this commit #+begin_quote commit fca2f70380dcb054497470aaf8eda6173063928e Author: Kenjiro Nakayama <nakayamakenjiro@gmail.com> Date: Mon Nov 10 22:33:55 2014 +0100 Allow uploading files from eww #+end_quote and the "\r\n" was included unconditionally before the boundary #+begin_src elisp ;; use the boundary as a separator (concat "\r\n--" boundary "\r\n") #+end_src Then it this commit changed it #+begin_quote commit a6e0188dffc394698d9ffbef50401f14a31c8722 Author: Lars Ingebrigtsen <larsi@gnus.org> Date: Thu Oct 13 21:39:29 2016 +0200 Fix problem with submitting binary data via HTTP forms #+end_quote to the following #+begin_src elisp (unless (bolp) (insert "\r\n")) #+end_src plus a bunch of "\r\n"s scattered throughout the different conditions. It's not clear to me how the behavior of the function changed. It looks like maybe instead of the initial diff I proposed, something like this would produce more accurate output for file uploads, and also change the surrounding code as little as possible, #+begin_quote diff --git a/lisp/gnus/mm-url.el b/lisp/gnus/mm-url.el index 11847a79f17..686dea20b6a 100644 --- a/lisp/gnus/mm-url.el +++ b/lisp/gnus/mm-url.el @@ -430,7 +430,8 @@ mm-url-encode-multipart-form-data (insert filedata)) ;; How can this possibly be useful? ((integerp filedata) - (insert (number-to-string filedata)))))) + (insert (number-to-string filedata))))) + (insert "\r\n")) ((equal name "submit") (insert "Content-Disposition: form-data; name=\"submit\"\r\n\r\nSubmit\r\n")) #+end_quote But it seems like it would be better to do a larger rewrite that consolidates the places where "\r\n" is added before the non-initial boundary. In order to figure that out, I will read the two versions more closely to figure out what the differences are and write some ERT tests showing the difference. Hopefully that will clear things up. [-- Attachment #2: Type: text/html, Size: 3378 bytes --] ^ permalink raw reply related [flat|nested] 18+ messages in thread
* bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form 2023-06-11 1:38 ` ozzloy @ 2023-06-18 23:23 ` ozzloy 2023-06-19 16:13 ` Eli Zaretskii 0 siblings, 1 reply; 18+ messages in thread From: ozzloy @ 2023-06-18 23:23 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 63941 [-- Attachment #1.1: Type: text/plain, Size: 911 bytes --] > who knows where it is used? As far as I can tell, it is only ever used by =(eww-submit ...)= from =lisp/net/eww.el=. That's the only place it's used in all of emacs, and every reference I can find on the web. I've tested the heck out of it now, and also used firefox, chromium, and curl to generate http messages for comparison. Those tests of the different versions of =mm-url-encode-multipart-form-data=, as well as the http messages generated by firefox, chromium, and curl can be seen here https://git.sr.ht/~ozzloy/emacs-bug-63941/tree/53a7949a5db21c456c1da3b4add29343c3d02137/item/mm-url-tests.el The patch I have attached to this email generates output that matches firefox, chromium, and curl. It also includes a bunch of tests for the included version of =mm-url-encode-multipart-form-data= If there's some change that would make the patch a better fit, let me know. I'm happy to modify it. [-- Attachment #1.2: Type: text/html, Size: 1158 bytes --] [-- Attachment #2: 0001-always-CRLF-before-non-first-boundary-in-mulitpart-f.patch --] [-- Type: text/x-patch, Size: 12293 bytes --] From c73ccda90623519434f8ec2c700adf70ac1d6a00 Mon Sep 17 00:00:00 2001 From: Daniel Watson <ozzloy@gmail.com> Date: Sun, 18 Jun 2023 15:14:32 -0700 Subject: [PATCH] always CRLF before non-first boundary in mulitpart form ; Insert CRLF after file contents and before boundary, ; in accordance with the syntax description here ; rfc2046 section 5.1.1. ; The CRLF is attached to the boundary, and not the preceding part. --- lisp/gnus/mm-url.el | 156 +++++++++++++++++++------- test/lisp/gnus/mm-url-tests.el | 194 +++++++++++++++++++++++++++++++++ 2 files changed, 308 insertions(+), 42 deletions(-) create mode 100644 test/lisp/gnus/mm-url-tests.el diff --git a/lisp/gnus/mm-url.el b/lisp/gnus/mm-url.el index 11847a79f17..f3eecbf18ed 100644 --- a/lisp/gnus/mm-url.el +++ b/lisp/gnus/mm-url.el @@ -394,54 +394,126 @@ mm-url-encode-www-form-urlencoded (autoload 'mml-compute-boundary "mml") +(defun mm-url--encode-multipart-form-file (file-metadata) + "Return a list of lines used to represent an http message with a +file and its contents, as described in FILE-METADATA + +* example +** input + '((\"name\" . \"a\") + (\"filename\" . \"b\") + (\"filedata\" . \"c\") + (\"content-type\" . \"d\")) +** output + '(\"Content-Disposition: form-data; name=\\\"a\\\"; filename=\\\"b\\\"\" + \"Content-Type: d; charset=utf-8\" + \"Content-Transfer-Encoding: binary\" + \"\" ;; completely blank line after all the headers + \"c\") ;; entire file content" + (cl-flet ((get-value (key default) + (alist-get + key file-metadata default nil #'string=))) + (let ((filedata (get-value "filedata" nil)) + (name (get-value "name" "file")) + (filename (get-value "filename" "file")) + (content-type (get-value "content-type" "text/plain"))) + (list + (format "Content-Disposition: form-data; name=%S; filename=%S" + name filename) + (format "Content-Type: %s; charset=utf-8" content-type) + "Content-Transfer-Encoding: binary" + "" + (cl-typecase filedata + (integer (number-to-string filedata)) + (string filedata)))))) + +(defun mm-url--encode-multipart-form-name-value (name value) + "Return NAME and VALUE as a list of lines used for creating +multipart/form-data http message. + +* example +** input + '(\"a\" . \"b\") +** output + '(\"Content-Disposition: form-data; name=\\\"a\\\"\" + \"\" + \"b\")" + (list (format "Content-Disposition: form-data; name=%S" name) + "" + value)) + +(defun mm-url--encode-multipart-form-submit () + "Return list of lines for submit message." + (list "Content-Disposition: form-data; name=\"submit\"" + "" + "Submit")) + +(defun mm-url--encode-multipart-form-datum (separator datum) + "Return list of lines in one segment of a multipart form. + +* example +** input + \"--BOUNDARY\" '(\"submit\") +** output + '(\"--BOUNDARY\" + \"Content-Disposition: form-data; name=\\\"submit\\\"\" + \"\" + \"Submit\")" + (let ((name (car datum)) + (value (cdr datum))) + (cons separator + (pcase name + ("file" (mm-url--encode-multipart-form-file value)) + ("submit" (mm-url--encode-multipart-form-submit)) + (_ (mm-url--encode-multipart-form-name-value + name value)))))) + +(defun mm-url--encode-multipart-form-lines (boundary data) + "return a list of lines for entire multipart form + +* examples +** input 0 + \"BOUNDARY\" '((\"submit\")) +** output 0 + '(\"--BOUNDARY\" + \"Content-Disposition: form-data; name=\\\"submit\\\"\" + \"\" + \"Submit\" + \"--BOUNDARY--\" + \"\") +** input 1 + \"BOUNDARY\" '((\"submit\") (\"a\" . \"b\")) +** output 1 + '(\"--BOUNDARY\" + \"Content-Disposition: form-data; name=\\\"submit\\\"\" + \"\" + \"Submit\" + \"--BOUNDARY\" + \"Content-Disposition: form-data; name=\\\"a\\\"\" + \"\" + \"b\" + \"--BOUNDARY--\" + \"\")" + (let ((separator (concat "--" boundary)) + (ending (concat "--" boundary "--"))) + (append (mapcan (apply-partially 'mm-url--encode-multipart-form-datum + separator) + data) + (list ending "")))) + (defun mm-url-encode-multipart-form-data (data &optional boundary) "Return DATA encoded in multipart/form-data. DATA is a list where the elements can have the following form: (\"NAME\" . \"VALUE\") (\"submit\") - (\"file\" . ((\"name\" . \"NAME\") - (\"filename\" . \"FILENAME\") - (\"content-type\" . \"CONTENT-TYPE\") - (\"filedata\" . \"FILEDATA\"))) + (\"file\" . ((\"name\" . \"NAME\") + (\"filename\" . \"FILENAME\") + (\"content-type\" . \"CONTENT-TYPE\") + (\"filedata\" . \"FILEDATA\"))) Lowercase strings above are literals and uppercase are not." - ;; RFC1867 - ;; Get a good boundary. - (unless boundary - (setq boundary (mml-compute-boundary '()))) - (with-temp-buffer - (set-buffer-multibyte nil) - (dolist (elem data) - (let ((name (car elem)) - (value (cdr elem))) - (insert "--" boundary "\r\n") - (cond - ((equal name "file") - (insert (format - "Content-Disposition: form-data; name=%S; filename=%S\r\n" - (or (cdr (assoc "name" value)) name) - (cdr (assoc "filename" value)))) - (insert "Content-Transfer-Encoding: binary\r\n") - (insert (format "Content-Type: %s\r\n\r\n" - (or (cdr (assoc "content-type" value)) - "text/plain"))) - (let ((filedata (cdr (assoc "filedata" value)))) - (cond - ((stringp filedata) - (insert filedata)) - ;; How can this possibly be useful? - ((integerp filedata) - (insert (number-to-string filedata)))))) - ((equal name "submit") - (insert - "Content-Disposition: form-data; name=\"submit\"\r\n\r\nSubmit\r\n")) - (t - (insert (format "Content-Disposition: form-data; name=%S\r\n\r\n" - name)) - (insert value))) - (unless (bolp) - (insert "\r\n")))) - (insert "--" boundary "--\r\n") - (buffer-string))) + (let* ((boundary (or boundary (mml-compute-boundary '()))) + (lines (mm-url--encode-multipart-form-lines boundary data))) + (mapconcat 'identity lines "\r\n"))) (defun mm-url-remove-markup () "Remove all HTML markup, leaving just plain text." diff --git a/test/lisp/gnus/mm-url-tests.el b/test/lisp/gnus/mm-url-tests.el new file mode 100644 index 00000000000..0c31a86146b --- /dev/null +++ b/test/lisp/gnus/mm-url-tests.el @@ -0,0 +1,194 @@ +;;; mm-url-tests.el --- -*- lexical-binding:t -*- + +;; Copyright (C) 2021-2023 Free Software Foundation, Inc. + +;; This file is part of GNU Emacs. + +;; GNU Emacs is free software: you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation, either version 3 of the License, or +;; (at your option) any later version. + +;; GNU Emacs is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. + +;; You should have received a copy of the GNU General Public License +;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. + +;;; Commentary: + +;;; Code: + +(require 'ert) +(require 'mm-url) + + +(ert-deftest test-mm-url--encode-multipart-form-file () + (should + (equal + (mm-url--encode-multipart-form-file '(("name" . "a") + ("filename" . "b") + ("filedata" . "c\n") + ("content-type" . "d"))) + '("Content-Disposition: form-data; name=\"a\"; filename=\"b\"" + "Content-Type: d; charset=utf-8" + "Content-Transfer-Encoding: binary" + "" ;; completely blank line to separate headers from content + "c\n")))) + +(ert-deftest test-mm-url--encode-multipart-form-name-value () + (should (equal (mm-url--encode-multipart-form-name-value "a" "b") + '("Content-Disposition: form-data; name=\"a\"" + "" + "b")))) + +(ert-deftest test-mm-url--encode-multipart-form-submit () + (should + (equal (mm-url--encode-multipart-form-submit) + (list "Content-Disposition: form-data; name=\"submit\"" + "" + "Submit")))) + +(ert-deftest test-mm-url--encode-multipart-form-datum () + (should + (equal + (mm-url--encode-multipart-form-datum "--BOUNDARY" '("submit")) + '("--BOUNDARY" + "Content-Disposition: form-data; name=\"submit\"" + "" + "Submit")))) + +(ert-deftest test-mm-url-encode-multipart-form-data-nil () + (should + (string= + (mm-url-encode-multipart-form-data '() "BOUNDARY") + "--BOUNDARY--\r\n"))) + +(ert-deftest test-mm-url-encode-multipart-form-data--name-value () + (should + (string= + (mm-url-encode-multipart-form-data + '(("name" . "value")) "BOUNDARY") + (concat "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"name\"\r\n" + "\r\n" + "value\r\n" + "--BOUNDARY--\r\n")))) + +(ert-deftest test-mm-url-encode-multipart-form-data--submit () + (should + (string= + (mm-url-encode-multipart-form-data '(("submit")) "BOUNDARY") + (concat "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"submit\"\r\n" + "\r\n" + "Submit\r\n" + "--BOUNDARY--\r\n")))) + +(ert-deftest test-mm-url-encode-multipart-form-data--file () + (should + (string= + (mm-url-encode-multipart-form-data + '(("file" . (("name" . "a") + ("filename" . "b") + ("content-type" . "c") + ("filedata" . "d\n")))) + "BOUNDARY") + (concat + "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"a\"; filename=\"b\"\r\n" + "Content-Type: c; charset=utf-8\r\n" + "Content-Transfer-Encoding: binary\r\n" + "\r\n" + + ;; file content + "d\n" + + ;; rfc 2046 section 5 + ;; https://www.rfc-editor.org/rfc/rfc2046#section-5 + ;; "The boundary delimiter MUST occur at the beginning of a + ;; line, i.e., following a CRLF, and the initial CRLF is + ;; considered to be attached to the boundary delimiter line + ;; rather than part of the preceding part." + "\r\n" + + "--BOUNDARY--\r\n")))) + +(ert-deftest test-mm-url-encode-multipart-form-data--all-parts () + (should + (string= + (mm-url-encode-multipart-form-data + '(("name" . "value") + ("submit") + ("file" . (("name" . "a") + ("filename" . "b") + ("content-type" . "c") + ("filedata" . "d")))) + "BOUNDARY") + (concat + "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"name\"\r\n" + "\r\n" + "value\r\n" + "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"submit\"\r\n" + "\r\n" + "Submit\r\n" + "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"a\"; filename=\"b\"\r\n" + "Content-Type: c; charset=utf-8\r\n" + "Content-Transfer-Encoding: binary\r\n" + "\r\n" + + ;; file content + "d" + + ;; rfc 2046 section 5 + ;; the \r\n is attached to the boundary below it + "\r\n" + "--BOUNDARY--\r\n")))) + +(ert-deftest test-mm-url-encode-multipart-form-data-two-files () + (should + (string= + (mm-url-encode-multipart-form-data + '(("file" . (("name" . "a") + ("filename" . "b") + ("content-type" . "c") + ("filedata" . "d\n"))) + ("file" . (("name" . "e") + ("filename" . "f") + ("content-type" . "g") + ("filedata" . "h\n")))) + "BOUNDARY") + (concat + "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"a\"; filename=\"b\"\r\n" + "Content-Type: c; charset=utf-8\r\n" + "Content-Transfer-Encoding: binary\r\n" + "\r\n" + + ;; file content + "d\n" + + ;; rfc2046 section 5 + ;; the \r\n is attached to the boundary below it + "\r\n" + "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"e\"; filename=\"f\"\r\n" + "Content-Type: g; charset=utf-8\r\n" + "Content-Transfer-Encoding: binary\r\n" + "\r\n" + + ;; file content + "h\n" + + ;; rfc 2046 section 5 + ;; the \r\n is attached to the boundary below it + "\r\n" + "--BOUNDARY--\r\n")))) + + +;;; mm-url-tests.el ends here -- 2.34.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form 2023-06-18 23:23 ` ozzloy @ 2023-06-19 16:13 ` Eli Zaretskii 2023-06-22 16:49 ` ozzloy 0 siblings, 1 reply; 18+ messages in thread From: Eli Zaretskii @ 2023-06-19 16:13 UTC (permalink / raw) To: ozzloy; +Cc: 63941 > From: ozzloy <ozzloy@gmail.com> > Date: Sun, 18 Jun 2023 16:23:23 -0700 > Cc: 63941@debbugs.gnu.org > > > who knows where it is used? > > As far as I can tell, it is only ever used by =(eww-submit ...)= from > =lisp/net/eww.el=. That's the only place it's used in all of emacs, > and every reference I can find on the web. > > I've tested the heck out of it now, and also used firefox, chromium, > and curl to generate http messages for comparison. > > Those tests of the different versions of > =mm-url-encode-multipart-form-data=, as well as the http messages > generated by firefox, chromium, and curl can be seen here > > https://git.sr.ht/~ozzloy/emacs-bug-63941/tree/53a7949a5db21c456c1da3b4add29343c3d02137/item/mm-url-tests.el > > > The patch I have attached to this email generates output that matches > firefox, chromium, and curl. It also includes a bunch of tests for > the included version of =mm-url-encode-multipart-form-data= > > If there's some change that would make the patch a better fit, let me > know. I'm happy to modify it. Thanks. I really need some Gnus/eww expert to look into this and review the patch. ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form 2023-06-19 16:13 ` Eli Zaretskii @ 2023-06-22 16:49 ` ozzloy 2023-06-22 18:25 ` ozzloy 0 siblings, 1 reply; 18+ messages in thread From: ozzloy @ 2023-06-22 16:49 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 63941 [-- Attachment #1: Type: text/plain, Size: 711 bytes --] > I really need some Gnus/eww expert to look into this and review the > patch. ok. While I was writing this patch, I found some other things to patch. For example, the multipart message will currently only be generated if there is a file to send. Instead, it should be used whenever there is a form that has the attribute "enctype" with its value set to "mutlipart/form-data". Is the patch ok other than needing review from an expert in gnus/eww? Is the formatting good, are the comments all right, did I do the comment block at the top of the new file =mm-url-tests.el= correctly? I'm writing and submitting more patches, so if there's something like that I should fix, feel free to let me know. thanks! [-- Attachment #2: Type: text/html, Size: 857 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form 2023-06-22 16:49 ` ozzloy @ 2023-06-22 18:25 ` ozzloy 2023-06-22 18:29 ` Eli Zaretskii 0 siblings, 1 reply; 18+ messages in thread From: ozzloy @ 2023-06-22 18:25 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 63941 [-- Attachment #1: Type: text/plain, Size: 379 bytes --] > I really need some Gnus/eww expert to look into this and review the patch. Just checking, is this something I should do? Perhaps by reaching out to Lars and/or Kenjiro, the two most recent authors on that function? I made shorter notes on what the bug is and how to reproduce it here https://git.sr.ht/~ozzloy/emacs-bug-63941/tree/reproduce/item/readme.org thanks again! [-- Attachment #2: Type: text/html, Size: 554 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form 2023-06-22 18:25 ` ozzloy @ 2023-06-22 18:29 ` Eli Zaretskii 2023-06-23 8:22 ` ozzloy 0 siblings, 1 reply; 18+ messages in thread From: Eli Zaretskii @ 2023-06-22 18:29 UTC (permalink / raw) To: ozzloy; +Cc: 63941 > From: ozzloy <ozzloy@gmail.com> > Date: Thu, 22 Jun 2023 11:25:05 -0700 > Cc: 63941@debbugs.gnu.org > > > I really need some Gnus/eww expert to look into this and review the patch. > > Just checking, is this something I should do? Perhaps by reaching out to Lars > and/or Kenjiro, the two most recent authors on that function? You could try, yes. ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form 2023-06-22 18:29 ` Eli Zaretskii @ 2023-06-23 8:22 ` ozzloy 0 siblings, 0 replies; 18+ messages in thread From: ozzloy @ 2023-06-23 8:22 UTC (permalink / raw) To: Eli Zaretskii, Kenjiro Nakayama, Lars Ingebrigtsen; +Cc: 63941 [-- Attachment #1: Type: text/plain, Size: 1120 bytes --] >> From: ozzloy <ozzloy@gmail.com> >> Date: Thu, 22 Jun 2023 11:25:05 -0700 >> Cc: 63941@debbugs.gnu.org >> >>> I really need some Gnus/eww expert to look into this and review the patch. >> >> Just checking, is this something I should do? Perhaps by reaching out to Lars >> and/or Kenjiro, the two most recent authors on that function? > > You could try, yes. oh, wow, i'm glad i asked. i thought maybe you were doing that and i should just wait until you were done. Hi Lars and Kenjiro! Could you review this patch? Or suggest someone who would be good to ask? https://git.sr.ht/~ozzloy/emacs/commit/c73ccda90623519434f8ec2c700adf70ac1d6a00 It fixes a problem where files uploaded through EWW have a newline at EOF chomped. Here's a set of instructions to reproduce the problem https://git.sr.ht/~ozzloy/emacs-bug-63941/tree/reproduce/item/readme.org and a bunch of tests showing the http messages generated by different clients and characterizing the behavior of prior versions of =mm-url-encode-multipart-form-data= https://git.sr.ht/~ozzloy/emacs-bug-63941/tree/reproduce/item/mm-url-tests.el thanks! [-- Attachment #2: Type: text/html, Size: 1729 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form 2023-06-08 2:48 ` ozzloy 2023-06-08 6:09 ` Eli Zaretskii @ 2023-07-18 19:04 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2023-07-21 9:04 ` ozzloy 1 sibling, 1 reply; 18+ messages in thread From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-07-18 19:04 UTC (permalink / raw) To: ozzloy; +Cc: 63941, Eli Zaretskii OK, I'm not super familiar with this code, and while I really appreciate your tests, I'd rather not completely replace the old with a brand new code because that makes it hard for me to see what's really changed. So, as a first step I took the existing code, and did the following: - "Inline" the (unless (bolp) (insert "\r\n")) into every branch of the preceding `cond`. We know this makes no difference. - Remove the (unless (bolp) (insert "\r\n")) in the "submit" branch since it follows an insertion that ends in "n" and hence doesn't do anything. - Remove the `(unless (bolp)` test from the "file" branch since that's what this bug report is about: - when `filedata` is a number, this makes no difference (we just inserted that number without a trailing \n). - when `filedata` is a string that ends in non-\n (a case that currently works right) this makes no difference. - when it does end in \n this does make a difference which fixes this bug. - when `filedata` is an empty string, this add an additional \r\n compared to the current code. This seems right to me (I expect the decoding software will skip the \r\n at the of the header and then look for \r\n<BOUNDARY>, so it's important to have two \r\n). - In the default case, I left the code as is, but the `(unless (bolp)` test is probably not right. There remain some questions on this patch: 1- When `filedata` is neither a number nor a string this is treated the same as an empty string. Is that right? Or should it just never happen (in which case we should probably catch this case and signal an error). 2- Should we also remove the `(unless (bolp)` in the default case? I think we do (and my reading of your tests agrees, tho I couldn't run your test suite on this version of the code, among other things because it contains multiple tests with the same name, so it gives me things like (error "Test `test-mm-url-encode-multipart-form-data-A-ab-c' redefined")). I suspect we can also simply this code by moving the first (insert "--" boundary "\r\n") before the loop, and the second into the loop so we can make it insert "\r\n--" boundary "\r\n" (and thus remove \r\n from the end of each of the preceding branches). Stefan diff --git a/lisp/gnus/mm-url.el b/lisp/gnus/mm-url.el index 11847a79f17..3041ad16264 100644 --- a/lisp/gnus/mm-url.el +++ b/lisp/gnus/mm-url.el @@ -430,16 +430,17 @@ mm-url-encode-multipart-form-data (insert filedata)) ;; How can this possibly be useful? ((integerp filedata) - (insert (number-to-string filedata)))))) + (insert (number-to-string filedata))))) + (insert "\r\n")) ((equal name "submit") (insert "Content-Disposition: form-data; name=\"submit\"\r\n\r\nSubmit\r\n")) (t (insert (format "Content-Disposition: form-data; name=%S\r\n\r\n" name)) - (insert value))) - (unless (bolp) - (insert "\r\n")))) + (insert value) + (unless (bolp) + (insert "\r\n")))))) (insert "--" boundary "--\r\n") (buffer-string))) ^ permalink raw reply related [flat|nested] 18+ messages in thread
* bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form 2023-07-18 19:04 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-07-21 9:04 ` ozzloy 2023-08-29 0:28 ` ozzloy 0 siblings, 1 reply; 18+ messages in thread From: ozzloy @ 2023-07-21 9:04 UTC (permalink / raw) To: Stefan Monnier; +Cc: 63941, Eli Zaretskii [-- Attachment #1.1: Type: text/plain, Size: 2657 bytes --] Thanks so much for taking the time to review this! > I'd rather not completely replace the old with a brand new code > because that makes it hard for me to see what's really changed. I thought this would be ok, since the existing version is a complete rewrite of the original (so there's precedent for complete rewrite of a function in a commit to fix a bug), and if there were tests showing the behavior to be the same as before, except where this bug is fixed. (Though I see the tests are currently broken). Based on your feedback, and some help from #emacs, I made a patch that is very minimal to the existing code, with better commit message, and attached it here. The patch removes the =(unless (bolp) ...)= guarding inserting CRLF. The RFC says the "boundary delimiter MUST occur at the beginning of a line, i.e., following a CRLF". =(bolp)= is not enough to guarantee the boundary is preceded by CRLF. It can be true when the point is after just "\n". Because CRLF is inserted unconditionally after the =cond=, the code does not include the boundary's CRLF in each branch of the =cond=. > when `filedata` is an empty string, this add an additional \r\n > compared to the current code. This seems right to me Me too, and all the other clients I tested. > I expect the decoding software will skip the \r\n at the of the > header and then look for \r\n<BOUNDARY>, so it's important to have > two \r\n What you said is true. In addition, they also accept "HEADER\r\nfile content\n--BOUNDARY" as the content "file content", and consider the last "\n" as attached to the boundary. That's where the file's final "\n" gets lost if the file's content was initially "file content\n". > There remain some questions on this patch: While fixing this bug, I found a few more problems in addition to the two that you mention here. I was not addressing them yet, since I thought I should fix one bug per patch. > I suspect we can also simply this code by moving the first (insert > "--" boundary "\r\n") before the loop, and the second into the loop > so we can make it insert "\r\n--" boundary "\r\n" (and thus remove > \r\n from the end of each of the preceding branches). Almost, but not quite, or at least not without some awkward (to my eye) repositioning of inserting boundaries, "--", and "\r\n". The final boundary complicates things. It is different from all the others, it is "--BOUNDARY--" instead of "--BOUNDARY" Here's what I ended up with when I tried that, https://git.sr.ht/~ozzloy/emacs-bug-63941/tree/simplify-insert-boundaries-and---/item/mm-url.el#L397 This passes the tests in =emacs/tests/lisp/gnus/mm-url-tests.el=. [-- Attachment #1.2: Type: text/html, Size: 3306 bytes --] [-- Attachment #2: 0001-allow-uploading-files-ending-in-newline-via-EWW.patch --] [-- Type: text/x-patch, Size: 7181 bytes --] From b3e2f07367c6e9836b3a7635b86335bf7104b2b9 Mon Sep 17 00:00:00 2001 From: Daniel Watson <ozzloy@gmail.com> Date: Fri, 21 Jul 2023 00:03:06 -0700 Subject: [PATCH] ; allow uploading files ending in newline via EWW ; Ensure that every boundary in HTTP message is preceded by "\r\n". ; According to RFC 2046, section 5, the "\r\n" preceding the boundary ; is not considered part of the preceding content, and is instead ; attached to the boundary that follows it. ; ; Consider a file named "1nl", consisting only of the single character ; '\n'. ; ; The old version of =mm-url-encode-multipart-form-data= creates the ; following HTTP message: ; ; (concat ; "--BOUNDARY\r\n" ; "Content-Disposition: form-data; name=\"a\"; filename=\"1nl\"\r\n" ; "Content-Transfer-Encoding: binary\r\n" ; "Content-Type: c\r\n" ; "\r\n" ; ; ;; file content ; "\n" ; ; ;; NOTE "\r\n" is absent before the following boundary ; "--BOUNDARY--\r\n") ; ; the new version of =mm-url-encode-multipart-form-data= creates this ; HTTP message: ; ; (concat ; "--BOUNDARY\r\n" ; "Content-Disposition: form-data; name=\"a\"; filename=\"1nl\"\r\n" ; "Content-Transfer-Encoding: binary\r\n" ; "Content-Type: c\r\n" ; "\r\n" ; ; ;; file content ; "\n" ; ; ;; NOTE "\r\n" precedes the boundary ; "\r\n" ; "--BOUNDARY--\r\n") ; ; The new code ensures all boundaries after the one at the very ; beginning are preceded by "\r\n", whether they are the final, or ; other internal boundaries. --- lisp/gnus/mm-url.el | 5 +- test/lisp/gnus/mm-url-tests.el | 160 +++++++++++++++++++++++++++++++++ 2 files changed, 162 insertions(+), 3 deletions(-) create mode 100644 test/lisp/gnus/mm-url-tests.el diff --git a/lisp/gnus/mm-url.el b/lisp/gnus/mm-url.el index 11847a79f17..5b68b25ec2e 100644 --- a/lisp/gnus/mm-url.el +++ b/lisp/gnus/mm-url.el @@ -433,13 +433,12 @@ mm-url-encode-multipart-form-data (insert (number-to-string filedata)))))) ((equal name "submit") (insert - "Content-Disposition: form-data; name=\"submit\"\r\n\r\nSubmit\r\n")) + "Content-Disposition: form-data; name=\"submit\"\r\n\r\nSubmit")) (t (insert (format "Content-Disposition: form-data; name=%S\r\n\r\n" name)) (insert value))) - (unless (bolp) - (insert "\r\n")))) + (insert "\r\n"))) (insert "--" boundary "--\r\n") (buffer-string))) diff --git a/test/lisp/gnus/mm-url-tests.el b/test/lisp/gnus/mm-url-tests.el new file mode 100644 index 00000000000..7b8d45b6061 --- /dev/null +++ b/test/lisp/gnus/mm-url-tests.el @@ -0,0 +1,160 @@ +;;; mm-url-tests.el --- -*- lexical-binding:t -*- + +;; Copyright (C) 2021-2023 Free Software Foundation, Inc. + +;; This file is part of GNU Emacs. + +;; GNU Emacs is free software: you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation, either version 3 of the License, or +;; (at your option) any later version. + +;; GNU Emacs is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. + +;; You should have received a copy of the GNU General Public License +;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. + +;;; Commentary: + +;;; Code: + +(require 'ert) +(require 'mm-url) + + +(ert-deftest test-mm-url-encode-multipart-form-data:nil () + (should + (string= + (mm-url-encode-multipart-form-data '() "BOUNDARY") + "--BOUNDARY--\r\n"))) + +(ert-deftest test-mm-url-encode-multipart-form-data:name-value () + (should + (string= + (mm-url-encode-multipart-form-data + '(("key" . "value")) "BOUNDARY") + (concat "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"key\"\r\n" + "\r\n" + "value\r\n" + "--BOUNDARY--\r\n")))) + +(ert-deftest test-mm-url-encode-multipart-form-data:submit () + (should + (string= + (mm-url-encode-multipart-form-data '(("submit")) "BOUNDARY") + (concat "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"submit\"\r\n" + "\r\n" + "Submit\r\n" + "--BOUNDARY--\r\n")))) + +(ert-deftest test-mm-url-encode-multipart-form-data:file () + (should + (string= + (mm-url-encode-multipart-form-data + '(("file" . (("name" . "a") + ("filename" . "b") + ("content-type" . "c") + ("filedata" . "d\n")))) + "BOUNDARY") + + (concat + "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"a\"; filename=\"b\"\r\n" + "Content-Transfer-Encoding: binary\r\n" + "Content-Type: c\r\n" + "\r\n" + + ;; file content + "d\n" + + ;; rfc 2046 section 5 + ;; https://www.rfc-editor.org/rfc/rfc2046#section-5 + ;; "The boundary delimiter MUST occur at the beginning of a + ;; line, i.e., following a CRLF, and the initial CRLF is + ;; considered to be attached to the boundary delimiter line + ;; rather than part of the preceding part." + "\r\n" + + "--BOUNDARY--\r\n")))) + +(ert-deftest test-mm-url-encode-multipart-form-data--all-parts () + (should + (string= + (mm-url-encode-multipart-form-data + '(("name" . "value") + ("submit") + ("file" . (("name" . "a") + ("filename" . "b") + ("content-type" . "c") + ("filedata" . "d")))) + "BOUNDARY") + (concat + "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"name\"\r\n" + "\r\n" + "value\r\n" + "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"submit\"\r\n" + "\r\n" + "Submit\r\n" + "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"a\"; filename=\"b\"\r\n" + "Content-Transfer-Encoding: binary\r\n" + "Content-Type: c\r\n" + "\r\n" + + ;; file content + "d" + + ;; rfc 2046 section 5 + ;; the \r\n is attached to the boundary below it + "\r\n" + "--BOUNDARY--\r\n")))) + +(ert-deftest test-mm-url-encode-multipart-form-data-two-files () + (should + (string= + (mm-url-encode-multipart-form-data + '(("file" . (("name" . "a") + ("filename" . "b") + ("content-type" . "c") + ("filedata" . "d\n"))) + ("file" . (("name" . "e") + ("filename" . "f") + ("content-type" . "g") + ("filedata" . "h\n")))) + "BOUNDARY") + (concat + "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"a\"; filename=\"b\"\r\n" + "Content-Transfer-Encoding: binary\r\n" + "Content-Type: c\r\n" + "\r\n" + + ;; file content + "d\n" + + ;; rfc2046 section 5 + ;; the \r\n is attached to the boundary below it + "\r\n" + "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"e\"; filename=\"f\"\r\n" + "Content-Transfer-Encoding: binary\r\n" + "Content-Type: g\r\n" + "\r\n" + + ;; file content + "h\n" + + ;; rfc 2046 section 5 + ;; the \r\n is attached to the boundary below it + "\r\n" + "--BOUNDARY--\r\n")))) + + +;;; mm-url-tests.el ends here -- 2.39.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form 2023-07-21 9:04 ` ozzloy @ 2023-08-29 0:28 ` ozzloy 2023-12-02 15:03 ` ozzloy 0 siblings, 1 reply; 18+ messages in thread From: ozzloy @ 2023-08-29 0:28 UTC (permalink / raw) To: Stefan Monnier; +Cc: 63941, Eli Zaretskii [-- Attachment #1.1: Type: text/plain, Size: 240 bytes --] i've modified the commit with a couple goals. + make the bug fixing part of the diff as small as possible. + made the new tests look more like existing ones. + rebased onto the head of the emacs-29 branch pulled in earlier today. [-- Attachment #1.2: Type: text/html, Size: 310 bytes --] [-- Attachment #2: 0001-upload-newline-terminated-files-via-EWW-Bug-63941.patch --] [-- Type: text/x-patch, Size: 6327 bytes --] From c6c42e4a72fc9c26086d7e9f0bcd70999a1bc213 Mon Sep 17 00:00:00 2001 From: Daniel Watson <ozzloy@gmail.com> Date: Fri, 21 Jul 2023 00:03:06 -0700 Subject: [PATCH] ; upload newline terminated files via EWW (Bug#63941) ; Ensure that every boundary in HTTP message is preceded by ; "\r\n". According to RFC 2046, section 5, the "\r\n" ; preceding the boundary is not considered part of the ; preceding content, and is instead attached to the boundary ; that follows it. ; Consider a file named "1nl", consisting only of the single ; character '\n'. ; The prior version of =mm-url-encode-multipart-form-data= ; creates the following HTTP message: ; (concat ; "--BOUNDARY\r\n" ; "Content-Disposition: form-data; name=\"a\"; filename=\"1nl\"\r\n" ; "Content-Transfer-Encoding: binary\r\n" ; "Content-Type: c\r\n" ; "\r\n" ; ; ;; file content ; "\n" ; ; ;; NOTE "\r\n" is absent here before the following boundary ; "--BOUNDARY--\r\n") ; this version of =mm-url-encode-multipart-form-data= creates ; this HTTP message: ; (concat ; "--BOUNDARY\r\n" ; "Content-Disposition: form-data; name=\"a\"; filename=\"1nl\"\r\n" ; "Content-Transfer-Encoding: binary\r\n" ; "Content-Type: c\r\n" ; "\r\n" ; ; ;; file content ; "\n" ; ; ;; NOTE "\r\n" preceding the boundary ; "\r\n" ; "--BOUNDARY--\r\n") ; The new code ensures all boundaries after the one at the very ; beginning are preceded by "\r\n", whether they are the final, ; or other internal boundaries. --- lisp/gnus/mm-url.el | 5 +- test/lisp/gnus/mm-url-tests.el | 131 +++++++++++++++++++++++++++++++++ 2 files changed, 133 insertions(+), 3 deletions(-) create mode 100644 test/lisp/gnus/mm-url-tests.el diff --git a/lisp/gnus/mm-url.el b/lisp/gnus/mm-url.el index 11847a79f17..5b68b25ec2e 100644 --- a/lisp/gnus/mm-url.el +++ b/lisp/gnus/mm-url.el @@ -433,13 +433,12 @@ mm-url-encode-multipart-form-data (insert (number-to-string filedata)))))) ((equal name "submit") (insert - "Content-Disposition: form-data; name=\"submit\"\r\n\r\nSubmit\r\n")) + "Content-Disposition: form-data; name=\"submit\"\r\n\r\nSubmit")) (t (insert (format "Content-Disposition: form-data; name=%S\r\n\r\n" name)) (insert value))) - (unless (bolp) - (insert "\r\n")))) + (insert "\r\n"))) (insert "--" boundary "--\r\n") (buffer-string))) diff --git a/test/lisp/gnus/mm-url-tests.el b/test/lisp/gnus/mm-url-tests.el new file mode 100644 index 00000000000..44efba1867c --- /dev/null +++ b/test/lisp/gnus/mm-url-tests.el @@ -0,0 +1,131 @@ +;;; mm-url-tests.el --- -*- lexical-binding:t -*- + +;; Copyright (C) 2021-2023 Free Software Foundation, Inc. + +;; This file is part of GNU Emacs. + +;; GNU Emacs is free software: you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation, either version 3 of the License, or +;; (at your option) any later version. + +;; GNU Emacs is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. + +;; You should have received a copy of the GNU General Public License +;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. + +;;; Commentary: + +;;; Code: + +(require 'ert) +(require 'mm-url) + + +(ert-deftest mm-url-encode-multipart-form-data () + ;; nil + (should + (string= + (mm-url-encode-multipart-form-data '() "BOUNDARY") + "--BOUNDARY--\r\n")) + + ;; key value pair + (should + (string= + (mm-url-encode-multipart-form-data + '(("key" . "value")) "BOUNDARY") + (concat "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"key\"\r\n" + "\r\n" + "value\r\n" + "--BOUNDARY--\r\n"))) + + ;; submit + (should + (string= + (mm-url-encode-multipart-form-data '(("submit")) "BOUNDARY") + (concat "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"submit\"\r\n" + "\r\n" + "Submit\r\n" + "--BOUNDARY--\r\n"))) + + ;; file ending in newline + (should + (string= + (mm-url-encode-multipart-form-data + '(("file" . (("name" . "a") + ("filename" . "b") + ("content-type" . "c") + ("filedata" . "d\n")))) + "BOUNDARY") + (concat + "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"a\"; filename=\"b\"\r\n" + "Content-Transfer-Encoding: binary\r\n" + "Content-Type: c\r\n" + "\r\n" + "d\n\r\n" + "--BOUNDARY--\r\n"))) + + ;; stress test combining parts: key-value, submit, file + (should + (string= + (mm-url-encode-multipart-form-data + '(("name" . "value") + ("submit") + ("file" . (("name" . "a") + ("filename" . "b") + ("content-type" . "c") + ("filedata" . "d")))) + "BOUNDARY") + (concat + "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"name\"\r\n" + "\r\n" + "value\r\n" + "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"submit\"\r\n" + "\r\n" + "Submit\r\n" + "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"a\"; filename=\"b\"\r\n" + "Content-Transfer-Encoding: binary\r\n" + "Content-Type: c\r\n" + "\r\n" + "d\r\n" + "--BOUNDARY--\r\n"))) + + ;; two files, newline at EOF, before final and non-final BOUNDARY + (should + (string= + (mm-url-encode-multipart-form-data + '(("file" . (("name" . "a") + ("filename" . "b") + ("content-type" . "c") + ("filedata" . "d\n"))) + ("file" . (("name" . "e") + ("filename" . "f") + ("content-type" . "g") + ("filedata" . "h\n")))) + "BOUNDARY") + (concat + "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"a\"; filename=\"b\"\r\n" + "Content-Transfer-Encoding: binary\r\n" + "Content-Type: c\r\n" + "\r\n" + "d\n\r\n" + "--BOUNDARY\r\n" + "Content-Disposition: form-data; name=\"e\"; filename=\"f\"\r\n" + "Content-Transfer-Encoding: binary\r\n" + "Content-Type: g\r\n" + "\r\n" + "h\n\r\n" + "--BOUNDARY--\r\n")))) + + +;;; mm-url-tests.el ends here -- 2.39.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form 2023-08-29 0:28 ` ozzloy @ 2023-12-02 15:03 ` ozzloy 0 siblings, 0 replies; 18+ messages in thread From: ozzloy @ 2023-12-02 15:03 UTC (permalink / raw) To: 63941; +Cc: Stefan Kangas [-- Attachment #1: Type: text/plain, Size: 336 bytes --] bump On Mon, Aug 28, 2023 at 5:28 PM ozzloy <ozzloy@gmail.com> wrote: > i've modified the commit with a couple goals. > + make the bug fixing part of the diff as small as possible. > + made the new tests look more like existing ones. > + rebased onto the head of the emacs-29 branch pulled in > earlier today. > [-- Attachment #2: Type: text/html, Size: 664 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2023-12-02 15:03 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-06-07 5:25 bug#63941: [PATCH] ; always CRLF before non-first boundary in multipart form ozzloy 2023-06-07 12:30 ` Eli Zaretskii 2023-06-08 2:48 ` ozzloy 2023-06-08 6:09 ` Eli Zaretskii 2023-06-08 6:43 ` ozzloy 2023-06-08 6:52 ` ozzloy 2023-06-10 9:42 ` Eli Zaretskii 2023-06-11 1:38 ` ozzloy 2023-06-18 23:23 ` ozzloy 2023-06-19 16:13 ` Eli Zaretskii 2023-06-22 16:49 ` ozzloy 2023-06-22 18:25 ` ozzloy 2023-06-22 18:29 ` Eli Zaretskii 2023-06-23 8:22 ` ozzloy 2023-07-18 19:04 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors 2023-07-21 9:04 ` ozzloy 2023-08-29 0:28 ` ozzloy 2023-12-02 15:03 ` ozzloy
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).