From: Nacho Barrientos <nacho.barrientos@cern.ch>
To: 54989@debbugs.gnu.org
Subject: bug#54989: 28.1; url-http.el: chunked response: wait for the last CRLF
Date: Sun, 17 Apr 2022 10:02:18 +0200 [thread overview]
Message-ID: <87zgkkuo4i.fsf@cern.ch> (raw)
[-- Attachment #1: Type: text/plain, Size: 2787 bytes --]
Hi,
This is a bug report for url.el, more specifically for url-http.el in
the context of processing responses with Transfer-Encoding set to
"chunked".
As per [0], the last chunk of 0 bytes is always accompanied by a last
CRLF that signals the end of the message:
chunked-body = *chunk
last-chunk
trailer-part
CRLF
^ this one
chunk = chunk-size [ chunk-ext ] CRLF
chunk-data CRLF
chunk-size = 1*HEXDIG
last-chunk = 1*("0") [ chunk-ext ] CRLF
chunk-data = 1*OCTET ; a sequence of chunk-size octets
`url-http-chunked-encoding-after-change-function' is able to process
(and remove) that terminator IF AVAILABLE in the buffer when
processing the response, however it won't wait for it if it's not yet
there.
In other words:
| Bottom of the response buffer | Bottom of the full response |
| (visible to url-http) | (to be delivered to Emacs) |
| ------------------------------+-----------------------------|
| 0\r\n | 0\r\n |
| | \r\n |
If the last chunk is processed when the bottom of the response buffer
is as above (note that the whole response has not yet been delivered
to Emacs), url-http will call the user callback without waiting for
the final terminator to be read from the socket.
This is normally not an issue when doing one-shot requests, but it's
problematic when the connection is reused immediately. As there are 2
bytes from the request N that have not been dealt with, they'll be
considered as part of the response of the request N+1. On top, it turns
out that when processing the headers of request N+1,
`url-http-wait-for-headers-change-function' will consider the request a
"headerless malformed response" (due to the leading \r\n) delivering it
broken to the caller.
I'm attaching a patch that I've got applied locally that prevents the
problem from happening by implementing an extra state in which
`url-http-chunked-encoding-after-change-function` properly waits for the
very last element of the message.
Unfortunately my copyright papers are stuck on the FSF for months now so
I'm unable to submit a patch directly myself so feel free to use the
attached diff as a PoC to illustrate the nature of the problem and a
possible fix. A more experienced Elisp developer will surely find a more
elegant solution.
Please note that the patch sits on top of the emacs-28 branch.
For additional context, this bug was found when debugging
magit/ghub (see [1] for details).
Hope this helps.
[0] https://datatracker.ietf.org/doc/html/rfc7230#section-4.1
[1] https://github.com/magit/ghub/issues/81
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-url-http.el-chunked-response-wait-for-the-last-CRLF.patch --]
[-- Type: text/x-patch, Size: 4829 bytes --]
From a11454602f7c4af3af5729b9e504130c660319f3 Mon Sep 17 00:00:00 2001
From: Nacho Barrientos <nacho.barrientos@cern.ch>
Date: Sat, 16 Apr 2022 18:00:39 +0200
Subject: [PATCH] url-http.el: chunked response: wait for the last CRLF.
As per [0], the last chunk of 0 bytes is always accompanied by a last
CRLF that signals the end of the message:
chunked-body = *chunk
last-chunk
trailer-part
CRLF
^ this one
chunk = chunk-size [ chunk-ext ] CRLF
chunk-data CRLF
chunk-size = 1*HEXDIG
last-chunk = 1*("0") [ chunk-ext ] CRLF
chunk-data = 1*OCTET ; a sequence of chunk-size octets
`url-http-chunked-encoding-after-change-function' is able to process
(and remove) that terminator IF AVAILABLE in the buffer when
processing the response, however it won't wait for it if it's not yet
there.
In other words:
| Bottom of the response buffer | Bottom of the full response |
| (visible to url-http) | (to be delivered to Emacs) |
| ------------------------------+-----------------------------|
| 0\r\n | 0\r\n |
| | \r\n |
If the last chunk is processed when the bottom of the response buffer
is as above (note that the whole response has not yet been delivered
to Emacs), url-http will call the user callback without waiting for
the final terminator to be read from the socket.
This is normally not an issue when doing one-shot requests, but it's
problematic when the connection is reused immediately. As there are 2
bytes from the request N that have not been dealt with, they'll be
considered as part of the response of the request N+1. On top, it
turns out that when processing the headers of request N+1,
`url-http-wait-for-headers-change-function' will consider the request
a "headerless malformed response" delivering it broken to the caller.
The proposed fix implements a state in which
`url-http-chunked-encoding-after-change-function` properly waits for
the very last element of the message preventing the problem explained
above from happening.
For additional context, this bug was found when debugging
magit/ghub (see [1] for details).
[0] https://datatracker.ietf.org/doc/html/rfc7230#section-4.1
[1] https://github.com/magit/ghub/issues/81
---
lisp/url/url-http.el | 27 ++++++++++++++++++++-------
1 file changed, 20 insertions(+), 7 deletions(-)
diff --git a/lisp/url/url-http.el b/lisp/url/url-http.el
index 16c3a6a1e6..dfdc03cc6c 100644
--- a/lisp/url/url-http.el
+++ b/lisp/url/url-http.el
@@ -36,6 +36,7 @@
(defvar url-current-object)
(defvar url-http-after-change-function)
(defvar url-http-chunked-counter)
+(defvar url-http-chunked-last-crlf-missing nil)
(defvar url-http-chunked-length)
(defvar url-http-chunked-start)
(defvar url-http-connection-opened)
@@ -1068,7 +1069,16 @@ the callback to be triggered."
Cannot give a sophisticated percentage, but we need a different
function to look for the special 0-length chunk that signifies
the end of the document."
- (save-excursion
+ (if url-http-chunked-last-crlf-missing
+ (progn
+ (goto-char url-http-chunked-last-crlf-missing)
+ (if (not (looking-at "\r\n"))
+ (url-http-debug "Still spinning for the terminator of last chunk...")
+ (url-http-debug "Saw the last CRLF.")
+ (delete-region (match-beginning 0) (match-end 0))
+ (if (url-http-parse-headers)
+ (url-http-activate-callback))))
+ (save-excursion
(goto-char st)
(let ((read-next-chunk t)
(case-fold-search t)
@@ -1145,13 +1155,16 @@ the end of the document."
(url-display-percentage nil nil)
;; Every chunk, even the last 0-length one, is
;; terminated by CRLF. Skip it.
- (when (looking-at "\r?\n")
+ (if (not (looking-at "\r?\n"))
+ (progn
+ (url-http-debug "Spinning for the terminator of last chunk...")
+ (setq-local url-http-chunked-last-crlf-missing (point)))
(url-http-debug "Removing terminator of last chunk")
- (delete-region (match-beginning 0) (match-end 0)))
- (if (re-search-forward "^\r?\n" nil t)
- (url-http-debug "Saw end of trailers..."))
- (if (url-http-parse-headers)
- (url-http-activate-callback))))))))))
+ (delete-region (match-beginning 0) (match-end 0))
+ (if (re-search-forward "^\r?\n" nil t)
+ (url-http-debug "Saw end of trailers..."))
+ (if (url-http-parse-headers)
+ (url-http-activate-callback))))))))))))
(defun url-http-wait-for-headers-change-function (_st nd _length)
;; This will wait for the headers to arrive and then splice in the
--
2.35.3
[-- Attachment #3: Type: text/plain, Size: 9892 bytes --]
In GNU Emacs 28.1 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.24.33, cairo version 1.17.6)
of 2022-04-09 built on frederik
Windowing system distributor 'The X.Org Foundation', version 11.0.12101003
System Description: Arch Linux
Configured using:
'configure --with-native-compilation --without-compress-install
--sysconfdir=/etc --prefix=/usr --libexecdir=/usr/lib
--localstatedir=/var --with-cairo --with-harfbuzz --with-modules
--with-wide-int --with-x-toolkit=gtk3 --with-xft 'CFLAGS=-march=x86-64
-mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2
-Wformat -Werror=format-security -fstack-clash-protection
-fcf-protection -g -ffile-prefix-map=/build/emacs/src=/usr/src/debug
-flto=auto' 'LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now
-flto=auto''
Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBOTF LIBSYSTEMD LIBXML2 M17N_FLT MODULES NATIVE_COMP NOTIFY
INOTIFY PDUMPER PNG RSVG SECCOMP SOUND THREADS TIFF TOOLKIT_SCROLL_BARS
X11 XDBE XIM XPM GTK3 ZLIB
Important settings:
value of $LANG: en_US.UTF-8
locale-coding-system: utf-8-unix
Major mode: ELisp/l
Minor modes in effect:
recentf-mode: t
bug-reference-prog-mode: t
global-git-commit-mode: t
magit-auto-revert-mode: t
goto-address-prog-mode: t
erc-services-mode: t
erc-networks-mode: t
erc-spelling-mode: t
flyspell-mode: t
display-time-mode: t
engine-mode: t
mu4e-marker-icons-mode: t
outline-minor-mode: t
beginend-global-mode: t
beginend-prog-mode: t
auto-revert-mode: t
editorconfig-mode: t
csv-field-index-mode: t
doom-modeline-mode: t
yas-global-mode: t
yas-minor-mode: t
all-the-icons-ivy-rich-mode: t
ivy-rich-project-root-cache-mode: t
ivy-rich-mode: t
ivy-posframe-mode: t
ivy-mode: t
shell-dirtrack-mode: t
whole-line-or-region-global-mode: t
whole-line-or-region-local-mode: t
global-whitespace-mode: t
override-global-mode: t
global-eldoc-mode: t
eldoc-mode: t
show-paren-mode: t
electric-indent-mode: t
mouse-wheel-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
window-divider-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
line-number-mode: t
auto-fill-function: yas--auto-fill
transient-mark-mode: t
hs-minor-mode: t
Load-path shadows:
/home/nacho/.emacs.d/elpa/transient-20220413.2332/transient hides /usr/share/emacs/28.1/lisp/transient
Features:
(shadow magit-patch magit-subtree magit-gitignore magit-ediff ediff
bicycle crux pulse expand-region yaml-mode-expansions
text-mode-expansions cc-mode-expansions the-org-mode-expansions
ruby-mode-expansions js-mode-expansions web-mode-expansions
er-basic-expansions expand-region-core expand-region-custom ielm rg
rg-info-hack rg-menu rg-ibuffer rg-result wgrep-rg wgrep rg-history
rg-header ibuf-ext ibuffer ibuffer-loaddefs grep cus-edit smiley
gnus-cite qp mm-archive mail-extr cl-print debug cus-start dired-aux
emacsbug mwim hippie-exp pcase counsel recentf tree-widget swiper
network-stream url-cache secrets forge-list forge-commands forge-semi
forge-bitbucket buck forge-gogs gogs forge-gitea gtea forge-gitlab glab
forge-github ghub-graphql treepy gsexp ghub gnutls forge-notify
forge-revnote forge-pullreq forge-issue forge-topic yaml bug-reference
forge-post markdown-mode forge-repo forge forge-core forge-db closql
emacsql-sqlite emacsql emacsql-compiler url-http url-auth url-gw nsm
magit-extras magit-bookmark magit-submodule magit-obsolete magit-blame
magit-stash magit-reflog magit-bisect magit-push magit-pull magit-fetch
magit-clone magit-remote magit-commit magit-sequence magit-notes
magit-worktree magit-tag magit-merge magit-branch magit-reset
magit-files magit-refs magit-status magit magit-repos magit-apply
magit-wip magit-log which-func magit-diff smerge-mode diff git-commit
log-edit pcvs-util add-log magit-core magit-autorevert magit-margin
magit-transient magit-process with-editor magit-mode transient magit-git
magit-base magit-section crm em-unix em-term term ehelp em-script
em-prompt em-hist em-pred em-glob em-cmpl em-basic em-banner em-alias
hideshow goto-addr erc-services erc-join erc-networks erc-track
erc-match erc-spelling erc-goodies erc erc-backend erc-loaddefs quail
flyspell ispell org-element ol-eww ol-rmail ol-mhe ol-irc ol-info
ol-gnus nnselect gnus-search eieio-opt speedbar ezimage dframe
ol-docview ol-bibtex ol-bbdb ol-w3m ol-doi org-link-doi org-agenda appt
diary-lib diary-loaddefs notifications org-tree-slide org-timer
org-clock org-capture org-refile org-protocol desktop-environment time
exwm-systemtray xcb-systemtray xcb-xembed exwm exwm-input xcb-keysyms
xcb-xkb exwm-manage exwm-floating xcb-cursor xcb-render exwm-layout
exwm-workspace exwm-core xcb-ewmh xcb-icccm xcb xcb-xproto xcb-types
xcb-debug engine-mode eww xdg mm-url eradio elfeed-show elfeed-search
vc-mtn vc-hg vc-git diff-mode vc-bzr vc-src vc-sccs vc-svn vc-cvs vc-rcs
vc vc-dispatcher editorconfig-core editorconfig-core-handle
editorconfig-fnmatch elfeed-csv elfeed elfeed-curl elfeed-log elfeed-db
elfeed-lib avl-tree url-queue xml-query mu4e-marker-icons cus-load mu4e
mu4e-org mu4e-main mu4e-view mu4e-view-gnus gnus-art mm-uu mml2015
mm-view mml-smime smime dig gnus-sum gnus-group gnus-undo gnus-start
gnus-dbus dbus gnus-cloud nnimap nnmail mail-source utf7 netrc nnoo
gnus-spec gnus-int gnus-range gnus-win gnus nnheader wid-edit
mu4e-view-common mu4e-headers mu4e-compose mu4e-context mu4e-draft
mu4e-actions ido rfc2368 smtpmail sendmail mu4e-mark mu4e-proc
mu4e-utils doc-view jka-compr image-mode exif mu4e-lists mu4e-message
shr kinsoku svg xml flow-fill org ob ob-tangle ob-ref ob-lob ob-table
ob-exp org-macro org-footnote org-src ob-comint org-pcomplete org-list
org-faces org-entities outline-minor-faces noutline outline org-version
ob-emacs-lisp ob-core ob-eval org-table oc-basic bibtex ol org-keys oc
org-compat org-macs org-loaddefs cal-menu calendar cal-loaddefs
mule-util hl-line mu4e-vars message rmc puny rfc822 mml mml-sec epa
derived epg rfc6068 epg-config gnus-util rmail rmail-loaddefs mm-decode
mm-bodies mm-encode mailabbrev gmm-utils mu4e-meta helpful trace edebug
backtrace info-look help-fns radix-tree elisp-refs beginend git-modes
gitignore-mode gitconfig-mode conf-mode gitattributes-mode ediff-merg
ediff-mult ediff-wind ediff-diff ediff-help ediff-init ediff-util
git-link eshell-prompt-extras em-dirs esh-var esh-mode em-ls
eshell-bookmark bookmark pp eshell package-lint let-alist finder
find-func lisp-mnt mail-parse rfc2231 rfc2047 rfc2045 mm-util ietf-drums
mail-prsvr powerthesaurus request mailheader autorevert filenotify
mail-utils dom virtualenvwrapper gitignore-templates editorconfig
sqlformat reformatter sql view jq-mode csv-mode sort web-mode advice
disp-table sh-script executable archive-rpm bindat archive-cpio
rpm-spec-mode yaml-mode json-mode json-snatcher js imenu cc-mode
cc-fonts cc-guess cc-menus cc-cmds cc-styles cc-align cc-engine cc-vars
cc-defs go-dlv gud go-mode find-file ffap thingatpt etags fileloop
generator xref project rspec-mode compile text-property-search
puppet-mode align ruby-mode smie doom-modeline doom-modeline-segments
doom-modeline-env doom-modeline-core shrink-path f dash
doom-themes-ext-visual-bell face-remap doom-tomorrow-night-theme
doom-themes doom-themes-base yasnippet-snippets yasnippet amx comp
comp-cstr warnings s all-the-icons-ivy-rich ivy-rich ivy-posframe
posframe all-the-icons-ivy ivy delsel ivy-faces ivy-overlay colir color
all-the-icons all-the-icons-faces data-material data-weathericons
data-octicons data-fileicons data-faicons data-alltheicons dired
dired-loaddefs pinentry tramp-cmds em-tramp esh-cmd esh-ext esh-opt
esh-module esh-groups esh-proc esh-io esh-arg esh-util tramp-sh tramp
tramp-loaddefs trampver tramp-integration files-x tramp-compat shell
pcomplete comint ansi-color ring parse-time iso8601 time-date ls-lisp
format-spec whole-line-or-region whitespace cl-extra help-mode
use-package use-package-ensure use-package-delight use-package-diminish
use-package-bind-key bind-key easy-mmode use-package-core finder-inf
edmacro kmacro avoid server rx info package browse-url url url-proxy
url-privacy url-expand url-methods url-history url-cookie url-domsuf
url-util mailcap url-handlers url-parse auth-source cl-seq eieio
eieio-core cl-macs eieio-loaddefs password-cache json subr-x map
url-vars seq byte-opt gv bytecomp byte-compile cconv cl-loaddefs cl-lib
iso-transl tooltip eldoc paren electric uniquify ediff-hook vc-hooks
lisp-float-type elisp-mode mwheel term/x-win x-win term/common-win x-dnd
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace
newcomment text-mode lisp-mode prog-mode register page tab-bar menu-bar
rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock
font-lock syntax font-core term/tty-colors frame minibuffer cl-generic
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite emoji-zwj charscript
charprop case-table epa-hook jka-cmpr-hook help simple abbrev obarray
cl-preloaded nadvice button loaddefs faces cus-face macroexp files
window text-properties overlay sha1 md5 base64 format env code-pages
mule custom widget hashtable-print-readable backquote threads dbusbind
inotify lcms2 dynamic-setting system-font-setting font-render-setting
cairo move-toolbar gtk x-toolkit x multi-tty make-network-process
native-compile emacs)
Memory information:
((conses 16 1609065 256669)
(symbols 48 89566 7)
(strings 32 380963 47984)
(string-bytes 1 15303133)
(vectors 16 156871)
(vector-slots 8 3659229 445952)
(floats 8 1948 1629)
(intervals 56 33966 19519)
(buffers 992 308))
--
bye
Nacho
http://cern.ch/nacho
next reply other threads:[~2022-04-17 8:02 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-17 8:02 Nacho Barrientos [this message]
2022-04-17 11:00 ` bug#54989: 28.1; url-http.el: chunked response: wait for the last CRLF Lars Ingebrigtsen
2022-04-17 11:34 ` Nacho Barrientos
2022-04-17 11:42 ` Lars Ingebrigtsen
2022-04-18 9:13 ` Nacho Barrientos
2022-04-18 9:37 ` Lars Ingebrigtsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zgkkuo4i.fsf@cern.ch \
--to=nacho.barrientos@cern.ch \
--cc=54989@debbugs.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).