unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#54989: 28.1; url-http.el: chunked response: wait for the last CRLF
@ 2022-04-17  8:02 Nacho Barrientos
  2022-04-17 11:00 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 6+ messages in thread
From: Nacho Barrientos @ 2022-04-17  8:02 UTC (permalink / raw)
  To: 54989

[-- Attachment #1: Type: text/plain, Size: 2787 bytes --]

Hi,

This is a bug report for url.el, more specifically for url-http.el in
the context of processing responses with Transfer-Encoding set to
"chunked".

As per [0], the last chunk of 0 bytes is always accompanied by a last
CRLF that signals the end of the message:

     chunked-body   = *chunk
                      last-chunk
                      trailer-part
                      CRLF
                      ^ this one

     chunk          = chunk-size [ chunk-ext ] CRLF
                      chunk-data CRLF
     chunk-size     = 1*HEXDIG
     last-chunk     = 1*("0") [ chunk-ext ] CRLF

     chunk-data     = 1*OCTET ; a sequence of chunk-size octets

`url-http-chunked-encoding-after-change-function' is able to process
(and remove) that terminator IF AVAILABLE in the buffer when
processing the response, however it won't wait for it if it's not yet
there.

In other words:

| Bottom of the response buffer | Bottom of the full response |
|    (visible to url-http)      | (to be delivered to Emacs)  |
| ------------------------------+-----------------------------|
| 0\r\n                         | 0\r\n                       |
|                               | \r\n                        |

If the last chunk is processed when the bottom of the response buffer
is as above (note that the whole response has not yet been delivered
to Emacs), url-http will call the user callback without waiting for
the final terminator to be read from the socket.

This is normally not an issue when doing one-shot requests, but it's
problematic when the connection is reused immediately. As there are 2
bytes from the request N that have not been dealt with, they'll be
considered as part of the response of the request N+1. On top, it turns
out that when processing the headers of request N+1,
`url-http-wait-for-headers-change-function' will consider the request a
"headerless malformed response" (due to the leading \r\n) delivering it
broken to the caller.

I'm attaching a patch that I've got applied locally that prevents the
problem from happening by implementing an extra state in which
`url-http-chunked-encoding-after-change-function` properly waits for the
very last element of the message.

Unfortunately my copyright papers are stuck on the FSF for months now so
I'm unable to submit a patch directly myself so feel free to use the
attached diff as a PoC to illustrate the nature of the problem and a
possible fix. A more experienced Elisp developer will surely find a more
elegant solution.

Please note that the patch sits on top of the emacs-28 branch.

For additional context, this bug was found when debugging
magit/ghub (see [1] for details).

Hope this helps.

[0] https://datatracker.ietf.org/doc/html/rfc7230#section-4.1
[1] https://github.com/magit/ghub/issues/81


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-url-http.el-chunked-response-wait-for-the-last-CRLF.patch --]
[-- Type: text/x-patch, Size: 4829 bytes --]

From a11454602f7c4af3af5729b9e504130c660319f3 Mon Sep 17 00:00:00 2001
From: Nacho Barrientos <nacho.barrientos@cern.ch>
Date: Sat, 16 Apr 2022 18:00:39 +0200
Subject: [PATCH] url-http.el: chunked response: wait for the last CRLF.

As per [0], the last chunk of 0 bytes is always accompanied by a last
CRLF that signals the end of the message:

     chunked-body   = *chunk
                      last-chunk
                      trailer-part
                      CRLF
                      ^ this one

     chunk          = chunk-size [ chunk-ext ] CRLF
                      chunk-data CRLF
     chunk-size     = 1*HEXDIG
     last-chunk     = 1*("0") [ chunk-ext ] CRLF

     chunk-data     = 1*OCTET ; a sequence of chunk-size octets

`url-http-chunked-encoding-after-change-function' is able to process
(and remove) that terminator IF AVAILABLE in the buffer when
processing the response, however it won't wait for it if it's not yet
there.

In other words:

| Bottom of the response buffer | Bottom of the full response |
|    (visible to url-http)      | (to be delivered to Emacs)  |
| ------------------------------+-----------------------------|
| 0\r\n                         | 0\r\n                       |
|                               | \r\n                        |

If the last chunk is processed when the bottom of the response buffer
is as above (note that the whole response has not yet been delivered
to Emacs), url-http will call the user callback without waiting for
the final terminator to be read from the socket.

This is normally not an issue when doing one-shot requests, but it's
problematic when the connection is reused immediately. As there are 2
bytes from the request N that have not been dealt with, they'll be
considered as part of the response of the request N+1. On top, it
turns out that when processing the headers of request N+1,
`url-http-wait-for-headers-change-function' will consider the request
a "headerless malformed response" delivering it broken to the caller.

The proposed fix implements a state in which
`url-http-chunked-encoding-after-change-function` properly waits for
the very last element of the message preventing the problem explained
above from happening.

For additional context, this bug was found when debugging
magit/ghub (see [1] for details).

[0] https://datatracker.ietf.org/doc/html/rfc7230#section-4.1
[1] https://github.com/magit/ghub/issues/81
---
 lisp/url/url-http.el | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/lisp/url/url-http.el b/lisp/url/url-http.el
index 16c3a6a1e6..dfdc03cc6c 100644
--- a/lisp/url/url-http.el
+++ b/lisp/url/url-http.el
@@ -36,6 +36,7 @@
 (defvar url-current-object)
 (defvar url-http-after-change-function)
 (defvar url-http-chunked-counter)
+(defvar url-http-chunked-last-crlf-missing nil)
 (defvar url-http-chunked-length)
 (defvar url-http-chunked-start)
 (defvar url-http-connection-opened)
@@ -1068,7 +1069,16 @@ the callback to be triggered."
 Cannot give a sophisticated percentage, but we need a different
 function to look for the special 0-length chunk that signifies
 the end of the document."
-  (save-excursion
+  (if url-http-chunked-last-crlf-missing
+      (progn
+        (goto-char url-http-chunked-last-crlf-missing)
+        (if (not (looking-at "\r\n"))
+	    (url-http-debug "Still spinning for the terminator of last chunk...")
+          (url-http-debug "Saw the last CRLF.")
+          (delete-region (match-beginning 0) (match-end 0))
+          (if (url-http-parse-headers)
+	         (url-http-activate-callback))))
+    (save-excursion
     (goto-char st)
     (let ((read-next-chunk t)
 	  (case-fold-search t)
@@ -1145,13 +1155,16 @@ the end of the document."
 		  (url-display-percentage nil nil)
 		  ;; Every chunk, even the last 0-length one, is
 		  ;; terminated by CRLF.  Skip it.
-		  (when (looking-at "\r?\n")
+		  (if (not (looking-at "\r?\n"))
+                      (progn
+	                (url-http-debug "Spinning for the terminator of last chunk...")
+                        (setq-local url-http-chunked-last-crlf-missing (point)))
 		    (url-http-debug "Removing terminator of last chunk")
-		    (delete-region (match-beginning 0) (match-end 0)))
-		  (if (re-search-forward "^\r?\n" nil t)
-		      (url-http-debug "Saw end of trailers..."))
-		  (if (url-http-parse-headers)
-		      (url-http-activate-callback))))))))))
+		    (delete-region (match-beginning 0) (match-end 0))
+		    (if (re-search-forward "^\r?\n" nil t)
+		        (url-http-debug "Saw end of trailers..."))
+		    (if (url-http-parse-headers)
+		        (url-http-activate-callback))))))))))))
 
 (defun url-http-wait-for-headers-change-function (_st nd _length)
   ;; This will wait for the headers to arrive and then splice in the
-- 
2.35.3


[-- Attachment #3: Type: text/plain, Size: 9892 bytes --]



In GNU Emacs 28.1 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.24.33, cairo version 1.17.6)
 of 2022-04-09 built on frederik
Windowing system distributor 'The X.Org Foundation', version 11.0.12101003
System Description: Arch Linux

Configured using:
 'configure --with-native-compilation --without-compress-install
 --sysconfdir=/etc --prefix=/usr --libexecdir=/usr/lib
 --localstatedir=/var --with-cairo --with-harfbuzz --with-modules
 --with-wide-int --with-x-toolkit=gtk3 --with-xft 'CFLAGS=-march=x86-64
 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2
 -Wformat -Werror=format-security -fstack-clash-protection
 -fcf-protection -g -ffile-prefix-map=/build/emacs/src=/usr/src/debug
 -flto=auto' 'LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now
 -flto=auto''

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBOTF LIBSYSTEMD LIBXML2 M17N_FLT MODULES NATIVE_COMP NOTIFY
INOTIFY PDUMPER PNG RSVG SECCOMP SOUND THREADS TIFF TOOLKIT_SCROLL_BARS
X11 XDBE XIM XPM GTK3 ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: ELisp/l

Minor modes in effect:
  recentf-mode: t
  bug-reference-prog-mode: t
  global-git-commit-mode: t
  magit-auto-revert-mode: t
  goto-address-prog-mode: t
  erc-services-mode: t
  erc-networks-mode: t
  erc-spelling-mode: t
  flyspell-mode: t
  display-time-mode: t
  engine-mode: t
  mu4e-marker-icons-mode: t
  outline-minor-mode: t
  beginend-global-mode: t
  beginend-prog-mode: t
  auto-revert-mode: t
  editorconfig-mode: t
  csv-field-index-mode: t
  doom-modeline-mode: t
  yas-global-mode: t
  yas-minor-mode: t
  all-the-icons-ivy-rich-mode: t
  ivy-rich-project-root-cache-mode: t
  ivy-rich-mode: t
  ivy-posframe-mode: t
  ivy-mode: t
  shell-dirtrack-mode: t
  whole-line-or-region-global-mode: t
  whole-line-or-region-local-mode: t
  global-whitespace-mode: t
  override-global-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  window-divider-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  auto-fill-function: yas--auto-fill
  transient-mark-mode: t
  hs-minor-mode: t

Load-path shadows:
/home/nacho/.emacs.d/elpa/transient-20220413.2332/transient hides /usr/share/emacs/28.1/lisp/transient

Features:
(shadow magit-patch magit-subtree magit-gitignore magit-ediff ediff
bicycle crux pulse expand-region yaml-mode-expansions
text-mode-expansions cc-mode-expansions the-org-mode-expansions
ruby-mode-expansions js-mode-expansions web-mode-expansions
er-basic-expansions expand-region-core expand-region-custom ielm rg
rg-info-hack rg-menu rg-ibuffer rg-result wgrep-rg wgrep rg-history
rg-header ibuf-ext ibuffer ibuffer-loaddefs grep cus-edit smiley
gnus-cite qp mm-archive mail-extr cl-print debug cus-start dired-aux
emacsbug mwim hippie-exp pcase counsel recentf tree-widget swiper
network-stream url-cache secrets forge-list forge-commands forge-semi
forge-bitbucket buck forge-gogs gogs forge-gitea gtea forge-gitlab glab
forge-github ghub-graphql treepy gsexp ghub gnutls forge-notify
forge-revnote forge-pullreq forge-issue forge-topic yaml bug-reference
forge-post markdown-mode forge-repo forge forge-core forge-db closql
emacsql-sqlite emacsql emacsql-compiler url-http url-auth url-gw nsm
magit-extras magit-bookmark magit-submodule magit-obsolete magit-blame
magit-stash magit-reflog magit-bisect magit-push magit-pull magit-fetch
magit-clone magit-remote magit-commit magit-sequence magit-notes
magit-worktree magit-tag magit-merge magit-branch magit-reset
magit-files magit-refs magit-status magit magit-repos magit-apply
magit-wip magit-log which-func magit-diff smerge-mode diff git-commit
log-edit pcvs-util add-log magit-core magit-autorevert magit-margin
magit-transient magit-process with-editor magit-mode transient magit-git
magit-base magit-section crm em-unix em-term term ehelp em-script
em-prompt em-hist em-pred em-glob em-cmpl em-basic em-banner em-alias
hideshow goto-addr erc-services erc-join erc-networks erc-track
erc-match erc-spelling erc-goodies erc erc-backend erc-loaddefs quail
flyspell ispell org-element ol-eww ol-rmail ol-mhe ol-irc ol-info
ol-gnus nnselect gnus-search eieio-opt speedbar ezimage dframe
ol-docview ol-bibtex ol-bbdb ol-w3m ol-doi org-link-doi org-agenda appt
diary-lib diary-loaddefs notifications org-tree-slide org-timer
org-clock org-capture org-refile org-protocol desktop-environment time
exwm-systemtray xcb-systemtray xcb-xembed exwm exwm-input xcb-keysyms
xcb-xkb exwm-manage exwm-floating xcb-cursor xcb-render exwm-layout
exwm-workspace exwm-core xcb-ewmh xcb-icccm xcb xcb-xproto xcb-types
xcb-debug engine-mode eww xdg mm-url eradio elfeed-show elfeed-search
vc-mtn vc-hg vc-git diff-mode vc-bzr vc-src vc-sccs vc-svn vc-cvs vc-rcs
vc vc-dispatcher editorconfig-core editorconfig-core-handle
editorconfig-fnmatch elfeed-csv elfeed elfeed-curl elfeed-log elfeed-db
elfeed-lib avl-tree url-queue xml-query mu4e-marker-icons cus-load mu4e
mu4e-org mu4e-main mu4e-view mu4e-view-gnus gnus-art mm-uu mml2015
mm-view mml-smime smime dig gnus-sum gnus-group gnus-undo gnus-start
gnus-dbus dbus gnus-cloud nnimap nnmail mail-source utf7 netrc nnoo
gnus-spec gnus-int gnus-range gnus-win gnus nnheader wid-edit
mu4e-view-common mu4e-headers mu4e-compose mu4e-context mu4e-draft
mu4e-actions ido rfc2368 smtpmail sendmail mu4e-mark mu4e-proc
mu4e-utils doc-view jka-compr image-mode exif mu4e-lists mu4e-message
shr kinsoku svg xml flow-fill org ob ob-tangle ob-ref ob-lob ob-table
ob-exp org-macro org-footnote org-src ob-comint org-pcomplete org-list
org-faces org-entities outline-minor-faces noutline outline org-version
ob-emacs-lisp ob-core ob-eval org-table oc-basic bibtex ol org-keys oc
org-compat org-macs org-loaddefs cal-menu calendar cal-loaddefs
mule-util hl-line mu4e-vars message rmc puny rfc822 mml mml-sec epa
derived epg rfc6068 epg-config gnus-util rmail rmail-loaddefs mm-decode
mm-bodies mm-encode mailabbrev gmm-utils mu4e-meta helpful trace edebug
backtrace info-look help-fns radix-tree elisp-refs beginend git-modes
gitignore-mode gitconfig-mode conf-mode gitattributes-mode ediff-merg
ediff-mult ediff-wind ediff-diff ediff-help ediff-init ediff-util
git-link eshell-prompt-extras em-dirs esh-var esh-mode em-ls
eshell-bookmark bookmark pp eshell package-lint let-alist finder
find-func lisp-mnt mail-parse rfc2231 rfc2047 rfc2045 mm-util ietf-drums
mail-prsvr powerthesaurus request mailheader autorevert filenotify
mail-utils dom virtualenvwrapper gitignore-templates editorconfig
sqlformat reformatter sql view jq-mode csv-mode sort web-mode advice
disp-table sh-script executable archive-rpm bindat archive-cpio
rpm-spec-mode yaml-mode json-mode json-snatcher js imenu cc-mode
cc-fonts cc-guess cc-menus cc-cmds cc-styles cc-align cc-engine cc-vars
cc-defs go-dlv gud go-mode find-file ffap thingatpt etags fileloop
generator xref project rspec-mode compile text-property-search
puppet-mode align ruby-mode smie doom-modeline doom-modeline-segments
doom-modeline-env doom-modeline-core shrink-path f dash
doom-themes-ext-visual-bell face-remap doom-tomorrow-night-theme
doom-themes doom-themes-base yasnippet-snippets yasnippet amx comp
comp-cstr warnings s all-the-icons-ivy-rich ivy-rich ivy-posframe
posframe all-the-icons-ivy ivy delsel ivy-faces ivy-overlay colir color
all-the-icons all-the-icons-faces data-material data-weathericons
data-octicons data-fileicons data-faicons data-alltheicons dired
dired-loaddefs pinentry tramp-cmds em-tramp esh-cmd esh-ext esh-opt
esh-module esh-groups esh-proc esh-io esh-arg esh-util tramp-sh tramp
tramp-loaddefs trampver tramp-integration files-x tramp-compat shell
pcomplete comint ansi-color ring parse-time iso8601 time-date ls-lisp
format-spec whole-line-or-region whitespace cl-extra help-mode
use-package use-package-ensure use-package-delight use-package-diminish
use-package-bind-key bind-key easy-mmode use-package-core finder-inf
edmacro kmacro avoid server rx info package browse-url url url-proxy
url-privacy url-expand url-methods url-history url-cookie url-domsuf
url-util mailcap url-handlers url-parse auth-source cl-seq eieio
eieio-core cl-macs eieio-loaddefs password-cache json subr-x map
url-vars seq byte-opt gv bytecomp byte-compile cconv cl-loaddefs cl-lib
iso-transl tooltip eldoc paren electric uniquify ediff-hook vc-hooks
lisp-float-type elisp-mode mwheel term/x-win x-win term/common-win x-dnd
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace
newcomment text-mode lisp-mode prog-mode register page tab-bar menu-bar
rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock
font-lock syntax font-core term/tty-colors frame minibuffer cl-generic
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite emoji-zwj charscript
charprop case-table epa-hook jka-cmpr-hook help simple abbrev obarray
cl-preloaded nadvice button loaddefs faces cus-face macroexp files
window text-properties overlay sha1 md5 base64 format env code-pages
mule custom widget hashtable-print-readable backquote threads dbusbind
inotify lcms2 dynamic-setting system-font-setting font-render-setting
cairo move-toolbar gtk x-toolkit x multi-tty make-network-process
native-compile emacs)

Memory information:
((conses 16 1609065 256669)
 (symbols 48 89566 7)
 (strings 32 380963 47984)
 (string-bytes 1 15303133)
 (vectors 16 156871)
 (vector-slots 8 3659229 445952)
 (floats 8 1948 1629)
 (intervals 56 33966 19519)
 (buffers 992 308))

--
 bye
 Nacho
 http://cern.ch/nacho

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-04-18  9:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-04-17  8:02 bug#54989: 28.1; url-http.el: chunked response: wait for the last CRLF Nacho Barrientos
2022-04-17 11:00 ` Lars Ingebrigtsen
2022-04-17 11:34   ` Nacho Barrientos
2022-04-17 11:42     ` Lars Ingebrigtsen
2022-04-18  9:13       ` Nacho Barrientos
2022-04-18  9:37         ` Lars Ingebrigtsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).