From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Mark H Weaver Newsgroups: gmane.lisp.guile.bugs Subject: bug#32528: http-post breaks with XML response payload containing boundary Date: Tue, 28 Aug 2018 23:28:19 -0400 Message-ID: <875zztg8bw.fsf@netris.org> References: <874lfiltkg.fsf@elephly.net> <87bm9mf9d9.fsf@netris.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: blaine.gmane.org 1535514344 16340 195.159.176.226 (29 Aug 2018 03:45:44 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 29 Aug 2018 03:45:44 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) Cc: 32528@debbugs.gnu.org To: Ricardo Wurmus Original-X-From: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Wed Aug 29 05:45:40 2018 Return-path: Envelope-to: guile-bugs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1furQJ-00047h-7x for guile-bugs@m.gmane.org; Wed, 29 Aug 2018 05:45:39 +0200 Original-Received: from localhost ([::1]:41019 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1furSP-0002sX-2F for guile-bugs@m.gmane.org; Tue, 28 Aug 2018 23:47:49 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:57164) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1furSB-0002p4-ES for bug-guile@gnu.org; Tue, 28 Aug 2018 23:47:38 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1furCB-0005qY-Nu for bug-guile@gnu.org; Tue, 28 Aug 2018 23:31:07 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:59464) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1furCA-0005oY-RE for bug-guile@gnu.org; Tue, 28 Aug 2018 23:31:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1furCA-0002mJ-IP for bug-guile@gnu.org; Tue, 28 Aug 2018 23:31:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Mark H Weaver Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Wed, 29 Aug 2018 03:31:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 32528 X-GNU-PR-Package: guile X-GNU-PR-Keywords: Original-Received: via spool by 32528-submit@debbugs.gnu.org id=B32528.153551340410551 (code B ref 32528); Wed, 29 Aug 2018 03:31:02 +0000 Original-Received: (at 32528) by debbugs.gnu.org; 29 Aug 2018 03:30:04 +0000 Original-Received: from localhost ([127.0.0.1]:36248 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1furBD-0002k4-60 for submit@debbugs.gnu.org; Tue, 28 Aug 2018 23:30:04 -0400 Original-Received: from world.peace.net ([64.112.178.59]:36746) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1furBA-0002j9-TR for 32528@debbugs.gnu.org; Tue, 28 Aug 2018 23:30:01 -0400 Original-Received: from mhw by world.peace.net with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1furB4-0007hg-7j; Tue, 28 Aug 2018 23:29:54 -0400 In-Reply-To: <87bm9mf9d9.fsf@netris.org> (Mark H. Weaver's message of "Tue, 28 Aug 2018 17:51:14 -0400") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-guile@gnu.org List-Id: "Bug reports for GUILE, GNU's Ubiquitous Extension Language" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guile-bounces+guile-bugs=m.gmane.org@gnu.org Original-Sender: "bug-guile" Xref: news.gmane.org gmane.lisp.guile.bugs:9139 Archived-At: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mark H Weaver writes: > Ricardo Wurmus writes: > >> I=E2=80=99m having a problem with http-post and I think it might be a bu= g. I=E2=80=99m >> talking to a Debbugs SOAP service over HTTP by sending (via POST) an XML >> request. The Debbugs SOAP service responds with a string of XML. [...] > The problem is simply that our Content-Type header parser is broken. > It's very simplistic and merely splits the string wherever ';' is found, > and then checks to make sure there's only one '=3D' in each parameter, > without taking into account that quoted strings in the parameters might > include those characters. > > I'll work on a proper parser for Content-Type headers. I've attached preliminary patches to fix the Content-Type header parser, and also to fix the parsing of response header lines to support continuation lines. With these patches applied, I'm able to fetch and decode the SOAP response that you fetched with your 'wget' example, as follows: --8<---------------cut here---------------start------------->8--- mhw@jojen ~/guile-stable-2.2 [env]$ meta/guile GNU Guile 2.2.4.10-4c91d Copyright (C) 1995-2017 Free Software Foundation, Inc. Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'. This program is free software, and you are welcome to redistribute it under certain conditions; type `,show c' for details. Enter `,help' for help. scheme@(guile-user)> (use-modules (web http) (web uri) (web client) (sxml s= imple) (ice-9 receive)) scheme@(guile-user)> ,pp (let ((req-xml "32= 514")) (receive (response body-port) (http-post "https://debbugs.gnu.org/cgi/soap= .cgi" #:streaming? #t #:body req-xml #:headers `((content-type . (text/xml)) (content-length . ,(string-leng= th req-xml)))) (set-port-encoding! body-port "UTF-8") (xml->sxml body-port #:trim-whitespace? #t))) $1 =3D (*TOP* (*PI* xml "version=3D\"1.0\" encoding=3D\"UTF-8\"") (http://schemas.xmlsoap.org/soap/envelope/:Envelope (@ (http://schemas.xmlsoap.org/soap/envelope/:encodingStyle "http://schemas.xmlsoap.org/soap/encoding/")) (http://schemas.xmlsoap.org/soap/envelope/:Body (urn:Debbugs/SOAP:get_bug_logResponse (http://schemas.xmlsoap.org/soap/encoding/:Array (@ (http://www.w3.org/1999/XMLSchema-instance:type "soapenc:Array") (http://schemas.xmlsoap.org/soap/encoding/:arrayType "xsd:ur-type[4]")) (urn:Debbugs/SOAP:item (urn:Debbugs/SOAP:header (@ (http://www.w3.org/1999/XMLSchema-instance:type "xsd:string")) "Received: (at submit) by debbugs.gnu.org; 23 Aug 2018 2= 0:17:46 +0000\nFrom debbugs-submit-bounces@debbugs.gnu.org [...] [...] --8<---------------cut here---------------end--------------->8--- Note that I needed to make two other changes to your preliminary code, namely: * I passed "#:streaming? #t" to 'http-post', to ask for a port to read the response body instead of reading it eagerly. * I explicitly set the port encoding to "UTF-8" on that port before using 'xml->sxml' to read it. Otherwise, the entire 'body' response will be returned as a bytevector, because the response Content-Type is not recognized as a textual type. The HTTP Content-Type is "multipart/related", with a parameter: type=3D"text/xml". I'm not sure if we should be automatically interpreting that as a textual type or not. There's no 'charset' parameter in the Content-Type header, but the XML internally specifies: encoding=3D"UTF-8". Anyway, here are the preliminary patches. Mark --=-=-= Content-Type: text/x-patch; charset=utf-8 Content-Disposition: inline; filename=0001-web-Add-support-for-HTTP-header-continuation-lines.patch Content-Transfer-Encoding: quoted-printable Content-Description: [PATCH 1/2] web: Add support for HTTP header continuation lines >From 41764d60dba80126b3c97f883d0225510b55f3fa Mon Sep 17 00:00:00 2001 From: Mark H Weaver Date: Tue, 28 Aug 2018 18:39:34 -0400 Subject: [PATCH 1/2] web: Add support for HTTP header continuation lines. * module/web/http.scm (spaces-and-tabs, space-or-tab?): New variables. (read-header-line): After reading a header, if a space or tab follows, then read the continuation lines and append them all together. --- module/web/http.scm | 31 ++++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/module/web/http.scm b/module/web/http.scm index de61c9495..15f173173 100644 --- a/module/web/http.scm +++ b/module/web/http.scm @@ -1,6 +1,6 @@ ;;; HTTP messages =20 -;; Copyright (C) 2010-2017 Free Software Foundation, Inc. +;; Copyright (C) 2010-2018 Free Software Foundation, Inc. =20 ;; This library is free software; you can redistribute it and/or ;; modify it under the terms of the GNU Lesser General Public @@ -152,18 +152,35 @@ The default writer will call =E2=80=98put-string=E2= =80=99." (lambda (val port) (put-string port val))))) =20 +(define spaces-and-tabs + (char-set #\space #\tab)) + +(define (space-or-tab? c) + (case c + ((#\space #\tab) #t) + (else #f))) + (define (read-header-line port) - "Read an HTTP header line and return it without its final CRLF or LF. -Raise a 'bad-header' exception if the line does not end in CRLF or LF, -or if EOF is reached." + "Read an HTTP header line, including any continuation lines, and +return the combined string without its final CRLF or LF. Raise a +'bad-header' exception if the line does not end in CRLF or LF, or if EOF +is reached." (match (%read-line port) (((? string? line) . #\newline) ;; '%read-line' does not consider #\return a delimiter; so if it's ;; there, remove it. We are more tolerant than the RFC in that we ;; tolerate LF-only endings. - (if (string-suffix? "\r" line) - (string-drop-right line 1) - line)) + (let ((line (if (string-suffix? "\r" line) + (string-drop-right line 1) + line))) + ;; If the next character is a space or tab, then there's at least + ;; one continuation line. Read the continuation lines by calling + ;; 'read-header-line' recursively, and append them to this header + ;; line, folding the leading spaces and tabs to a single space. + (if (space-or-tab? (lookahead-char port)) + (string-append line " " (string-trim (read-header-line port) + spaces-and-tabs)) + line))) ((line . _) ;EOF or missing delimiter (bad-header 'read-header-line line)))) =20 --=20 2.18.0 --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0002-PRELIMINARY-web-Fix-parsing-of-HTTP-Content-Type-hea.patch Content-Description: [PATCH 2/2] PRELIMINARY: web: Fix parsing of HTTP Content-Type header >From 6af35a3997887fe24620fc7448ded3649e04b82b Mon Sep 17 00:00:00 2001 From: Mark H Weaver Date: Tue, 28 Aug 2018 23:15:36 -0400 Subject: [PATCH 2/2] PRELIMINARY: web: Fix parsing of HTTP Content-Type header. --- module/web/http.scm | 109 +++++++++++++++++++++++++++++++++++--------- 1 file changed, 88 insertions(+), 21 deletions(-) diff --git a/module/web/http.scm b/module/web/http.scm index 15f173173..6ccd853c1 100644 --- a/module/web/http.scm +++ b/module/web/http.scm @@ -290,16 +290,94 @@ as an ordered alist." (define (write-opaque-string val port) (put-string port val)) -(define separators-without-slash - (string->char-set "[^][()<>@,;:\\\"?= \t]")) -(define (validate-media-type str) - (let ((idx (string-index str #\/))) - (and idx (= idx (string-rindex str #\/)) - (not (string-index str separators-without-slash))))) +(define separators + (string->char-set "()<>@,;:\\\"/[]?={} \t")) + +(define (ascii-char? c) + (char-set-contains? char-set:ascii c)) + +(define valid-token-chars + (char-set-difference char-set:ascii + char-set:iso-control + separators)) + +(define (valid-token? str) + (and (not (string-null? str)) + (string-every valid-token-chars str))) + +(define (string-skip* s pred i) + (or (string-skip s pred i) + (string-length s))) + +(define (parse-token str i) + (let* ((i (string-skip* str spaces-and-tabs i)) + (end (string-skip* str valid-token-chars i))) + (and (< i end) + (cons end (substring str i end))))) + +(define valid-text-chars + (char-set-adjoin (char-set-difference (ucs-range->char-set 0 256) + char-set:iso-control) + #\space #\tab)) + +(define (text-char? c) + (char-set-contains? valid-text-chars c)) + +(define (parse-quoted-string str i) + (let ((len (string-length str)) + (i (string-skip* str spaces-and-tabs i))) + (and (< i len) + (eqv? #\" (string-ref str i)) + (let loop ((i (+ i 1)) + (accum '())) + (and (< i len) + (match (string-ref str i) + (#\" (cons (+ i 1) (reverse-list->string accum))) + (#\\ (and (< (+ i 1) len) + (let ((c (string-ref str (+ i 1)))) + (and (ascii-char? c) + (loop (+ i 2) (cons c accum)))))) + (c (and (text-char? c) + (loop (+ i 1) (cons c accum)))))))))) + +(define (parse-parameter str i) + (let* ((eq (string-index str #\= i)) + (attribute (string-trim-both (substring str i eq) + spaces-and-tabs))) + (and (valid-token? attribute) + (match (or (parse-token str (+ eq 1)) + (parse-quoted-string str (+ eq 1))) + ((i . val) (cons i (cons (string->symbol attribute) val))) + (#f #f))))) + +(define (parse-parameter-list str i) + (let ((len (string-length str)) + (i (string-skip* str spaces-and-tabs i))) + (if (= i len) + '() + (and (< i len) + (eqv? #\; (string-ref str i)) + (match (parse-parameter str (+ i 1)) + (#f #f) + ((i . p) (match (parse-parameter-list str i) + (#f #f) + (lst (cons p lst))))))))) + (define (parse-media-type str) - (unless (validate-media-type str) - (bad-header-component 'media-type str)) - (string->symbol str)) + (let* ((i (or (string-index str #\;) + (string-length str))) + (params (parse-parameter-list str i))) + (or (match (string-split (substring str 0 i) #\/) + ((type* subtype*) + (let ((type (string-trim-both type* spaces-and-tabs)) + (subtype (string-trim-both subtype* spaces-and-tabs))) + (and (valid-token? type) + (valid-token? subtype) + params + (cons (string->symbol (string-append type "/" subtype)) + params)))) + (_ #f)) + (bad-header 'content-type str)))) (define* (skip-whitespace str #:optional (start 0) (end (string-length str))) (let lp ((i start)) @@ -1617,18 +1695,7 @@ treated specially, and is just returned as a plain string." ;; Content-Type = media-type ;; (declare-header! "Content-Type" - (lambda (str) - (let ((parts (string-split str #\;))) - (cons (parse-media-type (car parts)) - (map (lambda (x) - (let ((eq (string-index x #\=))) - (unless (and eq (= eq (string-rindex x #\=))) - (bad-header 'content-type str)) - (cons - (string->symbol - (string-trim x char-set:whitespace 0 eq)) - (string-trim-right x char-set:whitespace (1+ eq))))) - (cdr parts))))) + parse-media-type (lambda (val) (match val (((? symbol?) ((? symbol?) . (? string?)) ...) #t) -- 2.18.0 --=-=-=--