unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Skyler <skyvine@protonmail.com>
To: "guile-devel@gnu.org" <guile-devel@gnu.org>
Subject: Re: [PATCH] Make HTTP header symbols case-insensitive
Date: Sun, 10 Sep 2023 22:46:32 +0000	[thread overview]
Message-ID: <ukIfc02Uk6ZMa5P_3J5EG5Rst2qQaizvMZPkXbiktBKJsNubVd9v9oaFkyf4IoUt3gNMyL3IMyxMVNH5SgeT93iPbwNrgvohFfNC3_raWr0=@protonmail.com> (raw)
In-Reply-To: <ua8ZKLJQY8AZjq-JzcisRQGtms3Iw-ljjFRNlEeestQzF6hr1i40J5STuz7pHw3RV-7f0f7hcoFydku42tuiUGtaocTd9Vi9JUAIk1v5Q6w=@protonmail.com>

[-- Attachment #1: Type: text/plain, Size: 9026 bytes --]

Hello,

I just wanted to add some more context beyond what seemed appropriate to include in a commit message. Also, I don't think I've emailed the Guile list before but I'm not really sure what to do as an introduction. I'm just enjoy using guile, this seems like a problem, and I have a patch that fixes it. I would seem rude not to send it.

You can see the problem alluded to in the commit by running a socat listener on one terminal:

```
socat - TCP-LISTEN:65388
``

And from another run this Guile code:

```
(use-modules (web client))
(http-request "http://127.0.0.1:65388"
#:body "<some-tag></some-tag>"
#:headers '((Content-Type . "application/xml")
(Content-Length . "1000000000")))
```

The socat terminal will show that Guile sent this data:

```
GET / HTTP/1.1
Content-Length: 33
Content-Type: text/plain;charset=utf-8
Host: 127.0.0.1:65388
Connection: close
Content-Type: application/xml
Content-Length: 1000000000

<some-tag></some-tag>
```

Note that there are 2 instances of both Content-Length and Content-Type, which are singleton fields. The first set were generated by Guile while the second set contained the user values.

To be fair, the Guile code is incorrect. Neither the Content-Type nor the Content-Length headers take in strings as their types, and the validators would have thrown an error if they were run on the user-given values. However, due to the case sensitivity differences between Scheme and HTTP, the web module does not sanitize these headers but the remote server still processes them. I think it would be helpful if Guile was more robust in this regard; one of the purposes of high-level APIs like `http-request` (which calls `sanitize-request`) is to detect and prevent user errors (for example, it currently double-checks the provided Content-Length, if any).

The patch in the previous email updates the build-request and build-response functions to convert all header names to lowercase. I considered 2 alternatives:

1. Throw an error if header names contained uppercase letters. This did not seem appropriate as HTTP is case-insensitive.
2. Update the request/response accessors to be case-insensitive. This seems likely to cause performance issues.

After applying the patch, the web module threw errors for the incorrect types, then the incorrect length, then once all problems were corrected it sent a well-formed request:

```
(use-modules (web client))
(http-request "http://127.0.0.1:65388"
#:body "<some-tag></some-tag>"
#:headers '((Content-Type . (application/xml))
(Content-Length . 28)))
```

```
GET / HTTP/1.1
Host: 127.0.0.1:65388
Connection: close
Content-Type: application/xml;charset=utf-8
Content-Length: 28

<some-tag></some-tag>
```

Note that there is now only a single copy each of Content-Type and Content-Length. Guile appended the charset to the user-supplied Content-Type and the Content-Length field contains an accurate value.

A similar test can be performed for responses:

```
(use-modules (web response) (web server))

(define (handler request request-body)
(format #t "~a~%" request)
(when request-body
(format #t "~a~%" request-body))
(values (build-response #:headers '((Content-Length . "5454"))
#:code 500
#:reason-phrase "This server does not know how to process literally any request, no matter how well-formed.")
#f))

(run-server handler 'http '(#:port 65388))
```

If you run this script in one terminal and send a well-formed http request with the previous script in another, Guile will currently send a malformed response and the sending script will print a parsing error. After applying the patch, the error will occur in the listen script when the validator runs on the Content-Length header.

Thanks,
Skyler

------- Original Message -------
On Sunday, September 10th, 2023 at 3:34 PM, Skyler <skyvine@protonmail.com> wrote:

> The web modules contain logic for operating on headers, including logic
> to convert Scheme data to header values, validate the values of
> well-known headers, and automatically add missing headers. When working
> with headers internally, they are stored as lowercase symbols as is
> idiomatic for Scheme code. When writing out the requests, the headers
> are converted to title-case, as is idiomatic to HTTP (although HTTP
> header names are technically defined to be case-insensitive, unlike
> Scheme symbols).
>
> This can cause issues if titlecase symbols are used in the header lists
> of functions such as http-request from the (web client) module. In this
> case, the headers will not be recognized as well-defined headers, and
> they will be validated with the generic validators instead of the
> specific ones.
>
> This commit converts header symbols to be all lowercase in the
> build-request and build-response functions, to make sure that the system
> handles the headers consistently.
> ---
> doc/ref/web.texi | 4 ++++
> module/web/http.scm | 8 ++++++++
> module/web/request.scm | 9 +++++----
> module/web/response.scm | 27 ++++++++++++++-------------
> 4 files changed, 31 insertions(+), 17 deletions(-)
>
> diff --git a/doc/ref/web.texi b/doc/ref/web.texi
> index 607c855..4572b5a 100644
> --- a/doc/ref/web.texi
> +++ b/doc/ref/web.texi
> @@ -471,6 +471,10 @@ Return a true value if @var{val} is a valid Scheme value for the header
> with name @var{sym}, or @code{#f} otherwise.
> @end deffn
>
> +@deffn {Scheme Procedure} canonicalize-headers headers
> +Ensure that the headers are in a canonical Scheme format, in particular
> +this converts all header names to lowercase.
> +
> Now that we have a generic interface for reading and writing headers, we
> do just that.
>
> diff --git a/module/web/http.scm b/module/web/http.scm
> index 24a4312..9b241dc 100644
> --- a/module/web/http.scm
> +++ b/module/web/http.scm
> @@ -40,6 +40,7 @@
> #:use-module (web uri)
> #:export (string->header
> header->string
> + canonicalize-headers
>
> declare-header!
> declare-opaque-header!
> @@ -80,6 +81,13 @@
> (define (put-non-negative-integer port i)
> (put-string port (number->string i)))
>
> +(define (canonicalize-headers headers)
> + "Ensure that the symbolic header name is in lowercase."
> + (map (lambda (header)
> + (cons (string->symbol (string-downcase (symbol->string (car header))))
> + (cdr header)))
> + headers))
> +
> (define (string->header name)
> "Parse NAME to a symbolic header name."
> (string->symbol (string-downcase name)))
> diff --git a/module/web/request.scm b/module/web/request.scm
> index ff4b944..c7366da 100644
> --- a/module/web/request.scm
> +++ b/module/web/request.scm
> @@ -162,7 +162,8 @@
> (validate-headers? #t))
> "Construct an HTTP request object. If VALIDATE-HEADERS? is true,
> the headers are each run through their respective validators."
> - (let ((needs-host? (and (equal? version '(1 . 1))
> + (let ((canonicalized-headers (canonicalize-headers headers))
> + (needs-host? (and (equal? version '(1 . 1))
> (not (assq-ref headers 'host)))))
> (cond
> ((not (and (pair? version)
> @@ -180,12 +181,12 @@ the headers are each run through their respective validators."
> uri))
> (else
> (if validate-headers?
> - (validate-headers headers))))
> + (validate-headers canonicalized-headers))))
> (make-request method uri version
> (if needs-host?
> (acons 'host (cons (uri-host uri) (uri-port uri))
> - headers)
> - headers)
> + canonicalized-headers)
> + canonicalized-headers)
> meta port)))
>
> (define* (read-request port #:optional (meta '()))
> diff --git a/module/web/response.scm b/module/web/response.scm
> index 4ac4d74..9cf9dbd 100644
> --- a/module/web/response.scm
> +++ b/module/web/response.scm
> @@ -111,19 +111,20 @@
> (headers '()) port (validate-headers? #t))
> "Construct an HTTP response object. If VALIDATE-HEADERS? is true,
> the headers are each run through their respective validators."
> - (cond
> - ((not (and (pair? version)
> - (non-negative-integer? (car version))
> - (non-negative-integer? (cdr version))))
> - (bad-response "Bad version: ~a" version))
> - ((not (and (non-negative-integer? code) (< code 600)))
> - (bad-response "Bad code: ~a" code))
> - ((and reason-phrase (not (string? reason-phrase)))
> - (bad-response "Bad reason phrase" reason-phrase))
> - (else
> - (if validate-headers?
> - (validate-headers headers))))
> - (make-response version code reason-phrase headers port))
> + (let ((canonicalized-headers (canonicalize-headers headers)))
> + (cond
> + ((not (and (pair? version)
> + (non-negative-integer? (car version))
> + (non-negative-integer? (cdr version))))
> + (bad-response "Bad version: ~a" version))
> + ((not (and (non-negative-integer? code) (< code 600)))
> + (bad-response "Bad code: ~a" code))
> + ((and reason-phrase (not (string? reason-phrase)))
> + (bad-response "Bad reason phrase" reason-phrase))
> + (else
> + (if validate-headers?
> + (validate-headers canonicalized-headers))))
> + (make-response version code reason-phrase canonicalized-headers port)))
>
> (define *reason-phrases*
> '((100 . "Continue")
>
> base-commit: f31819b6b179429a617c8bd881dbb61219823e39
> --
> 2.41.0

[-- Attachment #2: Type: text/html, Size: 17087 bytes --]

      reply	other threads:[~2023-09-10 22:46 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-10 22:34 [PATCH] Make HTTP header symbols case-insensitive Skyler
2023-09-10 22:46 ` Skyler [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='ukIfc02Uk6ZMa5P_3J5EG5Rst2qQaizvMZPkXbiktBKJsNubVd9v9oaFkyf4IoUt3gNMyL3IMyxMVNH5SgeT93iPbwNrgvohFfNC3_raWr0=@protonmail.com' \
    --to=skyvine@protonmail.com \
    --cc=guile-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).