unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* [PATCH] Make HTTP header symbols case-insensitive
@ 2023-09-10 22:34 Skyler
  2023-09-10 22:46 ` Skyler
  0 siblings, 1 reply; 2+ messages in thread
From: Skyler @ 2023-09-10 22:34 UTC (permalink / raw)
  To: guile-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 4938 bytes --]

The web modules contain logic for operating on headers, including logic
to convert Scheme data to header values, validate the values of
well-known headers, and automatically add missing headers. When working
with headers internally, they are stored as lowercase symbols as is
idiomatic for Scheme code. When writing out the requests, the headers
are converted to title-case, as is idiomatic to HTTP (although HTTP
header names are technically defined to be case-insensitive, unlike
Scheme symbols).

This can cause issues if titlecase symbols are used in the header lists
of functions such as http-request from the (web client) module. In this
case, the headers will not be recognized as well-defined headers, and
they will be validated with the generic validators instead of the
specific ones.

This commit converts header symbols to be all lowercase in the
build-request and build-response functions, to make sure that the system
handles the headers consistently.
---
doc/ref/web.texi | 4 ++++
module/web/http.scm | 8 ++++++++
module/web/request.scm | 9 +++++----
module/web/response.scm | 27 ++++++++++++++-------------
4 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/doc/ref/web.texi b/doc/ref/web.texi
index 607c855..4572b5a 100644
--- a/doc/ref/web.texi
+++ b/doc/ref/web.texi
@@ -471,6 +471,10 @@ Return a true value if @var{val} is a valid Scheme value for the header
with name @var{sym}, or @code{#f} otherwise.
@end deffn

+@deffn {Scheme Procedure} canonicalize-headers headers
+Ensure that the headers are in a canonical Scheme format, in particular
+this converts all header names to lowercase.
+
Now that we have a generic interface for reading and writing headers, we
do just that.

diff --git a/module/web/http.scm b/module/web/http.scm
index 24a4312..9b241dc 100644
--- a/module/web/http.scm
+++ b/module/web/http.scm
@@ -40,6 +40,7 @@
#:use-module (web uri)
#:export (string->header
header->string
+ canonicalize-headers

declare-header!
declare-opaque-header!
@@ -80,6 +81,13 @@
(define (put-non-negative-integer port i)
(put-string port (number->string i)))

+(define (canonicalize-headers headers)
+ "Ensure that the symbolic header name is in lowercase."
+ (map (lambda (header)
+ (cons (string->symbol (string-downcase (symbol->string (car header))))
+ (cdr header)))
+ headers))
+
(define (string->header name)
"Parse NAME to a symbolic header name."
(string->symbol (string-downcase name)))
diff --git a/module/web/request.scm b/module/web/request.scm
index ff4b944..c7366da 100644
--- a/module/web/request.scm
+++ b/module/web/request.scm
@@ -162,7 +162,8 @@
(validate-headers? #t))
"Construct an HTTP request object. If VALIDATE-HEADERS? is true,
the headers are each run through their respective validators."
- (let ((needs-host? (and (equal? version '(1 . 1))
+ (let ((canonicalized-headers (canonicalize-headers headers))
+ (needs-host? (and (equal? version '(1 . 1))
(not (assq-ref headers 'host)))))
(cond
((not (and (pair? version)
@@ -180,12 +181,12 @@ the headers are each run through their respective validators."
uri))
(else
(if validate-headers?
- (validate-headers headers))))
+ (validate-headers canonicalized-headers))))
(make-request method uri version
(if needs-host?
(acons 'host (cons (uri-host uri) (uri-port uri))
- headers)
- headers)
+ canonicalized-headers)
+ canonicalized-headers)
meta port)))

(define* (read-request port #:optional (meta '()))
diff --git a/module/web/response.scm b/module/web/response.scm
index 4ac4d74..9cf9dbd 100644
--- a/module/web/response.scm
+++ b/module/web/response.scm
@@ -111,19 +111,20 @@
(headers '()) port (validate-headers? #t))
"Construct an HTTP response object. If VALIDATE-HEADERS? is true,
the headers are each run through their respective validators."
- (cond
- ((not (and (pair? version)
- (non-negative-integer? (car version))
- (non-negative-integer? (cdr version))))
- (bad-response "Bad version: ~a" version))
- ((not (and (non-negative-integer? code) (< code 600)))
- (bad-response "Bad code: ~a" code))
- ((and reason-phrase (not (string? reason-phrase)))
- (bad-response "Bad reason phrase" reason-phrase))
- (else
- (if validate-headers?
- (validate-headers headers))))
- (make-response version code reason-phrase headers port))
+ (let ((canonicalized-headers (canonicalize-headers headers)))
+ (cond
+ ((not (and (pair? version)
+ (non-negative-integer? (car version))
+ (non-negative-integer? (cdr version))))
+ (bad-response "Bad version: ~a" version))
+ ((not (and (non-negative-integer? code) (< code 600)))
+ (bad-response "Bad code: ~a" code))
+ ((and reason-phrase (not (string? reason-phrase)))
+ (bad-response "Bad reason phrase" reason-phrase))
+ (else
+ (if validate-headers?
+ (validate-headers canonicalized-headers))))
+ (make-response version code reason-phrase canonicalized-headers port)))

(define *reason-phrases*
'((100 . "Continue")

base-commit: f31819b6b179429a617c8bd881dbb61219823e39
--
2.41.0

[-- Attachment #2: Type: text/html, Size: 10376 bytes --]

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] Make HTTP header symbols case-insensitive
  2023-09-10 22:34 [PATCH] Make HTTP header symbols case-insensitive Skyler
@ 2023-09-10 22:46 ` Skyler
  0 siblings, 0 replies; 2+ messages in thread
From: Skyler @ 2023-09-10 22:46 UTC (permalink / raw)
  To: guile-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 9026 bytes --]

Hello,

I just wanted to add some more context beyond what seemed appropriate to include in a commit message. Also, I don't think I've emailed the Guile list before but I'm not really sure what to do as an introduction. I'm just enjoy using guile, this seems like a problem, and I have a patch that fixes it. I would seem rude not to send it.

You can see the problem alluded to in the commit by running a socat listener on one terminal:

```
socat - TCP-LISTEN:65388
``

And from another run this Guile code:

```
(use-modules (web client))
(http-request "http://127.0.0.1:65388"
#:body "<some-tag></some-tag>"
#:headers '((Content-Type . "application/xml")
(Content-Length . "1000000000")))
```

The socat terminal will show that Guile sent this data:

```
GET / HTTP/1.1
Content-Length: 33
Content-Type: text/plain;charset=utf-8
Host: 127.0.0.1:65388
Connection: close
Content-Type: application/xml
Content-Length: 1000000000

<some-tag></some-tag>
```

Note that there are 2 instances of both Content-Length and Content-Type, which are singleton fields. The first set were generated by Guile while the second set contained the user values.

To be fair, the Guile code is incorrect. Neither the Content-Type nor the Content-Length headers take in strings as their types, and the validators would have thrown an error if they were run on the user-given values. However, due to the case sensitivity differences between Scheme and HTTP, the web module does not sanitize these headers but the remote server still processes them. I think it would be helpful if Guile was more robust in this regard; one of the purposes of high-level APIs like `http-request` (which calls `sanitize-request`) is to detect and prevent user errors (for example, it currently double-checks the provided Content-Length, if any).

The patch in the previous email updates the build-request and build-response functions to convert all header names to lowercase. I considered 2 alternatives:

1. Throw an error if header names contained uppercase letters. This did not seem appropriate as HTTP is case-insensitive.
2. Update the request/response accessors to be case-insensitive. This seems likely to cause performance issues.

After applying the patch, the web module threw errors for the incorrect types, then the incorrect length, then once all problems were corrected it sent a well-formed request:

```
(use-modules (web client))
(http-request "http://127.0.0.1:65388"
#:body "<some-tag></some-tag>"
#:headers '((Content-Type . (application/xml))
(Content-Length . 28)))
```

```
GET / HTTP/1.1
Host: 127.0.0.1:65388
Connection: close
Content-Type: application/xml;charset=utf-8
Content-Length: 28

<some-tag></some-tag>
```

Note that there is now only a single copy each of Content-Type and Content-Length. Guile appended the charset to the user-supplied Content-Type and the Content-Length field contains an accurate value.

A similar test can be performed for responses:

```
(use-modules (web response) (web server))

(define (handler request request-body)
(format #t "~a~%" request)
(when request-body
(format #t "~a~%" request-body))
(values (build-response #:headers '((Content-Length . "5454"))
#:code 500
#:reason-phrase "This server does not know how to process literally any request, no matter how well-formed.")
#f))

(run-server handler 'http '(#:port 65388))
```

If you run this script in one terminal and send a well-formed http request with the previous script in another, Guile will currently send a malformed response and the sending script will print a parsing error. After applying the patch, the error will occur in the listen script when the validator runs on the Content-Length header.

Thanks,
Skyler

------- Original Message -------
On Sunday, September 10th, 2023 at 3:34 PM, Skyler <skyvine@protonmail.com> wrote:

> The web modules contain logic for operating on headers, including logic
> to convert Scheme data to header values, validate the values of
> well-known headers, and automatically add missing headers. When working
> with headers internally, they are stored as lowercase symbols as is
> idiomatic for Scheme code. When writing out the requests, the headers
> are converted to title-case, as is idiomatic to HTTP (although HTTP
> header names are technically defined to be case-insensitive, unlike
> Scheme symbols).
>
> This can cause issues if titlecase symbols are used in the header lists
> of functions such as http-request from the (web client) module. In this
> case, the headers will not be recognized as well-defined headers, and
> they will be validated with the generic validators instead of the
> specific ones.
>
> This commit converts header symbols to be all lowercase in the
> build-request and build-response functions, to make sure that the system
> handles the headers consistently.
> ---
> doc/ref/web.texi | 4 ++++
> module/web/http.scm | 8 ++++++++
> module/web/request.scm | 9 +++++----
> module/web/response.scm | 27 ++++++++++++++-------------
> 4 files changed, 31 insertions(+), 17 deletions(-)
>
> diff --git a/doc/ref/web.texi b/doc/ref/web.texi
> index 607c855..4572b5a 100644
> --- a/doc/ref/web.texi
> +++ b/doc/ref/web.texi
> @@ -471,6 +471,10 @@ Return a true value if @var{val} is a valid Scheme value for the header
> with name @var{sym}, or @code{#f} otherwise.
> @end deffn
>
> +@deffn {Scheme Procedure} canonicalize-headers headers
> +Ensure that the headers are in a canonical Scheme format, in particular
> +this converts all header names to lowercase.
> +
> Now that we have a generic interface for reading and writing headers, we
> do just that.
>
> diff --git a/module/web/http.scm b/module/web/http.scm
> index 24a4312..9b241dc 100644
> --- a/module/web/http.scm
> +++ b/module/web/http.scm
> @@ -40,6 +40,7 @@
> #:use-module (web uri)
> #:export (string->header
> header->string
> + canonicalize-headers
>
> declare-header!
> declare-opaque-header!
> @@ -80,6 +81,13 @@
> (define (put-non-negative-integer port i)
> (put-string port (number->string i)))
>
> +(define (canonicalize-headers headers)
> + "Ensure that the symbolic header name is in lowercase."
> + (map (lambda (header)
> + (cons (string->symbol (string-downcase (symbol->string (car header))))
> + (cdr header)))
> + headers))
> +
> (define (string->header name)
> "Parse NAME to a symbolic header name."
> (string->symbol (string-downcase name)))
> diff --git a/module/web/request.scm b/module/web/request.scm
> index ff4b944..c7366da 100644
> --- a/module/web/request.scm
> +++ b/module/web/request.scm
> @@ -162,7 +162,8 @@
> (validate-headers? #t))
> "Construct an HTTP request object. If VALIDATE-HEADERS? is true,
> the headers are each run through their respective validators."
> - (let ((needs-host? (and (equal? version '(1 . 1))
> + (let ((canonicalized-headers (canonicalize-headers headers))
> + (needs-host? (and (equal? version '(1 . 1))
> (not (assq-ref headers 'host)))))
> (cond
> ((not (and (pair? version)
> @@ -180,12 +181,12 @@ the headers are each run through their respective validators."
> uri))
> (else
> (if validate-headers?
> - (validate-headers headers))))
> + (validate-headers canonicalized-headers))))
> (make-request method uri version
> (if needs-host?
> (acons 'host (cons (uri-host uri) (uri-port uri))
> - headers)
> - headers)
> + canonicalized-headers)
> + canonicalized-headers)
> meta port)))
>
> (define* (read-request port #:optional (meta '()))
> diff --git a/module/web/response.scm b/module/web/response.scm
> index 4ac4d74..9cf9dbd 100644
> --- a/module/web/response.scm
> +++ b/module/web/response.scm
> @@ -111,19 +111,20 @@
> (headers '()) port (validate-headers? #t))
> "Construct an HTTP response object. If VALIDATE-HEADERS? is true,
> the headers are each run through their respective validators."
> - (cond
> - ((not (and (pair? version)
> - (non-negative-integer? (car version))
> - (non-negative-integer? (cdr version))))
> - (bad-response "Bad version: ~a" version))
> - ((not (and (non-negative-integer? code) (< code 600)))
> - (bad-response "Bad code: ~a" code))
> - ((and reason-phrase (not (string? reason-phrase)))
> - (bad-response "Bad reason phrase" reason-phrase))
> - (else
> - (if validate-headers?
> - (validate-headers headers))))
> - (make-response version code reason-phrase headers port))
> + (let ((canonicalized-headers (canonicalize-headers headers)))
> + (cond
> + ((not (and (pair? version)
> + (non-negative-integer? (car version))
> + (non-negative-integer? (cdr version))))
> + (bad-response "Bad version: ~a" version))
> + ((not (and (non-negative-integer? code) (< code 600)))
> + (bad-response "Bad code: ~a" code))
> + ((and reason-phrase (not (string? reason-phrase)))
> + (bad-response "Bad reason phrase" reason-phrase))
> + (else
> + (if validate-headers?
> + (validate-headers canonicalized-headers))))
> + (make-response version code reason-phrase canonicalized-headers port)))
>
> (define *reason-phrases*
> '((100 . "Continue")
>
> base-commit: f31819b6b179429a617c8bd881dbb61219823e39
> --
> 2.41.0

[-- Attachment #2: Type: text/html, Size: 17087 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-09-10 22:46 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-10 22:34 [PATCH] Make HTTP header symbols case-insensitive Skyler
2023-09-10 22:46 ` Skyler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).