unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* Custom HTTP methods in web module
@ 2024-03-20 22:14 Ryan Raymond
  2024-03-23 12:49 ` Maxime Devos
  2024-03-23 18:50 ` Maxime Devos
  0 siblings, 2 replies; 4+ messages in thread
From: Ryan Raymond @ 2024-03-20 22:14 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 1458 bytes --]

Hello, all.
Please consider the following code snippet from /module/web/http.scm:

(define* (parse-http-method str #:optional (start 0) (end (string-length
str)))
  "Parse an HTTP method from STR.  The result is an upper-case
symbol, like ‘GET’."
  (cond
   ((string= str "GET" start end) 'GET)
   ((string= str "HEAD" start end) 'HEAD)
   ((string= str "POST" start end) 'POST)
   ((string= str "PUT" start end) 'PUT)
   ((string= str "DELETE" start end) 'DELETE)
   ((string= str "OPTIONS" start end) 'OPTIONS)
   ((string= str "TRACE" start end) 'TRACE)
   ((string= str "CONNECT" start end) 'CONNECT)
   ((string= str "PATCH" start end) 'PATCH)
   (else (bad-request "Invalid method: ~a" (substring str start end)))))

I do not understand why it is necessary to throw an error on non-standard
http methods. I am working on an application where I will use (for example)
"WAIT" and "SUBSCRIBE". Why not replace the code with, for example:

(define* (parse-http-method str #:optional (start 0) (end (string-length
str)))
  "Parse an HTTP method from STR.  The result is an upper-case
symbol, like ‘GET’."
   (string->symbol (string-upcase (substring str start end))))

This has many advantages. It is probably faster, allows for lowercase
methods (maybe useful?), it has far fewer LOC,
and most importantly it allows us to have custom methods, which is
extremely useful.
Please weigh in with your opinions.

Ryan

[-- Attachment #2: Type: text/html, Size: 1853 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Custom HTTP methods in web module
  2024-03-20 22:14 Custom HTTP methods in web module Ryan Raymond
@ 2024-03-23 12:49 ` Maxime Devos
  2024-03-23 18:50 ` Maxime Devos
  1 sibling, 0 replies; 4+ messages in thread
From: Maxime Devos @ 2024-03-23 12:49 UTC (permalink / raw)
  To: Ryan Raymond, guile-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 2165 bytes --]

While I agree it has advantages, it is also buggy.

From https://www.rfc-editor.org/rfc/rfc9110.html#name-methods:

* “the method token is case-sensitive”.

So, you need to remove ‘string-upcase’.

* Additional methods, outside the scope of this specification, have been specified for use in HTTP. All such methods ought to be registered within the "Hypertext Transfer Protocol (HTTP) Method Registry", as described in Section 16.1.

(Just a reminder, not a bug with the proposed new parse-http-method.)

If it’s something application-specific, I propose naming the method something like
“APP-METHOD” (where APP = name of APP, METHOD = WAIT / SUBSCRIBE / ...), to avoid potential conflicts.

* “method = token”

“token” is defined in https://www.rfc-editor.org/rfc/rfc9110.html#name-tokens.

So, there are some limitations on what strings are valid method names that you need to verify. For example, suppose that someone uses “GET\0sneaky” as header, and the Guile code places some limits (in terms of 403 forbidden) of what can be read. “GET\0sneaky” passes through because it’s not “GET”, “HEAD” etc.. Let’s say the actual response is constructed by C code, so Guile passes the method to the C-code as a 0-terminated string (as is conventional) and the C-code then interprets “GET\0sneaky” as “GET\0” and sends the response while it shouldn’t.

(I don’t know if this particular situation can happen because perhaps parse-http-method is always used in a context where the string is for some reason already a token, but that’s not something to rely upon)

* A recipient SHOULD parse a received protocol element defensively, with only marginal expectations that the element will conform to its ABNF grammar and fit within a reasonable buffer size.

(quoting this for the “conform to grammar” part, not the “buffer size part”)

(With regards to the first point) this paragraph SHOULD be ignored, this way leads to continued violations of the grammar and potential security problems (see example in previous point). Postel’s law is a mistake.

Best regards,
Maxime Devos

[-- Attachment #2: Type: text/html, Size: 8765 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Custom HTTP methods in web module
  2024-03-20 22:14 Custom HTTP methods in web module Ryan Raymond
  2024-03-23 12:49 ` Maxime Devos
@ 2024-03-23 18:50 ` Maxime Devos
       [not found]   ` <CAGvJ-HS5Laqd7=v=WCn4-2zUurXVZcKDFA2+MmNPO-cZO6iUJg@mail.gmail.com>
  1 sibling, 1 reply; 4+ messages in thread
From: Maxime Devos @ 2024-03-23 18:50 UTC (permalink / raw)
  To: Ryan Raymond, guile-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 835 bytes --]


   (else (bad-request "Invalid method: ~a" (substring str start end)))))

Another problem with the old implementation – AFAICT, this does a  “400 Bad Request”. However, at least when the syntax/grammar/... is correct, it should be a 501 instead. 

>An origin server that receives a request method that is unrecognized or not implemented SHOULD respond with the 501 (Not Implemented) status code. An origin server that receives a request method that is recognized and implemented, but not allowed for the target resource, SHOULD respond with the 405 (Method Not Allowed) status code.

Given the proposal to (in this procedure) allow all methods (at least all grammatical methods) in a new implementation, it seems this won’t be this method’s responsibility anymore, though.

Best regards,
Maxime Devos


[-- Attachment #2: Type: text/html, Size: 3207 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Custom HTTP methods in web module
       [not found]   ` <CAGvJ-HS5Laqd7=v=WCn4-2zUurXVZcKDFA2+MmNPO-cZO6iUJg@mail.gmail.com>
@ 2024-03-24 13:57     ` Maxime Devos
  0 siblings, 0 replies; 4+ messages in thread
From: Maxime Devos @ 2024-03-24 13:57 UTC (permalink / raw)
  To: Ryan Raymond,
	Jonas Hahnfeld via Developers list for Guile,the GNU extensibility library

[-- Attachment #1: Type: text/plain, Size: 4054 bytes --]

>The character-set you're referring to, is it US-ASCII? I am not particularly familiar with how Guile handles characters. If string-filter is not sufficient, can you suggest another method?
>
>For example, perhaps we need to go to where "str" is read and set the port encoding to US-ASCII. Right now it's Iso Latin which is a superset of US-ASCII, and therefore improper.

I meant “character set” in the sense as used in Guile, not character encoding.
Very literally, it means a “set of characters”, where ‘set’ is used in the mathematical sense. ‘Character’ means any character in Unicode (not counting those special pairs used for UTF-16, they aren’t characters).

Given you mentioned char-set:graphic, I thought you already knew.

So, the answer is, no, it’s not ASCII (the character set), it’s a subset of US-ASCII defined in the HTTP spec. IIRC, I referred to:

➢ https://www.rfc-editor.org/rfc/rfc9110.html#name-tokens

(in particular see ‘tchar’) which I think is pretty clearly not all of ASCII but rather a subset. Explicitly, the character set I’m referring to is the ‘tchar’ mentioned in the RFC.

On string-filter: I suppose you could use that, (string=? (string-filter the-char-set ...) original-string), to check things, but it seems more efficient and simpler to use the predicate string-every instead.

That said, it might be worth looking at how the caller(s) of  the method parsing procedure uses the method parsing procedure. It might be the case that they use something to (string-index s everything-except-tchar begin end) to locate the end of the method name. In that case, the argument passes to the method parsing procedure is correct by construction (assuming length>0), so then that procedure doesn’t need to do any checks and can leave (with a docstring) that responsibility to the caller.

>For example, perhaps we need to go to where "str" is read and set the port encoding to US-ASCII. Right now it's Iso Latin which is a superset of US-ASCII, and therefore improper.

Eh, while HTTP might look like text, it’s more like a mix of text and octets/bytes:

Field values are usually constrained to the range of US-ASCII characters [USASCII]. __Fields needing a greater range of characters can use an encoding__, such as the one defined in [RFC8187]. Historically, HTTP allowed field content with text in the __ISO-8859-1__ charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. Specifications for newly defined fields SHOULD limit their values to visible US-ASCII octets (VCHAR), SP, and HTAB. __A recipient SHOULD treat other allowed octets in field content (i.e., obs-text) as opaque data__.

(emphasis added)

I interpret this as “HTTP prefers only US-ASCII(see SHOULD), but it’s not strictly required (depending on the field), and sometimes it doesn’t even have any meaning as characters and instead is only raw bytes(*)”.  Also see the bit about ISO-8559-1, it appears that in at least some case, the ISO-8559-1 encoding should be recognised.

(I might be misinterpreting this though, perhaps it is referring to %-encoding.)

Also, using ISO Latin 1 (or another ASCII (the character encoding)-compatible 8-bit encoding) is convenient for handling octets and US-ASCII characters together.

Maybe separating the US-ASCII from the extra octets might make the code more proper in some aesthetical sense, but I don’t think it would make things more proper in a RFC-compliant sense (though neither would it make things worse, I suppose).

(There might be bugs w.r.t. character encoding in the Guile implementation, but I don’t think this is one of them.)

>That being said, the best form for this function is:
>(string->symbol (substring str start end) )
>With additional logic added to other functions?

I am not familiar enough with the Guile implementation to tell if the extra logic is best done in this function or in its caller. It just needs to be done _somewhere_.

Best regards,
Maxime Devos

[-- Attachment #2: Type: text/html, Size: 9098 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-03-24 13:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-20 22:14 Custom HTTP methods in web module Ryan Raymond
2024-03-23 12:49 ` Maxime Devos
2024-03-23 18:50 ` Maxime Devos
     [not found]   ` <CAGvJ-HS5Laqd7=v=WCn4-2zUurXVZcKDFA2+MmNPO-cZO6iUJg@mail.gmail.com>
2024-03-24 13:57     ` Maxime Devos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).