RE: Custom HTTP methods in web module

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

From: Maxime Devos <maximedevos@telenet.be>
To: Ryan Raymond <rjraymond@oakland.edu>,
	 "Jonas Hahnfeld via Developers list for Guile,the GNU
	extensibility library" <guile-devel@gnu.org>
Subject: RE: Custom HTTP methods in web module
Date: Sun, 24 Mar 2024 14:57:42 +0100	[thread overview]
Message-ID: <20240324145741.2dxh2C0015DtEJR06dxhWR@albert.telenet-ops.be> (raw)
In-Reply-To: <CAGvJ-HS5Laqd7=v=WCn4-2zUurXVZcKDFA2+MmNPO-cZO6iUJg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4054 bytes --]

>The character-set you're referring to, is it US-ASCII? I am not particularly familiar with how Guile handles characters. If string-filter is not sufficient, can you suggest another method?
>
>For example, perhaps we need to go to where "str" is read and set the port encoding to US-ASCII. Right now it's Iso Latin which is a superset of US-ASCII, and therefore improper.

I meant “character set” in the sense as used in Guile, not character encoding.
Very literally, it means a “set of characters”, where ‘set’ is used in the mathematical sense. ‘Character’ means any character in Unicode (not counting those special pairs used for UTF-16, they aren’t characters).

Given you mentioned char-set:graphic, I thought you already knew.

So, the answer is, no, it’s not ASCII (the character set), it’s a subset of US-ASCII defined in the HTTP spec. IIRC, I referred to:

➢ https://www.rfc-editor.org/rfc/rfc9110.html#name-tokens

(in particular see ‘tchar’) which I think is pretty clearly not all of ASCII but rather a subset. Explicitly, the character set I’m referring to is the ‘tchar’ mentioned in the RFC.

On string-filter: I suppose you could use that, (string=? (string-filter the-char-set ...) original-string), to check things, but it seems more efficient and simpler to use the predicate string-every instead.

That said, it might be worth looking at how the caller(s) of  the method parsing procedure uses the method parsing procedure. It might be the case that they use something to (string-index s everything-except-tchar begin end) to locate the end of the method name. In that case, the argument passes to the method parsing procedure is correct by construction (assuming length>0), so then that procedure doesn’t need to do any checks and can leave (with a docstring) that responsibility to the caller.

>For example, perhaps we need to go to where "str" is read and set the port encoding to US-ASCII. Right now it's Iso Latin which is a superset of US-ASCII, and therefore improper.

Eh, while HTTP might look like text, it’s more like a mix of text and octets/bytes:

Field values are usually constrained to the range of US-ASCII characters [USASCII]. __Fields needing a greater range of characters can use an encoding__, such as the one defined in [RFC8187]. Historically, HTTP allowed field content with text in the __ISO-8859-1__ charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. Specifications for newly defined fields SHOULD limit their values to visible US-ASCII octets (VCHAR), SP, and HTAB. __A recipient SHOULD treat other allowed octets in field content (i.e., obs-text) as opaque data__.

(emphasis added)

I interpret this as “HTTP prefers only US-ASCII(see SHOULD), but it’s not strictly required (depending on the field), and sometimes it doesn’t even have any meaning as characters and instead is only raw bytes(*)”.  Also see the bit about ISO-8559-1, it appears that in at least some case, the ISO-8559-1 encoding should be recognised.

(I might be misinterpreting this though, perhaps it is referring to %-encoding.)

Also, using ISO Latin 1 (or another ASCII (the character encoding)-compatible 8-bit encoding) is convenient for handling octets and US-ASCII characters together.

Maybe separating the US-ASCII from the extra octets might make the code more proper in some aesthetical sense, but I don’t think it would make things more proper in a RFC-compliant sense (though neither would it make things worse, I suppose).

(There might be bugs w.r.t. character encoding in the Guile implementation, but I don’t think this is one of them.)

>That being said, the best form for this function is:
>(string->symbol (substring str start end) )
>With additional logic added to other functions?

I am not familiar enough with the Guile implementation to tell if the extra logic is best done in this function or in its caller. It just needs to be done _somewhere_.

Best regards,
Maxime Devos

[-- Attachment #2: Type: text/html, Size: 9098 bytes --]

     prev parent reply	other threads:[~2024-03-24 13:57 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-20 22:14 Custom HTTP methods in web module Ryan Raymond
2024-03-23 12:49 ` Maxime Devos
2024-03-23 18:50 ` Maxime Devos
     [not found]   ` <CAGvJ-HS5Laqd7=v=WCn4-2zUurXVZcKDFA2+MmNPO-cZO6iUJg@mail.gmail.com>
2024-03-24 13:57     ` Maxime Devos [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240324145741.2dxh2C0015DtEJR06dxhWR@albert.telenet-ops.be \
    --to=maximedevos@telenet.be \
    --cc=guile-devel@gnu.org \
    --cc=rjraymond@oakland.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).