>The character-set you're referring to, is it US-ASCII? I am not particularly familiar with how Guile handles characters. If string-filter is not sufficient, can you suggest another method?
>
>For example, perhaps we need to go to where "str" is read and set the port encoding to US-ASCII. Right now it's Iso Latin which is a superset of US-ASCII, and therefore improper.

I meant “character set” in the sense as used in Guile, not character encoding.
Very literally, it means a “set of characters”, where ‘set’ is used in the mathematical sense. ‘Character’ means any character in Unicode (not counting those special pairs used for UTF-16, they aren’t characters).

Given you mentioned char-set:graphic, I thought you already knew.

So, the answer is, no, it’s not ASCII (the character set), it’s a subset of US-ASCII defined in the HTTP spec. IIRC, I referred to:

➢ https://www.rfc-editor.org/rfc/rfc9110.html#name-tokens

(in particular see ‘tchar’) which I think is pretty clearly not all of ASCII but rather a subset. Explicitly, the character set I’m referring to is the ‘tchar’ mentioned in the RFC.

On string-filter: I suppose you could use that, (string=? (string-filter the-char-set ...) original-string), to check things, but it seems more efficient and simpler to use the predicate string-every instead.

That said, it might be worth looking at how the caller(s) of  the method parsing procedure uses the method parsing procedure. It might be the case that they use something to (string-index s everything-except-tchar begin end) to locate the end of the method name. In that case, the argument passes to the method parsing procedure is correct by construction (assuming length>0), so then that procedure doesn’t need to do any checks and can leave (with a docstring) that responsibility to the caller.

>For example, perhaps we need to go to where "str" is read and set the port encoding to US-ASCII. Right now it's Iso Latin which is a superset of US-ASCII, and therefore improper.

Eh, while HTTP might look like text, it’s more like a mix of text and octets/bytes:

Field values are usually constrained to the range of US-ASCII characters [USASCII]. __Fields needing a greater range of characters can use an encoding__, such as the one defined in [RFC8187]. Historically, HTTP allowed field content with text in the __ISO-8859-1__ charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. Specifications for newly defined fields SHOULD limit their values to visible US-ASCII octets (VCHAR), SP, and HTAB. __A recipient SHOULD treat other allowed octets in field content (i.e., obs-text) as opaque data__.

(emphasis added)

I interpret this as “HTTP prefers only US-ASCII(see SHOULD), but it’s not strictly required (depending on the field), and sometimes it doesn’t even have any meaning as characters and instead is only raw bytes(*)”.  Also see the bit about ISO-8559-1, it appears that in at least some case, the ISO-8559-1 encoding should be recognised.

(I might be misinterpreting this though, perhaps it is referring to %-encoding.)

Also, using ISO Latin 1 (or another ASCII (the character encoding)-compatible 8-bit encoding) is convenient for handling octets and US-ASCII characters together.

Maybe separating the US-ASCII from the extra octets might make the code more proper in some aesthetical sense, but I don’t think it would make things more proper in a RFC-compliant sense (though neither would it make things worse, I suppose).

(There might be bugs w.r.t. character encoding in the Guile implementation, but I don’t think this is one of them.)

>That being said, the best form for this function is:
>(string->symbol (substring str start end) )
>With additional logic added to other functions?

I am not familiar enough with the Guile implementation to tell if the extra logic is best done in this function or in its caller. It just needs to be done _somewhere_.

Best regards,
Maxime Devos