From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Maxime Devos Newsgroups: gmane.lisp.guile.devel Subject: RE: Custom HTTP methods in web module Date: Sun, 24 Mar 2024 14:57:42 +0100 Message-ID: <20240324145741.2dxh2C0015DtEJR06dxhWR@albert.telenet-ops.be> References: <20240323195006.2Jq52C00K5DtEJR01Jq6FE@laurent.telenet-ops.be> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="_C24A2D57-9CC8-4DB2-A0F3-56202FB898D1_" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="11140"; mail-complaints-to="usenet@ciao.gmane.io" To: Ryan Raymond , "Jonas Hahnfeld via Developers list for Guile,the GNU extensibility library" Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Sun Mar 24 14:58:18 2024 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1roOMc-0002jY-5i for guile-devel@m.gmane-mx.org; Sun, 24 Mar 2024 14:58:18 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1roOMC-0005un-Hf; Sun, 24 Mar 2024 09:57:52 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1roOMA-0005uC-3W for guile-devel@gnu.org; Sun, 24 Mar 2024 09:57:50 -0400 Original-Received: from albert.telenet-ops.be ([2a02:1800:110:4::f00:1a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1roOM7-0003hw-An for guile-devel@gnu.org; Sun, 24 Mar 2024 09:57:49 -0400 Original-Received: from [IPv6:2a02:1811:8c0e:ef00:e9e7:2ef1:7a03:a748] ([IPv6:2a02:1811:8c0e:ef00:e9e7:2ef1:7a03:a748]) by albert.telenet-ops.be with bizsmtp id 2dxh2C0015DtEJR06dxhWR; Sun, 24 Mar 2024 14:57:41 +0100 Importance: normal X-Priority: 3 In-Reply-To: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=telenet.be; s=r24; t=1711288661; bh=2Sk/Fi9RH8d2rOocgVU7moU8mM90IFYIUkulzZNZoQ8=; h=To:From:Subject:Date:In-Reply-To:References; b=ZLaKxfAHpWNqb6n3Wa+Rj5MkF0HlDSb0v8b9UQE38anZChnrkgIt6gk0y9GO7tjdP UfVn6D+zqUfwGwIl+jr6xXf5LXV6GoEKL7Ugqf7mfxOxadSXE7+xOEP6iIXGyWRtM2 ZEJSVSMvA7mUvnac1+8GSwc4XqfxHEgrN0bZWDH6QC4hQkSnYhzPh90gHtFpzkq7nl yh5gwlZ/VDby5lHCwiBcRLbYfjYKSzF9HXufqu01BE1yiBGQIHwcnxU9eTSWSed17M cjU3ZfOyShdM/gb9ectTxIiX/YYh5WxpxRXcVUE+v3448C/8gzxS2EKgHMLx7TCSAT ykFGlF5P0OqhQ== Received-SPF: pass client-ip=2a02:1800:110:4::f00:1a; envelope-from=maximedevos@telenet.be; helo=albert.telenet-ops.be X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.lisp.guile.devel:22366 Archived-At: --_C24A2D57-9CC8-4DB2-A0F3-56202FB898D1_ Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" >The character-set you're referring to, is it US-ASCII? I am not particular= ly familiar with how Guile handles characters. If string-filter is not suff= icient, can you suggest another method? > >For example, perhaps we need to go to where "str" is read and set the port= encoding to US-ASCII. Right now it's Iso Latin which is a superset of US-A= SCII, and therefore improper. I meant =E2=80=9Ccharacter set=E2=80=9D in the sense as used in Guile, not = character encoding. Very literally, it means a =E2=80=9Cset of characters=E2=80=9D, where =E2= =80=98set=E2=80=99 is used in the mathematical sense. =E2=80=98Character=E2= =80=99 means any character in Unicode (not counting those special pairs use= d for UTF-16, they aren=E2=80=99t characters). Given you mentioned char-set:graphic, I thought you already knew. So, the answer is, no, it=E2=80=99s not ASCII (the character set), it=E2=80= =99s a subset of US-ASCII defined in the HTTP spec. IIRC, I referred to: =E2=9E=A2 https://www.rfc-editor.org/rfc/rfc9110.html#name-tokens (in particular see =E2=80=98tchar=E2=80=99) which I think is pretty clearly= not all of ASCII but rather a subset. Explicitly, the character set I=E2= =80=99m referring to is the =E2=80=98tchar=E2=80=99 mentioned in the RFC. On string-filter: I suppose you could use that, (string=3D? (string-filter = the-char-set ...) original-string), to check things, but it seems more effi= cient and simpler to use the predicate string-every instead. That said, it might be worth looking at how the caller(s) of the method pa= rsing procedure uses the method parsing procedure. It might be the case tha= t they use something to (string-index s everything-except-tchar begin end) = to locate the end of the method name. In that case, the argument passes to = the method parsing procedure is correct by construction (assuming length>0)= , so then that procedure doesn=E2=80=99t need to do any checks and can leav= e (with a docstring) that responsibility to the caller. >For example, perhaps we need to go to where "str" is read and set the port= encoding to US-ASCII. Right now it's Iso Latin which is a superset of US-A= SCII, and therefore improper. Eh, while HTTP might look like text, it=E2=80=99s more like a mix of text a= nd octets/bytes: Field values are usually constrained to the range of US-ASCII characters [U= SASCII]. __Fields needing a greater range of characters can use an encoding= __, such as the one defined in [RFC8187]. Historically, HTTP allowed field = content with text in the __ISO-8859-1__ charset [ISO-8859-1], supporting ot= her charsets only through use of [RFC2047] encoding. Specifications for new= ly defined fields SHOULD limit their values to visible US-ASCII octets (VCH= AR), SP, and HTAB. __A recipient SHOULD treat other allowed octets in field= content (i.e., obs-text) as opaque data__. (emphasis added) I interpret this as =E2=80=9CHTTP prefers only US-ASCII(see SHOULD), but it= =E2=80=99s not strictly required (depending on the field), and sometimes it= doesn=E2=80=99t even have any meaning as characters and instead is only ra= w bytes(*)=E2=80=9D. Also see the bit about ISO-8559-1, it appears that in= at least some case, the ISO-8559-1 encoding should be recognised. (I might be misinterpreting this though, perhaps it is referring to %-encod= ing.) Also, using ISO Latin 1 (or another ASCII (the character encoding)-compatib= le 8-bit encoding) is convenient for handling octets and US-ASCII character= s together. Maybe separating the US-ASCII from the extra octets might make the code mor= e proper in some aesthetical sense, but I don=E2=80=99t think it would make= things more proper in a RFC-compliant sense (though neither would it make = things worse, I suppose). (There might be bugs w.r.t. character encoding in the Guile implementation,= but I don=E2=80=99t think this is one of them.) >That being said, the best form for this function is: >(string->symbol (substring str start end) ) >With additional logic added to other functions? I am not familiar enough with the Guile implementation to tell if the extra= logic is best done in this function or in its caller. It just needs to be = done _somewhere_. Best regards, Maxime Devos --_C24A2D57-9CC8-4DB2-A0F3-56202FB898D1_ Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="utf-8"

>T= he character-set you're referring to, is it US-ASCII? I am not particularly= familiar with how Guile handles characters. If string-filter is not suffic= ient, can you suggest another method?

>= ; 

>For example, perh= aps we need to go to where "str" is read and set the port encodin= g to US-ASCII. Right now it's Iso Latin which is a superset of US-ASCII, an= d therefore improper.

 <= /p>

I meant =E2=80=9Ccharacter set=E2=80=9D= in the sense as used in Guile, not character encoding.

Very literally, it means a =E2=80=9Cset of characters=E2=80=9D, where = =E2=80=98set=E2=80=99 is used in the mathematical sense. =E2=80=98Character= =E2=80=99 means any character in Unicode (not counting those special pairs = used for UTF-16, they aren=E2=80=99t characters).

<= o:p> 

Given you mentioned char-set:graph= ic, I thought you already knew.

 

So, the answer is, no, it=E2=80=99s not ASCII (the c= haracter set), it=E2=80=99s a subset of US-ASCII defined in the HTTP spec. = IIRC, I referred to:

 

 

(in particular see =E2=80=98tchar=E2=80=99) which I think is p= retty clearly not all of ASCII but rather a subset. Explicitly, the charact= er set I=E2=80=99m referring to is the =E2=80=98tchar=E2=80=99 mentioned in= the RFC.

 

On string-filter: I suppose you could use that, (string=3D? (string-filter= the-char-set ...) original-string), to check things, but it seems more eff= icient and simpler to use the predicate string-every instead.

 

That said, it might = be worth looking at how the caller(s) of=C2=A0 the method parsing procedure= uses the method parsing procedure. It might be the case that they use some= thing to (string-index s everything-except-tchar begin end) to locate the e= nd of the method name. In that case, the argument passes to the method pars= ing procedure is correct by construction (assuming length>0), so then th= at procedure doesn=E2=80=99t need to do any checks and can leave (with a do= cstring) that responsibility to the caller.

&n= bsp;

>For example, perhaps we need to go t= o where "str" is read and set the port encoding to US-ASCII. Righ= t now it's Iso Latin which is a superset of US-ASCII, and therefore imprope= r.

 

Eh, while HTTP might look like text, it=E2=80=99s more like a mix of t= ext and octets/bytes:

 

Field values are usually constrained to the range of US-ASCII = characters [USASCII]. __Fields needing a greater range of characters can us= e an encoding__, such as the one defined in [RFC8187]. Historically, HTTP a= llowed field content with text in the __ISO-8859-1__ charset [ISO-8859-1], = supporting other charsets only through use of [RFC2047] encoding. Specifica= tions for newly defined fields SHOULD limit their values to visible US-ASCI= I octets (VCHAR), SP, and HTAB. __A recipient SHOULD treat other allowed oc= tets in field content (i.e., obs-text) as opaque data__.

 

(emphasis added)

 

I interpret this as= =E2=80=9CHTTP prefers only US-ASCII(see SHOULD), but it=E2=80=99s not stri= ctly required (depending on the field), and sometimes it doesn=E2=80=99t ev= en have any meaning as characters and instead is only raw bytes(*)=E2=80=9D= .=C2=A0 Also see the bit about ISO-8559-1, it appears that in at least some= case, the ISO-8559-1 encoding should be recognised.

 

(I might be misinterpreting thi= s though, perhaps it is referring to %-encoding.)

<= o:p> 

Also, using ISO Latin 1 (or anothe= r ASCII (the character encoding)-compatible 8-bit encoding) is convenient f= or handling octets and US-ASCII characters together.

 

Maybe separating the US-ASCII f= rom the extra octets might make the code more proper in some aesthetical se= nse, but I don=E2=80=99t think it would make things more proper in a RFC-co= mpliant sense (though neither would it make things worse, I suppose).

 

(There might b= e bugs w.r.t. character encoding in the Guile implementation, but I don=E2= =80=99t think this is one of them.)

 

>That being said, the best form for this= function is:

>(string->symbol (su= bstring str start end) )

>With additi= onal logic added to other functions?

 

I am not familiar enough with the Gu= ile implementation to tell if the extra logic is best done in this function= or in its caller. It just needs to be done _somewhere_.

 

Best regards,

Maxime Devos

= --_C24A2D57-9CC8-4DB2-A0F3-56202FB898D1_--