From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: Maxime Devos <maximedevos@telenet.be>
Newsgroups: gmane.lisp.guile.devel
Subject: RE: Custom HTTP methods in web module
Date: Sun, 24 Mar 2024 14:57:42 +0100
Message-ID: <20240324145741.2dxh2C0015DtEJR06dxhWR@albert.telenet-ops.be>
References: <CAGvJ-HSOn7npYqfMOb1shRMf5jJz5xeCxeFH_4iUZi9KmdMvPQ@mail.gmail.com>
 <20240323195006.2Jq52C00K5DtEJR01Jq6FE@laurent.telenet-ops.be>
 <CAGvJ-HS5Laqd7=v=WCn4-2zUurXVZcKDFA2+MmNPO-cZO6iUJg@mail.gmail.com>
Mime-Version: 1.0
Content-Type: multipart/alternative;
 boundary="_C24A2D57-9CC8-4DB2-A0F3-56202FB898D1_"
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="11140"; mail-complaints-to="usenet@ciao.gmane.io"
To: Ryan Raymond <rjraymond@oakland.edu>, 
 "Jonas Hahnfeld via Developers list for Guile,the GNU extensibility library"
 <guile-devel@gnu.org>
Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Sun Mar 24 14:58:18 2024
Return-path: <guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org>
Envelope-to: guile-devel@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org>)
	id 1roOMc-0002jY-5i
	for guile-devel@m.gmane-mx.org; Sun, 24 Mar 2024 14:58:18 +0100
Original-Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <guile-devel-bounces@gnu.org>)
	id 1roOMC-0005un-Hf; Sun, 24 Mar 2024 09:57:52 -0400
Original-Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <maximedevos@telenet.be>)
 id 1roOMA-0005uC-3W
 for guile-devel@gnu.org; Sun, 24 Mar 2024 09:57:50 -0400
Original-Received: from albert.telenet-ops.be ([2a02:1800:110:4::f00:1a])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <maximedevos@telenet.be>)
 id 1roOM7-0003hw-An
 for guile-devel@gnu.org; Sun, 24 Mar 2024 09:57:49 -0400
Original-Received: from [IPv6:2a02:1811:8c0e:ef00:e9e7:2ef1:7a03:a748]
 ([IPv6:2a02:1811:8c0e:ef00:e9e7:2ef1:7a03:a748])
 by albert.telenet-ops.be with bizsmtp
 id 2dxh2C0015DtEJR06dxhWR; Sun, 24 Mar 2024 14:57:41 +0100
Importance: normal
X-Priority: 3
In-Reply-To: <CAGvJ-HS5Laqd7=v=WCn4-2zUurXVZcKDFA2+MmNPO-cZO6iUJg@mail.gmail.com>
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=telenet.be; s=r24;
 t=1711288661; bh=2Sk/Fi9RH8d2rOocgVU7moU8mM90IFYIUkulzZNZoQ8=;
 h=To:From:Subject:Date:In-Reply-To:References;
 b=ZLaKxfAHpWNqb6n3Wa+Rj5MkF0HlDSb0v8b9UQE38anZChnrkgIt6gk0y9GO7tjdP
 UfVn6D+zqUfwGwIl+jr6xXf5LXV6GoEKL7Ugqf7mfxOxadSXE7+xOEP6iIXGyWRtM2
 ZEJSVSMvA7mUvnac1+8GSwc4XqfxHEgrN0bZWDH6QC4hQkSnYhzPh90gHtFpzkq7nl
 yh5gwlZ/VDby5lHCwiBcRLbYfjYKSzF9HXufqu01BE1yiBGQIHwcnxU9eTSWSed17M
 cjU3ZfOyShdM/gb9ectTxIiX/YYh5WxpxRXcVUE+v3448C/8gzxS2EKgHMLx7TCSAT
 ykFGlF5P0OqhQ==
Received-SPF: pass client-ip=2a02:1800:110:4::f00:1a;
 envelope-from=maximedevos@telenet.be; helo=albert.telenet-ops.be
X-Spam_score_int: -27
X-Spam_score: -2.8
X-Spam_bar: --
X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001,
 HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: guile-devel@gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Developers list for Guile,
 the GNU extensibility library" <guile-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/guile-devel>,
 <mailto:guile-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/guile-devel>
List-Post: <mailto:guile-devel@gnu.org>
List-Help: <mailto:guile-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/guile-devel>,
 <mailto:guile-devel-request@gnu.org?subject=subscribe>
Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org
Original-Sender: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org
Xref: news.gmane.io gmane.lisp.guile.devel:22366
Archived-At: <http://permalink.gmane.org/gmane.lisp.guile.devel/22366>

--_C24A2D57-9CC8-4DB2-A0F3-56202FB898D1_
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

>The character-set you're referring to, is it US-ASCII? I am not particular=
ly familiar with how Guile handles characters. If string-filter is not suff=
icient, can you suggest another method?
>
>For example, perhaps we need to go to where "str" is read and set the port=
 encoding to US-ASCII. Right now it's Iso Latin which is a superset of US-A=
SCII, and therefore improper.

I meant =E2=80=9Ccharacter set=E2=80=9D in the sense as used in Guile, not =
character encoding.
Very literally, it means a =E2=80=9Cset of characters=E2=80=9D, where =E2=
=80=98set=E2=80=99 is used in the mathematical sense. =E2=80=98Character=E2=
=80=99 means any character in Unicode (not counting those special pairs use=
d for UTF-16, they aren=E2=80=99t characters).

Given you mentioned char-set:graphic, I thought you already knew.

So, the answer is, no, it=E2=80=99s not ASCII (the character set), it=E2=80=
=99s a subset of US-ASCII defined in the HTTP spec. IIRC, I referred to:

=E2=9E=A2 https://www.rfc-editor.org/rfc/rfc9110.html#name-tokens

(in particular see =E2=80=98tchar=E2=80=99) which I think is pretty clearly=
 not all of ASCII but rather a subset. Explicitly, the character set I=E2=
=80=99m referring to is the =E2=80=98tchar=E2=80=99 mentioned in the RFC.

On string-filter: I suppose you could use that, (string=3D? (string-filter =
the-char-set ...) original-string), to check things, but it seems more effi=
cient and simpler to use the predicate string-every instead.

That said, it might be worth looking at how the caller(s) of  the method pa=
rsing procedure uses the method parsing procedure. It might be the case tha=
t they use something to (string-index s everything-except-tchar begin end) =
to locate the end of the method name. In that case, the argument passes to =
the method parsing procedure is correct by construction (assuming length>0)=
, so then that procedure doesn=E2=80=99t need to do any checks and can leav=
e (with a docstring) that responsibility to the caller.

>For example, perhaps we need to go to where "str" is read and set the port=
 encoding to US-ASCII. Right now it's Iso Latin which is a superset of US-A=
SCII, and therefore improper.

Eh, while HTTP might look like text, it=E2=80=99s more like a mix of text a=
nd octets/bytes:

Field values are usually constrained to the range of US-ASCII characters [U=
SASCII]. __Fields needing a greater range of characters can use an encoding=
__, such as the one defined in [RFC8187]. Historically, HTTP allowed field =
content with text in the __ISO-8859-1__ charset [ISO-8859-1], supporting ot=
her charsets only through use of [RFC2047] encoding. Specifications for new=
ly defined fields SHOULD limit their values to visible US-ASCII octets (VCH=
AR), SP, and HTAB. __A recipient SHOULD treat other allowed octets in field=
 content (i.e., obs-text) as opaque data__.

(emphasis added)

I interpret this as =E2=80=9CHTTP prefers only US-ASCII(see SHOULD), but it=
=E2=80=99s not strictly required (depending on the field), and sometimes it=
 doesn=E2=80=99t even have any meaning as characters and instead is only ra=
w bytes(*)=E2=80=9D.  Also see the bit about ISO-8559-1, it appears that in=
 at least some case, the ISO-8559-1 encoding should be recognised.

(I might be misinterpreting this though, perhaps it is referring to %-encod=
ing.)

Also, using ISO Latin 1 (or another ASCII (the character encoding)-compatib=
le 8-bit encoding) is convenient for handling octets and US-ASCII character=
s together.

Maybe separating the US-ASCII from the extra octets might make the code mor=
e proper in some aesthetical sense, but I don=E2=80=99t think it would make=
 things more proper in a RFC-compliant sense (though neither would it make =
things worse, I suppose).

(There might be bugs w.r.t. character encoding in the Guile implementation,=
 but I don=E2=80=99t think this is one of them.)

>That being said, the best form for this function is:
>(string->symbol (substring str start end) )
>With additional logic added to other functions?

I am not familiar enough with the Guile implementation to tell if the extra=
 logic is best done in this function or in its caller. It just needs to be =
done _somewhere_.

Best regards,
Maxime Devos

--_C24A2D57-9CC8-4DB2-A0F3-56202FB898D1_
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset="utf-8"

<html xmlns:o=3D"urn:schemas-microsoft-com:office:office" xmlns:w=3D"urn:sc=
hemas-microsoft-com:office:word" xmlns:m=3D"http://schemas.microsoft.com/of=
fice/2004/12/omml" xmlns=3D"http://www.w3.org/TR/REC-html40"><head><meta ht=
tp-equiv=3DContent-Type content=3D"text/html; charset=3Dutf-8"><meta name=
=3DGenerator content=3D"Microsoft Word 15 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
	{font-family:Wingdings;
	panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0cm;
	font-size:11.0pt;
	font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
	{mso-style-priority:34;
	margin-top:0cm;
	margin-right:0cm;
	margin-bottom:0cm;
	margin-left:36.0pt;
	font-size:11.0pt;
	font-family:"Calibri",sans-serif;}
.MsoChpDefault
	{mso-style-type:export-only;
	font-family:"Calibri",sans-serif;}
@page WordSection1
	{size:612.0pt 792.0pt;
	margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
	{page:WordSection1;}
/* List Definitions */
@list l0
	{mso-list-id:2041006660;
	mso-list-type:hybrid;
	mso-list-template-ids:-893629756 -1 134807555 134807557 134807553 13480755=
5 134807557 134807553 134807555 134807557;}
@list l0:level1
	{mso-level-start-at:0;
	mso-level-number-format:bullet;
	mso-level-text:\F0D8;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-18.0pt;
	font-family:Wingdings;
	mso-fareast-font-family:"Times New Roman";
	mso-bidi-font-family:"Times New Roman";}
@list l0:level2
	{mso-level-number-format:bullet;
	mso-level-text:o;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-18.0pt;
	font-family:"Courier New";}
@list l0:level3
	{mso-level-number-format:bullet;
	mso-level-text:\F0A7;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-18.0pt;
	font-family:Wingdings;}
@list l0:level4
	{mso-level-number-format:bullet;
	mso-level-text:\F0B7;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-18.0pt;
	font-family:Symbol;}
@list l0:level5
	{mso-level-number-format:bullet;
	mso-level-text:o;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-18.0pt;
	font-family:"Courier New";}
@list l0:level6
	{mso-level-number-format:bullet;
	mso-level-text:\F0A7;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-18.0pt;
	font-family:Wingdings;}
@list l0:level7
	{mso-level-number-format:bullet;
	mso-level-text:\F0B7;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-18.0pt;
	font-family:Symbol;}
@list l0:level8
	{mso-level-number-format:bullet;
	mso-level-text:o;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-18.0pt;
	font-family:"Courier New";}
@list l0:level9
	{mso-level-number-format:bullet;
	mso-level-text:\F0A7;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-18.0pt;
	font-family:Wingdings;}
ol
	{margin-bottom:0cm;}
ul
	{margin-bottom:0cm;}
--></style></head><body lang=3Den-BE link=3Dblue vlink=3D"#954F72" style=3D=
'word-wrap:break-word'><div class=3DWordSection1><p class=3DMsoNormal>&gt;T=
he character-set you're referring to, is it US-ASCII? I am not particularly=
 familiar with how Guile handles characters. If string-filter is not suffic=
ient, can you suggest another method?</p><div><div><p class=3DMsoNormal>&gt=
;<o:p>&nbsp;</o:p></p></div><div><p class=3DMsoNormal>&gt;For example, perh=
aps we need to go to where &quot;str&quot; is read and set the port encodin=
g to US-ASCII. Right now it's Iso Latin which is a superset of US-ASCII, an=
d therefore improper.</p></div><div><p class=3DMsoNormal><o:p>&nbsp;</o:p><=
/p></div></div><p class=3DMsoNormal>I meant =E2=80=9Ccharacter set=E2=80=9D=
 in the sense as used in Guile, not character encoding.</p><p class=3DMsoNo=
rmal>Very literally, it means a =E2=80=9Cset of characters=E2=80=9D, where =
=E2=80=98set=E2=80=99 is used in the mathematical sense. =E2=80=98Character=
=E2=80=99 means any character in Unicode (not counting those special pairs =
used for UTF-16, they aren=E2=80=99t characters).</p><p class=3DMsoNormal><=
o:p>&nbsp;</o:p></p><p class=3DMsoNormal>Given you mentioned char-set:graph=
ic, I thought you already knew.</p><p class=3DMsoNormal><o:p>&nbsp;</o:p></=
p><p class=3DMsoNormal>So, the answer is, no, it=E2=80=99s not ASCII (the c=
haracter set), it=E2=80=99s a subset of US-ASCII defined in the HTTP spec. =
IIRC, I referred to:</p><p class=3DMsoNormal><o:p>&nbsp;</o:p></p><ul style=
=3D'margin-top:0cm' type=3Ddisc><li class=3DMsoListParagraph style=3D'margi=
n-left:0cm;mso-list:l0 level1 lfo1'><a href=3D"https://www.rfc-editor.org/r=
fc/rfc9110.html#name-tokens">https://www.rfc-editor.org/rfc/rfc9110.html#na=
me-tokens</a></li></ul><p class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=
=3DMsoNormal>(in particular see =E2=80=98tchar=E2=80=99) which I think is p=
retty clearly not all of ASCII but rather a subset. Explicitly, the charact=
er set I=E2=80=99m referring to is the =E2=80=98tchar=E2=80=99 mentioned in=
 the RFC.</p><p class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal=
>On string-filter: I suppose you could use that, (string=3D? (string-filter=
 the-char-set ...) original-string), to check things, but it seems more eff=
icient and simpler to use the predicate string-every instead.</p><p class=
=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>That said, it might =
be worth looking at how the caller(s) of=C2=A0 the method parsing procedure=
 uses the method parsing procedure. It might be the case that they use some=
thing to (string-index s everything-except-tchar begin end) to locate the e=
nd of the method name. In that case, the argument passes to the method pars=
ing procedure is correct by construction (assuming length&gt;0), so then th=
at procedure doesn=E2=80=99t need to do any checks and can leave (with a do=
cstring) that responsibility to the caller.</p><p class=3DMsoNormal><o:p>&n=
bsp;</o:p></p><p class=3DMsoNormal>&gt;For example, perhaps we need to go t=
o where &quot;str&quot; is read and set the port encoding to US-ASCII. Righ=
t now it's Iso Latin which is a superset of US-ASCII, and therefore imprope=
r.<o:p></o:p></p><p class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNo=
rmal>Eh, while HTTP might look like text, it=E2=80=99s more like a mix of t=
ext and octets/bytes:</p><p class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=
=3DMsoNormal>Field values are usually constrained to the range of US-ASCII =
characters [USASCII]. __Fields needing a greater range of characters can us=
e an encoding__, such as the one defined in [RFC8187]. Historically, HTTP a=
llowed field content with text in the __ISO-8859-1__ charset [ISO-8859-1], =
supporting other charsets only through use of [RFC2047] encoding. Specifica=
tions for newly defined fields SHOULD limit their values to visible US-ASCI=
I octets (VCHAR), SP, and HTAB. __A recipient SHOULD treat other allowed oc=
tets in field content (i.e., obs-text) as opaque data__.</p><p class=3DMsoN=
ormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>(emphasis added)</p><p clas=
s=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>I interpret this as=
 =E2=80=9CHTTP prefers only US-ASCII(see SHOULD), but it=E2=80=99s not stri=
ctly required (depending on the field), and sometimes it doesn=E2=80=99t ev=
en have any meaning as characters and instead is only raw bytes(*)=E2=80=9D=
.=C2=A0 Also see the bit about ISO-8559-1, it appears that in at least some=
 case, the ISO-8559-1 encoding should be recognised.</p><p class=3DMsoNorma=
l><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>(I might be misinterpreting thi=
s though, perhaps it is referring to %-encoding.)</p><p class=3DMsoNormal><=
o:p>&nbsp;</o:p></p><p class=3DMsoNormal>Also, using ISO Latin 1 (or anothe=
r ASCII (the character encoding)-compatible 8-bit encoding) is convenient f=
or handling octets and US-ASCII characters together.</p><p class=3DMsoNorma=
l><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>Maybe separating the US-ASCII f=
rom the extra octets might make the code more proper in some aesthetical se=
nse, but I don=E2=80=99t think it would make things more proper in a RFC-co=
mpliant sense (though neither would it make things worse, I suppose).</p><p=
 class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>(There might b=
e bugs w.r.t. character encoding in the Guile implementation, but I don=E2=
=80=99t think this is one of them.)</p><p class=3DMsoNormal><o:p>&nbsp;</o:=
p></p><div><p class=3DMsoNormal>&gt;That being said, the best form for this=
 function is:</p></div><div><p class=3DMsoNormal>&gt;(string-&gt;symbol (su=
bstring str start end) )</p></div><div><p class=3DMsoNormal>&gt;With additi=
onal logic added to other functions?</p></div><div><p class=3DMsoNormal><o:=
p>&nbsp;</o:p></p><p class=3DMsoNormal>I am not familiar enough with the Gu=
ile implementation to tell if the extra logic is best done in this function=
 or in its caller. It just needs to be done _<i>somewhere</i>_.</p><p class=
=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>Best regards,</p><p =
class=3DMsoNormal>Maxime Devos</p></div></div></body></html>=

--_C24A2D57-9CC8-4DB2-A0F3-56202FB898D1_--