From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Maxime Devos Newsgroups: gmane.lisp.guile.devel Subject: RE: Improving the handling of system data (env, users, paths, ...) Date: Sun, 7 Jul 2024 16:59:10 +0200 Message-ID: <20240707165910.kez92C00H4hwdlW01ezAlE@andre.telenet-ops.be> References: <878qyeqn1q.fsf@trouble.defaultvalue.org> <86jzhx3gxe.fsf@gnu.org> <9985c529ffbbabaa259ee62226ced1feec8c7810.camel@abou-samra.fr> <865xth31kq.fsf@gnu.org> <20240707133527.kbbT2C0064hwdlW01bbTq5@baptiste.telenet-ops.be> <8634ol2sal.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="_D92A67A6-BA24-4E64-8561-CBC64E4D06EE_" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="35820"; mail-complaints-to="usenet@ciao.gmane.io" Cc: "jean@abou-samra.fr" , "rlb@defaultvalue.org" , "guile-devel@gnu.org" To: Eli Zaretskii Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Sun Jul 07 16:59:45 2024 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sQTMd-0008yE-5B for guile-devel@m.gmane-mx.org; Sun, 07 Jul 2024 16:59:43 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sQTMG-0001kc-5T; Sun, 07 Jul 2024 10:59:20 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sQTMC-0001iP-M3 for guile-devel@gnu.org; Sun, 07 Jul 2024 10:59:16 -0400 Original-Received: from andre.telenet-ops.be ([2a02:1800:120:4::f00:15]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sQTM9-000313-Vr for guile-devel@gnu.org; Sun, 07 Jul 2024 10:59:16 -0400 Original-Received: from [IPv6:2a02:1811:8c0e:ef00:95f6:12f6:aa85:7dcc] ([IPv6:2a02:1811:8c0e:ef00:95f6:12f6:aa85:7dcc]) by andre.telenet-ops.be with bizsmtp id kez92C00H4hwdlW01ezAlE; Sun, 07 Jul 2024 16:59:10 +0200 Importance: normal X-Priority: 3 In-Reply-To: <8634ol2sal.fsf@gnu.org> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=telenet.be; s=r24; t=1720364350; bh=RikUqzO0lbCsmRISdyX9zXJCcl2w7oR3UOMtEO1bF3M=; h=To:Cc:From:Subject:Date:In-Reply-To:References; b=My/KLNQOyVEhWfIFMA3gz0ahPVQie/Ep5g98syRllcwftA3MjlOVwsLi0JnZng1+s 3BKcWAs8BeZOGbhHNEGsUeNtOzkZaxHlP+fR5IW0WLlv+dUd07v+yOpaPLYpFSc4Sv q8rl8QdVZroqpcw/ieMOPXBLOhE3g9jj8CNzAFARxoTSDoTPg6QGTXRuGtkSIhRDFA rd+AMax8TSV6kDE+gDX8ye2Qws4wgKoxBXQmleZ0shEdpcoHUfPR/l9Z2gj/dfVi9m 3kZEdaRPDKFSgxLEULlHGjrTTVkjm47I44Z8vGkQjb1u6bDTYt+QBh9VS6AJZkBgnc Fv3cARP+Cgl6Q== Received-SPF: pass client-ip=2a02:1800:120:4::f00:15; envelope-from=maximedevos@telenet.be; helo=andre.telenet-ops.be X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.lisp.guile.devel:22557 Archived-At: --_D92A67A6-BA24-4E64-8561-CBC64E4D06EE_ Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" >> >> Guile is a Scheme implementation, bound by Scheme standards and compa= tibility >> >> with other Scheme implementations (and backwards compatibility too). >> > >> >Yes, I understand that. >>=20 >> Going by what you are saying below, I think you don=E2=80=99t. > >Thank you for your vote of confidence. That was not a vote of confidence, if anything, it=E2=80=99s the contrary. > I=E2=80=99m pretty sure that they weren=E2=80=99t intending to get the 0x= b5 byte. Rather, they were using the equivalent of =E2=80=98string-ref=E2= =80=99 (i.e., =E2=80=98aref=E2=80=99) and demonstrating that the result is = bogus in Scheme. In Scheme, =E2=80=98(string-ref ...)=E2=80=99 needs to re= turn a character, and there exists no (Unicode) character with codepoint 41= 94229, so what Emacs returns here would be bogus for (Guile) Scheme. >aref in Emacs and string-ref in Guile are not the same, and if Guile needs to produce a raw byte in this scenario, it can be easily arranged. In Emacs we have other goals. It is the opposite. In Guile, string-ref does not need to produce bytes, bu= t characters =E2=80=93 just like aref (modulo difference in how Scheme and = Emacs define =E2=80=98byte=E2=80=99). >IOW, I think this argument is pointless, since it is easy to adapt the mechanism to what Guile needs. No =E2=80=93 the argument is about how it is impossible to adapt the mechan= ism to Guile, since bytes aren=E2=80=99t characters in Unicode. > >From the Emacs manual: >=20 > >For example, you can access individual characters in a string using the = function=C2=A0aref=C2=A0(see=C2=A0Functions that Operate on Arrays). >=20 > Thus, (aref the-string index) is the equivalent of (string-ref the-string= index). >No, because a raw byte is not a character. Yes, because characters are characters. Both string-ref and aref return cha= racters. This is documented in both the Emacs and Guile manual: Again, from the Emacs manual: > A string is a fixed sequence of characters. [...] Since strings are array= s, and therefore sequences as well, you can operate on them with the genera= l array and sequence functions documented in=C2=A0Sequences, Arrays, and Ve= ctors. For example, you can access individual characters in a string using = the function=C2=A0aref=C2=A0(see=C2=A0Functions that Operate on Arrays). Hence, (aref the-string index) returns (Emacs) characters. Likewise, from the Guile manual: > Scheme Procedure:=C2=A0string-ref=C2=A0str k >C Function:=C2=A0scm_string_ref=C2=A0(str, k) Return character=C2=A0k=C2=A0of=C2=A0str=C2=A0using zero-origin indexing.= =C2=A0k=C2=A0must be a valid index of=C2=A0str. Clearly, these are equivalent (modulo difference in the meaning of =E2=80= =98characters=E2=80=99). >If Guile restricts itself to Unicode characters and only them, it will lack important features. So my suggestion is not to have this restriction. Guile restricting strings to Unicode _is_ an important feature (simplicity,= and compatibility). Guile extending strings beyond Unicode is a _limitation_ (compatibility and= other trickiness for applications). I could imagine in the far future there might be too little codepoints left= in Unicode, in which case the range of what Guile (and more generally, Sch= eme and Unicode) considers characters needs to be extended (even if that ha= s some compatibility implicaitons), but that time hasn=E2=80=99t arrived ye= t. The important feature of this thread, is supporting file names (and getenv = stuff, etc.) that doesn=E2=80=99t fit properly in the =E2=80=98string=E2=80= =99 model. As mentioned earlier (in the initial message, even), there are s= olutions to that do not impose the =E2=80=98let characters go beyond Unicod= e=E2=80=99 limitation. >I think the fact that this discussion is held, and that Rob suggested to use Latin-1 for the purpose of supporting raw bytes is a clear indication that Guile, too, needs to deal with "character-like" data that does not fit the Unicode framework.=20 True, and I never claimed otherwise. > So I think saying that strings in Guile can only hold Unicode characters = will not give you what this discussion attempts to give. Sure, and I wasn=E2=80=99t trying to. What I (and IIUC, the other person as= well) was doing was mentioning how neither the Emacs=E2=80=99s thing is a = solution. (Whether because of backwards compatibility, or whether because o= f not _wanting_ to conflate bytes with characters (and not wanting to go be= yond Unicode) with all the consequences this conflation would imply for app= lications.) > In particular, how will you handle the situations described by Rob where a file has a name that is not a valid UTF-8 sequence (thus not "characters" as long as you interpret text as UTF-8)? Scheme does not interpret text as UTF-8, that=E2=80=99s an internal impleme= ntation detail and a matter of things like locales. Instead, to Scheme text= is (Unicode) characters. I have outlined a solution (that does not conflate characters with bytes) i= n another response. IIRC, it was in a response so Rob. I would propose actu= ally, you know, reading it. I=E2=80=99m not sure, but IIRC Rob also mention= ed another solution (i.e., just accept bytevectors in some locations, or do= Latin-1). Also, this structure makes no sense. Even if I did not provide an alternati= ve solution of my own, that wouldn=E2=80=99t mean Emacs=E2=80=99s thing is = the answer. (Negative) criticism can be valid without providing alternative= s. Best regards, Maxime Devos. --_D92A67A6-BA24-4E64-8561-CBC64E4D06EE_ Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="utf-8"

>> >> Guile is a Scheme implementation, bound by = Scheme standards and compatibility

>> >> with other Scheme implementations (a= nd backwards compatibility too).

= >> >

= >> >Yes, I understand that.

>>

>> Going by what you are saying= below, I think you don=E2=80=99t.

> 

<= span lang=3DEN-US>>Thank you for your vote of confidence.

 

=

That was not a vote of confidence, = if anything, it=E2=80=99s the contrary.

 

= > I=E2=80=99m pretty sure that they weren=E2=80=99t i= ntending to get the 0xb5 byte. Rather, they were using the equivalent of = =E2=80=98string-ref=E2=80=99 (i.e., =E2=80=98aref=E2=80=99) and demonstrati= ng that the result is bogus in Scheme.=C2=A0 In Scheme, =E2=80=98(string-re= f ...)=E2=80=99 needs to return a character, and there exists no (Unicode) = character with codepoint 4194229, so what Emacs returns here would be bogus= for (Guile) Scheme.

 

>aref in Emacs and string-ref in Guile are not the same, and if Guile<= o:p>

needs to produ= ce a raw byte in this scenario, it can be easily

arranged.=C2=A0 In Emacs we have other g= oals.

&nb= sp;

It is the oppos= ite. In Guile, string-ref does not need to produce bytes, but characters = =E2=80=93 just like aref (modulo difference in how Scheme and Emacs define = =E2=80=98byte=E2=80=99).

 

>IOW, I think this argument is pointless, since it is easy to adapt = the

mechanism = to what Guile needs.

 

No =E2=80=93 the argument is about how it is impossible to adapt the mech= anism to Guile, since bytes aren=E2=80=99t characters in Unicode.

 

> >From the Emacs manual= :

>

> >For example= , you can access individual characters in a string using the function = aref (see Functions that Operate on Arrays).

>

> Thus, (aref the-string index) is the = equivalent of (string-ref the-string index).

 

>No, because a raw byte is not a character.

 =

Yes, because characters = are characters. Both string-ref and aref return characters. This is documen= ted in both the Emacs and Guile manual:

 

= Again, from the Emacs manual:

 

> A string is a fixed sequence of characters. [...] Since strings are arrays= , and therefore sequences as well, you can operate on them with the general= array and sequence functions documented in = Sequences, Arrays, and Ve= ctors. For example, y= ou can access individual characters in a string using the function aref (see&= nbsp;Fun= ctions that Operate on Arrays).

 

Hence, (aref the-string index) returns (Emacs) ch= aracters.

 

Likewise, from the Guile manual:

 

> Scheme Procedure: string-ref str= k

>C Function: scm_s= tring_ref (str, k)<= o:p>

Return character k of = ;str using zero-origin indexing. k mus= t be a valid index of str.

 

Clearly, these are equivalent (modulo difference in t= he meaning of =E2=80=98characters=E2=80=99).

 

>If Guile restricts itself to Unicode character= s and only them, it will

lack important features.=C2=A0 So my suggestion is not to have t= his

restrictio= n.

 =

Guile restricting = strings to Unicode _is_ an important feature (simplicity, and compat= ibility).

 

Guile exten= ding strings beyond Unicode is a _limitation_ (compatibility and oth= er trickiness for applications).

=  

I could imagine in the far future there might be too little cod= epoints left in Unicode, in which case the range of what Guile (and more ge= nerally, Scheme and Unicode) considers characters needs to be extended (eve= n if that has some compatibility implicaitons), but that time hasn=E2=80=99= t arrived yet.

 

The im= portant feature of this thread, is supporting file names (and getenv stuff,= etc.) that doesn=E2=80=99t fit properly in the =E2=80=98string=E2=80=99 mo= del. As mentioned earlier (in the initial message, even), there are solutio= ns to that do not impose the =E2=80=98let characters go beyond Unicode=E2= =80=99 limitation.

 

&g= t;I think the fact that this discussion is held, and that Rob suggested

to use Latin-1 fo= r the purpose of supporting raw bytes is a clear

indication that Guile, too, needs to dea= l with "character-like" data

that does not fit the Unicode framework.

 

True, and I never claimed oth= erwise.

&= nbsp;

> So I thi= nk saying that strings in Guile can only hold Unicode characters will not g= ive you what this discussion attempts to give.

 

Sure, and I wasn=E2=80=99t trying to. What I (and= IIUC, the other person as well) was doing was mentioning how neither the E= macs=E2=80=99s thing is a solution. (Whether because of backwards compatibi= lity, or whether because of not _wanting_ to conflate bytes with cha= racters (and not wanting to go beyond Unicode) with all the consequences th= is conflation would imply for applications.)

 

> In particular, how will you=

handle the situations described= by Rob where a file has a name that is

not a valid UTF-8 sequence (thus not "charac= ters" as long as you

interpret text as UTF-8)?

 

Scheme does not interpret text as UTF-8, that=E2=80=99s an= internal implementation detail and a matter of things like locales. Instea= d, to Scheme text is (Unicode) characters.

 

I have outlined a solution (that does not conflate ch= aracters with bytes) in another response. IIRC, it was in a response so Rob= . I would propose actually, you know, reading it. I=E2=80=99m not sure, but= IIRC Rob also mentioned another solution (i.e., just accept bytevectors in= some locations, or do Latin-1).

=  

Also, this structure makes no sense. Even if I did not provide = an alternative solution of my own, that wouldn=E2=80=99t mean Emacs=E2=80= =99s thing is the answer. (Negative) criticism can be valid without providi= ng alternatives.

 

Best= regards,

Maxi= me Devos.

= --_D92A67A6-BA24-4E64-8561-CBC64E4D06EE_--