From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Maxime Devos Newsgroups: gmane.lisp.guile.devel Subject: RE: [PATCH v2] rdelim: Add new procedure `for-line-in-file`. Date: Mon, 16 Dec 2024 11:52:06 +0100 Message-ID: <20241216115207.pNs52D00j1dDhme01Ns6nW@baptiste.telenet-ops.be> References: <0c0bc19c-9b46-4da7-b200-3de0355949fa@disroot.org> <20241216111722.pNHK2D00E1dDhme01NHKfE@andre.telenet-ops.be> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="_758A01DA-3435-4456-A89A-FFE8ADDF275C_" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="13637"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Adam Faiz , "guile-devel@gnu.org" , Ricardo Wurmus To: Nala Ginrut Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Mon Dec 16 11:52:38 2024 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tN8iM-0003Pz-0t for guile-devel@m.gmane-mx.org; Mon, 16 Dec 2024 11:52:38 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tN8hz-0005xH-94; Mon, 16 Dec 2024 05:52:15 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tN8hw-0005q8-KD for guile-devel@gnu.org; Mon, 16 Dec 2024 05:52:13 -0500 Original-Received: from baptiste.telenet-ops.be ([2a02:1800:120:4::f00:13]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1tN8hu-0005bY-DI for guile-devel@gnu.org; Mon, 16 Dec 2024 05:52:12 -0500 Original-Received: from [IPv6:2a02:1811:8c0e:ef00:2ca3:f220:8d70:df09] ([IPv6:2a02:1811:8c0e:ef00:2ca3:f220:8d70:df09]) by baptiste.telenet-ops.be with cmsmtp id pNs52D00j1dDhme01Ns6nW; Mon, 16 Dec 2024 11:52:07 +0100 Importance: normal X-Priority: 3 In-Reply-To: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=telenet.be; s=r24; t=1734346327; bh=dFNcCbLippVMSdSDnGXnAfesv+hjf+93jFv5q1wX9ig=; h=Message-ID:MIME-Version:To:Cc:From:Subject:Date:In-Reply-To: References:Content-Type:From; b=bYoYp1q+lrbb+LWrRohIEpQqp2SaDz6mb4d4k3AEJeS8yYfh7v0Xbwj1HEIUJizd1 c5g2GlAPlTZq61gDUBdnqNCQVdbRRMI1CcuZSxHGf8G+EWHpM1zd2JXVL7pa1DxC70 PEGemQlpPffgBItj2TK9tCs2Msd/6ydZrzGE84PA4uWMa+tbfbo3qx+fLBNH0Ibu6t xRtWUWojCm5sbsdnXE2KI62u+1ezrQJQn5puvp3qKUjkxoiF414QOJnsKKOMIDKqTN A+JhIoOiJ+XDNVJNzraUzAz7c+D0tNJcxRB4INlBqyGQl9EuQrQKP6JjTIZAIb0ziF IBwVT1RQeaYrg== Received-SPF: pass client-ip=2a02:1800:120:4::f00:13; envelope-from=maximedevos@telenet.be; helo=baptiste.telenet-ops.be X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.lisp.guile.devel:22824 Archived-At: --_758A01DA-3435-4456-A89A-FFE8ADDF275C_ Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" >You raised this topic from string line reading to more general case. =F0= =9F=98=84 Yes. >If so, the best way could be providing a general function to wrap rdelim f= or for-each-seg-delim, users may pass a delimiter to decide how to delim (e= ven for bytevectors), and implement for-line-a-file base on it with unicode= encoding. No, this is worse. This limits the procedure to delimiting. >However, personally I dare to doubt if we really need such general functio= n, since general parsing may require looking backwards. This implies the ch= ar based checking delimiter will be more general. If we don't consider this, it's better to just consider strings with proper= encoding to avoid over engineering.=20 'general parsing may require looking backwards=E2=80=99 does not imply =E2= =80=98don=E2=80=99t need such general function=E2=80=99, and =E2=80=98don= =E2=80=99t need such general function=E2=80=99 doesn=E2=80=99t imply =E2=80= =98such a general function wouldn=E2=80=99t be useful=E2=80=99. Many parser= s don=E2=80=99t need looking backwards. E.g., see all examples I mentioned = in my mail. Also, you =E2=80=98general=E2=80=99 isn=E2=80=99t my =E2=80=98general=E2=80= =99. Nowhere did I limit the procedure to delimiters and character-based th= ings. For an example on how this could be used: in Scheme-GNUnet, a fiber is wait= ing on messages on some (stream) socket (SOCK_STREAM, not SOCK_DGRAM). Each= message consists of =E2=80=98message type field + packet size field + info= rmation=E2=80=99. So, the fiber effectively iterates over all messages on t= he stream. The parser first reads the type and size (no looking backwards n= eeded), then reads the remaining information and passes this information to= the type-specific parser, which can now be acted upon. (IIRC, the control flow is technically a bit different (let loop with argum= ents, to avoid mutation, not that it really matters since there is state an= yway in the ports), but it could have been implemented like this instead.) Copy of list of examples that don=E2=80=99t need looking backwards: This is overly specific to reading lines, and reading lines with rdelim. If= you replace =E2=80=98read-line=E2=80=99 by an argument, the procedure beco= mes more general. For example, by passing =E2=80=98get-char=E2=80=99 you ca= n act on each character, with =E2=80=98get-line=E2=80=99 I=E2=80=99m not su= re what the difference would be, but apparently it=E2=80=99s not =E2=80=98r= ead-line=E2=80=99 (?), if you give it a JSON reading+parsing proedure you i= terate over all JSON objects, with get-u8 you iterate over bytes etc.. Best regards, Maxime Devos --_758A01DA-3435-4456-A89A-FFE8ADDF275C_ Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="utf-8"

>You raised this topic from string line reading to more ge= neral case. 😄

 

Yes.=

>If so, the best way could be providing a gene= ral function to wrap rdelim for for-each-seg-delim, users may pass a delimi= ter to decide how to delim (even for bytevectors), and implement for-line-a= -file base on it with unicode encoding.

No, this is worse. This limits the procedure to delimiting.

>However, personally I dare to doub= t if we really need such general function, since general parsing may requir= e looking backwards. This implies the char based checking delimiter will be= more general.
If we don't consider this, it's better to just consider s= trings with proper encoding to avoid over engineering.

'general parsing may require looking backwards=E2= =80=99 does not imply =E2=80=98don=E2=80=99t need such general function=E2= =80=99, and =E2=80=98don=E2=80=99t need such general function=E2=80=99 does= n=E2=80=99t imply =E2=80=98such a general function wouldn=E2=80=99t be usef= ul=E2=80=99. Many parsers don=E2=80=99t need looking backwards. E.g., see a= ll examples I mentioned in my mail.

Also, you =E2=80=98general=E2=80=99 isn=E2=80=99t my =E2=80=98general= =E2=80=99. Nowhere did I limit the procedure to delimiters and character-ba= sed things.

For an example on ho= w this could be used: in Scheme-GNUnet, a fiber is waiting on messages on s= ome (stream) socket (SOCK_STREAM, not SOCK_DGRAM). Each message consists of= =E2=80=98message type field + packet size field + information=E2=80=99. So= , the fiber effectively iterates over all messages on the stream. The parse= r first reads the type and size (no looking backwards needed), then reads t= he remaining information and passes this information to the type-specific p= arser, which can now be acted upon.

(IIRC, the control flow is technically a bit different (let loop with a= rguments, to avoid mutation, not that it really matters since there is stat= e anyway in the ports), but it could have been implemented like this instea= d.)

Copy of list of examples tha= t don=E2=80=99t need looking backwards:

This is overly specific to reading lines, and readin= g lines with rdelim. If you replace =E2=80=98read-line=E2=80=99 by an argum= ent, the procedure becomes more general. For example, by passing =E2=80=98g= et-char=E2=80=99 you can act on each character, with =E2=80=98get-line=E2= =80=99 I=E2=80=99m not sure what the difference would be, but apparently it= =E2=80=99s not =E2=80=98read-line=E2=80=99 (?), if you give it a JSON readi= ng+parsing proedure you iterate over all JSON objects, with get-u8 you iter= ate over bytes etc..

Best regards,
Maxime Devos

= --_758A01DA-3435-4456-A89A-FFE8ADDF275C_--