From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Nala Ginrut Newsgroups: gmane.lisp.guile.devel Subject: Re: [PATCH v2] rdelim: Add new procedure `for-line-in-file`. Date: Mon, 16 Dec 2024 20:06:41 +0900 Message-ID: References: <0c0bc19c-9b46-4da7-b200-3de0355949fa@disroot.org> <20241216111722.pNHK2D00E1dDhme01NHKfE@andre.telenet-ops.be> <20241216115207.pNs52D00j1dDhme01Ns6nW@baptiste.telenet-ops.be> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000875bae0629612db0" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="34052"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Adam Faiz , "guile-devel@gnu.org" , Ricardo Wurmus To: Maxime Devos Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Mon Dec 16 12:07:13 2024 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tN8wT-0008kL-24 for guile-devel@m.gmane-mx.org; Mon, 16 Dec 2024 12:07:13 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tN8wD-0004gK-94; Mon, 16 Dec 2024 06:06:57 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tN8wB-0004g4-Uc for guile-devel@gnu.org; Mon, 16 Dec 2024 06:06:55 -0500 Original-Received: from mail-pl1-x635.google.com ([2607:f8b0:4864:20::635]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1tN8wA-0007Jd-3w for guile-devel@gnu.org; Mon, 16 Dec 2024 06:06:55 -0500 Original-Received: by mail-pl1-x635.google.com with SMTP id d9443c01a7336-216401de828so30546835ad.3 for ; Mon, 16 Dec 2024 03:06:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734347213; x=1734952013; darn=gnu.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=10knmWThUM0pR65FXhGtQUvNcLE466QluEb5QetmZV0=; b=RT7FHrpH7egCTRWHU60RWXtHo6RVdU1AEOA/lZV99luaV+LreQWoU/W48UISgsqqCG LL4CXRjaoj6AqzRXDYEU9hHxk108Rs3kf6P0jtH/x355AXQXKWg1YQ4003xAK3pjYrVj /Z/KSdpJiEUJ5lB/e8amNOysNdHaQ33L6wfacnTXLsoXNf++VAjjFunP+ozXVpvIywAN taZZwha3bMYJyMgyzvosChm316qdxg30dBfVfWilHa5arhtZtp7vlfIf/06mt+F8L6kS eUQAL+OUXk9ltUSVnuCBvspLV4jxNSpVdivApR4O4xbHd+Tlc0KHqxU9IqxPQe3NMwT1 0vXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734347213; x=1734952013; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=10knmWThUM0pR65FXhGtQUvNcLE466QluEb5QetmZV0=; b=glnQW6YKdehddsUCy61v9I6GOco8dEp4eXcDbAQ+6LQyD/AV3NypZ4EMthWh8VPdyE 7UYe2XUZZHVKikPJTyMlyGJojmeAwsxNwvDWa5yPjyVylJw81bM0KBmVVZvAFkkYHLnf Z8Dj9bdlTGOSl+Vnb/fryii0wW9KVMsBmwD1AH7nAwwkyrLgcI/1NlZoPC19CV8XMdRF u1DWY5pDwMsngS8sgHQGf4LFlIsqtdqe8XYqERbKnEEyUIj/4UvIF5cYRqFXfga0RbPh Hd5jIpjmcTabDsVbO6Vug1fSDNMpQdjBAs1+qtj28AppUzPYg5BygHmi/DvwCcVgwFGL UCPQ== X-Forwarded-Encrypted: i=1; AJvYcCVCNN8ij/POP6wt34MESsdDA6GxybDy8/ILqDpFGozv0b2MeEomJLuhBCbOcEVMzJ5pV+yCMbafXx6DQg==@gnu.org X-Gm-Message-State: AOJu0Yw0HuieFkDqk45c+lRghHICTXUVNRs/M7hQ0xIERcknavWnGnEG yDMfEyfhA/iVbOZLAksFEt/B5IBjTfRoikieyHbPwej52VkiozywF4Y9vocGFhSGHOO0YrTsk0P pPI0aL+2Rp9D51ALC3+J1DISA6UQ= X-Gm-Gg: ASbGncuvRrfVNu9bMcMzcERcpSkCXBfNlEFsFaBe9TK/JGRW6r8dT8zX1wJf1dtLSGn 5Ds1eWuPruaOc+vk6GwVntrNHiytyEW2kBN93ww== X-Google-Smtp-Source: AGHT+IGSrD70bHwO3K9tS/EEA1253Uh3U1Shax/Qc1MDrSM7jZ4JUe83+PEDUz80rrDwOxsIxxIwieYhsD/bFKeb2pI= X-Received: by 2002:a17:902:d50c:b0:216:2426:7666 with SMTP id d9443c01a7336-2189298baccmr167184215ad.12.1734347212673; Mon, 16 Dec 2024 03:06:52 -0800 (PST) In-Reply-To: <20241216115207.pNs52D00j1dDhme01Ns6nW@baptiste.telenet-ops.be> Received-SPF: pass client-ip=2607:f8b0:4864:20::635; envelope-from=nalaginrut@gmail.com; helo=mail-pl1-x635.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.lisp.guile.devel:22825 Archived-At: --000000000000875bae0629612db0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I'm not going to pursued anyone here, just sharing my opinion about a patch for line parsing in a text file.=F0=9F=98=84 Yes, some of the parsers don need backwards, but it also doesn't mean others parsers have priority to occupy a general API. To my experience, a parser author like me prefer to write own parser from scratch for many reasons, rather than finding ans adapting to a general function buried deep under document. Here's my opinion, as a parser writer, I have no interest to use it, but I can say it still looks beautiful from functional programming perspective. Beauty is still a value worth to go. I can agree with this. But I don't use this function. Of course I speak only for myself as a potential existing user. So if we can be back to the original topic, a patch for parsing text line function that can be understand and reviewex easy by most people. I believe the author's effort and my suggestions are well enough to go. =F0=9F=98=84 This doesn't mean I against your idea. Best regards. On Mon, Dec 16, 2024, 19:52 Maxime Devos wrote: > >You raised this topic from string line reading to more general case. =F0= =9F=98=84 > > > > Yes. > > >If so, the best way could be providing a general function to wrap rdelim > for for-each-seg-delim, users may pass a delimiter to decide how to delim > (even for bytevectors), and implement for-line-a-file base on it with > unicode encoding. > > No, this is worse. This limits the procedure to delimiting. > > >However, personally I dare to doubt if we really need such general > function, since general parsing may require looking backwards. This impli= es > the char based checking delimiter will be more general. > If we don't consider this, it's better to just consider strings with > proper encoding to avoid over engineering. > > 'general parsing may require looking backwards=E2=80=99 does not imply = =E2=80=98don=E2=80=99t need > such general function=E2=80=99, and =E2=80=98don=E2=80=99t need such gene= ral function=E2=80=99 doesn=E2=80=99t > imply =E2=80=98such a general function wouldn=E2=80=99t be useful=E2=80= =99. Many parsers don=E2=80=99t need > looking backwards. E.g., see all examples I mentioned in my mail. > > Also, you =E2=80=98general=E2=80=99 isn=E2=80=99t my =E2=80=98general=E2= =80=99. Nowhere did I limit the procedure > to delimiters and character-based things. > > For an example on how this could be used: in Scheme-GNUnet, a fiber is > waiting on messages on some (stream) socket (SOCK_STREAM, not SOCK_DGRAM)= . > Each message consists of =E2=80=98message type field + packet size field = + > information=E2=80=99. So, the fiber effectively iterates over all message= s on the > stream. The parser first reads the type and size (no looking backwards > needed), then reads the remaining information and passes this information > to the type-specific parser, which can now be acted upon. > > (IIRC, the control flow is technically a bit different (let loop with > arguments, to avoid mutation, not that it really matters since there is > state anyway in the ports), but it could have been implemented like this > instead.) > > Copy of list of examples that don=E2=80=99t need looking backwards: > > This is overly specific to reading lines, and reading lines with rdelim. > If you replace =E2=80=98read-line=E2=80=99 by an argument, the procedure = becomes more > general. For example, by passing =E2=80=98get-char=E2=80=99 you can act o= n each character, > with =E2=80=98get-line=E2=80=99 I=E2=80=99m not sure what the difference = would be, but apparently > it=E2=80=99s not =E2=80=98read-line=E2=80=99 (?), if you give it a JSON r= eading+parsing proedure > you iterate over all JSON objects, with get-u8 you iterate over bytes etc= .. > > Best regards, > Maxime Devos > --000000000000875bae0629612db0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

I'm not going to pursued anyone here, just sharing my op= inion about a patch for line parsing in a text file.=F0=9F=98=84

Yes, some of the parsers don need backwards, but it also doe= sn't mean others parsers have priority to occupy a general API.

To my experience, a parser author like me prefer to write ow= n parser from scratch for many reasons, rather than finding ans adapting to= a general function buried deep under document.

Here's my opinion, as a parser writer, I have no interes= t to use it, but I can say it still looks beautiful from functional program= ming perspective. Beauty is still a value worth to go. I can agree with thi= s. But I don't use this function.

Of course I speak only for myself as a potential existing us= er.

So if we can be back to the original topic, a patch for pars= ing text line function that can be understand and reviewex easy by most peo= ple. I believe the author's effort and my suggestions are well enough t= o go. =F0=9F=98=84

This doesn't mean I against your idea.

Best regards.

On Mon, Dec 16, 2024, 19:52 Maxime Devos <maximedevos@telenet.be> wrote:
=

>You raised this topic from = string line reading to more general case. =F0=9F=98=84<= u>

=C2=A0

Yes.

>If so, the best way could be provi= ding a general function to wrap rdelim for for-each-seg-delim, users may pa= ss a delimiter to decide how to delim (even for bytevectors), and implement= for-line-a-file base on it with unicode encoding.

=

No, this is worse. This limits the procedure to del= imiting.

>However, perso= nally I dare to doubt if we really need such general function, since genera= l parsing may require looking backwards. This implies the char based checki= ng delimiter will be more general.
If we don't consider this, it'= ;s better to just consider strings with proper encoding to avoid over engin= eering.

'general parsi= ng may require looking backwards=E2=80=99 does not imply =E2=80=98don=E2=80= =99t need such general function=E2=80=99, and =E2=80=98don=E2=80=99t need s= uch general function=E2=80=99 doesn=E2=80=99t imply =E2=80=98such a general= function wouldn=E2=80=99t be useful=E2=80=99. Many parsers don=E2=80=99t n= eed looking backwards. E.g., see all examples I mentioned in my mail.

Also, you =E2=80=98general=E2=80= =99 isn=E2=80=99t my =E2=80=98general=E2=80=99. Nowhere did I limit the pro= cedure to delimiters and character-based things.

For an example on how this could be used: in Scheme-G= NUnet, a fiber is waiting on messages on some (stream) socket (SOCK_STREAM,= not SOCK_DGRAM). Each message consists of =E2=80=98message type field + pa= cket size field + information=E2=80=99. So, the fiber effectively iterates = over all messages on the stream. The parser first reads the type and size (= no looking backwards needed), then reads the remaining information and pass= es this information to the type-specific parser, which can now be acted upo= n.

(IIRC, the control flow = is technically a bit different (let loop with arguments, to avoid mutation,= not that it really matters since there is state anyway in the ports), but = it could have been implemented like this instead.)

=

Copy of list of examples that don=E2=80=99t need lo= oking backwards:

This is overly specific to reading lines, an= d reading lines with rdelim. If you replace =E2=80=98read-line=E2=80=99 by = an argument, the procedure becomes more general. For example, by passing = =E2=80=98get-char=E2=80=99 you can act on each character, with =E2=80=98get= -line=E2=80=99 I=E2=80=99m not sure what the difference would be, but appar= ently it=E2=80=99s not =E2=80=98read-line=E2=80=99 (?), if you give it a JS= ON reading+parsing proedure you iterate over all JSON objects, with get-u8 = you iterate over bytes etc..

Best regards,
Maxim= e Devos

--000000000000875bae0629612db0--