From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Nala Ginrut Newsgroups: gmane.lisp.guile.devel Subject: Re: [PATCH v2] rdelim: Add new procedure `for-line-in-file`. Date: Mon, 16 Dec 2024 19:29:15 +0900 Message-ID: References: <0c0bc19c-9b46-4da7-b200-3de0355949fa@disroot.org> <20241216111722.pNHK2D00E1dDhme01NHKfE@andre.telenet-ops.be> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000a67c52062960a74d" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="6589"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Adam Faiz , "guile-devel@gnu.org" , Ricardo Wurmus To: Maxime Devos Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Mon Dec 16 11:29:53 2024 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tN8ML-0001Yv-Ko for guile-devel@m.gmane-mx.org; Mon, 16 Dec 2024 11:29:53 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tN8M0-0000O1-NM; Mon, 16 Dec 2024 05:29:32 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tN8Ly-0000NT-1y for guile-devel@gnu.org; Mon, 16 Dec 2024 05:29:30 -0500 Original-Received: from mail-pl1-x62f.google.com ([2607:f8b0:4864:20::62f]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1tN8Lw-0002vh-5x for guile-devel@gnu.org; Mon, 16 Dec 2024 05:29:29 -0500 Original-Received: by mail-pl1-x62f.google.com with SMTP id d9443c01a7336-21670dce0a7so42619325ad.1 for ; Mon, 16 Dec 2024 02:29:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734344967; x=1734949767; darn=gnu.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=FnsKvr56K9zNNShKqA1ClJAzuSHZaA9BBUsWUzBhjwk=; b=UOzSo0a2DlrditBr1rR4BsAExw4PNGkcxpK5uXPNIjLdub1uh43zhd26shHr3/4F9E GGcXIy1FW2oePNKPo839MoMeJe6ZBewvR7a3mpmYw5tOuZ5eUk09oCA4DvtL+7Koz+0r vLfhVsbCygGbJfr22v7NMr0eysqk6flSGTkE/QmIANIFCjzRxgh2234kVh8ziDQnc8Cw dA7BINwo/5jm4FoSuCpmogTmog1FSvxAW436FuLNIqYLmrXBQedoEDKN0z+o43JjHE2U s0l1x1UENaReTKrWDjALMYiVt0Yt1Y26tWkwrnkWx09xaIwptr+iZhIo1NeBUF4f3oi9 XmYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734344967; x=1734949767; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=FnsKvr56K9zNNShKqA1ClJAzuSHZaA9BBUsWUzBhjwk=; b=sbMrTtaPkrGCUuI265Yrsx5brEdswcBiQapWyDZRjKYnTEHADuBcDHev6BYkKEC8Dc cV6whRBvaGe4FzjCJXK/Ub+gd5L7H0NNgkbzPUTkrYFrHqzjHaSLadY84A3s4dhNj8OG n2jmMm7m/Pl2az894d3bsQ4QVqGQu6IVyafkMsmQzcw0E5DmxQkvmg0GSkYZmQGZ3kBa 98A89NYazX2/cDsFDsc+/z6NonOXt5D+YZQK6FJ8lZ5YC0ANMxRiOMq4utrqneejHXph 4+tzdeHpiakWt5WOa2anUzP5g/mIsdVTS8KamOL2XcRrOxyHDE8NIfJXIiwkWjDPsLvZ H9qg== X-Forwarded-Encrypted: i=1; AJvYcCV+hjKUHOwlOw/HBnTdTur17eGycyFryNM48ewu/P96e5B+Z0vs91SU+9o7QuFzjiqx1Wky3tbfl6tClA==@gnu.org X-Gm-Message-State: AOJu0YwqEHKzccrYlxsf3xAcwfwsWS0bc6T0XyMSzXdKK3ZXnoRQiPuD xMXPAxJWg8dl9aEAElqwJkuwVWP36SfoYFXMK1RBYHqtqibFEinR7T0XxX5PYJu7oB4lVZ1dWrC 1ol9e5QIZVOZMnjN7RLh7F548A5zAtw== X-Gm-Gg: ASbGncvFpt25tUjOKQN23CZjmUzpzVmBq/hDQQqSpKsjLitbrMPSA7j91kEgCL3ZYon h1puuvVGkaAyZ6Hps+tDt8NPS9P2HIUxR3Kl0UA== X-Google-Smtp-Source: AGHT+IFvScK3s9kFj2ydc446Se6iT3e9iJ0Mb/SgX142CXW4QTAZR/bu8zTxHLtsB5KVcPxbp4ePG28gFylnNRxeoC8= X-Received: by 2002:a17:902:e752:b0:215:b01a:627f with SMTP id d9443c01a7336-2189298169cmr166291635ad.4.1734344966565; Mon, 16 Dec 2024 02:29:26 -0800 (PST) In-Reply-To: <20241216111722.pNHK2D00E1dDhme01NHKfE@andre.telenet-ops.be> Received-SPF: pass client-ip=2607:f8b0:4864:20::62f; envelope-from=nalaginrut@gmail.com; helo=mail-pl1-x62f.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.lisp.guile.devel:22823 Archived-At: --000000000000a67c52062960a74d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable You raised this topic from string line reading to more general case. =F0=9F= =98=84 If so, the best way could be providing a general function to wrap rdelim for for-each-seg-delim, users may pass a delimiter to decide how to delim (even for bytevectors), and implement for-line-a-file base on it with unicode encoding. However, personally I dare to doubt if we really need such general function, since general parsing may require looking backwards. This implies the char based checking delimiter will be more general. If we don't consider this, it's better to just consider strings with proper encoding to avoid over engineering. Best regards. On Mon, Dec 16, 2024, 19:17 Maxime Devos wrote: > > > This is overly specific to reading lines, and reading lines with rdelim. > If you replace =E2=80=98read-line=E2=80=99 by an argument, the procedure = becomes more > general. For example, by passing =E2=80=98get-char=E2=80=99 you can act o= n each character, > with =E2=80=98get-line=E2=80=99 I=E2=80=99m not sure what the difference = would be, but apparently > it=E2=80=99s not =E2=80=98read-line=E2=80=99 (?), if you give it a JSON r= eading+parsing proedure > you iterate over all JSON objects, with get-u8 you iterate over bytes etc= .. > > > > You could then define =E2=80=98for-line-in-file=E2=80=99 (for-line-in-por= t?) as a special > case of the more general procedure. > > > > This generalisation also allows for setting =E2=80=98handle-delim=E2=80= =99, which > currently you are not allowing (the user still shouldn=E2=80=99t choose = =E2=80=98split=E2=80=99 > though). > > > > I=E2=80=99m not sure where the general port interface should be, maybe in > https://www.gnu.org/software/guile/manual/html_node/Ports.html? > > > > (Slightly more general is to also move eof-object? into an argument, but > that seems too much generalisation. OTOH, it allows for =E2=80=98split=E2= =80=99.) > > > > Also, I=E2=80=99d rather keep opening files out of the procedure =E2=80= =93 avoids the text > encoding issues (mentioned by Nala Ginrut), also convenient for sandboxed > environment that don=E2=80=99t want to give access to the file system (or= , at > least, only use a special file opening procedure that does additional > checks), and avoids conflation of file names with files and files with > ports. (As written, it=E2=80=99s for file ports, but as implemented, it c= an be > meaningfully used for other ports as well (e.g. networking sockets).) (Al= so > the user might want to set CLOEXEC or other flags, or uncompress input, = =E2=80=A6) > > > > Also, documentation is missing. > > > > Best regards, > Maxime Devos > --000000000000a67c52062960a74d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

You raised this topic from string line reading to more gener= al case. =F0=9F=98=84

If so, the best way could be providing a general function to= wrap rdelim for for-each-seg-delim, users may pass a delimiter to decide h= ow to delim (even for bytevectors), and implement for-line-a-file base on i= t with unicode encoding.

However, personally I dare to doubt if we really need such g= eneral function, since general parsing may require looking backwards. This = implies the char based checking delimiter will be more general.
If we don't consider this, it's better to just consider strings wit= h proper encoding to avoid over engineering.

Best regards.

On Mon, Dec 16, 2024, 19:17 Maxime Devos <maximedevos@telenet.be> wrote:

=C2=A0

This is overly specific to re= ading lines, and reading lines with rdelim. If you replace =E2=80=98read-li= ne=E2=80=99 by an argument, the procedure becomes more general. For example= , by passing =E2=80=98get-char=E2=80=99 you can act on each character, with= =E2=80=98get-line=E2=80=99 I=E2=80=99m not sure what the difference would = be, but apparently it=E2=80=99s not =E2=80=98read-line=E2=80=99 (?), if you= give it a JSON reading+parsing proedure you iterate over all JSON objects,= with get-u8 you iterate over bytes etc..

=C2=A0

You could then define =E2=80=98for-line= -in-file=E2=80=99 (for-line-in-port?) as a special case of the more general= procedure.

=C2=A0

This generalisation also allows for setting =E2=80=98handle-delim=E2= =80=99, which currently you are not allowing (the user still shouldn=E2=80= =99t choose =E2=80=98split=E2=80=99 though).

=C2=A0

I=E2=80=99m not sure where the genera= l port interface should be, maybe in h= ttps://www.gnu.org/software/guile/manual/html_node/Ports.html?

=C2=A0

(Slightly more = general is to also move eof-object? into an argument, but that seems too mu= ch generalisation. OTOH, it allows for =E2=80=98split=E2=80=99.)<= /u>

=C2=A0<= /u>

Also, I=E2=80=99d= rather keep opening files out of the procedure =E2=80=93 avoids the text e= ncoding issues (mentioned by Nala Ginrut), also convenient for sandboxed en= vironment that don=E2=80=99t want to give access to the file system (or, at= least, only use a special file opening procedure that does additional chec= ks), and avoids conflation of file names with files and files with ports. (= As written, it=E2=80=99s for file ports, but as implemented, it can be mean= ingfully used for other ports as well (e.g. networking sockets).) (Also the= user might want to set CLOEXEC or other flags, or uncompress input, =E2=80= =A6)

=C2=A0

Al= so, documentation is missing.

=C2=A0

Best regards,
Maxime Devos

--000000000000a67c52062960a74d--