unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* [PATCH v2] rdelim: Add new procedure `for-line-in-file`.
@ 2024-12-16  6:14 Adam Faiz
  2024-12-16  7:41 ` Nala Ginrut
  2024-12-16 10:17 ` Maxime Devos
  0 siblings, 2 replies; 8+ messages in thread
From: Adam Faiz @ 2024-12-16  6:14 UTC (permalink / raw)
  To: guile-devel; +Cc: Nala Ginrut, Ricardo Wurmus, Maxime Devos

From c8a9904f1b1c09d148de1ec23dc2eb0d433b3141 Mon Sep 17 00:00:00 2001
From: AwesomeAdam54321 <adam.faiz@disroot.org>
Date: Sun, 15 Dec 2024 23:48:30 +0800
Subject: [PATCH v2] rdelim: Add new procedure `for-line-in-file`.

* module/ice-9/rdelim.scm (for-line-in-file): Add it.

This procedure makes it convenient to do per-line processing of a text
file.
---
 module/ice-9/rdelim.scm | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/module/ice-9/rdelim.scm b/module/ice-9/rdelim.scm
index d2cd081d7..9b20c99cb 100644
--- a/module/ice-9/rdelim.scm
+++ b/module/ice-9/rdelim.scm
@@ -23,7 +23,8 @@
 ;;; similar to (scsh rdelim) but somewhat incompatible.
 
 (define-module (ice-9 rdelim)
-  #:export (read-line
+  #:export (for-line-in-file
+            read-line
             read-line!
             read-delimited
             read-delimited!
@@ -206,3 +207,20 @@ characters to read.  By default, there is no limit."
 	      line)
       (else
        (error "unexpected handle-delim value: " handle-delim)))))
+
+(define (for-line-in-file file proc)
+  "Call PROC for every line in FILE until the eof-object is reached.
+FILE can either be a filename string or an already opened input port.
+The corresponding port is closed upon completion.
+
+The line provided to PROC is guaranteed to be a string."
+  (let ((port
+        (if (input-port? file)
+            file
+            (open-input-file file))))
+    (let loop ((line (read-line port)))
+      (cond ((eof-object? line)
+             (close-port port))
+            (else
+             (proc line)
+             (loop (read-line port)))))))
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] rdelim: Add new procedure `for-line-in-file`.
  2024-12-16  6:14 [PATCH v2] rdelim: Add new procedure `for-line-in-file` Adam Faiz
@ 2024-12-16  7:41 ` Nala Ginrut
  2024-12-16 10:17 ` Maxime Devos
  1 sibling, 0 replies; 8+ messages in thread
From: Nala Ginrut @ 2024-12-16  7:41 UTC (permalink / raw)
  To: Adam Faiz; +Cc: guile-devel, Ricardo Wurmus, Maxime Devos

[-- Attachment #1: Type: text/plain, Size: 352 bytes --]

Hi Adam!
For string manipulation, the proper encoding has to be considered. It's
better to use the provided [#:guess-encoding=#f] [#:encoding=#f] as well.
You may take a look at call-with-input-file in document:
https://www.gnu.org/software/guile/manual/html_node/File-Ports.html

BTW, in your case, you can ignore the binary mode.

Best regards.

>
>

[-- Attachment #2: Type: text/html, Size: 1177 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH v2] rdelim: Add new procedure `for-line-in-file`.
  2024-12-16  6:14 [PATCH v2] rdelim: Add new procedure `for-line-in-file` Adam Faiz
  2024-12-16  7:41 ` Nala Ginrut
@ 2024-12-16 10:17 ` Maxime Devos
  2024-12-16 10:29   ` Nala Ginrut
  1 sibling, 1 reply; 8+ messages in thread
From: Maxime Devos @ 2024-12-16 10:17 UTC (permalink / raw)
  To: Adam Faiz, guile-devel@gnu.org; +Cc: Nala Ginrut, Ricardo Wurmus

[-- Attachment #1: Type: text/plain, Size: 1683 bytes --]


This is overly specific to reading lines, and reading lines with rdelim. If you replace ‘read-line’ by an argument, the procedure becomes more general. For example, by passing ‘get-char’ you can act on each character, with ‘get-line’ I’m not sure what the difference would be, but apparently it’s not ‘read-line’ (?), if you give it a JSON reading+parsing proedure you iterate over all JSON objects, with get-u8 you iterate over bytes etc..

You could then define ‘for-line-in-file’ (for-line-in-port?) as a special case of the more general procedure.

This generalisation also allows for setting ‘handle-delim’, which currently you are not allowing (the user still shouldn’t choose ‘split’ though).

I’m not sure where the general port interface should be, maybe in https://www.gnu.org/software/guile/manual/html_node/Ports.html?

(Slightly more general is to also move eof-object? into an argument, but that seems too much generalisation. OTOH, it allows for ‘split’.)

Also, I’d rather keep opening files out of the procedure – avoids the text encoding issues (mentioned by Nala Ginrut), also convenient for sandboxed environment that don’t want to give access to the file system (or, at least, only use a special file opening procedure that does additional checks), and avoids conflation of file names with files and files with ports. (As written, it’s for file ports, but as implemented, it can be meaningfully used for other ports as well (e.g. networking sockets).) (Also the user might want to set CLOEXEC or other flags, or uncompress input, …)

Also, documentation is missing.

Best regards,
Maxime Devos

[-- Attachment #2: Type: text/html, Size: 3781 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] rdelim: Add new procedure `for-line-in-file`.
  2024-12-16 10:17 ` Maxime Devos
@ 2024-12-16 10:29   ` Nala Ginrut
  2024-12-16 10:52     ` Maxime Devos
  0 siblings, 1 reply; 8+ messages in thread
From: Nala Ginrut @ 2024-12-16 10:29 UTC (permalink / raw)
  To: Maxime Devos; +Cc: Adam Faiz, guile-devel@gnu.org, Ricardo Wurmus

[-- Attachment #1: Type: text/plain, Size: 2535 bytes --]

You raised this topic from string line reading to more general case. 😄

If so, the best way could be providing a general function to wrap rdelim
for for-each-seg-delim, users may pass a delimiter to decide how to delim
(even for bytevectors), and implement for-line-a-file base on it with
unicode encoding.

However, personally I dare to doubt if we really need such general
function, since general parsing may require looking backwards. This implies
the char based checking delimiter will be more general.
If we don't consider this, it's better to just consider strings with proper
encoding to avoid over engineering.
Best regards.

On Mon, Dec 16, 2024, 19:17 Maxime Devos <maximedevos@telenet.be> wrote:

>
>
> This is overly specific to reading lines, and reading lines with rdelim.
> If you replace ‘read-line’ by an argument, the procedure becomes more
> general. For example, by passing ‘get-char’ you can act on each character,
> with ‘get-line’ I’m not sure what the difference would be, but apparently
> it’s not ‘read-line’ (?), if you give it a JSON reading+parsing proedure
> you iterate over all JSON objects, with get-u8 you iterate over bytes etc..
>
>
>
> You could then define ‘for-line-in-file’ (for-line-in-port?) as a special
> case of the more general procedure.
>
>
>
> This generalisation also allows for setting ‘handle-delim’, which
> currently you are not allowing (the user still shouldn’t choose ‘split’
> though).
>
>
>
> I’m not sure where the general port interface should be, maybe in
> https://www.gnu.org/software/guile/manual/html_node/Ports.html?
>
>
>
> (Slightly more general is to also move eof-object? into an argument, but
> that seems too much generalisation. OTOH, it allows for ‘split’.)
>
>
>
> Also, I’d rather keep opening files out of the procedure – avoids the text
> encoding issues (mentioned by Nala Ginrut), also convenient for sandboxed
> environment that don’t want to give access to the file system (or, at
> least, only use a special file opening procedure that does additional
> checks), and avoids conflation of file names with files and files with
> ports. (As written, it’s for file ports, but as implemented, it can be
> meaningfully used for other ports as well (e.g. networking sockets).) (Also
> the user might want to set CLOEXEC or other flags, or uncompress input, …)
>
>
>
> Also, documentation is missing.
>
>
>
> Best regards,
> Maxime Devos
>

[-- Attachment #2: Type: text/html, Size: 4041 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH v2] rdelim: Add new procedure `for-line-in-file`.
  2024-12-16 10:29   ` Nala Ginrut
@ 2024-12-16 10:52     ` Maxime Devos
  2024-12-16 11:06       ` Nala Ginrut
  0 siblings, 1 reply; 8+ messages in thread
From: Maxime Devos @ 2024-12-16 10:52 UTC (permalink / raw)
  To: Nala Ginrut; +Cc: Adam Faiz, guile-devel@gnu.org, Ricardo Wurmus

[-- Attachment #1: Type: text/plain, Size: 2412 bytes --]

>You raised this topic from string line reading to more general case. 😄

Yes.
>If so, the best way could be providing a general function to wrap rdelim for for-each-seg-delim, users may pass a delimiter to decide how to delim (even for bytevectors), and implement for-line-a-file base on it with unicode encoding.
No, this is worse. This limits the procedure to delimiting.
>However, personally I dare to doubt if we really need such general function, since general parsing may require looking backwards. This implies the char based checking delimiter will be more general.
If we don't consider this, it's better to just consider strings with proper encoding to avoid over engineering. 
'general parsing may require looking backwards’ does not imply ‘don’t need such general function’, and ‘don’t need such general function’ doesn’t imply ‘such a general function wouldn’t be useful’. Many parsers don’t need looking backwards. E.g., see all examples I mentioned in my mail.
Also, you ‘general’ isn’t my ‘general’. Nowhere did I limit the procedure to delimiters and character-based things.
For an example on how this could be used: in Scheme-GNUnet, a fiber is waiting on messages on some (stream) socket (SOCK_STREAM, not SOCK_DGRAM). Each message consists of ‘message type field + packet size field + information’. So, the fiber effectively iterates over all messages on the stream. The parser first reads the type and size (no looking backwards needed), then reads the remaining information and passes this information to the type-specific parser, which can now be acted upon.
(IIRC, the control flow is technically a bit different (let loop with arguments, to avoid mutation, not that it really matters since there is state anyway in the ports), but it could have been implemented like this instead.)
Copy of list of examples that don’t need looking backwards:
This is overly specific to reading lines, and reading lines with rdelim. If you replace ‘read-line’ by an argument, the procedure becomes more general. For example, by passing ‘get-char’ you can act on each character, with ‘get-line’ I’m not sure what the difference would be, but apparently it’s not ‘read-line’ (?), if you give it a JSON reading+parsing proedure you iterate over all JSON objects, with get-u8 you iterate over bytes etc..
Best regards,
Maxime Devos

[-- Attachment #2: Type: text/html, Size: 4526 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] rdelim: Add new procedure `for-line-in-file`.
  2024-12-16 10:52     ` Maxime Devos
@ 2024-12-16 11:06       ` Nala Ginrut
  2024-12-16 12:00         ` Maxime Devos
  0 siblings, 1 reply; 8+ messages in thread
From: Nala Ginrut @ 2024-12-16 11:06 UTC (permalink / raw)
  To: Maxime Devos; +Cc: Adam Faiz, guile-devel@gnu.org, Ricardo Wurmus

[-- Attachment #1: Type: text/plain, Size: 3654 bytes --]

I'm not going to pursued anyone here, just sharing my opinion about a patch
for line parsing in a text file.😄

Yes, some of the parsers don need backwards, but it also doesn't mean
others parsers have priority to occupy a general API.

To my experience, a parser author like me prefer to write own parser from
scratch for many reasons, rather than finding ans adapting to a general
function buried deep under document.

Here's my opinion, as a parser writer, I have no interest to use it, but I
can say it still looks beautiful from functional programming perspective.
Beauty is still a value worth to go. I can agree with this. But I don't use
this function.

Of course I speak only for myself as a potential existing user.

So if we can be back to the original topic, a patch for parsing text line
function that can be understand and reviewex easy by most people. I believe
the author's effort and my suggestions are well enough to go. 😄

This doesn't mean I against your idea.

Best regards.

On Mon, Dec 16, 2024, 19:52 Maxime Devos <maximedevos@telenet.be> wrote:

> >You raised this topic from string line reading to more general case. 😄
>
>
>
> Yes.
>
> >If so, the best way could be providing a general function to wrap rdelim
> for for-each-seg-delim, users may pass a delimiter to decide how to delim
> (even for bytevectors), and implement for-line-a-file base on it with
> unicode encoding.
>
> No, this is worse. This limits the procedure to delimiting.
>
> >However, personally I dare to doubt if we really need such general
> function, since general parsing may require looking backwards. This implies
> the char based checking delimiter will be more general.
> If we don't consider this, it's better to just consider strings with
> proper encoding to avoid over engineering.
>
> 'general parsing may require looking backwards’ does not imply ‘don’t need
> such general function’, and ‘don’t need such general function’ doesn’t
> imply ‘such a general function wouldn’t be useful’. Many parsers don’t need
> looking backwards. E.g., see all examples I mentioned in my mail.
>
> Also, you ‘general’ isn’t my ‘general’. Nowhere did I limit the procedure
> to delimiters and character-based things.
>
> For an example on how this could be used: in Scheme-GNUnet, a fiber is
> waiting on messages on some (stream) socket (SOCK_STREAM, not SOCK_DGRAM).
> Each message consists of ‘message type field + packet size field +
> information’. So, the fiber effectively iterates over all messages on the
> stream. The parser first reads the type and size (no looking backwards
> needed), then reads the remaining information and passes this information
> to the type-specific parser, which can now be acted upon.
>
> (IIRC, the control flow is technically a bit different (let loop with
> arguments, to avoid mutation, not that it really matters since there is
> state anyway in the ports), but it could have been implemented like this
> instead.)
>
> Copy of list of examples that don’t need looking backwards:
>
> This is overly specific to reading lines, and reading lines with rdelim.
> If you replace ‘read-line’ by an argument, the procedure becomes more
> general. For example, by passing ‘get-char’ you can act on each character,
> with ‘get-line’ I’m not sure what the difference would be, but apparently
> it’s not ‘read-line’ (?), if you give it a JSON reading+parsing proedure
> you iterate over all JSON objects, with get-u8 you iterate over bytes etc..
>
> Best regards,
> Maxime Devos
>

[-- Attachment #2: Type: text/html, Size: 5212 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH v2] rdelim: Add new procedure `for-line-in-file`.
  2024-12-16 11:06       ` Nala Ginrut
@ 2024-12-16 12:00         ` Maxime Devos
  2024-12-16 12:33           ` Nala Ginrut
  0 siblings, 1 reply; 8+ messages in thread
From: Maxime Devos @ 2024-12-16 12:00 UTC (permalink / raw)
  To: Nala Ginrut; +Cc: Adam Faiz, guile-devel@gnu.org, Ricardo Wurmus

[-- Attachment #1: Type: text/plain, Size: 3554 bytes --]

>I'm not going to pursued anyone here, just sharing my opinion about a patch for line parsing in a text file.😄
Then maybe you should stop it with the faces (“😄”) and strawmen. The faces just make things worse.
>Yes, some of the parsers don need backwards, but it also doesn't mean others parsers have priority to occupy a general API. 
Nowhere did I propose 'letting other parsers have priority to occupy a general API’. There is no priorisation here, only an option for choice. Maybe you could even let ‘read-line’ be the default (with optional arguments) (depending on where you put the procedure, if it’s in (ice-9 ports) there would probably be some inconvenient dependency issue – I don’t expect this side-remark would work out well).
>To my experience, a parser author like me prefer to write own parser from scratch for many reasons, rather than finding ans adapting to a general function buried deep under document.
The specific function is buried deep – it’s undocumented, you have to read the implementation of (ice-9 rdelim) or things like that to discover the existence.
Nowhere did I propose burying the general function. In fact, I proposed a location on where to place it, that isn’t at all ‘deep’, and surely better than the somewhat obscure (ice-9 rdelim). At the very least, it’s better than being undocumented.
Nowhere did I propose removing the special case. The special case could still be defined in (ice-9 rdelim), but this time implemented in terms of the general function.
Also, the general function is as much as parser as the specific function – it just repeats a single parser (read-line in specific case, the passed procedure in the general case) over the whole port.
>Here's my opinion, as a parser writer, I have no interest to use it, but I can say it still looks beautiful from functional programming perspective. Beauty is still a value worth to go. I can agree with this. But I don't use this function.
>Of course I speak only for myself as a potential existing user.
Nowhere did I say it has to be done because of beauty. The argument was on usefulness, and (implicitly) on how straightforward it is to generalise it (making it more useful, can be used by more people, avoids having to implement other variants since the general version already does it).
As a parser writer, I have little interest in using the ‘read-line’ specific variant. Most of the parsing I do is not based on lines.
>So if we can be back to the original topic, a patch for parsing text line function that can be understand and reviewex easy by most people. I believe the author's effort and my suggestions are well enough to go. 😄
There is nothing to go back to. The original patch did not parse lines(*), it only read them and left the actual parsing to ‘proc’/’body’. Also, iterating over lines in a file is a special case of iterating over more general things and the method of doing the generalisation is trivial, so this is entirely on topic. It’s the same topic, just broader. Also, the general version can also be understood and easily reviewed by most people, and I haven’t seen evidence to the contrary.
I think they _aren’t_ well enough to go, since I haven’t heard a good argument yet for _not_ generalising it.
(*) I.e., while it technically does parse something (extract line from text), it doesn’t parse the line itself (which in many cases will need to happen), and ‘recognising a line as a line’ is kind of trivial.
Best regards,
Maxime Devos


[-- Attachment #2: Type: text/html, Size: 5602 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] rdelim: Add new procedure `for-line-in-file`.
  2024-12-16 12:00         ` Maxime Devos
@ 2024-12-16 12:33           ` Nala Ginrut
  0 siblings, 0 replies; 8+ messages in thread
From: Nala Ginrut @ 2024-12-16 12:33 UTC (permalink / raw)
  To: Maxime Devos; +Cc: Adam Faiz, guile-devel@gnu.org, Ricardo Wurmus

[-- Attachment #1: Type: text/plain, Size: 4639 bytes --]

The faces icons are my speaking freedom as part of text. I grown up in a
dictatorship place, they don’t even let me stop typing faces 😁 and no one
told me smile faces make things worse. It is interesting to hear it first
time in my life.

“Well enough to go” means there is enough efforts to show and should be
respected to stick to the original purpose. And the parser arguments could
be in another thread for people who interested. I’m the people who is
interested in the original line delimited reader, which may not be
completed enough to be called as a regular parser.

It doesn’t mean it’s good enough to be merged, of course. The argument
around the original topic is always welcome and respect the time of people
get involved in the topic.

Best regards.


On Reiwa 6 Dec 16, Mon at 21:00 Maxime Devos <maximedevos@telenet.be> wrote:

> >I'm not going to pursued anyone here, just sharing my opinion about a
> patch for line parsing in a text file.😄
>
> Then maybe you should stop it with the faces (“😄”) and strawmen. The
> faces just make things worse.
>
> >Yes, some of the parsers don need backwards, but it also doesn't mean
> others parsers have priority to occupy a general API.
>
> Nowhere did I propose 'letting other parsers have priority to occupy a
> general API’. There is no priorisation here, only an option for choice.
> Maybe you could even let ‘read-line’ be the default (with optional
> arguments) (depending on where you put the procedure, if it’s in (ice-9
> ports) there would probably be some inconvenient dependency issue – I don’t
> expect this side-remark would work out well).
>
> >To my experience, a parser author like me prefer to write own parser from
> scratch for many reasons, rather than finding ans adapting to a general
> function buried deep under document.
>
> The specific function is buried deep – it’s undocumented, you have to read
> the implementation of (ice-9 rdelim) or things like that to discover the
> existence.
>
> Nowhere did I propose burying the general function. In fact, I proposed a
> location on where to place it, that isn’t at all ‘deep’, and surely better
> than the somewhat obscure (ice-9 rdelim). At the very least, it’s better
> than being undocumented.
>
> Nowhere did I propose removing the special case. The special case could
> still be defined in (ice-9 rdelim), but this time implemented in terms of
> the general function.
>
> Also, the general function is as much as parser as the specific function –
> it just repeats a single parser (read-line in specific case, the passed
> procedure in the general case) over the whole port.
>
> >Here's my opinion, as a parser writer, I have no interest to use it, but
> I can say it still looks beautiful from functional programming perspective.
> Beauty is still a value worth to go. I can agree with this. But I don't use
> this function.
> >Of course I speak only for myself as a potential existing user.
>
> Nowhere did I say it has to be done because of beauty. The argument was on
> usefulness, and (implicitly) on how straightforward it is to generalise it
> (making it more useful, can be used by more people, avoids having to
> implement other variants since the general version already does it).
>
> As a parser writer, I have little interest in using the ‘read-line’
> specific variant. Most of the parsing I do is not based on lines.
>
> >So if we can be back to the original topic, a patch for parsing text line
> function that can be understand and reviewex easy by most people. I believe
> the author's effort and my suggestions are well enough to go. 😄
>
> There is nothing to go back to. The original patch did not parse lines(*),
> it only read them and left the actual parsing to ‘proc’/’body’. Also,
> iterating over lines in a file is a special case of iterating over more
> general things and the method of doing the generalisation is trivial, so
> this is entirely on topic. It’s the same topic, just broader. Also, the
> general version can also be understood and easily reviewed by most people,
> and I haven’t seen evidence to the contrary.
>
> I think they _*aren’t*_ well enough to go, since I haven’t heard a good
> argument yet for _*not*_ generalising it.
>
> (*) I.e., while it technically does parse something (extract line from
> text), it doesn’t parse the line itself (which in many cases will need to
> happen), and ‘recognising a line as a line’ is kind of trivial.
>
> Best regards,
> Maxime Devos
>
>
>

[-- Attachment #2: Type: text/html, Size: 6415 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-12-16 12:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-16  6:14 [PATCH v2] rdelim: Add new procedure `for-line-in-file` Adam Faiz
2024-12-16  7:41 ` Nala Ginrut
2024-12-16 10:17 ` Maxime Devos
2024-12-16 10:29   ` Nala Ginrut
2024-12-16 10:52     ` Maxime Devos
2024-12-16 11:06       ` Nala Ginrut
2024-12-16 12:00         ` Maxime Devos
2024-12-16 12:33           ` Nala Ginrut

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).