On Sunday, 22 November 2020 19:48:24 CET Zelphir Kaltstahl wrote: > Hello Guile Users! > > I have a question about data structures. > > Recently I read a file and the lines in the file would become a list in > my Guile program. The file was not super big or anything. However, I > usually try to avoid having to use `append` or `reverse`, whenever > possible, considering, that they are O(n) operations and in principle I > do not want to write code, that uses lists in inefficient ways, when > there is a more efficient way. That said, I hit a little snag: > > When I am reading a file and do not know how many lines there are in the > file, I can use a normal list and construct it using recursive calls > like `(cons line (iter ...))` where `iter` is the recursive call and. > Could also be called `process-next-line` or simply `next`. Since I am > building a recursive data structure, it is OK to have a non-tail > position recursive call. Then I would return that list and work with it. > However, what if I ever need to add a list entry and the order of list > entry matters? I would either have to use `append`, to add it to the > end, which would be O(n), or I would have to initially construct the > list of lines in reversed order from initially, so that I can add by > simply using `(cons new-entry lines)`. However, when I use the list in > reverse and ever need to output the lines in the list in their original > order, I would first need to `reverse` the list again. > > OK the whole reversing would not be a problem, if I used a vector to > store the lines. However, then I would need to know the number of lines > in the file ahead of time or look at the file once for counting the > lines, then create the vector of that length and then store the lines in > it. This seems inelegant again, because I look at the lines of the file > twice. I could read it in as a list and then use `list->vector`, but > that will also be an additional O(n) for converting every entry to > vector element. > > If I now think about Python, then I have Python lists (actually arrays?) > and those can be easily appended to in O(1) and random access is also > O(1) according to https://wiki.python.org/moin/TimeComplexity. This > means I do not need to think much about how I will use the lines of a > file, when reading in the file. A Python list will be appropriate. > > So in Guile I sometimes feel some mental overhead of thinking about the > choice of data structure, which I do not feel in Python. Generally I > like Guile a lot more, so I would like to know how others deal with > this. Here are some ideas for it: > > 1. I could create some abstraction layer for the "sequence of read > lines", which I use inside the procedure, which reads in the lines and > all other code, that works with the lines, so that I can rather easily > later exchange the data structure. A data abstraction. However, that > might only hide the complexities of some operations behind the > abstraction and not reduce them. > > 2. Perhaps I want Guile's arrays? (Can they be expanded in O(1)? I do > seem to remember reading, that Guile vectors are only a special case of > Guile arrays, so that would mean they are not expandable in O(1).) > > 3. Just convert list to vector after reading in the file until I hit a > problem, where that additional O(n) really becomes a problem. (In my > current project it is not realistically a problem.) But this does not > satisfy me. I should learn how to solve the problem in general and use > the best way to do things. > > 4. Perhaps I need to write another data structure, that creates a new > vector, when the previous one is full, managing the expansion myself. > > How do you approach this problem? Is it a problem at all? > > Best regards, > Zelphir > > Hey Zelphir, If you want to use a FIFO data structure, you may want to check out queues. They're already in ice-9, under (ice-9 q). Mandatory manual reference: https://www.gnu.org/software/guile/manual/html_node/Queues.html#Queues Alternatively, if you want constant-time random access you could try using Vlists, although they aren't thread-safe. https://www.gnu.org/software/guile/manual/html_node/VLists.html#VLists Finally you could implement either a doubly-linked list or array list type depending on your needs. If I understand your requirements correctly I would recommend queues. They are easy to work with. Sincerely yours, - Tim