unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* Question about data structures
@ 2020-11-22 18:48 Zelphir Kaltstahl
  2020-11-22 19:45 ` divoplade
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Zelphir Kaltstahl @ 2020-11-22 18:48 UTC (permalink / raw)
  To: Guile User

Hello Guile Users!

I have a question about data structures.

Recently I read a file and the lines in the file would become a list in
my Guile program. The file was not super big or anything. However, I
usually try to avoid having to use `append` or `reverse`, whenever
possible, considering, that they are O(n) operations and in principle I
do not want to write code, that uses lists in inefficient ways, when
there is a more efficient way. That said, I hit a little snag:

When I am reading a file and do not know how many lines there are in the
file, I can use a normal list and construct it using recursive calls
like `(cons line (iter ...))` where `iter` is the recursive call and.
Could also be called `process-next-line` or simply `next`. Since I am
building a recursive data structure, it is OK to have a non-tail
position recursive call. Then I would return that list and work with it.
However, what if I ever need to add a list entry and the order of list
entry matters? I would either have to use `append`, to add it to the
end, which would be O(n), or I would have to initially construct the
list of lines in reversed order from initially, so that I can add by
simply using `(cons new-entry lines)`. However, when I use the list in
reverse and ever need to output the lines in the list in their original
order, I would first need to `reverse` the list again.

OK the whole reversing would not be a problem, if I used a vector to
store the lines. However, then I would need to know the number of lines
in the file ahead of time or look at the file once for counting the
lines, then create the vector of that length and then store the lines in
it. This seems inelegant again, because I look at the lines of the file
twice. I could read it in as a list and then use `list->vector`, but
that will also be an additional O(n) for converting every entry to
vector element.

If I now think about Python, then I have Python lists (actually arrays?)
and those can be easily appended to in O(1) and random access is also
O(1) according to https://wiki.python.org/moin/TimeComplexity. This
means I do not need to think much about how I will use the lines of a
file, when reading in the file. A Python list will be appropriate.

So in Guile I sometimes feel some mental overhead of thinking about the
choice of data structure, which I do not feel in Python. Generally I
like Guile a lot more, so I would like to know how others deal with
this. Here are some ideas for it:

1. I could create some abstraction layer for the "sequence of read
lines", which I use inside the procedure, which reads in the lines and
all other code, that works with the lines, so that I can rather easily
later exchange the data structure. A data abstraction. However, that
might only hide the complexities of some operations behind the
abstraction and not reduce them.

2. Perhaps I want Guile's arrays? (Can they be expanded in O(1)? I do
seem to remember reading, that Guile vectors are only a special case of
Guile arrays, so that would mean they are not expandable in O(1).)

3. Just convert list to vector after reading in the file until I hit a
problem, where that additional O(n) really becomes a problem. (In my
current project it is not realistically a problem.) But this does not
satisfy me. I should learn how to solve the problem in general and use
the best way to do things.

4. Perhaps I need to write another data structure, that creates a new
vector, when the previous one is full, managing the expansion myself.

How do you approach this problem? Is it a problem at all?

Best regards,
Zelphir

-- 
repositories: https://notabug.org/ZelphirKaltstahl




^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-11-28  2:47 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-22 18:48 Question about data structures Zelphir Kaltstahl
2020-11-22 19:45 ` divoplade
2020-11-22 20:24   ` Zelphir Kaltstahl
2020-11-22 21:00     ` divoplade
2020-11-28  2:47       ` Zelphir Kaltstahl
2020-11-22 20:58   ` kwright
2020-11-22 22:50 ` Tim Van den Langenbergh
2020-11-23  3:42 ` Taylan Kammer
2020-11-23  4:42   ` John Cowan
2020-11-23 14:54   ` [EXT] " Thompson, David
2020-11-23  9:52 ` Neil Jerram
2020-11-23 12:24 ` Dr. Arne Babenhauserheide

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).