unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: Neil Jerram <neiljerram@gmail.com>
To: Zelphir Kaltstahl <zelphirkaltstahl@posteo.de>
Cc: Guile User <guile-user@gnu.org>
Subject: Re: Question about data structures
Date: Mon, 23 Nov 2020 09:52:19 +0000	[thread overview]
Message-ID: <CAKuG=vuKLJRRRKBkQnBs9pc9p8H9qbEdkJNgFvmNcQDAgGZD6A@mail.gmail.com> (raw)
In-Reply-To: <d8df761e-0f1f-d57d-1180-ab23c928ee82@posteo.de>

On Sun, 22 Nov 2020 at 18:49, Zelphir Kaltstahl <zelphirkaltstahl@posteo.de>
wrote:

> Hello Guile Users!
>
> I have a question about data structures.
>
> Recently I read a file and the lines in the file would become a list in
> my Guile program. The file was not super big or anything. However, I
> usually try to avoid having to use `append` or `reverse`, whenever
> possible, considering, that they are O(n) operations and in principle I
> do not want to write code, that uses lists in inefficient ways, when
> there is a more efficient way. That said, I hit a little snag:
>
> When I am reading a file and do not know how many lines there are in the
> file, I can use a normal list and construct it using recursive calls
> like `(cons line (iter ...))` where `iter` is the recursive call and.
> Could also be called `process-next-line` or simply `next`. Since I am
> building a recursive data structure, it is OK to have a non-tail
> position recursive call. Then I would return that list and work with it.
> However, what if I ever need to add a list entry and the order of list
> entry matters? I would either have to use `append`, to add it to the
> end, which would be O(n), or I would have to initially construct the
> list of lines in reversed order from initially, so that I can add by
> simply using `(cons new-entry lines)`. However, when I use the list in
> reverse and ever need to output the lines in the list in their original
> order, I would first need to `reverse` the list again.
>
> OK the whole reversing would not be a problem, if I used a vector to
> store the lines. However, then I would need to know the number of lines
> in the file ahead of time or look at the file once for counting the
> lines, then create the vector of that length and then store the lines in
> it. This seems inelegant again, because I look at the lines of the file
> twice. I could read it in as a list and then use `list->vector`, but
> that will also be an additional O(n) for converting every entry to
> vector element.
>
> If I now think about Python, then I have Python lists (actually arrays?)
> and those can be easily appended to in O(1) and random access is also
> O(1) according to https://wiki.python.org/moin/TimeComplexity. This
> means I do not need to think much about how I will use the lines of a
> file, when reading in the file. A Python list will be appropriate.
>
> So in Guile I sometimes feel some mental overhead of thinking about the
> choice of data structure, which I do not feel in Python. Generally I
> like Guile a lot more, so I would like to know how others deal with
> this. Here are some ideas for it:
>
> 1. I could create some abstraction layer for the "sequence of read
> lines", which I use inside the procedure, which reads in the lines and
> all other code, that works with the lines, so that I can rather easily
> later exchange the data structure. A data abstraction. However, that
> might only hide the complexities of some operations behind the
> abstraction and not reduce them.
>
> 2. Perhaps I want Guile's arrays? (Can they be expanded in O(1)? I do
> seem to remember reading, that Guile vectors are only a special case of
> Guile arrays, so that would mean they are not expandable in O(1).)
>
> 3. Just convert list to vector after reading in the file until I hit a
> problem, where that additional O(n) really becomes a problem. (In my
> current project it is not realistically a problem.) But this does not
> satisfy me. I should learn how to solve the problem in general and use
> the best way to do things.
>
> 4. Perhaps I need to write another data structure, that creates a new
> vector, when the previous one is full, managing the expansion myself.
>
> How do you approach this problem? Is it a problem at all?


My standard pattern:

 (let loop ((input input) (output '()))
  (if (null? input)
      (reverse output)
      (loop (cdr input) (cons (process (car input)) output))))


  parent reply	other threads:[~2020-11-23  9:52 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-22 18:48 Question about data structures Zelphir Kaltstahl
2020-11-22 19:45 ` divoplade
2020-11-22 20:24   ` Zelphir Kaltstahl
2020-11-22 21:00     ` divoplade
2020-11-28  2:47       ` Zelphir Kaltstahl
2020-11-22 20:58   ` kwright
2020-11-22 22:50 ` Tim Van den Langenbergh
2020-11-23  3:42 ` Taylan Kammer
2020-11-23  4:42   ` John Cowan
2020-11-23 14:54   ` [EXT] " Thompson, David
2020-11-23  9:52 ` Neil Jerram [this message]
2020-11-23 12:24 ` Dr. Arne Babenhauserheide

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKuG=vuKLJRRRKBkQnBs9pc9p8H9qbEdkJNgFvmNcQDAgGZD6A@mail.gmail.com' \
    --to=neiljerram@gmail.com \
    --cc=guile-user@gnu.org \
    --cc=zelphirkaltstahl@posteo.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).