unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: Taylan Kammer <taylan.kammer@gmail.com>
To: Zelphir Kaltstahl <zelphirkaltstahl@posteo.de>,
	Guile User <guile-user@gnu.org>
Subject: Re: Question about data structures
Date: Mon, 23 Nov 2020 04:42:37 +0100	[thread overview]
Message-ID: <a5317e0c-9fe0-1fb9-c6b5-d3065d179285@gmail.com> (raw)
In-Reply-To: <d8df761e-0f1f-d57d-1180-ab23c928ee82@posteo.de>

On 22.11.2020 19:48, Zelphir Kaltstahl wrote:
> Hello Guile Users!
> 
> I have a question about data structures.
> 
> [...]
> 
> How do you approach this problem? Is it a problem at all?

First of all, be cautious about premature optimization.  In many cases 
it's best to just write the code the most straightforward way possible 
with the tools at hand, and not bother with optimization unless it 
actually proves to be an issue.  Are you going to be processing files 
with millions of lines?  Thousands of lines but on a very weak CPU? 
Does it matter if your program takes 0.1 seconds or 2 seconds to run?


Now the actual answer, in case you need to optimize, or just want to 
learn more:


All data structures that offer a sequential list of elements have to 
make some trade-offs between the performance of various operations, as 
well as the implementation complexity.  Linked lists (i.e. "lists" in 
Scheme) are very simple, and a few operations are cheap as well, but 
they have the shortcomings you've described plus some more.


Since your main concern seems to be appending, you could simply use a 
linked list where you keep a reference to the last cons pair (tail) of 
the list, so appending is simply a matter of a 'set-cdr!' operation on 
the tail.


Python lists, JDK's ArrayList, and .NET ArrayList, among probably many 
other "list" or "array" data structures in popular languages nowadays 
use a relatively straightforward data structure that is backed by an 
actual array which can have empty slots (e.g. your Python list with 3 
elements might be backed by an array of size 10), and is reallocated 
whenever there's no space left.  This means that appending an element at 
the end is usually dirt cheap, until there's no space left, at which 
point the append operation is much heavier for one call, then the 
following calls are dirt cheap again, until it's full again...

Inserting an element at the beginning or middle is also relatively 
expensive with those implementations, since all elements need to be 
shifted forward to make space for the new element.  (Although this might 
be done with an operation like C's memcpy which is still actually very 
fast.)

It's called a "dynamic array" by Wikipedia:

     https://en.wikipedia.org/wiki/Dynamic_array

If you want to go on an adventure, you could implement a Scheme data 
structure called DVector that implements this strategy, using plain 
Scheme vectors for the backing array.


The VList has also been mentioned in this thread, but from what I can 
tell it doesn't seem to offer a very efficient append operation.


- Taylan



  parent reply	other threads:[~2020-11-23  3:42 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-22 18:48 Question about data structures Zelphir Kaltstahl
2020-11-22 19:45 ` divoplade
2020-11-22 20:24   ` Zelphir Kaltstahl
2020-11-22 21:00     ` divoplade
2020-11-28  2:47       ` Zelphir Kaltstahl
2020-11-22 20:58   ` kwright
2020-11-22 22:50 ` Tim Van den Langenbergh
2020-11-23  3:42 ` Taylan Kammer [this message]
2020-11-23  4:42   ` John Cowan
2020-11-23 14:54   ` [EXT] " Thompson, David
2020-11-23  9:52 ` Neil Jerram
2020-11-23 12:24 ` Dr. Arne Babenhauserheide

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a5317e0c-9fe0-1fb9-c6b5-d3065d179285@gmail.com \
    --to=taylan.kammer@gmail.com \
    --cc=guile-user@gnu.org \
    --cc=zelphirkaltstahl@posteo.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).