unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Size and length limits for Emacs primitive types and etc data?
@ 2013-01-22 22:06 Oleksandr Gavenko
  2013-02-03 13:56 ` Aurélien Aptel
  0 siblings, 1 reply; 14+ messages in thread
From: Oleksandr Gavenko @ 2013-01-22 22:06 UTC (permalink / raw)
  To: help-gnu-emacs

during search I found these sources of information about limits of Emacs runtime:

  (info "(elisp)Programming Types")
                Programming Types
  http://www.emacswiki.org/emacs/EmacsFileSizeLimit
                EmacsFileSizeLimit
  http://article.gmane.org/gmane.emacs.devel/139119
                 Re: stack overflow limit
                 The value of re_max_failures we use now needs 4MB of stack on
                 a 32-but machine, twice as much on a 64-bit machine. We also
                 need stack space for GC.

From official docs:

For integers: 28bit + sign.
For chars: 22-bit.

Next types have unknown or undefined size limits in manual but:

================================================================

For float: Emacs uses the IEEE floating point standard where possible. But
which precision exactly (half/single/double
http://en.wikipedia.org/wiki/IEEE_754#Basic_formats)?

/* Lisp floating point type.  */
struct Lisp_Float  /* src/lisp.h */
  {
    union
    {
      double data;
      struct Lisp_Float *chain;
    } u;
  };

Seems it uses 64-bit (double precision) IEEE 754 on most of 32-bit platforms.

Any function in runtime that return digits and exponent width for float?

================================================================

For list: I think their length unlimited at all.

================================================================

But how many bytes take symbol? For example 'foo'?

From src/lisp.h:

typedef struct { EMACS_INT i; } Lisp_Object;

struct Lisp_Symbol
{
  unsigned gcmarkbit : 1;
  ENUM_BF (symbol_redirect) redirect : 3;
  unsigned constant : 2;
  unsigned interned : 2;
  unsigned declared_special : 1;
  Lisp_Object name;
  union {
    Lisp_Object value;
    struct Lisp_Symbol *alias;
    struct Lisp_Buffer_Local_Value *blv;
    union Lisp_Fwd *fwd;
  } val;
  Lisp_Object function;
  Lisp_Object plist;
  struct Lisp_Symbol *next;
};

For 32-bit arch I count 4*6=24 bytes.

Seems that Lisp_Object is index in hash table to actual values (like actual
name or function code...).

================================================================

How many memory takes cons cell?

struct Lisp_Cons
  {
    Lisp_Object car;
    union
    {
      Lisp_Object cdr;
      struct Lisp_Cons *chain;
    } u;
  };

For 32-bit arch I count 4*2=8 bytes.

================================================================

How many takes plist for storing single property?

From:

DEFUN ("plist-put", Fplist_put, Splist_put, 3, 3, 0,
  (Lisp_Object plist, register Lisp_Object prop, Lisp_Object val)
{
  register Lisp_Object tail, prev;
  Lisp_Object newcell;
  prev = Qnil;
  for (tail = plist; CONSP (tail) && CONSP (XCDR (tail));
       tail = XCDR (XCDR (tail)))

seems that 2 cons... or 8*2=16 bytes.

================================================================

How many memory takes string (which is buffer strings and symbols names)?

typedef struct interval *INTERVAL;
struct Lisp_String
  {
    ptrdiff_t size;
    ptrdiff_t size_byte;
    INTERVAL intervals;		/* Text properties in this string.  */
    unsigned char *data;
  };

Seems that 3*4 + lengthOf(data) bytes.

Manual say that "strings really contain integers" and "strings are arrays, and
therefore sequences as well".

So each char (in data) uses 4 bytes? Seem doesn't. As

     To conserve memory, Emacs does not hold fixed-length 22-bit numbers that
  are codepoints of text characters within buffers and strings. Rather, Emacs
  uses a variable-length internal representation of characters, that stores
  each character as a sequence of 1 to 5 8-bit bytes, depending on the
  magnitude of its codepoint.

and:

  Encoded text is not really text, as far as Emacs is concerned, but rather a
  sequence of raw 8-bit bytes. We call buffers and strings that hold encoded
  text "unibyte" buffers and strings, because Emacs treats them as a sequence
  of individual bytes.

With unibyte I understand that it is easy to get char by index.

But with multibyte I don't understand. And don't understand why in this case
string are array, is it an inefficient array?

Seems that buffer text == string:

struct buffer_text   /* from src/buffer.h */
  {
    unsigned char *beg;
    ptrdiff_t gpt;		/* Char pos of gap in buffer.  */
    ptrdiff_t z;		/* Char pos of end of buffer.  */
    ptrdiff_t gpt_byte;		/* Byte pos of gap in buffer.  */
    ptrdiff_t z_byte;		/* Byte pos of end of buffer.  */
    ptrdiff_t gap_size;		/* Size of buffer's gap.  */
    EMACS_INT modiff;		/* This counts buffer-modification events
    EMACS_INT chars_modiff;	/* This is modified with character change
    EMACS_INT save_modiff;	/* Previous value of modiff, as of last
    EMACS_INT overlay_modiff;	/* Counts modifications to overlays.  */
    EMACS_INT compact;		/* Set to modiff each time when compact_buffer
    ptrdiff_t beg_unchanged;
    ptrdiff_t end_unchanged;
    EMACS_INT unchanged_modified;
    EMACS_INT overlay_unchanged_modified;
    INTERVAL intervals;
    struct Lisp_Marker *markers;
    bool inhibit_shrinking;
  };

So opening 10 KiB Russian file in cp1251 actually take 2*10 KiB for buffer as
each Russian chars in multibyte string take 2 bytes... (just type C-u C-x =
and look to "buffer code: #xD0 #x91").

I think that string have no length limit (except limit in 28-bit for index on
32-bit platform).

================================================================

Seems that arrays/vectors also have no limits for length (except limit in
28-bit for index on 32-bit platform):

/* Regular vector is just a header plus array of Lisp_Objects.  */
struct Lisp_Vector   /* src/lisp.h */
  {
    struct vectorlike_header header;
    Lisp_Object contents[1];
  };

/* A boolvector is a kind of vectorlike, with contents are like a string.  */
struct Lisp_Bool_Vector
  {
    struct vectorlike_header header;
    /* This is the size in bits.  */
    EMACS_INT size;
    /* This contains the actual bits, packed into bytes.  */
    unsigned char data[1];
  };

================================================================

Hash tables are harder data type and I don't understand limitations on count
of key-values pairs from:

struct Lisp_Hash_Table
{
  struct vectorlike_header header;
  Lisp_Object weak;
  Lisp_Object rehash_size;
  Lisp_Object rehash_threshold;
  Lisp_Object hash;
  Lisp_Object next;
  Lisp_Object next_free;
  Lisp_Object index;
  ptrdiff_t count;
  Lisp_Object key_and_value;
  struct hash_table_test test;
  struct Lisp_Hash_Table *next_weak;
};

================================================================

Please correct me and answer the questions...

-- 
Best regards!




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Size and length limits for Emacs primitive types and etc data?
  2013-01-22 22:06 Size and length limits for Emacs primitive types and etc data? Oleksandr Gavenko
@ 2013-02-03 13:56 ` Aurélien Aptel
  2013-02-03 19:16   ` Eli Zaretskii
  0 siblings, 1 reply; 14+ messages in thread
From: Aurélien Aptel @ 2013-02-03 13:56 UTC (permalink / raw)
  To: Oleksandr Gavenko; +Cc: help-gnu-emacs

On Tue, Jan 22, 2013 at 11:06 PM, Oleksandr Gavenko <gavenkoa@gmail.com> wrote:
> Please correct me and answer the questions...

I'm just bumping this thread hoping someone knowledgeable can answer
because I'm also interested :)



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Size and length limits for Emacs primitive types and etc data?
  2013-02-03 13:56 ` Aurélien Aptel
@ 2013-02-03 19:16   ` Eli Zaretskii
  2013-02-04 12:38     ` Aurélien Aptel
  0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2013-02-03 19:16 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Sun, 3 Feb 2013 14:56:34 +0100
> From: Aurélien Aptel <aurelien.aptel+emacs@gmail.com>
> Cc: help-gnu-emacs@gnu.org
> 
> On Tue, Jan 22, 2013 at 11:06 PM, Oleksandr Gavenko <gavenkoa@gmail.com> wrote:
> > Please correct me and answer the questions...
> 
> I'm just bumping this thread hoping someone knowledgeable can answer
> because I'm also interested :)

Everything was already said in the OP.  And since no one jumped in to
correct that...




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Size and length limits for Emacs primitive types and etc data?
  2013-02-03 19:16   ` Eli Zaretskii
@ 2013-02-04 12:38     ` Aurélien Aptel
  2013-02-04 15:57       ` Eli Zaretskii
  0 siblings, 1 reply; 14+ messages in thread
From: Aurélien Aptel @ 2013-02-04 12:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs

On Sun, Feb 3, 2013 at 8:16 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> Everything was already said in the OP.  And since no one jumped in to
> correct that...

What about the hash table?



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Size and length limits for Emacs primitive types and etc data?
  2013-02-04 12:38     ` Aurélien Aptel
@ 2013-02-04 15:57       ` Eli Zaretskii
  2013-02-05  9:41         ` Oleksandr Gavenko
  0 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2013-02-04 15:57 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Mon, 4 Feb 2013 13:38:23 +0100
> From: Aurélien Aptel <aurelien.aptel+emacs@gmail.com>
> Cc: help-gnu-emacs@gnu.org
> 
> On Sun, Feb 3, 2013 at 8:16 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> > Everything was already said in the OP.  And since no one jumped in to
> > correct that...
> 
> What about the hash table?

What about it?




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Size and length limits for Emacs primitive types and etc data?
  2013-02-04 15:57       ` Eli Zaretskii
@ 2013-02-05  9:41         ` Oleksandr Gavenko
  2013-02-05 18:14           ` Eli Zaretskii
                             ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Oleksandr Gavenko @ 2013-02-05  9:41 UTC (permalink / raw)
  To: help-gnu-emacs

On 2013-02-04, Eli Zaretskii wrote:

>> Date: Mon, 4 Feb 2013 13:38:23 +0100
>> From: Aurélien Aptel <aurelien.aptel+emacs@gmail.com>
>> Cc: help-gnu-emacs@gnu.org
>> 
>> On Sun, Feb 3, 2013 at 8:16 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>> > Everything was already said in the OP.  And since no one jumped in to
>> > correct that...
>> 
>> What about the hash table?
>
> What about it?
>
In my original post I can't analyse how many memory take hash tables...

And also it is interesting to know how much key-value pairs can hold hash
table.

================================================================

Also interesting thing about elisp that any related data stored in distinct
memory locations (ever any internal structs hold link to actual data, like
string for symbol name), non-together.

I think linked nature of elisp data structure cause very high rate of CPU
cache miss (but I don't actually run any AMD/Intel CPU profilers).

-- 
Best regards!




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Size and length limits for Emacs primitive types and etc data?
  2013-02-05  9:41         ` Oleksandr Gavenko
@ 2013-02-05 18:14           ` Eli Zaretskii
  2013-02-05 20:17             ` Oleksandr Gavenko
       [not found]           ` <mailman.19079.1360088047.855.help-gnu-emacs@gnu.org>
  2013-02-05 22:25           ` Peter Dyballa
  2 siblings, 1 reply; 14+ messages in thread
From: Eli Zaretskii @ 2013-02-05 18:14 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Oleksandr Gavenko <gavenkoa@gmail.com>
> Date: Tue, 05 Feb 2013 11:41:55 +0200
> 
> In my original post I can't analyse how many memory take hash tables...

Depends on the size of the table, obviously.

> And also it is interesting to know how much key-value pairs can hold hash
> table.

most-positive-fixnum, I guess.  (Why is that interesting?)

> I think linked nature of elisp data structure cause very high rate of CPU
> cache miss (but I don't actually run any AMD/Intel CPU profilers).

Prove it.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Size and length limits for Emacs primitive types and etc data?
       [not found]           ` <mailman.19079.1360088047.855.help-gnu-emacs@gnu.org>
@ 2013-02-05 19:06             ` Burton Samograd
  2013-02-05 20:04               ` Oleksandr Gavenko
  2013-02-05 21:28               ` Eli Zaretskii
  0 siblings, 2 replies; 14+ messages in thread
From: Burton Samograd @ 2013-02-05 19:06 UTC (permalink / raw)
  To: help-gnu-emacs

Eli Zaretskii <eliz@gnu.org> writes:

>> I think linked nature of elisp data structure cause very high rate of CPU
>> cache miss (but I don't actually run any AMD/Intel CPU profilers).
>
> Prove it.

Here's an analysis of cache misses with pointer based data structures
(linked list and trees):

  http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.9669

It's common knowledge that linked lists are not cache friendly, leading
to creation of structures like Unrolled Linked Lists
(http://en.wikipedia.org/wiki/Unrolled_linked_list) to help.

--
Burton Samograd


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Size and length limits for Emacs primitive types and etc data?
  2013-02-05 19:06             ` Burton Samograd
@ 2013-02-05 20:04               ` Oleksandr Gavenko
  2013-02-05 21:28               ` Eli Zaretskii
  1 sibling, 0 replies; 14+ messages in thread
From: Oleksandr Gavenko @ 2013-02-05 20:04 UTC (permalink / raw)
  To: help-gnu-emacs

On 2013-02-05, Burton Samograd wrote:

> Eli Zaretskii <eliz@gnu.org> writes:
>
>>> I think linked nature of elisp data structure cause very high rate of CPU
>>> cache miss (but I don't actually run any AMD/Intel CPU profilers).
>>
>> Prove it.
>
I don't familiar with appropriated CPU profilers:

  http://developer.amd.com/tools/heterogeneous-computing/amd-codeanalyst-performance-analyzer/
                CodeAnalyst Performance Analyzer
  http://developer.amd.com/tools/heterogeneous-computing/codexl/
                GPU debugging, comprehensive GPU and CPU profiling, and static
                OpenCL™ kernel analysis capabilities

  http://software.intel.com/en-us/intel-vtune-amplifier-xe
                Intel® VTune™ Amplifier XE 2013 is the premier profiler for C,
                C++, C#, Fortran, Assembly and Java.

Main reason that it is hard to interpret right what they show (modern CPU have
too complicate execution architecture).

> Here's an analysis of cache misses with pointer based data structures
> (linked list and trees):
>
>   http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.9669
>
> It's common knowledge that linked lists are not cache friendly, leading
> to creation of structures like Unrolled Linked Lists
> (http://en.wikipedia.org/wiki/Unrolled_linked_list) to help.
>

I work with Isabelle proof assistant (written in ML) on my old laptop with 512
MiB of RAM (in 64-bit userspace). On large theories my laptop froze on disk
(swap) operations. While non-functional programs like Firefox and OpenOffice
operated with minor slowness in IU when out of 512 MiB.

It is possible to say that I run into 100% RAM misses!!

After installing additional memory module (up to 1 GiB) any disk operation go
away on those large theories!

Also interesting result we get when work on optimisation of GOST cipher. It
have permutation table and in order to increase speed we expand this table.
Usually larger precomputation give better performance... but in our case as
soon table out of CPU cache we lose speed in 2 or more times...

Deduce from theoretical positions (look to Burton Samograd link) it is
possible frequently cache misses in Elisp. As I don't know GNU Emacs internals
I ask about Elisp and CPU cache misses...

-- 
Best regards!




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Size and length limits for Emacs primitive types and etc data?
  2013-02-05 18:14           ` Eli Zaretskii
@ 2013-02-05 20:17             ` Oleksandr Gavenko
  2013-02-05 21:35               ` Eli Zaretskii
  2013-02-06 18:46               ` Stefan Monnier
  0 siblings, 2 replies; 14+ messages in thread
From: Oleksandr Gavenko @ 2013-02-05 20:17 UTC (permalink / raw)
  To: help-gnu-emacs

On 2013-02-05, Eli Zaretskii wrote:

>> From: Oleksandr Gavenko <gavenkoa@gmail.com>
>> Date: Tue, 05 Feb 2013 11:41:55 +0200
>> 
>> In my original post I can't analyse how many memory take hash tables...
>
> Depends on the size of the table, obviously.
>
I didn't visit CS courses. Does hash table take more then Coef*N memory space
from N keys? Like N*log_2(N) or similar?

Just looking to

  http://en.wikipedia.org/wiki/Hash_table

shown that there are exist implementations with N^2 memory requirement...

>> And also it is interesting to know how much key-value pairs can hold hash
>> table.
>
> most-positive-fixnum, I guess.  (Why is that interesting?)
>
I am start implementing ASN1 parser in Elisp. ASN1 data format designed to
store any theoretically possible data length (2^128 bytes...). But any
implementation have memory limits. I want to push this limits up to Elisp
abilities and document this in docs... So need concrete numbers...

-- 
Best regards!




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Size and length limits for Emacs primitive types and etc data?
  2013-02-05 19:06             ` Burton Samograd
  2013-02-05 20:04               ` Oleksandr Gavenko
@ 2013-02-05 21:28               ` Eli Zaretskii
  1 sibling, 0 replies; 14+ messages in thread
From: Eli Zaretskii @ 2013-02-05 21:28 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Burton Samograd <burton@samograd.ca>
> Date: Tue, 05 Feb 2013 12:06:35 -0700
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> I think linked nature of elisp data structure cause very high rate of CPU
> >> cache miss (but I don't actually run any AMD/Intel CPU profilers).
> >
> > Prove it.
> 
> Here's an analysis of cache misses with pointer based data structures
> (linked list and trees):
> 
>   http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.9669
> 
> It's common knowledge that linked lists are not cache friendly, leading
> to creation of structures like Unrolled Linked Lists
> (http://en.wikipedia.org/wiki/Unrolled_linked_list) to help.

The relevant thing here is how Emacs Lisp programs fare in this
regard.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Size and length limits for Emacs primitive types and etc data?
  2013-02-05 20:17             ` Oleksandr Gavenko
@ 2013-02-05 21:35               ` Eli Zaretskii
  2013-02-06 18:46               ` Stefan Monnier
  1 sibling, 0 replies; 14+ messages in thread
From: Eli Zaretskii @ 2013-02-05 21:35 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Oleksandr Gavenko <gavenkoa@gmail.com>
> Date: Tue, 05 Feb 2013 22:17:47 +0200
> 
> I didn't visit CS courses. Does hash table take more then Coef*N memory space
> from N keys? Like N*log_2(N) or similar?

Look at 'struct Lisp_Hash_Table' definition, it's all there.
Basically, there's a vector of hash values, and for each value a chain
of key-value pairs.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Size and length limits for Emacs primitive types and etc data?
  2013-02-05  9:41         ` Oleksandr Gavenko
  2013-02-05 18:14           ` Eli Zaretskii
       [not found]           ` <mailman.19079.1360088047.855.help-gnu-emacs@gnu.org>
@ 2013-02-05 22:25           ` Peter Dyballa
  2 siblings, 0 replies; 14+ messages in thread
From: Peter Dyballa @ 2013-02-05 22:25 UTC (permalink / raw)
  To: Oleksandr Gavenko; +Cc: help-gnu-emacs


Am 05.02.2013 um 10:41 schrieb Oleksandr Gavenko:

> I think linked nature of elisp data structure cause very high rate of CPU
> cache miss (but I don't actually run any AMD/Intel CPU profilers).

It's also possible that this new hardware is a bit different compared to older ones. At least this is my explanation why on my Mac I see all the time 0.0 % cache hits…

--
Greetings

  Pete

There are very few jobs that actually require a penis or vagina. All other jobs should be open to everybody.
				– Florynce Kennedy




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Size and length limits for Emacs primitive types and etc data?
  2013-02-05 20:17             ` Oleksandr Gavenko
  2013-02-05 21:35               ` Eli Zaretskii
@ 2013-02-06 18:46               ` Stefan Monnier
  1 sibling, 0 replies; 14+ messages in thread
From: Stefan Monnier @ 2013-02-06 18:46 UTC (permalink / raw)
  To: help-gnu-emacs

> shown that there are exist implementations with N^2 memory requirement...

I don't see any obvious N^2 memory use in that page.
All normal implementations I know have a memory use of O(N).
Anything different is probably limited to very special application areas
(perfect hashing, maybe?).

The way hash tables work in the current Emacs implementation is
with a table of size N containing (upto) N objects (and N pointers used
to reference the next element in a chain), plus a bucket table of size
(N / rehash-threshold).


        Stefan




^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-02-06 18:46 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-22 22:06 Size and length limits for Emacs primitive types and etc data? Oleksandr Gavenko
2013-02-03 13:56 ` Aurélien Aptel
2013-02-03 19:16   ` Eli Zaretskii
2013-02-04 12:38     ` Aurélien Aptel
2013-02-04 15:57       ` Eli Zaretskii
2013-02-05  9:41         ` Oleksandr Gavenko
2013-02-05 18:14           ` Eli Zaretskii
2013-02-05 20:17             ` Oleksandr Gavenko
2013-02-05 21:35               ` Eli Zaretskii
2013-02-06 18:46               ` Stefan Monnier
     [not found]           ` <mailman.19079.1360088047.855.help-gnu-emacs@gnu.org>
2013-02-05 19:06             ` Burton Samograd
2013-02-05 20:04               ` Oleksandr Gavenko
2013-02-05 21:28               ` Eli Zaretskii
2013-02-05 22:25           ` Peter Dyballa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).