all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Size and length limits for Emacs primitive types and etc data?
@ 2013-01-22 22:06 Oleksandr Gavenko
  2013-02-03 13:56 ` Aurélien Aptel
  0 siblings, 1 reply; 14+ messages in thread
From: Oleksandr Gavenko @ 2013-01-22 22:06 UTC (permalink / raw)
  To: help-gnu-emacs

during search I found these sources of information about limits of Emacs runtime:

  (info "(elisp)Programming Types")
                Programming Types
  http://www.emacswiki.org/emacs/EmacsFileSizeLimit
                EmacsFileSizeLimit
  http://article.gmane.org/gmane.emacs.devel/139119
                 Re: stack overflow limit
                 The value of re_max_failures we use now needs 4MB of stack on
                 a 32-but machine, twice as much on a 64-bit machine. We also
                 need stack space for GC.

From official docs:

For integers: 28bit + sign.
For chars: 22-bit.

Next types have unknown or undefined size limits in manual but:

================================================================

For float: Emacs uses the IEEE floating point standard where possible. But
which precision exactly (half/single/double
http://en.wikipedia.org/wiki/IEEE_754#Basic_formats)?

/* Lisp floating point type.  */
struct Lisp_Float  /* src/lisp.h */
  {
    union
    {
      double data;
      struct Lisp_Float *chain;
    } u;
  };

Seems it uses 64-bit (double precision) IEEE 754 on most of 32-bit platforms.

Any function in runtime that return digits and exponent width for float?

================================================================

For list: I think their length unlimited at all.

================================================================

But how many bytes take symbol? For example 'foo'?

From src/lisp.h:

typedef struct { EMACS_INT i; } Lisp_Object;

struct Lisp_Symbol
{
  unsigned gcmarkbit : 1;
  ENUM_BF (symbol_redirect) redirect : 3;
  unsigned constant : 2;
  unsigned interned : 2;
  unsigned declared_special : 1;
  Lisp_Object name;
  union {
    Lisp_Object value;
    struct Lisp_Symbol *alias;
    struct Lisp_Buffer_Local_Value *blv;
    union Lisp_Fwd *fwd;
  } val;
  Lisp_Object function;
  Lisp_Object plist;
  struct Lisp_Symbol *next;
};

For 32-bit arch I count 4*6=24 bytes.

Seems that Lisp_Object is index in hash table to actual values (like actual
name or function code...).

================================================================

How many memory takes cons cell?

struct Lisp_Cons
  {
    Lisp_Object car;
    union
    {
      Lisp_Object cdr;
      struct Lisp_Cons *chain;
    } u;
  };

For 32-bit arch I count 4*2=8 bytes.

================================================================

How many takes plist for storing single property?

From:

DEFUN ("plist-put", Fplist_put, Splist_put, 3, 3, 0,
  (Lisp_Object plist, register Lisp_Object prop, Lisp_Object val)
{
  register Lisp_Object tail, prev;
  Lisp_Object newcell;
  prev = Qnil;
  for (tail = plist; CONSP (tail) && CONSP (XCDR (tail));
       tail = XCDR (XCDR (tail)))

seems that 2 cons... or 8*2=16 bytes.

================================================================

How many memory takes string (which is buffer strings and symbols names)?

typedef struct interval *INTERVAL;
struct Lisp_String
  {
    ptrdiff_t size;
    ptrdiff_t size_byte;
    INTERVAL intervals;		/* Text properties in this string.  */
    unsigned char *data;
  };

Seems that 3*4 + lengthOf(data) bytes.

Manual say that "strings really contain integers" and "strings are arrays, and
therefore sequences as well".

So each char (in data) uses 4 bytes? Seem doesn't. As

     To conserve memory, Emacs does not hold fixed-length 22-bit numbers that
  are codepoints of text characters within buffers and strings. Rather, Emacs
  uses a variable-length internal representation of characters, that stores
  each character as a sequence of 1 to 5 8-bit bytes, depending on the
  magnitude of its codepoint.

and:

  Encoded text is not really text, as far as Emacs is concerned, but rather a
  sequence of raw 8-bit bytes. We call buffers and strings that hold encoded
  text "unibyte" buffers and strings, because Emacs treats them as a sequence
  of individual bytes.

With unibyte I understand that it is easy to get char by index.

But with multibyte I don't understand. And don't understand why in this case
string are array, is it an inefficient array?

Seems that buffer text == string:

struct buffer_text   /* from src/buffer.h */
  {
    unsigned char *beg;
    ptrdiff_t gpt;		/* Char pos of gap in buffer.  */
    ptrdiff_t z;		/* Char pos of end of buffer.  */
    ptrdiff_t gpt_byte;		/* Byte pos of gap in buffer.  */
    ptrdiff_t z_byte;		/* Byte pos of end of buffer.  */
    ptrdiff_t gap_size;		/* Size of buffer's gap.  */
    EMACS_INT modiff;		/* This counts buffer-modification events
    EMACS_INT chars_modiff;	/* This is modified with character change
    EMACS_INT save_modiff;	/* Previous value of modiff, as of last
    EMACS_INT overlay_modiff;	/* Counts modifications to overlays.  */
    EMACS_INT compact;		/* Set to modiff each time when compact_buffer
    ptrdiff_t beg_unchanged;
    ptrdiff_t end_unchanged;
    EMACS_INT unchanged_modified;
    EMACS_INT overlay_unchanged_modified;
    INTERVAL intervals;
    struct Lisp_Marker *markers;
    bool inhibit_shrinking;
  };

So opening 10 KiB Russian file in cp1251 actually take 2*10 KiB for buffer as
each Russian chars in multibyte string take 2 bytes... (just type C-u C-x =
and look to "buffer code: #xD0 #x91").

I think that string have no length limit (except limit in 28-bit for index on
32-bit platform).

================================================================

Seems that arrays/vectors also have no limits for length (except limit in
28-bit for index on 32-bit platform):

/* Regular vector is just a header plus array of Lisp_Objects.  */
struct Lisp_Vector   /* src/lisp.h */
  {
    struct vectorlike_header header;
    Lisp_Object contents[1];
  };

/* A boolvector is a kind of vectorlike, with contents are like a string.  */
struct Lisp_Bool_Vector
  {
    struct vectorlike_header header;
    /* This is the size in bits.  */
    EMACS_INT size;
    /* This contains the actual bits, packed into bytes.  */
    unsigned char data[1];
  };

================================================================

Hash tables are harder data type and I don't understand limitations on count
of key-values pairs from:

struct Lisp_Hash_Table
{
  struct vectorlike_header header;
  Lisp_Object weak;
  Lisp_Object rehash_size;
  Lisp_Object rehash_threshold;
  Lisp_Object hash;
  Lisp_Object next;
  Lisp_Object next_free;
  Lisp_Object index;
  ptrdiff_t count;
  Lisp_Object key_and_value;
  struct hash_table_test test;
  struct Lisp_Hash_Table *next_weak;
};

================================================================

Please correct me and answer the questions...

-- 
Best regards!




^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-02-06 18:46 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-22 22:06 Size and length limits for Emacs primitive types and etc data? Oleksandr Gavenko
2013-02-03 13:56 ` Aurélien Aptel
2013-02-03 19:16   ` Eli Zaretskii
2013-02-04 12:38     ` Aurélien Aptel
2013-02-04 15:57       ` Eli Zaretskii
2013-02-05  9:41         ` Oleksandr Gavenko
2013-02-05 18:14           ` Eli Zaretskii
2013-02-05 20:17             ` Oleksandr Gavenko
2013-02-05 21:35               ` Eli Zaretskii
2013-02-06 18:46               ` Stefan Monnier
     [not found]           ` <mailman.19079.1360088047.855.help-gnu-emacs@gnu.org>
2013-02-05 19:06             ` Burton Samograd
2013-02-05 20:04               ` Oleksandr Gavenko
2013-02-05 21:28               ` Eli Zaretskii
2013-02-05 22:25           ` Peter Dyballa

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.