unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
From: Oleksandr Gavenko <gavenkoa@gmail.com>
To: help-gnu-emacs@gnu.org
Subject: Size and length limits for Emacs primitive types and etc data?
Date: Wed, 23 Jan 2013 00:06:04 +0200	[thread overview]
Message-ID: <87sj5s50vn.fsf@gavenkoa.example.com> (raw)

during search I found these sources of information about limits of Emacs runtime:

  (info "(elisp)Programming Types")
                Programming Types
  http://www.emacswiki.org/emacs/EmacsFileSizeLimit
                EmacsFileSizeLimit
  http://article.gmane.org/gmane.emacs.devel/139119
                 Re: stack overflow limit
                 The value of re_max_failures we use now needs 4MB of stack on
                 a 32-but machine, twice as much on a 64-bit machine. We also
                 need stack space for GC.

From official docs:

For integers: 28bit + sign.
For chars: 22-bit.

Next types have unknown or undefined size limits in manual but:

================================================================

For float: Emacs uses the IEEE floating point standard where possible. But
which precision exactly (half/single/double
http://en.wikipedia.org/wiki/IEEE_754#Basic_formats)?

/* Lisp floating point type.  */
struct Lisp_Float  /* src/lisp.h */
  {
    union
    {
      double data;
      struct Lisp_Float *chain;
    } u;
  };

Seems it uses 64-bit (double precision) IEEE 754 on most of 32-bit platforms.

Any function in runtime that return digits and exponent width for float?

================================================================

For list: I think their length unlimited at all.

================================================================

But how many bytes take symbol? For example 'foo'?

From src/lisp.h:

typedef struct { EMACS_INT i; } Lisp_Object;

struct Lisp_Symbol
{
  unsigned gcmarkbit : 1;
  ENUM_BF (symbol_redirect) redirect : 3;
  unsigned constant : 2;
  unsigned interned : 2;
  unsigned declared_special : 1;
  Lisp_Object name;
  union {
    Lisp_Object value;
    struct Lisp_Symbol *alias;
    struct Lisp_Buffer_Local_Value *blv;
    union Lisp_Fwd *fwd;
  } val;
  Lisp_Object function;
  Lisp_Object plist;
  struct Lisp_Symbol *next;
};

For 32-bit arch I count 4*6=24 bytes.

Seems that Lisp_Object is index in hash table to actual values (like actual
name or function code...).

================================================================

How many memory takes cons cell?

struct Lisp_Cons
  {
    Lisp_Object car;
    union
    {
      Lisp_Object cdr;
      struct Lisp_Cons *chain;
    } u;
  };

For 32-bit arch I count 4*2=8 bytes.

================================================================

How many takes plist for storing single property?

From:

DEFUN ("plist-put", Fplist_put, Splist_put, 3, 3, 0,
  (Lisp_Object plist, register Lisp_Object prop, Lisp_Object val)
{
  register Lisp_Object tail, prev;
  Lisp_Object newcell;
  prev = Qnil;
  for (tail = plist; CONSP (tail) && CONSP (XCDR (tail));
       tail = XCDR (XCDR (tail)))

seems that 2 cons... or 8*2=16 bytes.

================================================================

How many memory takes string (which is buffer strings and symbols names)?

typedef struct interval *INTERVAL;
struct Lisp_String
  {
    ptrdiff_t size;
    ptrdiff_t size_byte;
    INTERVAL intervals;		/* Text properties in this string.  */
    unsigned char *data;
  };

Seems that 3*4 + lengthOf(data) bytes.

Manual say that "strings really contain integers" and "strings are arrays, and
therefore sequences as well".

So each char (in data) uses 4 bytes? Seem doesn't. As

     To conserve memory, Emacs does not hold fixed-length 22-bit numbers that
  are codepoints of text characters within buffers and strings. Rather, Emacs
  uses a variable-length internal representation of characters, that stores
  each character as a sequence of 1 to 5 8-bit bytes, depending on the
  magnitude of its codepoint.

and:

  Encoded text is not really text, as far as Emacs is concerned, but rather a
  sequence of raw 8-bit bytes. We call buffers and strings that hold encoded
  text "unibyte" buffers and strings, because Emacs treats them as a sequence
  of individual bytes.

With unibyte I understand that it is easy to get char by index.

But with multibyte I don't understand. And don't understand why in this case
string are array, is it an inefficient array?

Seems that buffer text == string:

struct buffer_text   /* from src/buffer.h */
  {
    unsigned char *beg;
    ptrdiff_t gpt;		/* Char pos of gap in buffer.  */
    ptrdiff_t z;		/* Char pos of end of buffer.  */
    ptrdiff_t gpt_byte;		/* Byte pos of gap in buffer.  */
    ptrdiff_t z_byte;		/* Byte pos of end of buffer.  */
    ptrdiff_t gap_size;		/* Size of buffer's gap.  */
    EMACS_INT modiff;		/* This counts buffer-modification events
    EMACS_INT chars_modiff;	/* This is modified with character change
    EMACS_INT save_modiff;	/* Previous value of modiff, as of last
    EMACS_INT overlay_modiff;	/* Counts modifications to overlays.  */
    EMACS_INT compact;		/* Set to modiff each time when compact_buffer
    ptrdiff_t beg_unchanged;
    ptrdiff_t end_unchanged;
    EMACS_INT unchanged_modified;
    EMACS_INT overlay_unchanged_modified;
    INTERVAL intervals;
    struct Lisp_Marker *markers;
    bool inhibit_shrinking;
  };

So opening 10 KiB Russian file in cp1251 actually take 2*10 KiB for buffer as
each Russian chars in multibyte string take 2 bytes... (just type C-u C-x =
and look to "buffer code: #xD0 #x91").

I think that string have no length limit (except limit in 28-bit for index on
32-bit platform).

================================================================

Seems that arrays/vectors also have no limits for length (except limit in
28-bit for index on 32-bit platform):

/* Regular vector is just a header plus array of Lisp_Objects.  */
struct Lisp_Vector   /* src/lisp.h */
  {
    struct vectorlike_header header;
    Lisp_Object contents[1];
  };

/* A boolvector is a kind of vectorlike, with contents are like a string.  */
struct Lisp_Bool_Vector
  {
    struct vectorlike_header header;
    /* This is the size in bits.  */
    EMACS_INT size;
    /* This contains the actual bits, packed into bytes.  */
    unsigned char data[1];
  };

================================================================

Hash tables are harder data type and I don't understand limitations on count
of key-values pairs from:

struct Lisp_Hash_Table
{
  struct vectorlike_header header;
  Lisp_Object weak;
  Lisp_Object rehash_size;
  Lisp_Object rehash_threshold;
  Lisp_Object hash;
  Lisp_Object next;
  Lisp_Object next_free;
  Lisp_Object index;
  ptrdiff_t count;
  Lisp_Object key_and_value;
  struct hash_table_test test;
  struct Lisp_Hash_Table *next_weak;
};

================================================================

Please correct me and answer the questions...

-- 
Best regards!




             reply	other threads:[~2013-01-22 22:06 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-22 22:06 Oleksandr Gavenko [this message]
2013-02-03 13:56 ` Size and length limits for Emacs primitive types and etc data? Aurélien Aptel
2013-02-03 19:16   ` Eli Zaretskii
2013-02-04 12:38     ` Aurélien Aptel
2013-02-04 15:57       ` Eli Zaretskii
2013-02-05  9:41         ` Oleksandr Gavenko
2013-02-05 18:14           ` Eli Zaretskii
2013-02-05 20:17             ` Oleksandr Gavenko
2013-02-05 21:35               ` Eli Zaretskii
2013-02-06 18:46               ` Stefan Monnier
     [not found]           ` <mailman.19079.1360088047.855.help-gnu-emacs@gnu.org>
2013-02-05 19:06             ` Burton Samograd
2013-02-05 20:04               ` Oleksandr Gavenko
2013-02-05 21:28               ` Eli Zaretskii
2013-02-05 22:25           ` Peter Dyballa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sj5s50vn.fsf@gavenkoa.example.com \
    --to=gavenkoa@gmail.com \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).