From: Oleksandr Gavenko <gavenkoa@gmail.com>
To: help-gnu-emacs@gnu.org
Subject: Size and length limits for Emacs primitive types and etc data?
Date: Wed, 23 Jan 2013 00:06:04 +0200 [thread overview]
Message-ID: <87sj5s50vn.fsf@gavenkoa.example.com> (raw)
during search I found these sources of information about limits of Emacs runtime:
(info "(elisp)Programming Types")
Programming Types
http://www.emacswiki.org/emacs/EmacsFileSizeLimit
EmacsFileSizeLimit
http://article.gmane.org/gmane.emacs.devel/139119
Re: stack overflow limit
The value of re_max_failures we use now needs 4MB of stack on
a 32-but machine, twice as much on a 64-bit machine. We also
need stack space for GC.
From official docs:
For integers: 28bit + sign.
For chars: 22-bit.
Next types have unknown or undefined size limits in manual but:
================================================================
For float: Emacs uses the IEEE floating point standard where possible. But
which precision exactly (half/single/double
http://en.wikipedia.org/wiki/IEEE_754#Basic_formats)?
/* Lisp floating point type. */
struct Lisp_Float /* src/lisp.h */
{
union
{
double data;
struct Lisp_Float *chain;
} u;
};
Seems it uses 64-bit (double precision) IEEE 754 on most of 32-bit platforms.
Any function in runtime that return digits and exponent width for float?
================================================================
For list: I think their length unlimited at all.
================================================================
But how many bytes take symbol? For example 'foo'?
From src/lisp.h:
typedef struct { EMACS_INT i; } Lisp_Object;
struct Lisp_Symbol
{
unsigned gcmarkbit : 1;
ENUM_BF (symbol_redirect) redirect : 3;
unsigned constant : 2;
unsigned interned : 2;
unsigned declared_special : 1;
Lisp_Object name;
union {
Lisp_Object value;
struct Lisp_Symbol *alias;
struct Lisp_Buffer_Local_Value *blv;
union Lisp_Fwd *fwd;
} val;
Lisp_Object function;
Lisp_Object plist;
struct Lisp_Symbol *next;
};
For 32-bit arch I count 4*6=24 bytes.
Seems that Lisp_Object is index in hash table to actual values (like actual
name or function code...).
================================================================
How many memory takes cons cell?
struct Lisp_Cons
{
Lisp_Object car;
union
{
Lisp_Object cdr;
struct Lisp_Cons *chain;
} u;
};
For 32-bit arch I count 4*2=8 bytes.
================================================================
How many takes plist for storing single property?
From:
DEFUN ("plist-put", Fplist_put, Splist_put, 3, 3, 0,
(Lisp_Object plist, register Lisp_Object prop, Lisp_Object val)
{
register Lisp_Object tail, prev;
Lisp_Object newcell;
prev = Qnil;
for (tail = plist; CONSP (tail) && CONSP (XCDR (tail));
tail = XCDR (XCDR (tail)))
seems that 2 cons... or 8*2=16 bytes.
================================================================
How many memory takes string (which is buffer strings and symbols names)?
typedef struct interval *INTERVAL;
struct Lisp_String
{
ptrdiff_t size;
ptrdiff_t size_byte;
INTERVAL intervals; /* Text properties in this string. */
unsigned char *data;
};
Seems that 3*4 + lengthOf(data) bytes.
Manual say that "strings really contain integers" and "strings are arrays, and
therefore sequences as well".
So each char (in data) uses 4 bytes? Seem doesn't. As
To conserve memory, Emacs does not hold fixed-length 22-bit numbers that
are codepoints of text characters within buffers and strings. Rather, Emacs
uses a variable-length internal representation of characters, that stores
each character as a sequence of 1 to 5 8-bit bytes, depending on the
magnitude of its codepoint.
and:
Encoded text is not really text, as far as Emacs is concerned, but rather a
sequence of raw 8-bit bytes. We call buffers and strings that hold encoded
text "unibyte" buffers and strings, because Emacs treats them as a sequence
of individual bytes.
With unibyte I understand that it is easy to get char by index.
But with multibyte I don't understand. And don't understand why in this case
string are array, is it an inefficient array?
Seems that buffer text == string:
struct buffer_text /* from src/buffer.h */
{
unsigned char *beg;
ptrdiff_t gpt; /* Char pos of gap in buffer. */
ptrdiff_t z; /* Char pos of end of buffer. */
ptrdiff_t gpt_byte; /* Byte pos of gap in buffer. */
ptrdiff_t z_byte; /* Byte pos of end of buffer. */
ptrdiff_t gap_size; /* Size of buffer's gap. */
EMACS_INT modiff; /* This counts buffer-modification events
EMACS_INT chars_modiff; /* This is modified with character change
EMACS_INT save_modiff; /* Previous value of modiff, as of last
EMACS_INT overlay_modiff; /* Counts modifications to overlays. */
EMACS_INT compact; /* Set to modiff each time when compact_buffer
ptrdiff_t beg_unchanged;
ptrdiff_t end_unchanged;
EMACS_INT unchanged_modified;
EMACS_INT overlay_unchanged_modified;
INTERVAL intervals;
struct Lisp_Marker *markers;
bool inhibit_shrinking;
};
So opening 10 KiB Russian file in cp1251 actually take 2*10 KiB for buffer as
each Russian chars in multibyte string take 2 bytes... (just type C-u C-x =
and look to "buffer code: #xD0 #x91").
I think that string have no length limit (except limit in 28-bit for index on
32-bit platform).
================================================================
Seems that arrays/vectors also have no limits for length (except limit in
28-bit for index on 32-bit platform):
/* Regular vector is just a header plus array of Lisp_Objects. */
struct Lisp_Vector /* src/lisp.h */
{
struct vectorlike_header header;
Lisp_Object contents[1];
};
/* A boolvector is a kind of vectorlike, with contents are like a string. */
struct Lisp_Bool_Vector
{
struct vectorlike_header header;
/* This is the size in bits. */
EMACS_INT size;
/* This contains the actual bits, packed into bytes. */
unsigned char data[1];
};
================================================================
Hash tables are harder data type and I don't understand limitations on count
of key-values pairs from:
struct Lisp_Hash_Table
{
struct vectorlike_header header;
Lisp_Object weak;
Lisp_Object rehash_size;
Lisp_Object rehash_threshold;
Lisp_Object hash;
Lisp_Object next;
Lisp_Object next_free;
Lisp_Object index;
ptrdiff_t count;
Lisp_Object key_and_value;
struct hash_table_test test;
struct Lisp_Hash_Table *next_weak;
};
================================================================
Please correct me and answer the questions...
--
Best regards!
next reply other threads:[~2013-01-22 22:06 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-22 22:06 Oleksandr Gavenko [this message]
2013-02-03 13:56 ` Size and length limits for Emacs primitive types and etc data? Aurélien Aptel
2013-02-03 19:16 ` Eli Zaretskii
2013-02-04 12:38 ` Aurélien Aptel
2013-02-04 15:57 ` Eli Zaretskii
2013-02-05 9:41 ` Oleksandr Gavenko
2013-02-05 18:14 ` Eli Zaretskii
2013-02-05 20:17 ` Oleksandr Gavenko
2013-02-05 21:35 ` Eli Zaretskii
2013-02-06 18:46 ` Stefan Monnier
[not found] ` <mailman.19079.1360088047.855.help-gnu-emacs@gnu.org>
2013-02-05 19:06 ` Burton Samograd
2013-02-05 20:04 ` Oleksandr Gavenko
2013-02-05 21:28 ` Eli Zaretskii
2013-02-05 22:25 ` Peter Dyballa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87sj5s50vn.fsf@gavenkoa.example.com \
--to=gavenkoa@gmail.com \
--cc=help-gnu-emacs@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).