From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Oleksandr Gavenko Newsgroups: gmane.emacs.help Subject: Size and length limits for Emacs primitive types and etc data? Date: Wed, 23 Jan 2013 00:06:04 +0200 Organization: Oleksandr Gavenko , http://gavenkoa.users.sf.net Message-ID: <87sj5s50vn.fsf@gavenkoa.example.com> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1358892386 15095 80.91.229.3 (22 Jan 2013 22:06:26 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 22 Jan 2013 22:06:26 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Tue Jan 22 23:06:45 2013 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TxlzQ-0004Wv-D2 for geh-help-gnu-emacs@m.gmane.org; Tue, 22 Jan 2013 23:06:44 +0100 Original-Received: from localhost ([::1]:46481 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Txlz9-0007T6-4M for geh-help-gnu-emacs@m.gmane.org; Tue, 22 Jan 2013 17:06:27 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:50033) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Txlz2-0007T0-Pi for help-gnu-emacs@gnu.org; Tue, 22 Jan 2013 17:06:22 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Txlyy-0006qT-LS for help-gnu-emacs@gnu.org; Tue, 22 Jan 2013 17:06:20 -0500 Original-Received: from plane.gmane.org ([80.91.229.3]:36651) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Txlyy-0006qN-BM for help-gnu-emacs@gnu.org; Tue, 22 Jan 2013 17:06:16 -0500 Original-Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1TxlzD-0004MO-Gr for help-gnu-emacs@gnu.org; Tue, 22 Jan 2013 23:06:31 +0100 Original-Received: from 37.229.4.200 ([37.229.4.200]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 22 Jan 2013 23:06:31 +0100 Original-Received: from gavenkoa by 37.229.4.200 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 22 Jan 2013 23:06:31 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 226 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: 37.229.4.200 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (gnu/linux) Cancel-Lock: sha1:+6wEIyTgyle98c4mANf6Sog9rw8= X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 80.91.229.3 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:88772 Archived-At: during search I found these sources of information about limits of Emacs runtime: (info "(elisp)Programming Types") Programming Types http://www.emacswiki.org/emacs/EmacsFileSizeLimit EmacsFileSizeLimit http://article.gmane.org/gmane.emacs.devel/139119 Re: stack overflow limit The value of re_max_failures we use now needs 4MB of stack on a 32-but machine, twice as much on a 64-bit machine. We also need stack space for GC. >From official docs: For integers: 28bit + sign. For chars: 22-bit. Next types have unknown or undefined size limits in manual but: ================================================================ For float: Emacs uses the IEEE floating point standard where possible. But which precision exactly (half/single/double http://en.wikipedia.org/wiki/IEEE_754#Basic_formats)? /* Lisp floating point type. */ struct Lisp_Float /* src/lisp.h */ { union { double data; struct Lisp_Float *chain; } u; }; Seems it uses 64-bit (double precision) IEEE 754 on most of 32-bit platforms. Any function in runtime that return digits and exponent width for float? ================================================================ For list: I think their length unlimited at all. ================================================================ But how many bytes take symbol? For example 'foo'? >From src/lisp.h: typedef struct { EMACS_INT i; } Lisp_Object; struct Lisp_Symbol { unsigned gcmarkbit : 1; ENUM_BF (symbol_redirect) redirect : 3; unsigned constant : 2; unsigned interned : 2; unsigned declared_special : 1; Lisp_Object name; union { Lisp_Object value; struct Lisp_Symbol *alias; struct Lisp_Buffer_Local_Value *blv; union Lisp_Fwd *fwd; } val; Lisp_Object function; Lisp_Object plist; struct Lisp_Symbol *next; }; For 32-bit arch I count 4*6=24 bytes. Seems that Lisp_Object is index in hash table to actual values (like actual name or function code...). ================================================================ How many memory takes cons cell? struct Lisp_Cons { Lisp_Object car; union { Lisp_Object cdr; struct Lisp_Cons *chain; } u; }; For 32-bit arch I count 4*2=8 bytes. ================================================================ How many takes plist for storing single property? From: DEFUN ("plist-put", Fplist_put, Splist_put, 3, 3, 0, (Lisp_Object plist, register Lisp_Object prop, Lisp_Object val) { register Lisp_Object tail, prev; Lisp_Object newcell; prev = Qnil; for (tail = plist; CONSP (tail) && CONSP (XCDR (tail)); tail = XCDR (XCDR (tail))) seems that 2 cons... or 8*2=16 bytes. ================================================================ How many memory takes string (which is buffer strings and symbols names)? typedef struct interval *INTERVAL; struct Lisp_String { ptrdiff_t size; ptrdiff_t size_byte; INTERVAL intervals; /* Text properties in this string. */ unsigned char *data; }; Seems that 3*4 + lengthOf(data) bytes. Manual say that "strings really contain integers" and "strings are arrays, and therefore sequences as well". So each char (in data) uses 4 bytes? Seem doesn't. As To conserve memory, Emacs does not hold fixed-length 22-bit numbers that are codepoints of text characters within buffers and strings. Rather, Emacs uses a variable-length internal representation of characters, that stores each character as a sequence of 1 to 5 8-bit bytes, depending on the magnitude of its codepoint. and: Encoded text is not really text, as far as Emacs is concerned, but rather a sequence of raw 8-bit bytes. We call buffers and strings that hold encoded text "unibyte" buffers and strings, because Emacs treats them as a sequence of individual bytes. With unibyte I understand that it is easy to get char by index. But with multibyte I don't understand. And don't understand why in this case string are array, is it an inefficient array? Seems that buffer text == string: struct buffer_text /* from src/buffer.h */ { unsigned char *beg; ptrdiff_t gpt; /* Char pos of gap in buffer. */ ptrdiff_t z; /* Char pos of end of buffer. */ ptrdiff_t gpt_byte; /* Byte pos of gap in buffer. */ ptrdiff_t z_byte; /* Byte pos of end of buffer. */ ptrdiff_t gap_size; /* Size of buffer's gap. */ EMACS_INT modiff; /* This counts buffer-modification events EMACS_INT chars_modiff; /* This is modified with character change EMACS_INT save_modiff; /* Previous value of modiff, as of last EMACS_INT overlay_modiff; /* Counts modifications to overlays. */ EMACS_INT compact; /* Set to modiff each time when compact_buffer ptrdiff_t beg_unchanged; ptrdiff_t end_unchanged; EMACS_INT unchanged_modified; EMACS_INT overlay_unchanged_modified; INTERVAL intervals; struct Lisp_Marker *markers; bool inhibit_shrinking; }; So opening 10 KiB Russian file in cp1251 actually take 2*10 KiB for buffer as each Russian chars in multibyte string take 2 bytes... (just type C-u C-x = and look to "buffer code: #xD0 #x91"). I think that string have no length limit (except limit in 28-bit for index on 32-bit platform). ================================================================ Seems that arrays/vectors also have no limits for length (except limit in 28-bit for index on 32-bit platform): /* Regular vector is just a header plus array of Lisp_Objects. */ struct Lisp_Vector /* src/lisp.h */ { struct vectorlike_header header; Lisp_Object contents[1]; }; /* A boolvector is a kind of vectorlike, with contents are like a string. */ struct Lisp_Bool_Vector { struct vectorlike_header header; /* This is the size in bits. */ EMACS_INT size; /* This contains the actual bits, packed into bytes. */ unsigned char data[1]; }; ================================================================ Hash tables are harder data type and I don't understand limitations on count of key-values pairs from: struct Lisp_Hash_Table { struct vectorlike_header header; Lisp_Object weak; Lisp_Object rehash_size; Lisp_Object rehash_threshold; Lisp_Object hash; Lisp_Object next; Lisp_Object next_free; Lisp_Object index; ptrdiff_t count; Lisp_Object key_and_value; struct hash_table_test test; struct Lisp_Hash_Table *next_weak; }; ================================================================ Please correct me and answer the questions... -- Best regards!