From: Dmitry Antipov <dmantipov@yandex.ru>
To: Emacs development discussions <emacs-devel@gnu.org>
Subject: [corrector(s) needed] doc/lispref/internals.texi tweaks
Date: Mon, 12 Nov 2012 15:20:47 +0400 [thread overview]
Message-ID: <50A0DB8F.1090008@yandex.ru> (raw)
[-- Attachment #1: Type: text/plain, Size: 105 bytes --]
There are some bits for doc/lispref/internals.texi, which looks a bit
outdated and incomplete...
Dmitry
[-- Attachment #2: internals.patch --]
[-- Type: text/plain, Size: 12095 bytes --]
=== modified file 'doc/lispref/internals.texi'
--- doc/lispref/internals.texi 2012-06-27 05:21:15 +0000
+++ doc/lispref/internals.texi 2012-11-12 11:18:55 +0000
@@ -226,12 +226,11 @@
Beyond the basic vector, a lot of objects like window, buffer, and
frame are managed as if they were vectors. The corresponding C data
structures include the @code{struct vectorlike_header} field whose
-@code{next} field points to the next object in the chain:
-@code{header.next.buffer} points to the next buffer (which could be
-a killed buffer), and @code{header.next.vector} points to the next
-vector in a free list. If a vector is small (smaller than or equal to
-@code{VBLOCK_BYTES_MAX} bytes, see @file{alloc.c}), then
-@code{header.next.nbytes} contains the vector size in bytes.
+@code{size} member contains the subtype enumerated by @code{enum pvec_type}
+and an information about how many Lisp_Object fields this structure
+contains and what the size of the rest data is. This information is
+needed to calculate the memory footprint of an object, and used
+by the vector allocation code while iterating over the vector blocks.
@cindex garbage collection
It is quite common to use some storage for a while, then release it
@@ -284,89 +283,35 @@
spontaneously if you use more than @code{gc-cons-threshold} bytes of
Lisp data since the previous garbage collection.)
-@code{garbage-collect} returns a list containing the following
-information:
-
-@example
-@group
-((@var{used-conses} . @var{free-conses})
- (@var{used-syms} . @var{free-syms})
-@end group
- (@var{used-miscs} . @var{free-miscs})
- @var{used-string-chars}
- @var{used-vector-slots}
- (@var{used-floats} . @var{free-floats})
- (@var{used-intervals} . @var{free-intervals})
- (@var{used-strings} . @var{free-strings}))
-@end example
-
-Here is an example:
-
-@example
-@group
-(garbage-collect)
- @result{} ((106886 . 13184) (9769 . 0)
- (7731 . 4651) 347543 121628
- (31 . 94) (1273 . 168)
- (25474 . 3569))
-@end group
-@end example
-
-Here is a table explaining each element:
-
-@table @var
-@item used-conses
-The number of cons cells in use.
-
-@item free-conses
-The number of cons cells for which space has been obtained from the
-operating system, but that are not currently being used.
-
-@item used-syms
-The number of symbols in use.
-
-@item free-syms
-The number of symbols for which space has been obtained from the
-operating system, but that are not currently being used.
-
-@item used-miscs
-The number of miscellaneous objects in use. These include markers and
-overlays, plus certain objects not visible to users.
-
-@item free-miscs
-The number of miscellaneous objects for which space has been obtained
-from the operating system, but that are not currently being used.
-
-@item used-string-chars
-The total size of all strings, in characters.
-
-@item used-vector-slots
-The total number of elements of existing vectors.
-
-@item used-floats
-The number of floats in use.
-
-@item free-floats
-The number of floats for which space has been obtained from the
-operating system, but that are not currently being used.
-
-@item used-intervals
-The number of intervals in use. Intervals are an internal
-data structure used for representing text properties.
-
-@item free-intervals
-The number of intervals for which space has been obtained
-from the operating system, but that are not currently being used.
-
-@item used-strings
-The number of strings in use.
-
-@item free-strings
-The number of string headers for which the space was obtained from the
-operating system, but which are currently not in use. (A string
-object consists of a header and the storage for the string text
-itself; the latter is only allocated when the string is created.)
-@end table
+@code{garbage-collect} returns a list with information on amount of space
+in use, where each entry has the form @samp{(name size used free)}. In the
+entry, @samp{name} is a symbol describing the kind of objects this entry
+represents, @samp{size} is the number of bytes used by each one, @samp{used}
+is the number of those objects that were found live in the heap, and
+@samp{free} is the number of those objects that are not live but that Emacs
+keeps around for future allocations. Here is an example:
+
+@example
+((@var{conses} 16 50589 8907) (@var{symbols} 48 14759 0)
+ (@var{miscs} 40 37 119) (@var{strings} 32 3610 4481)
+ (@var{string-bytes} 1 96823) (@var{vectors} 16 7471)
+ (@var{vector-slots} 8 344767 27849) (@var{floats} 8 76 111)
+ (@var{intervals} 56 49 18) (@var{buffers} 944 9)
+ (@var{heap} 1024 14654 2363))
+@end example
+
+First entry means that the internal size of a cons cell is 16 bytes, there
+are 50589 used cons cells and 8907 conses are on the free list. Likewise
+for symbols, floats and intervals. Freed buffers aren't collected in the
+free list, and the corresponding entry has just two numbers (internal size
+of @code{struct buffer} and amount of buffers in @code{all_buffers} list).
+Miscellaneous objects at @var{misc} includes markers and overlays plus
+certain objects not visible to users. Since string objects consists of
+a header and the storage for the string text itself, there are two entries
+for them: @var{strings} counts headers and @var{string-bytes} counts
+the total number of bytes in the strings. The same applies to the vectors
+with @var{vectors} and @var{vector-slots}. Finally, the last member
+means that the total heap size is 14654 Kb, and 2363 Kb of them are free.
If there was overflow in pure space (@pxref{Pure Storage}),
@code{garbage-collect} returns @code{nil}, because a real garbage
@@ -639,7 +584,12 @@
the number of Lisp arguments, it must have exactly two C arguments:
the first is the number of Lisp arguments, and the second is the
address of a block containing their values. These have types
-@code{int} and @w{@code{Lisp_Object *}} respectively.
+@code{int} and @w{@code{Lisp_Object *}} respectively. Since
+@code{Lisp_Object} can hold any Lisp object of any data type, you
+can determine the actual data type only at run time; so if you want
+a primitive to accept only a certain type of argument, you must check
+the type explicitly using a suitable predicate (@pxref{Type Predicates}).
+@cindex type checking internals
@cindex @code{GCPRO} and @code{UNGCPRO}
@cindex protect C variables from garbage collection
@@ -820,23 +770,69 @@
@section Object Internals
@cindex object internals
-@c FIXME Is this still true? Does --with-wide-int affect anything?
- GNU Emacs Lisp manipulates many different types of data. The actual
-data are stored in a heap and the only access that programs have to it
-is through pointers. Each pointer is 32 bits wide on 32-bit machines,
-and 64 bits wide on 64-bit machines; three of these bits are used for
-the tag that identifies the object's type, and the remainder are used
-to address the object.
-
- Because Lisp objects are represented as tagged pointers, it is always
-possible to determine the Lisp data type of any object. The C data type
-@code{Lisp_Object} can hold any Lisp object of any data type. Ordinary
-variables have type @code{Lisp_Object}, which means they can hold any
-type of Lisp value; you can determine the actual data type only at run
-time. The same is true for function arguments; if you want a function
-to accept only a certain type of argument, you must check the type
-explicitly using a suitable predicate (@pxref{Type Predicates}).
-@cindex type checking internals
+ Emacs Lisp provides a rich set of the data types. Some of them, like cons
+cells, integers and stirngs, are common to nearly all Lisp dialects. Some
+others, like markers and buffers, are quite special and needed to provide
+the basic support to write an editor commands in Lisp. To implement such
+a variety of object types and provide an efficient way to pass objects between
+the subsystems of an interpreter, there is a set of C data structures and
+a special type to represent the pointers to all of them, which is known as
+tagged pointer.
+
+ In C, the tagged pointer is an object of type @code{Lisp_Object}. Any
+initialized variable of such a type always holds the value of one of the
+following basic data types: integer, symbol, string, cons cell, float,
+vectorlike or miscellaneous object. Each of these data types has the
+corresponding tag value. All tags are enumerated by @code{enum Lisp_Type}
+and placed into a 3-bits bitfield of the @code{Lisp_Object}. The rest bits of
+@code{Lisp_Object} is the value itself. Integer values are immediate, e.g.
+directly represented by the rest bits, and all other objects are represented by
+the C pointers to a corresponding object allocated from the heap. Width of the
+@code{Lisp_Object} is platform- and configuration-dependent: usually it's equal
+to the width of an underlying platform pointer (e.g. 32-bit on a 32-bit machine
+and 64-bit on a 64-bit one), but also there is a special configuration where
+@code{Lisp_Object} is 64-bit but all pointers are 32-bit. The latter trick
+was designed to overcome the limited range of values for Lisp integers on
+a 32-bit system by using 64-bit @code{long long} type for @code{Lisp_Object}.
+
+ The following C data structures are defined in @file{lisp.h} to represent
+the basic data types beyond integers:
+
+@table @code
+@item struct Lisp_Cons
+Cons cell, an object used to construct lists.
+
+@item struct Lisp_String
+String, the basic object to represent a sequence of characters.
+
+@item struct Lisp_Vector
+Array, a fixed-size set of Lisp_Objects which may be accessed by an index.
+
+@item struct Lisp_Symbol
+Symbol, the unique-named entity commonly used as an identifier.
+
+@item struct Lisp_Float
+Floating point value.
+
+@item union Lisp_Misc
+Miscellaneous kinds of objects which doesn't fit into any of the above.
+@end table
+
+ These types are the first-class citizens of an internal type system.
+Since the tag space is limited, all other types are the subtypes of either
+@code{Lisp_Vectorlike} or @code{Lisp_Misc}. Vector subtypes are enumerated
+by @code{enum pvec_type}, and nearly all complex objects like windows, buffers,
+frames, and processes falls into this category. The rest of special types,
+including markers and overlays, are enumerated by @code{enum Lisp_Misc_Type}
+and forms the set of subtypes of @code{Lisp_Misc}.
+
+ Below there is a description of a few subtypes of @code{Lisp_Vectorlike}.
+Buffer object represents the text to display and edit. Window is the part
+of display structure which shows the buffer or used as a container to
+recursively place other windows on the same frame. (Do not confuse Emacs Lisp
+window object with the window as an entity managed by the user interface
+system like X; in Emacs terminology, the latter is called frame). Finally,
+process object is used to manage the subprocesses.
@menu
* Buffer Internals:: Components of a buffer structure.
@@ -912,12 +908,8 @@
@table @code
@item header
-A @code{struct vectorlike_header} structure where @code{header.next}
-points to the next buffer, in the chain of all buffers (including
-killed buffers). This chain is used only for garbage collection, in
-order to collect killed buffers properly. Note that vectors, and most
-kinds of objects allocated as vectors, are all on one chain, but
-buffers are on a separate chain of their own.
+A header of type @code{struct vectorlike_header} is common to all
+vectorlike objects.
@item own_text
A @code{struct buffer_text} structure that ordinarily holds the buffer
@@ -928,6 +920,11 @@
ordinary buffer, this is the @code{own_text} field above. In an
indirect buffer, this is the @code{own_text} field of the base buffer.
+@item next
+A pointer to the next buffer, in the chain of all buffers, including
+killed buffers. This chain is used only for allocation and garbage
+collection, in order to collect killed buffers properly.
+
@item pt
@itemx pt_byte
The character and byte positions of point in a buffer.
next reply other threads:[~2012-11-12 11:20 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-12 11:20 Dmitry Antipov [this message]
2012-11-13 13:47 ` [corrector(s) needed] doc/lispref/internals.texi tweaks Eli Zaretskii
2012-11-14 16:30 ` Dmitry Antipov
2012-11-14 17:10 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50A0DB8F.1090008@yandex.ru \
--to=dmantipov@yandex.ru \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).