* Re: [Emacs-diffs] master 515afc9 6/6: Fix crash if user test munges hash table
2019-07-21 9:57 ` [Emacs-diffs] master 515afc9 6/6: Fix crash if user test munges hash table Pip Cet
@ 2019-07-21 18:39 ` Paul Eggert
0 siblings, 0 replies; 4+ messages in thread
From: Paul Eggert @ 2019-07-21 18:39 UTC (permalink / raw)
To: Pip Cet, emacs-devel; +Cc: emacs-diffs
[-- Attachment #1: Type: text/plain, Size: 1399 bytes --]
Pip Cet wrote:
> I don't think this eassert () is completely safe. pure_alloc will
> return uninitialized memory if pure space has overflowed, so it's
> possible the new table is marked as mutable.
Good point. I wonder what other code has similar problems? Rather than spend
time looking for that, I installed the first attached patch, which clears that
uninitialized memory. If pure space is going away anyway I'm not sure it's worth
the trouble to audit the code for similar problems elsewhere.
> Is it really necessary to cater to code such as this? I thought the
> general line was that it was okay for bad Lisp code to crash Emacs in
> exceptional circumstances, such as by building bad bytecode objects or
> by doing silly things in a user-defined hash function...
Generally speaking Emacs should not crash. There is currently an exception for
bad bytecode objects, but we really should fix that (presumably by verifying the
bytecode before executing it, to avoid slowdown during execution).
While we're on the topic, we should better document how user-defined hash
functions should behave. I installed the second attached patch to do that. It
also clarifies that hash codes are fixnums.
By code inspection I found some problems that I recently introduced, involving
very large hash tables and/or 32-bit platforms --with-wide-int, and fixed them
with the third attached patch.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-pure_alloc-returns-cleared-memory.patch --]
[-- Type: text/x-patch; name="0001-pure_alloc-returns-cleared-memory.patch", Size: 2100 bytes --]
From 4a1507b88e813e3d54614f4cb59211234e05334a Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 21 Jul 2019 11:20:07 -0700
Subject: [PATCH 1/3] pure_alloc returns cleared memory
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
* src/alloc.c (pure_alloc): Clear any heap-allocated storage.
This is simpler than auditing all the callers to make sure
they don’t assume pure memory is cleared memory, and the
performance implication is nonexistent except when Emacs
is misconfigured. Also, add an assertion to catch
caller misuse when pure space is exhausted.
---
src/alloc.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/src/alloc.c b/src/alloc.c
index 1718ce0faf..b7ba886482 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -5086,7 +5086,13 @@ valid_lisp_object_p (Lisp_Object obj)
/* Allocate room for SIZE bytes from pure Lisp storage and return a
pointer to it. TYPE is the Lisp type for which the memory is
allocated. TYPE < 0 means it's not used for a Lisp object,
- and that the result should have an alignment of -TYPE. */
+ and that the result should have an alignment of -TYPE.
+
+ The bytes are initially zero.
+
+ If pure space is exhausted, allocate space from the heap. This is
+ merely an expedient to let Emacs warn that pure space was exhausted
+ and that Emacs should be rebuilt with a larger pure space. */
static void *
pure_alloc (size_t size, int type)
@@ -5119,8 +5125,10 @@ pure_alloc (size_t size, int type)
/* Don't allocate a large amount here,
because it might get mmap'd and then its address
might not be usable. */
- purebeg = xmalloc (10000);
- pure_size = 10000;
+ int small_amount = 10000;
+ eassert (size <= small_amount - LISP_ALIGNMENT);
+ purebeg = xzalloc (small_amount);
+ pure_size = small_amount;
pure_bytes_used_before_overflow += pure_bytes_used - size;
pure_bytes_used = 0;
pure_bytes_used_lisp = pure_bytes_used_non_lisp = 0;
--
2.17.1
[-- Attachment #3: 0002-Improve-doc-for-hash-tables.patch --]
[-- Type: text/x-patch, Size: 6291 bytes --]
From cf285946bee56912286f75e4d1215214bc7c5b4b Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 21 Jul 2019 11:20:07 -0700
Subject: [PATCH 2/3] Improve doc for hash tables
* doc/lispref/hash.texi (Creating Hash, Defining Hash):
* src/fns.c (Fsxhash_eq, Fsxhash_eql, Fsxhash_equal):
Say that hashes are fixnums.
(Fmake_hash_table): Say that that an integer rehash-size
should be a fixnum.
* doc/lispref/hash.texi (Defining Hash): Say that hash and
comparison functions should be consistent and pure, and should
return quickly.
---
doc/lispref/hash.texi | 27 ++++++++++++++++-----------
src/fns.c | 8 ++++----
2 files changed, 20 insertions(+), 15 deletions(-)
diff --git a/doc/lispref/hash.texi b/doc/lispref/hash.texi
index 9b900e6309..051531491c 100644
--- a/doc/lispref/hash.texi
+++ b/doc/lispref/hash.texi
@@ -132,7 +132,7 @@ Creating Hash
it grows automatically. This value specifies how to make the hash table
larger, at that time.
-If @var{rehash-size} is an integer, it should be positive, and the hash
+If @var{rehash-size} is a fixnum, it should be positive and the hash
table grows by adding approximately that much to the nominal size. If
@var{rehash-size} is floating point, it had better be greater
than 1, and the hash table grows by multiplying the old size by
@@ -239,14 +239,19 @@ Defining Hash
You can think of a hash table conceptually as a large array of many
slots, each capable of holding one association. To look up a key,
-@code{gethash} first computes an integer, the hash code, from the key.
-It reduces this integer modulo the length of the array, to produce an
+@code{gethash} first computes a fixnum, the hash code, from the key.
+It reduces this fixnum modulo the length of the array, to produce an
index in the array. Then it looks in that slot, and if necessary in
other nearby slots, to see if it has found the key being sought.
Thus, to define a new method of key lookup, you need to specify both a
function to compute the hash code from a key, and a function to compare
-two keys directly.
+two keys directly. The two functions should be consistent with each
+other: that is, two keys' hash codes should be the same if the keys
+compare as equal. Also, since the two functions can be called at any
+time (such as by the garbage collector), the functions should be free
+of side effects and should return quickly, and their behavior should
+depend on only on properties of the keys that do not change.
@defun define-hash-table-test name test-fn hash-fn
This function defines a new hash table test, named @var{name}.
@@ -260,9 +265,9 @@ Defining Hash
return non-@code{nil} if they are considered the same.
The function @var{hash-fn} should accept one argument, a key, and return
-an integer that is the hash code of that key. For good results, the
-function should use the whole range of integers for hash codes,
-including negative integers.
+a fixnum that is the hash code of that key. For good results, the
+function should use the whole range of fixnums for hash codes,
+including negative fixnums.
The specified functions are stored in the property list of @var{name}
under the property @code{hash-table-test}; the property value's form is
@@ -271,12 +276,12 @@ Defining Hash
@defun sxhash-equal obj
This function returns a hash code for Lisp object @var{obj}.
-This is an integer which reflects the contents of @var{obj}
+This is a fixnum that reflects the contents of @var{obj}
and the other Lisp objects it points to.
If two objects @var{obj1} and @var{obj2} are @code{equal}, then
@code{(sxhash-equal @var{obj1})} and @code{(sxhash-equal @var{obj2})}
-are the same integer.
+are the same fixnum.
If the two objects are not @code{equal}, the values returned by
@code{sxhash-equal} are usually different, but not always; once in a
@@ -294,7 +299,7 @@ Defining Hash
If two objects @var{obj1} and @var{obj2} are @code{eq}, then
@code{(sxhash-eq @var{obj1})} and @code{(sxhash-eq @var{obj2})} are
-the same integer.
+the same fixnum.
@end defun
@defun sxhash-eql obj
@@ -305,7 +310,7 @@ Defining Hash
If two objects @var{obj1} and @var{obj2} are @code{eql}, then
@code{(sxhash-eql @var{obj1})} and @code{(sxhash-eql @var{obj2})} are
-the same integer.
+the same fixnum.
@end defun
This example creates a hash table whose keys are strings that are
diff --git a/src/fns.c b/src/fns.c
index d7e123122d..819eaec7c7 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -4700,7 +4700,7 @@ sxhash (Lisp_Object obj, int depth)
***********************************************************************/
DEFUN ("sxhash-eq", Fsxhash_eq, Ssxhash_eq, 1, 1, 0,
- doc: /* Return an integer hash code for OBJ suitable for `eq'.
+ doc: /* Return a fixnum hash code for OBJ suitable for `eq'.
If (eq A B), then (= (sxhash-eq A) (sxhash-eq B)).
Hash codes are not guaranteed to be preserved across Emacs sessions. */)
@@ -4710,7 +4710,7 @@ Hash codes are not guaranteed to be preserved across Emacs sessions. */)
}
DEFUN ("sxhash-eql", Fsxhash_eql, Ssxhash_eql, 1, 1, 0,
- doc: /* Return an integer hash code for OBJ suitable for `eql'.
+ doc: /* Return a fixnum hash code for OBJ suitable for `eql'.
If (eql A B), then (= (sxhash-eql A) (sxhash-eql B)).
Hash codes are not guaranteed to be preserved across Emacs sessions. */)
@@ -4720,7 +4720,7 @@ Hash codes are not guaranteed to be preserved across Emacs sessions. */)
}
DEFUN ("sxhash-equal", Fsxhash_equal, Ssxhash_equal, 1, 1, 0,
- doc: /* Return an integer hash code for OBJ suitable for `equal'.
+ doc: /* Return a fixnum hash code for OBJ suitable for `equal'.
If (equal A B), then (= (sxhash-equal A) (sxhash-equal B)).
Hash codes are not guaranteed to be preserved across Emacs sessions. */)
@@ -4744,7 +4744,7 @@ keys. Default is `eql'. Predefined are the tests `eq', `eql', and
Default is 65.
:rehash-size REHASH-SIZE - Indicates how to expand the table when it
-fills up. If REHASH-SIZE is an integer, increase the size by that
+fills up. If REHASH-SIZE is a fixnum, increase the size by that
amount. If it is a float, it must be > 1.0, and the new size is the
old size multiplied by that factor. Default is 1.5.
--
2.17.1
[-- Attachment #4: 0003-Avoid-integer-overflow-in-hash-table-size.patch --]
[-- Type: text/x-patch, Size: 2597 bytes --]
From c72e6328b408805953a5adf832b5c5cc9f3a75e7 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 21 Jul 2019 11:20:07 -0700
Subject: [PATCH 3/3] Avoid integer overflow in hash table size
* src/fns.c (INDEX_SIZE_BOUND): Use a tighter bound.
(maybe_resize_hash_table): Avoid integer overflow when
checking for hash table size overflow. Fix confusion
between INDEX_SIZE_BOUND (which is for the index vector)
and hash table size. Fix typo in debugging message
when ENABLE_CHECKING.
---
src/fns.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/src/fns.c b/src/fns.c
index 819eaec7c7..8eefa7c72b 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -4042,9 +4042,14 @@ allocate_hash_table (void)
}
/* An upper bound on the size of a hash table index. It must fit in
- ptrdiff_t and be a valid Emacs fixnum. */
+ ptrdiff_t and be a valid Emacs fixnum. This is an upper bound on
+ VECTOR_ELTS_MAX (see alloc.c) and gets as close as we can without
+ violating modularity. */
#define INDEX_SIZE_BOUND \
- ((ptrdiff_t) min (MOST_POSITIVE_FIXNUM, PTRDIFF_MAX / word_size))
+ ((ptrdiff_t) min (MOST_POSITIVE_FIXNUM, \
+ ((min (PTRDIFF_MAX, SIZE_MAX) \
+ - header_size - GCALIGNMENT) \
+ / word_size)))
/* Create and initialize a new hash table.
@@ -4169,11 +4174,13 @@ maybe_resize_hash_table (struct Lisp_Hash_Table *h)
else
{
double float_new_size = old_size * (rehash_size + 1);
- if (float_new_size < INDEX_SIZE_BOUND + 1)
+ if (float_new_size < EMACS_INT_MAX)
new_size = float_new_size;
else
- new_size = INDEX_SIZE_BOUND + 1;
+ new_size = EMACS_INT_MAX;
}
+ if (PTRDIFF_MAX < new_size)
+ new_size = PTRDIFF_MAX;
if (new_size <= old_size)
new_size = old_size + 1;
@@ -4181,7 +4188,7 @@ maybe_resize_hash_table (struct Lisp_Hash_Table *h)
avoid problems if memory is exhausted. larger_vecalloc
finishes computing the size of the replacement vectors. */
Lisp_Object next = larger_vecalloc (h->next, new_size - old_size,
- INDEX_SIZE_BOUND / 2);
+ PTRDIFF_MAX / 2);
ptrdiff_t next_size = ASIZE (next);
for (ptrdiff_t i = old_size; i < next_size - 1; i++)
gc_aset (next, i, make_fixnum (i + 1));
@@ -4216,7 +4223,7 @@ maybe_resize_hash_table (struct Lisp_Hash_Table *h)
#ifdef ENABLE_CHECKING
if (HASH_TABLE_P (Vpurify_flag) && XHASH_TABLE (Vpurify_flag) == h)
- message ("Growing hash table to: %"pD"d", new_size);
+ message ("Growing hash table to: %"pD"d", next_size);
#endif
}
}
--
2.17.1
^ permalink raw reply related [flat|nested] 4+ messages in thread