unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#25743: rehash-size ignored
@ 2017-02-15 19:18 Stefan Monnier
  2019-07-26 13:55 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 3+ messages in thread
From: Stefan Monnier @ 2017-02-15 19:18 UTC (permalink / raw)
  To: 25743

Package: Emacs
Version: 26.0.50


Looking accidentally at maybe_resize_hash_table, I noticed that we don't
obey resize_hash, although we do all the work needed for that.
Worse, we make dangerous assumptions about the behavior of
larger_vector:

maybe_resize_hash_table takes `old_size' from `ASIZE (h->next)' and then
uses rehash_size to compute the desired new_size.  The problem comes
here:

      set_hash_key_and_value (h, larger_vector (h->key_and_value,
						2 * (new_size - old_size), -1));
      set_hash_next (h, larger_vector (h->next, new_size - old_size, -1));

This says, that h->next and h->key_and_value are replaced by new vectors
that are larger than the previous one so that they are large enough to
accomodate new_size.

The problem is that larger_vector's promise is to return a vector at
least as large as requested, but sometimes larger.  In practice the new
vector is twice the size of the old vector.  This means that rehash_size
is ignored when it means to something smaller than twice the size.
It *also* means that h->next may end up larger than planned.  But `ASIZE
(h->next)' is the definition of the hash-table's size, so if it's larger
than planned, then it's indispensible to make sure that all other
related tables are equally larger: h->key_and_value must always be at
least twice the size of h->next.  But the call above only makes sure the
new h->key_and_value is at least twice the desired new_size but not
twice the size of the new h->next.

I think the right fix is to change the calls to larger_vector so that
instead of requesting vectors "at least as large as new_size", we should
request vectors "of size exactly new_size".  That will fix both problems
at the same time.  The patch below implements it.

Any objection?


        Stefan


diff --git a/src/fns.c b/src/fns.c
index 760a017dedf..62c834fac60 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -3536,19 +3535,19 @@ get_key_arg (Lisp_Object key, ptrdiff_t nargs, Lisp_Object *args, char *used)
 Lisp_Object
 larger_vector (Lisp_Object vec, ptrdiff_t incr_min, ptrdiff_t nitems_max)
 {
+  eassert (VECTORP (vec));
   struct Lisp_Vector *v;
-  ptrdiff_t incr, incr_max, old_size, new_size;
+  ptrdiff_t old_size = ASIZE (vec);
   ptrdiff_t C_language_max = min (PTRDIFF_MAX, SIZE_MAX) / sizeof *v->contents;
-  ptrdiff_t n_max = (0 <= nitems_max && nitems_max < C_language_max
+  ptrdiff_t n_max = (nitems_max == 0 ? old_size + incr_min
+                     : 0 < nitems_max && nitems_max < C_language_max
 		     ? nitems_max : C_language_max);
-  eassert (VECTORP (vec));
   eassert (0 < incr_min && -1 <= nitems_max);
-  old_size = ASIZE (vec);
-  incr_max = n_max - old_size;
-  incr = max (incr_min, min (old_size >> 1, incr_max));
+  ptrdiff_t incr_max = n_max - old_size;
+  ptrdiff_t incr = max (incr_min, min (old_size >> 1, incr_max));
   if (incr_max < incr)
     memory_full (SIZE_MAX);
-  new_size = old_size + incr;
+  ptrdiff_t new_size = old_size + incr;
   v = allocate_vector (new_size);
   memcpy (v->contents, XVECTOR (vec)->contents, old_size * sizeof *v->contents);
   memclear (v->contents + old_size, incr * word_size);
@@ -3828,10 +3827,10 @@ maybe_resize_hash_table (struct Lisp_Hash_Table *h)
 	message ("Growing hash table to: %"pI"d", new_size);
 #endif
 
-      set_hash_key_and_value (h, larger_vector (h->key_and_value,
-						2 * (new_size - old_size), -1));
-      set_hash_next (h, larger_vector (h->next, new_size - old_size, -1));
-      set_hash_hash (h, larger_vector (h->hash, new_size - old_size, -1));
+      set_hash_key_and_value (h, larger_vector (h->key_and_value,
+						2 * (new_size - old_size), 0));
+      set_hash_next (h, larger_vector (h->next, new_size - old_size, 0));
+      set_hash_hash (h, larger_vector (h->hash, new_size - old_size, 0));
       set_hash_index (h, Fmake_vector (make_number (index_size), Qnil));
 
       /* Update the free list.  Do it so that new entries are added at



In GNU Emacs 26.0.50.1 (x86_64-unknown-linux-gnu, X toolkit, Xaw3d scroll bars)
 of 2017-01-30 built on lechazo
Repository revision: 5ddca31bf707686aafaf891c216702b64657c516
Windowing system distributor 'The X.Org Foundation', version 11.0.11900000
System Description:	Debian GNU/Linux 9.0 (stretch)

Recent messages:
Wrote /home/monnier/src/emacs/work/src/fns.c
Saving file /home/monnier/src/emacs/work/src/fns.c...
Wrote /home/monnier/src/emacs/work/src/fns.c
previous-line: Beginning of buffer [2 times]
Hunk already applied
Mark set
Saving file /home/monnier/src/emacs/work/src/fns.c...
Wrote /home/monnier/src/emacs/work/src/fns.c
Saving file /home/monnier/src/emacs/work/src/fns.c...
Wrote /home/monnier/src/emacs/work/src/fns.c

Configured using:
 'configure -C --enable-checking --enable-check-lisp-object-type
 'CFLAGS=-Wall -g3 -Og -Wno-pointer-sign'
 PKG_CONFIG_PATH=/home/monnier/lib/pkgconfig'

Configured features:
XAW3D XPM JPEG TIFF GIF PNG SOUND GPM DBUS GSETTINGS NOTIFY GNUTLS
LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS LUCID X11

Important settings:
  value of $LANG: fr_CH.UTF-8
  locale-coding-system: utf-8-unix

Major mode: InactiveMinibuffer

Minor modes in effect:
  csv-field-index-mode: t
  diff-auto-refine-mode: t
  shell-dirtrack-mode: t
  electric-pair-mode: t
  global-reveal-mode: t
  reveal-mode: t
  auto-insert-mode: t
  savehist-mode: t
  minibuffer-electric-default-mode: t
  global-compact-docstrings-mode: t
  url-handler-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  global-prettify-symbols-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
/home/monnier/src/emacs/elpa/packages/svg/svg hides /home/monnier/src/emacs/work/lisp/svg
/home/monnier/src/emacs/elpa/packages/ada-mode/ada-xref hides /home/monnier/src/emacs/work/lisp/progmodes/ada-xref
/home/monnier/src/emacs/elpa/packages/ada-mode/ada-stmt hides /home/monnier/src/emacs/work/lisp/progmodes/ada-stmt
/home/monnier/src/emacs/elpa/packages/ada-mode/ada-prj hides /home/monnier/src/emacs/work/lisp/progmodes/ada-prj
/home/monnier/src/emacs/elpa/packages/ada-mode/ada-mode hides /home/monnier/src/emacs/work/lisp/progmodes/ada-mode
/home/monnier/src/emacs/elpa/packages/landmark/landmark hides /home/monnier/src/emacs/work/lisp/obsolete/landmark
/home/monnier/src/emacs/elpa/packages/crisp/crisp hides /home/monnier/src/emacs/work/lisp/obsolete/crisp

Features:
(shadow mail-extr emacsbug message puny rfc822 mml mml-sec epa derived
epg mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev
gmm-utils mailheader sendmail grep csv-mode smerge-mode whitespace gud
debug completion tar-mode dabbrev vc vc-dispatcher caml tuareg_indent
tuareg caml-help caml-types caml-emacs map sm-c-mode smie ox-latex
ox-icalendar ox-html ox-ascii ox-publish ox org-protocol org-mouse
org-mobile org-agenda org-indent org-feed org-crypt org-capture
org-attach org-id org-element org-rmail org-mhe org-irc org-info
org-gnus gnus-util rmail rmail-loaddefs rfc2047 rfc2045 ietf-drums
mm-util mail-prsvr mail-utils org-docview org-bibtex org-bbdb org-w3m
org org-macro org-footnote org-pcomplete org-list org-faces org-entities
org-version ob-emacs-lisp ob ob-tangle ob-ref ob-lob ob-table ob-exp
org-src ob-keys ob-comint ob-core ob-eval org-compat org-macs
org-loaddefs bibtex-style bibtex rect skeleton eieio-opt speedbar
sb-image ezimage dframe find-func help-fns radix-tree html5-schema
rng-xsd xsd-regexp rng-cmpct rng-nxml rng-valid rng-loc rng-uri
rng-parse nxml-parse rng-match rng-dt rng-util rng-pttrn nxml-ns
nxml-mode nxml-outln nxml-rap sgml-mode dom nxml-util nxml-enc xmltok
sort mpc view cal-china lunar solar cal-dst cal-bahai cal-islam
cal-hebrew holidays hol-loaddefs warnings cal-french diary-lib
diary-loaddefs cal-move cal-menu calendar cal-loaddefs misearch
multi-isearch cus-edit cus-start cus-load wid-edit autorevert filenotify
doc-view subr-x jka-compr image-mode dired dired-loaddefs format-spec
executable copyright vc-git diff-mode filecache reftex-dcr reftex
reftex-loaddefs reftex-vars tex-mode compile shell pcomplete comint
ansi-color ring latexenc server time-date noutline outline easy-mmode
flyspell ispell checkdoc thingatpt load-dir elec-pair reveal autoinsert
proof-site proof-autoloads cl pg-vars savehist minibuf-eldef disp-table
compact-docstrings kotl-loaddefs advice info realgud-recursive-autoloads
finder-inf url-auth package epg-config url-handlers url-parse
auth-source eieio eieio-core cl-macs eieio-loaddefs password-cache
url-vars seq byte-opt gv bytecomp byte-compile cl-extra help-mode
easymenu cconv cl-loaddefs pcase cl-lib bbdb-loaddefs mule-util tooltip
eldoc electric uniquify ediff-hook vc-hooks lisp-float-type mwheel
term/x-win x-win term/common-win x-dnd tool-bar dnd fontset image
regexp-opt fringe tabulated-list replace newcomment text-mode elisp-mode
lisp-mode prog-mode register page menu-bar rfn-eshadow isearch timer
select scroll-bar mouse jit-lock font-lock syntax font-core
term/tty-colors frame cl-generic cham georgian utf-8-lang misc-lang
vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932
hebrew greek romanian slovak czech european ethiopic indian cyrillic
chinese composite charscript case-table epa-hook jka-cmpr-hook help
simple abbrev obarray minibuffer cl-preloaded nadvice loaddefs button
faces cus-face macroexp files text-properties overlay sha1 md5 base64
format env code-pages mule custom widget hashtable-print-readable
backquote dbusbind inotify dynamic-setting system-font-setting
font-render-setting x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 8 523925 99482)
 (symbols 24 37808 0) (miscs 20 5334 863) (strings 16 112634 17461)
 (string-bytes 1 4377921)
 (vectors 8 68327) (vector-slots 4 2220644 87010) (floats 8 1387 1336)
 (intervals 28 30482 0)
 (buffers 520 88))





^ permalink raw reply related	[flat|nested] 3+ messages in thread

* bug#25743: rehash-size ignored
  2017-02-15 19:18 bug#25743: rehash-size ignored Stefan Monnier
@ 2019-07-26 13:55 ` Lars Ingebrigtsen
  2019-07-26 17:16   ` Stefan Monnier
  0 siblings, 1 reply; 3+ messages in thread
From: Lars Ingebrigtsen @ 2019-07-26 13:55 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 25743

Stefan Monnier <monnier@IRO.UMontreal.CA> writes:

> Looking accidentally at maybe_resize_hash_table, I noticed that we don't
> obey resize_hash, although we do all the work needed for that.
> Worse, we make dangerous assumptions about the behavior of
> larger_vector:
>
> maybe_resize_hash_table takes `old_size' from `ASIZE (h->next)' and then
> uses rehash_size to compute the desired new_size.  The problem comes
> here:
>
>       set_hash_key_and_value (h, larger_vector (h->key_and_value,
> 						2 * (new_size - old_size), -1));
>       set_hash_next (h, larger_vector (h->next, new_size - old_size, -1));
>
> This says, that h->next and h->key_and_value are replaced by new vectors
> that are larger than the previous one so that they are large enough to
> accomodate new_size.

I did not follow the recent thread about hash table resizing closely,
but I do seem to remember somebody saying that they'd fixed something in
the hash resizing code, and the commits in fns.c seem to back that up:

commit 49e80e765b693736a8bb97ae5bfa341d25bf4f02
Author: Paul Eggert <eggert@cs.ucla.edu>
Date:   Sat Jul 20 23:21:14 2019 -0700

    Tweak recent hash-table fix
    
    * src/fns.c (maybe_resize_hash_table): Completely initialize the
    new ‘next’ vector before allocating more vectors, as this
    preserves locality a bit better and it’s safer not to leave an
    uninitialized Lisp object around.  Use next_size instead of
    new_size to compute new index size.

So is the issue discussed in this bug report fixed now?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 3+ messages in thread

* bug#25743: rehash-size ignored
  2019-07-26 13:55 ` Lars Ingebrigtsen
@ 2019-07-26 17:16   ` Stefan Monnier
  0 siblings, 0 replies; 3+ messages in thread
From: Stefan Monnier @ 2019-07-26 17:16 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: 25743-done

>     * src/fns.c (maybe_resize_hash_table): Completely initialize the
>     new ‘next’ vector before allocating more vectors, as this
>     preserves locality a bit better and it’s safer not to leave an
>     uninitialized Lisp object around.  Use next_size instead of
>     new_size to compute new index size.
>
> So is the issue discussed in this bug report fixed now?

This patch fixes the most severe problem (which was that the table was
larger than requested and we didn't take it into account in the rest of
the code).

But it doesn't fix the problem in the title: by default rehash-size is
1.5 yet in practice the code just doubles the size (because
larger_vecalloc only allocates something smaller than twice the
original size when we're reaching the maximum possible size).

So I installed the patch below to finish it.
Thanks,


        Stefan


diff --git a/src/fns.c b/src/fns.c
index f4f3b95ac6..c45f455646 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -4190,7 +4190,7 @@ maybe_resize_hash_table (struct Lisp_Hash_Table *h)
 	 avoid problems if memory is exhausted.  larger_vecalloc
 	 finishes computing the size of the replacement vectors.  */
       Lisp_Object next = larger_vecalloc (h->next, new_size - old_size,
-					  PTRDIFF_MAX / 2);
+					  new_size);
       ptrdiff_t next_size = ASIZE (next);
       for (ptrdiff_t i = old_size; i < next_size - 1; i++)
 	gc_aset (next, i, make_fixnum (i + 1));






^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-07-26 17:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-02-15 19:18 bug#25743: rehash-size ignored Stefan Monnier
2019-07-26 13:55 ` Lars Ingebrigtsen
2019-07-26 17:16   ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).