unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Paul Eggert <eggert@cs.ucla.edu>
Cc: 8794@debbugs.gnu.org
Subject: bug#8794: cons_to_long fixes; making 64-bit EMACS_INT the default
Date: Fri, 03 Jun 2011 22:43:55 +0300	[thread overview]
Message-ID: <83hb86em4k.fsf@gnu.org> (raw)
In-Reply-To: <4DE91FB3.80601@cs.ucla.edu>

> Date: Fri, 03 Jun 2011 10:53:55 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: 8794@debbugs.gnu.org
> 
>   int
>   main (void)
>   {
>     int big = 536870913;
>     int *p = malloc (big * sizeof *p);
>     if (!p)
>       return 1;
>     memset (p, 0xef, big * sizeof *p);
>     printf ("%x %x\n", p[0], p[big - 1]);
>     return 0;
>   }
> 
> On my RHEL 5.6 host, built as a 32-bit executable, this outputs:
> 
>   $ gcc -m32 t.c
>   $ ./a.out
>   efefefef efefefef

How does this work on the machine code level?  Doesn't the code need
to load a pointer to p into a 32-bit register, in order to reference
the array?  On Windows, I see that the GCC-produced code does this:

  movl   $0x20000001,0xfffffffc(%ebp)
  ...
  mov    0xfffffffc(%ebp),%eax
  shl    $0x2,%eax

and then uses EAX to reference the array elements.

That last left shift by 2 bits will surely overflow for values of
`big' that are larger that 0x3fffffff (not 0x20000001, the value you
used).  So maybe 2GB is not the limit, but 4GB surely is.  You promise
much more.

> Perhaps you're thinking of pointer subtraction?  That often stops working on
> arrays larger than 2 GiB.  But this is easy to program around.

Well, then we need to program around that, _before_ we promise buffers
larger than 2GB on 32-bit hosts.  E.g., look how we address characters
in buffers:

  /* Address of beginning of buffer.  */
  #define BUF_BEG_ADDR(buf) ((buf)->text->beg)

  /* Return character code of multi-byte form at byte position POS in BUF.
     If POS doesn't point the head of valid multi-byte form, only the byte at
     POS is returned.  No range checking.  */

  #define BUF_FETCH_MULTIBYTE_CHAR(buf, pos)				\
    (_fetch_multibyte_char_p						\
       = (((pos) >= BUF_GPT_BYTE (buf) ? BUF_GAP_SIZE (buf) : 0)	\
	  + (pos) + BUF_BEG_ADDR (buf) - BEG_BYTE),			\
     STRING_CHAR (_fetch_multibyte_char_p))

The pointer arithmetics will wrap around on 32-bit hosts here, because
a pointer is loaded into a 32-bit register before it's dereferenced.
Am I missing something?

> And anyway, even if we assume buffers and strings are all smaller
> than 2 GiB, an EMACS_INT wider than 32 bits is still needed for
> large buffers and strings, due to the tag bits.

I wasn't saying a 64-bit EMACS_INT wasn't an advantage.  It is.  But I
very much doubt that we could have buffers and strings larger than 4GB
on 32-bit hosts.  Your changes to the docs seem to promise much larger
buffers, which I don't think is feasible.

> > The *_MAX macros need limits.h, but I don't see it being included by
> > data.c.  Did I miss something?
> 
> Those are OK because lisp.h includes inttypes.h.  INTMAX_MAX and
> UINTMAX_MAX are defined by inttypes.h (actually, stdint.h, but
> inttypes.h includes stdint.h).

What about ULONG_MAX in this patch to xselect.c:

> -      *data_ret = (unsigned char *) xmalloc (sizeof (long) + 1);
> -      (*data_ret) [sizeof (long)] = 0;
> -      (*(unsigned long **) data_ret) [0] = cons_to_long (obj);
> +      *data_ret = (unsigned char *) xmalloc (sizeof (unsigned long) + 1);
> +      (*data_ret) [sizeof (unsigned long)] = 0;
> +      (*(unsigned long **) data_ret) [0] = cons_to_unsigned (obj, ULONG_MAX);

?  There are also USHRT_MAX, LONG_MAX, CHAR_MAX, and SHRT_MAX there,
but I see no limits.h being included.  How did that compile for you?





  reply	other threads:[~2011-06-03 19:43 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-03  8:43 bug#8794: cons_to_long fixes; making 64-bit EMACS_INT the default Paul Eggert
2011-06-03 10:52 ` Eli Zaretskii
2011-06-03 17:53   ` Paul Eggert
2011-06-03 19:43     ` Eli Zaretskii [this message]
2011-06-04  3:05       ` Paul Eggert
2011-06-03 15:54 ` Stefan Monnier
2011-06-03 19:28   ` Paul Eggert
2011-06-05 12:00     ` Stefan Monnier
2011-06-06  8:39       ` Paul Eggert
2011-06-06 16:01         ` Stefan Monnier
2011-06-06  8:39       ` bug#8794: (a) uncontroversial fixes (2011-06-06 version) Paul Eggert
2011-06-06 17:17         ` Stefan Monnier
2011-06-06  8:39       ` bug#8794: (c) fix the cons<->int conversions " Paul Eggert
2011-06-06 17:18         ` Stefan Monnier
2011-06-03 19:29   ` bug#8794: (a) straightforward prerequisite fixes Paul Eggert
2011-06-03 19:29   ` bug#8794: (b) make the 64bit-on-32bit the default (if supported) Paul Eggert
2011-06-06 14:52     ` Stefan Monnier
2011-06-06 17:54       ` Paul Eggert
2011-06-07  4:21       ` Paul Eggert
2011-06-03 19:30   ` bug#8794: (c) fix the cons<->int conversions Paul Eggert
2016-02-25  6:49     ` Lars Ingebrigtsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83hb86em4k.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=8794@debbugs.gnu.org \
    --cc=eggert@cs.ucla.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).