From: Eli Zaretskii <eliz@gnu.org>
To: Paul Eggert <eggert@cs.ucla.edu>
Cc: 8794@debbugs.gnu.org
Subject: bug#8794: cons_to_long fixes; making 64-bit EMACS_INT the default
Date: Fri, 03 Jun 2011 22:43:55 +0300 [thread overview]
Message-ID: <83hb86em4k.fsf@gnu.org> (raw)
In-Reply-To: <4DE91FB3.80601@cs.ucla.edu>
> Date: Fri, 03 Jun 2011 10:53:55 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: 8794@debbugs.gnu.org
>
> int
> main (void)
> {
> int big = 536870913;
> int *p = malloc (big * sizeof *p);
> if (!p)
> return 1;
> memset (p, 0xef, big * sizeof *p);
> printf ("%x %x\n", p[0], p[big - 1]);
> return 0;
> }
>
> On my RHEL 5.6 host, built as a 32-bit executable, this outputs:
>
> $ gcc -m32 t.c
> $ ./a.out
> efefefef efefefef
How does this work on the machine code level? Doesn't the code need
to load a pointer to p into a 32-bit register, in order to reference
the array? On Windows, I see that the GCC-produced code does this:
movl $0x20000001,0xfffffffc(%ebp)
...
mov 0xfffffffc(%ebp),%eax
shl $0x2,%eax
and then uses EAX to reference the array elements.
That last left shift by 2 bits will surely overflow for values of
`big' that are larger that 0x3fffffff (not 0x20000001, the value you
used). So maybe 2GB is not the limit, but 4GB surely is. You promise
much more.
> Perhaps you're thinking of pointer subtraction? That often stops working on
> arrays larger than 2 GiB. But this is easy to program around.
Well, then we need to program around that, _before_ we promise buffers
larger than 2GB on 32-bit hosts. E.g., look how we address characters
in buffers:
/* Address of beginning of buffer. */
#define BUF_BEG_ADDR(buf) ((buf)->text->beg)
/* Return character code of multi-byte form at byte position POS in BUF.
If POS doesn't point the head of valid multi-byte form, only the byte at
POS is returned. No range checking. */
#define BUF_FETCH_MULTIBYTE_CHAR(buf, pos) \
(_fetch_multibyte_char_p \
= (((pos) >= BUF_GPT_BYTE (buf) ? BUF_GAP_SIZE (buf) : 0) \
+ (pos) + BUF_BEG_ADDR (buf) - BEG_BYTE), \
STRING_CHAR (_fetch_multibyte_char_p))
The pointer arithmetics will wrap around on 32-bit hosts here, because
a pointer is loaded into a 32-bit register before it's dereferenced.
Am I missing something?
> And anyway, even if we assume buffers and strings are all smaller
> than 2 GiB, an EMACS_INT wider than 32 bits is still needed for
> large buffers and strings, due to the tag bits.
I wasn't saying a 64-bit EMACS_INT wasn't an advantage. It is. But I
very much doubt that we could have buffers and strings larger than 4GB
on 32-bit hosts. Your changes to the docs seem to promise much larger
buffers, which I don't think is feasible.
> > The *_MAX macros need limits.h, but I don't see it being included by
> > data.c. Did I miss something?
>
> Those are OK because lisp.h includes inttypes.h. INTMAX_MAX and
> UINTMAX_MAX are defined by inttypes.h (actually, stdint.h, but
> inttypes.h includes stdint.h).
What about ULONG_MAX in this patch to xselect.c:
> - *data_ret = (unsigned char *) xmalloc (sizeof (long) + 1);
> - (*data_ret) [sizeof (long)] = 0;
> - (*(unsigned long **) data_ret) [0] = cons_to_long (obj);
> + *data_ret = (unsigned char *) xmalloc (sizeof (unsigned long) + 1);
> + (*data_ret) [sizeof (unsigned long)] = 0;
> + (*(unsigned long **) data_ret) [0] = cons_to_unsigned (obj, ULONG_MAX);
? There are also USHRT_MAX, LONG_MAX, CHAR_MAX, and SHRT_MAX there,
but I see no limits.h being included. How did that compile for you?
next prev parent reply other threads:[~2011-06-03 19:43 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-03 8:43 bug#8794: cons_to_long fixes; making 64-bit EMACS_INT the default Paul Eggert
2011-06-03 10:52 ` Eli Zaretskii
2011-06-03 17:53 ` Paul Eggert
2011-06-03 19:43 ` Eli Zaretskii [this message]
2011-06-04 3:05 ` Paul Eggert
2011-06-03 15:54 ` Stefan Monnier
2011-06-03 19:28 ` Paul Eggert
2011-06-05 12:00 ` Stefan Monnier
2011-06-06 8:39 ` Paul Eggert
2011-06-06 16:01 ` Stefan Monnier
2011-06-06 8:39 ` bug#8794: (a) uncontroversial fixes (2011-06-06 version) Paul Eggert
2011-06-06 17:17 ` Stefan Monnier
2011-06-06 8:39 ` bug#8794: (c) fix the cons<->int conversions " Paul Eggert
2011-06-06 17:18 ` Stefan Monnier
2011-06-03 19:29 ` bug#8794: (a) straightforward prerequisite fixes Paul Eggert
2011-06-03 19:29 ` bug#8794: (b) make the 64bit-on-32bit the default (if supported) Paul Eggert
2011-06-06 14:52 ` Stefan Monnier
2011-06-06 17:54 ` Paul Eggert
2011-06-07 4:21 ` Paul Eggert
2011-06-03 19:30 ` bug#8794: (c) fix the cons<->int conversions Paul Eggert
2016-02-25 6:49 ` Lars Ingebrigtsen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=83hb86em4k.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=8794@debbugs.gnu.org \
--cc=eggert@cs.ucla.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).