* reproducible .pdmp generation
@ 2024-10-01 15:08 Bernhard M. Wiedemann
2024-10-01 16:04 ` Eli Zaretskii
0 siblings, 1 reply; 4+ messages in thread
From: Bernhard M. Wiedemann @ 2024-10-01 15:08 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1.1: Type: text/plain, Size: 1620 bytes --]
Hi,
I tried to get emacs builds to produce bit-reproducible .pdmp files.
I reduced it to this reproducer:
> cd ~/rpmbuild/BUILD/emacs-29.4/src && setarch -R taskset 1 valgrind --leak-check=full --track-origins=yes ./temacs "--__aslr-disabled" "-batch" "--no-build-details" "-l" "loadup" "--temacs=pdump" "--bin-dest" "/usr/bin/" "--eln-dest" "/usr/lib64/emacs/29.4/" && md5sum emacs.pdmp
Here, valgrind reports about several uses of un-initialized memory,
which could explain part of the variations.
Many come from alloc.c lmalloc and lrealloc
But there might be other issues involved.
One pair of .pdmp files had this noteworthy hunk in the diff:
> 009ca3a0 73 20 74 6f 20 64 6f 63 20 73 74 72 69 6e 67 73 |s to doc strings|
> 009ca3b0 2e 2e 2e 64 6f 6e 65 0a 50 75 72 65 2d 68 61 73 |...done.Pure-has|
> 009ca3c0 68 65 64 3a 20 31 35 39 35 37 20 73 74 72 69 6e |hed: 15957 strin|
> -009ca3d0 67 73 2c 20 31 39 35 33 20 76 65 63 74 6f 72 73 |gs, 1953 vectors|
> +009ca3d0 67 73 2c 20 31 39 34 37 20 76 65 63 74 6f 72 73 |gs, 1947 vectors|
> 009ca3e0 2c 20 34 39 35 33 31 20 63 6f 6e 73 65 73 2c 20 |, 49531 conses, |
> 009ca3f0 31 32 33 33 20 62 79 74 65 63 6f 64 65 73 2c 20 |1233 bytecodes, |
> 009ca400 33 37 31 20 6f 74 68 65 72 73 0a 44 75 6d 70 69 |371 others.Dumpi|
Is there a tool to convert .pdmp files into some readable text form, so
that I can better understand what differs there?
What other approaches could help to track down the source of
non-determinism here?
Are these files actually needed to use emacs?
Ciao
Bernhard M.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: reproducible .pdmp generation
2024-10-01 15:08 reproducible .pdmp generation Bernhard M. Wiedemann
@ 2024-10-01 16:04 ` Eli Zaretskii
2024-10-01 16:32 ` Daniel Colascione
0 siblings, 1 reply; 4+ messages in thread
From: Eli Zaretskii @ 2024-10-01 16:04 UTC (permalink / raw)
To: Bernhard M. Wiedemann, Daniel Colascione; +Cc: emacs-devel
> Date: Tue, 1 Oct 2024 17:08:30 +0200
> From: "Bernhard M. Wiedemann" <bernhardout@lsmod.de>
>
> But there might be other issues involved.
> One pair of .pdmp files had this noteworthy hunk in the diff:
>
> > 009ca3a0 73 20 74 6f 20 64 6f 63 20 73 74 72 69 6e 67 73 |s to doc strings|
> > 009ca3b0 2e 2e 2e 64 6f 6e 65 0a 50 75 72 65 2d 68 61 73 |...done.Pure-has|
> > 009ca3c0 68 65 64 3a 20 31 35 39 35 37 20 73 74 72 69 6e |hed: 15957 strin|
> > -009ca3d0 67 73 2c 20 31 39 35 33 20 76 65 63 74 6f 72 73 |gs, 1953 vectors|
> > +009ca3d0 67 73 2c 20 31 39 34 37 20 76 65 63 74 6f 72 73 |gs, 1947 vectors|
> > 009ca3e0 2c 20 34 39 35 33 31 20 63 6f 6e 73 65 73 2c 20 |, 49531 conses, |
> > 009ca3f0 31 32 33 33 20 62 79 74 65 63 6f 64 65 73 2c 20 |1233 bytecodes, |
> > 009ca400 33 37 31 20 6f 74 68 65 72 73 0a 44 75 6d 70 69 |371 others.Dumpi|
>
> Is there a tool to convert .pdmp files into some readable text form, so
> that I can better understand what differs there?
Not that I know of. Maybe Daniel (CC'ed) does.
> What other approaches could help to track down the source of
> non-determinism here?
Ditto.
> Are these files actually needed to use emacs?
You mean, the .pdmp files? Without them, you can only start temacs
and load all the preloaded Lisp files from scratch, before you can
have a fully-functional Emacs.
If you meant other files, please name them.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: reproducible .pdmp generation
2024-10-01 16:04 ` Eli Zaretskii
@ 2024-10-01 16:32 ` Daniel Colascione
2024-10-08 6:46 ` Bernhard M. Wiedemann
0 siblings, 1 reply; 4+ messages in thread
From: Daniel Colascione @ 2024-10-01 16:32 UTC (permalink / raw)
To: Eli Zaretskii, Bernhard M. Wiedemann; +Cc: emacs-devel
On October 1, 2024 9:04:26 AM PDT, Eli Zaretskii <eliz@gnu.org> wrote:
>> Date: Tue, 1 Oct 2024 17:08:30 +0200
>> From: "Bernhard M. Wiedemann" <bernhardout@lsmod.de>
>>
>> But there might be other issues involved.
>> One pair of .pdmp files had this noteworthy hunk in the diff:
>>
>> > 009ca3a0 73 20 74 6f 20 64 6f 63 20 73 74 72 69 6e 67 73 |s to doc strings|
>> > 009ca3b0 2e 2e 2e 64 6f 6e 65 0a 50 75 72 65 2d 68 61 73 |...done.Pure-has|
>> > 009ca3c0 68 65 64 3a 20 31 35 39 35 37 20 73 74 72 69 6e |hed: 15957 strin|
>> > -009ca3d0 67 73 2c 20 31 39 35 33 20 76 65 63 74 6f 72 73 |gs, 1953 vectors|
>> > +009ca3d0 67 73 2c 20 31 39 34 37 20 76 65 63 74 6f 72 73 |gs, 1947 vectors|
>> > 009ca3e0 2c 20 34 39 35 33 31 20 63 6f 6e 73 65 73 2c 20 |, 49531 conses, |
>> > 009ca3f0 31 32 33 33 20 62 79 74 65 63 6f 64 65 73 2c 20 |1233 bytecodes, |
>> > 009ca400 33 37 31 20 6f 74 68 65 72 73 0a 44 75 6d 70 69 |371 others.Dumpi|
>>
>> Is there a tool to convert .pdmp files into some readable text form, so
>> that I can better understand what differs there?
>
>Not that I know of. Maybe Daniel (CC'ed) does.
>
>> What other approaches could help to track down the source of
>> non-determinism here?
>
>Ditto.
>
>> Are these files actually needed to use emacs?
>
>You mean, the .pdmp files? Without them, you can only start temacs
>and load all the preloaded Lisp files from scratch, before you can
>have a fully-functional Emacs.
>
>If you meant other files, please name them.
pdmp files are basically packed Emacs heap images with embedded relocations. If you want to make them deterministic, make the temacs run that produces them deterministic. For example, you can try turning off ASLR and making sure that if anything is using a PRNG, it's seeded. Above, it seems like the string table is differing somewhat between two runs. I'd start with tracking down the origins of the strings.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: reproducible .pdmp generation
2024-10-01 16:32 ` Daniel Colascione
@ 2024-10-08 6:46 ` Bernhard M. Wiedemann
0 siblings, 0 replies; 4+ messages in thread
From: Bernhard M. Wiedemann @ 2024-10-08 6:46 UTC (permalink / raw)
To: Daniel Colascione, Eli Zaretskii; +Cc: emacs-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 2613 bytes --]
On 01/10/2024 18.32, Daniel Colascione wrote:
>
>
> On October 1, 2024 9:04:26 AM PDT, Eli Zaretskii <eliz@gnu.org> wrote:
>>> Date: Tue, 1 Oct 2024 17:08:30 +0200
>>> From: "Bernhard M. Wiedemann" <bernhardout@lsmod.de>
>>>
>>> But there might be other issues involved.
>>> One pair of .pdmp files had this noteworthy hunk in the diff:
>>>
>>>> 009ca3a0 73 20 74 6f 20 64 6f 63 20 73 74 72 69 6e 67 73 |s to doc strings|
>>>> 009ca3b0 2e 2e 2e 64 6f 6e 65 0a 50 75 72 65 2d 68 61 73 |...done.Pure-has|
>>>> 009ca3c0 68 65 64 3a 20 31 35 39 35 37 20 73 74 72 69 6e |hed: 15957 strin|
>>>> -009ca3d0 67 73 2c 20 31 39 35 33 20 76 65 63 74 6f 72 73 |gs, 1953 vectors|
>>>> +009ca3d0 67 73 2c 20 31 39 34 37 20 76 65 63 74 6f 72 73 |gs, 1947 vectors|
>>>> 009ca3e0 2c 20 34 39 35 33 31 20 63 6f 6e 73 65 73 2c 20 |, 49531 conses, |
>>>> 009ca3f0 31 32 33 33 20 62 79 74 65 63 6f 64 65 73 2c 20 |1233 bytecodes, |
>>>> 009ca400 33 37 31 20 6f 74 68 65 72 73 0a 44 75 6d 70 69 |371 others.Dumpi|
>>>
>>> Is there a tool to convert .pdmp files into some readable text form, so
>>> that I can better understand what differs there?
>>
>> Not that I know of. Maybe Daniel (CC'ed) does.
>>
>>> What other approaches could help to track down the source of
>>> non-determinism here?
>>
>> Ditto.
>>
>>> Are these files actually needed to use emacs?
>>
>> You mean, the .pdmp files? Without them, you can only start temacs
>> and load all the preloaded Lisp files from scratch, before you can
>> have a fully-functional Emacs.
>>
>> If you meant other files, please name them.
>
> pdmp files are basically packed Emacs heap images with embedded relocations. If you want to make them deterministic, make the temacs run that produces them deterministic. For example, you can try turning off ASLR and making sure that if anything is using a PRNG, it's seeded. Above, it seems like the string table is differing somewhat between two runs. I'd start with tracking down the origins of the strings.
ASLR and parallelism were already disabled in my example reproducer via
setarch -R and taskset 1
I tried to make it deterministic with code patches, but even in the best
(random) case, there were still 15 random bytes left in 4 locations.
See the attached patch for what I tried.
Something like objdump -D would be nice to have for .pdmp files to
better understand where it differs.
I uploaded two such .pdmp files from emacs-29.4 to
http://rb.zq1.de/other/emacs/
in case someone wants to have a closer look.
Ciao
Bernhard M.
[-- Attachment #1.1.2: test.patch --]
[-- Type: text/x-patch, Size: 2545 bytes --]
diff --git a/alloc.c b/alloc.c
index b02f13e..4dcdb0d 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -1345,6 +1345,7 @@ laligned (void *p, size_t size)
static void *
lmalloc (size_t size, bool clearit)
{
+ clearit = true;
#ifdef USE_ALIGNED_ALLOC
if (! MALLOC_IS_LISP_ALIGNED && size % LISP_ALIGNMENT == 0)
{
@@ -1375,11 +1376,15 @@ lmalloc (size_t size, bool clearit)
static void *
lrealloc (void *p, size_t size)
{
+ int oldsize = malloc_usable_size(p);
while (true)
{
p = realloc (p, size);
- if (laligned (p, size) && (size || p))
+ if (laligned (p, size) && (size || p)) {
+ if (size > oldsize)
+ memset(p+oldsize, 0, size-oldsize);
return p;
+ }
size_t bigger = size + LISP_ALIGNMENT;
if (size < bigger)
size = bigger;
@@ -7452,7 +7457,7 @@ respective remote host. */)
struct sysinfo si;
uintmax_t units;
- if (sysinfo (&si))
+ if (1)
return Qnil;
#ifdef LINUX_SYSINFO_UNIT
units = si.mem_unit;
diff --git a/fns.c b/fns.c
index 0a90295..34f3d2d 100644
--- a/fns.c
+++ b/src/fns.c
@@ -86,11 +86,13 @@ With a string argument, set the seed based on the string's contents.
See Info node `(elisp)Random Numbers' for more details. */)
(Lisp_Object limit)
{
- if (EQ (limit, Qt))
+ if (EQ (limit, Qt)) {
init_random ();
- else if (STRINGP (limit))
+ seed_random("foo", 3);
+ } else if (STRINGP (limit)) {
seed_random (SSDATA (limit), SBYTES (limit));
- else if (FIXNUMP (limit))
+ seed_random("foo", 3);
+ } else if (FIXNUMP (limit))
{
EMACS_INT lim = XFIXNUM (limit);
if (lim <= 0)
@@ -5720,7 +5722,7 @@ extract_data_from_object (Lisp_Object spec,
char *lim = SSDATA (object) + start_hold;
for (char *p = SSDATA (object); p < lim; p++)
{
- ssize_t gotten = getrandom (p, lim - p, 0);
+ ssize_t gotten = 1; memset(p,0,1);//getrandom (p, lim - p, 0);
if (0 <= gotten)
p += gotten;
else if (errno != EINTR)
diff --git a/sysdep.c b/sysdep.c
index ef2dc12..d77de0e 100644
--- a/sysdep.c
+++ b/src/sysdep.c
@@ -2179,7 +2179,7 @@ init_random (void)
success = w32_init_random (&v, sizeof v) == 0;
#else
verify (sizeof v <= 256);
- success = getrandom (&v, sizeof v, 0) == sizeof v;
+ success = 1; //getrandom (&v, sizeof v, 0) == sizeof v;
#endif
/* If that didn't work, just use the current time value and PID.
@@ -2190,6 +2190,7 @@ init_random (void)
v = getpid () ^ t.tv_sec ^ t.tv_nsec;
}
+ v=42;
set_random_seed (v);
}
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-10-08 6:46 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-01 15:08 reproducible .pdmp generation Bernhard M. Wiedemann
2024-10-01 16:04 ` Eli Zaretskii
2024-10-01 16:32 ` Daniel Colascione
2024-10-08 6:46 ` Bernhard M. Wiedemann
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.