unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* reproducible .pdmp generation
@ 2024-10-01 15:08 Bernhard M. Wiedemann
  2024-10-01 16:04 ` Eli Zaretskii
  0 siblings, 1 reply; 4+ messages in thread
From: Bernhard M. Wiedemann @ 2024-10-01 15:08 UTC (permalink / raw)
  To: emacs-devel


[-- Attachment #1.1: Type: text/plain, Size: 1620 bytes --]

Hi,

I tried to get emacs builds to produce bit-reproducible .pdmp files.

I reduced it to this reproducer:

> cd ~/rpmbuild/BUILD/emacs-29.4/src && setarch -R taskset 1 valgrind --leak-check=full --track-origins=yes ./temacs "--__aslr-disabled" "-batch" "--no-build-details" "-l" "loadup" "--temacs=pdump" "--bin-dest" "/usr/bin/" "--eln-dest" "/usr/lib64/emacs/29.4/" && md5sum emacs.pdmp


Here, valgrind reports about several uses of un-initialized memory, 
which could explain part of the variations.
Many come from alloc.c lmalloc and lrealloc

But there might be other issues involved.
One pair of .pdmp files had this noteworthy hunk in the diff:

>  009ca3a0  73 20 74 6f 20 64 6f 63  20 73 74 72 69 6e 67 73  |s to doc strings|
>  009ca3b0  2e 2e 2e 64 6f 6e 65 0a  50 75 72 65 2d 68 61 73  |...done.Pure-has|
>  009ca3c0  68 65 64 3a 20 31 35 39  35 37 20 73 74 72 69 6e  |hed: 15957 strin|
> -009ca3d0  67 73 2c 20 31 39 35 33  20 76 65 63 74 6f 72 73  |gs, 1953 vectors|
> +009ca3d0  67 73 2c 20 31 39 34 37  20 76 65 63 74 6f 72 73  |gs, 1947 vectors|
>  009ca3e0  2c 20 34 39 35 33 31 20  63 6f 6e 73 65 73 2c 20  |, 49531 conses, |
>  009ca3f0  31 32 33 33 20 62 79 74  65 63 6f 64 65 73 2c 20  |1233 bytecodes, |
>  009ca400  33 37 31 20 6f 74 68 65  72 73 0a 44 75 6d 70 69  |371 others.Dumpi|

Is there a tool to convert .pdmp files into some readable text form, so 
that I can better understand what differs there?

What other approaches could help to track down the source of 
non-determinism here?

Are these files actually needed to use emacs?

Ciao
Bernhard M.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: reproducible .pdmp generation
  2024-10-01 15:08 reproducible .pdmp generation Bernhard M. Wiedemann
@ 2024-10-01 16:04 ` Eli Zaretskii
  2024-10-01 16:32   ` Daniel Colascione
  0 siblings, 1 reply; 4+ messages in thread
From: Eli Zaretskii @ 2024-10-01 16:04 UTC (permalink / raw)
  To: Bernhard M. Wiedemann, Daniel Colascione; +Cc: emacs-devel

> Date: Tue, 1 Oct 2024 17:08:30 +0200
> From: "Bernhard M. Wiedemann" <bernhardout@lsmod.de>
> 
> But there might be other issues involved.
> One pair of .pdmp files had this noteworthy hunk in the diff:
> 
> >  009ca3a0  73 20 74 6f 20 64 6f 63  20 73 74 72 69 6e 67 73  |s to doc strings|
> >  009ca3b0  2e 2e 2e 64 6f 6e 65 0a  50 75 72 65 2d 68 61 73  |...done.Pure-has|
> >  009ca3c0  68 65 64 3a 20 31 35 39  35 37 20 73 74 72 69 6e  |hed: 15957 strin|
> > -009ca3d0  67 73 2c 20 31 39 35 33  20 76 65 63 74 6f 72 73  |gs, 1953 vectors|
> > +009ca3d0  67 73 2c 20 31 39 34 37  20 76 65 63 74 6f 72 73  |gs, 1947 vectors|
> >  009ca3e0  2c 20 34 39 35 33 31 20  63 6f 6e 73 65 73 2c 20  |, 49531 conses, |
> >  009ca3f0  31 32 33 33 20 62 79 74  65 63 6f 64 65 73 2c 20  |1233 bytecodes, |
> >  009ca400  33 37 31 20 6f 74 68 65  72 73 0a 44 75 6d 70 69  |371 others.Dumpi|
> 
> Is there a tool to convert .pdmp files into some readable text form, so 
> that I can better understand what differs there?

Not that I know of.  Maybe Daniel (CC'ed) does.

> What other approaches could help to track down the source of 
> non-determinism here?

Ditto.

> Are these files actually needed to use emacs?

You mean, the .pdmp files?  Without them, you can only start temacs
and load all the preloaded Lisp files from scratch, before you can
have a fully-functional Emacs.

If you meant other files, please name them.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: reproducible .pdmp generation
  2024-10-01 16:04 ` Eli Zaretskii
@ 2024-10-01 16:32   ` Daniel Colascione
  2024-10-08  6:46     ` Bernhard M. Wiedemann
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Colascione @ 2024-10-01 16:32 UTC (permalink / raw)
  To: Eli Zaretskii, Bernhard M. Wiedemann; +Cc: emacs-devel



On October 1, 2024 9:04:26 AM PDT, Eli Zaretskii <eliz@gnu.org> wrote:
>> Date: Tue, 1 Oct 2024 17:08:30 +0200
>> From: "Bernhard M. Wiedemann" <bernhardout@lsmod.de>
>> 
>> But there might be other issues involved.
>> One pair of .pdmp files had this noteworthy hunk in the diff:
>> 
>> >  009ca3a0  73 20 74 6f 20 64 6f 63  20 73 74 72 69 6e 67 73  |s to doc strings|
>> >  009ca3b0  2e 2e 2e 64 6f 6e 65 0a  50 75 72 65 2d 68 61 73  |...done.Pure-has|
>> >  009ca3c0  68 65 64 3a 20 31 35 39  35 37 20 73 74 72 69 6e  |hed: 15957 strin|
>> > -009ca3d0  67 73 2c 20 31 39 35 33  20 76 65 63 74 6f 72 73  |gs, 1953 vectors|
>> > +009ca3d0  67 73 2c 20 31 39 34 37  20 76 65 63 74 6f 72 73  |gs, 1947 vectors|
>> >  009ca3e0  2c 20 34 39 35 33 31 20  63 6f 6e 73 65 73 2c 20  |, 49531 conses, |
>> >  009ca3f0  31 32 33 33 20 62 79 74  65 63 6f 64 65 73 2c 20  |1233 bytecodes, |
>> >  009ca400  33 37 31 20 6f 74 68 65  72 73 0a 44 75 6d 70 69  |371 others.Dumpi|
>> 
>> Is there a tool to convert .pdmp files into some readable text form, so 
>> that I can better understand what differs there?
>
>Not that I know of.  Maybe Daniel (CC'ed) does.
>
>> What other approaches could help to track down the source of 
>> non-determinism here?
>
>Ditto.
>
>> Are these files actually needed to use emacs?
>
>You mean, the .pdmp files?  Without them, you can only start temacs
>and load all the preloaded Lisp files from scratch, before you can
>have a fully-functional Emacs.
>
>If you meant other files, please name them.

pdmp files are basically packed Emacs heap images with embedded relocations. If you want to make them deterministic, make the temacs run that produces them deterministic. For example, you can try turning off ASLR and making sure that if anything is using a PRNG, it's seeded. Above, it seems like the string table is differing somewhat between two runs. I'd start with tracking down the origins of the strings.






^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: reproducible .pdmp generation
  2024-10-01 16:32   ` Daniel Colascione
@ 2024-10-08  6:46     ` Bernhard M. Wiedemann
  0 siblings, 0 replies; 4+ messages in thread
From: Bernhard M. Wiedemann @ 2024-10-08  6:46 UTC (permalink / raw)
  To: Daniel Colascione, Eli Zaretskii; +Cc: emacs-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 2613 bytes --]



On 01/10/2024 18.32, Daniel Colascione wrote:
> 
> 
> On October 1, 2024 9:04:26 AM PDT, Eli Zaretskii <eliz@gnu.org> wrote:
>>> Date: Tue, 1 Oct 2024 17:08:30 +0200
>>> From: "Bernhard M. Wiedemann" <bernhardout@lsmod.de>
>>>
>>> But there might be other issues involved.
>>> One pair of .pdmp files had this noteworthy hunk in the diff:
>>>
>>>>   009ca3a0  73 20 74 6f 20 64 6f 63  20 73 74 72 69 6e 67 73  |s to doc strings|
>>>>   009ca3b0  2e 2e 2e 64 6f 6e 65 0a  50 75 72 65 2d 68 61 73  |...done.Pure-has|
>>>>   009ca3c0  68 65 64 3a 20 31 35 39  35 37 20 73 74 72 69 6e  |hed: 15957 strin|
>>>> -009ca3d0  67 73 2c 20 31 39 35 33  20 76 65 63 74 6f 72 73  |gs, 1953 vectors|
>>>> +009ca3d0  67 73 2c 20 31 39 34 37  20 76 65 63 74 6f 72 73  |gs, 1947 vectors|
>>>>   009ca3e0  2c 20 34 39 35 33 31 20  63 6f 6e 73 65 73 2c 20  |, 49531 conses, |
>>>>   009ca3f0  31 32 33 33 20 62 79 74  65 63 6f 64 65 73 2c 20  |1233 bytecodes, |
>>>>   009ca400  33 37 31 20 6f 74 68 65  72 73 0a 44 75 6d 70 69  |371 others.Dumpi|
>>>
>>> Is there a tool to convert .pdmp files into some readable text form, so
>>> that I can better understand what differs there?
>>
>> Not that I know of.  Maybe Daniel (CC'ed) does.
>>
>>> What other approaches could help to track down the source of
>>> non-determinism here?
>>
>> Ditto.
>>
>>> Are these files actually needed to use emacs?
>>
>> You mean, the .pdmp files?  Without them, you can only start temacs
>> and load all the preloaded Lisp files from scratch, before you can
>> have a fully-functional Emacs.
>>
>> If you meant other files, please name them.
> 
> pdmp files are basically packed Emacs heap images with embedded relocations. If you want to make them deterministic, make the temacs run that produces them deterministic. For example, you can try turning off ASLR and making sure that if anything is using a PRNG, it's seeded. Above, it seems like the string table is differing somewhat between two runs. I'd start with tracking down the origins of the strings.

ASLR and parallelism were already disabled in my example reproducer via 
setarch -R and taskset 1
I tried to make it deterministic with code patches, but even in the best 
(random) case, there were still 15 random bytes left in 4 locations.

See the attached patch for what I tried.

Something like objdump -D would be nice to have for .pdmp files to 
better understand where it differs.

I uploaded two such .pdmp files from emacs-29.4 to
http://rb.zq1.de/other/emacs/
in case someone wants to have a closer look.

Ciao
Bernhard M.

[-- Attachment #1.1.2: test.patch --]
[-- Type: text/x-patch, Size: 2545 bytes --]

diff --git a/alloc.c b/alloc.c
index b02f13e..4dcdb0d 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -1345,6 +1345,7 @@ laligned (void *p, size_t size)
 static void *
 lmalloc (size_t size, bool clearit)
 {
+  clearit = true;
 #ifdef USE_ALIGNED_ALLOC
   if (! MALLOC_IS_LISP_ALIGNED && size % LISP_ALIGNMENT == 0)
     {
@@ -1375,11 +1376,15 @@ lmalloc (size_t size, bool clearit)
 static void *
 lrealloc (void *p, size_t size)
 {
+  int oldsize = malloc_usable_size(p);
   while (true)
     {
       p = realloc (p, size);
-      if (laligned (p, size) && (size || p))
+      if (laligned (p, size) && (size || p)) {
+        if (size > oldsize)
+           memset(p+oldsize, 0, size-oldsize);
 	return p;
+      }
       size_t bigger = size + LISP_ALIGNMENT;
       if (size < bigger)
 	size = bigger;
@@ -7452,7 +7457,7 @@ respective remote host.  */)
   struct sysinfo si;
   uintmax_t units;
 
-  if (sysinfo (&si))
+  if (1)
     return Qnil;
 #ifdef LINUX_SYSINFO_UNIT
   units = si.mem_unit;
diff --git a/fns.c b/fns.c
index 0a90295..34f3d2d 100644
--- a/fns.c
+++ b/src/fns.c
@@ -86,11 +86,13 @@ With a string argument, set the seed based on the string's contents.
 See Info node `(elisp)Random Numbers' for more details.  */)
   (Lisp_Object limit)
 {
-  if (EQ (limit, Qt))
+  if (EQ (limit, Qt)) {
     init_random ();
-  else if (STRINGP (limit))
+    seed_random("foo", 3);
+  } else if (STRINGP (limit)) {
     seed_random (SSDATA (limit), SBYTES (limit));
-  else if (FIXNUMP (limit))
+    seed_random("foo", 3);
+  } else if (FIXNUMP (limit))
     {
       EMACS_INT lim = XFIXNUM (limit);
       if (lim <= 0)
@@ -5720,7 +5722,7 @@ extract_data_from_object (Lisp_Object spec,
 	  char *lim = SSDATA (object) + start_hold;
 	  for (char *p = SSDATA (object); p < lim; p++)
 	    {
-	      ssize_t gotten = getrandom (p, lim - p, 0);
+	      ssize_t gotten = 1; memset(p,0,1);//getrandom (p, lim - p, 0);
 	      if (0 <= gotten)
 		p += gotten;
 	      else if (errno != EINTR)
diff --git a/sysdep.c b/sysdep.c
index ef2dc12..d77de0e 100644
--- a/sysdep.c
+++ b/src/sysdep.c
@@ -2179,7 +2179,7 @@ init_random (void)
   success = w32_init_random (&v, sizeof v) == 0;
 #else
   verify (sizeof v <= 256);
-  success = getrandom (&v, sizeof v, 0) == sizeof v;
+  success = 1; //getrandom (&v, sizeof v, 0) == sizeof v;
 #endif
 
   /* If that didn't work, just use the current time value and PID.
@@ -2190,6 +2190,7 @@ init_random (void)
       v = getpid () ^ t.tv_sec ^ t.tv_nsec;
     }
 
+  v=42;
   set_random_seed (v);
 }
 

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-10-08  6:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-01 15:08 reproducible .pdmp generation Bernhard M. Wiedemann
2024-10-01 16:04 ` Eli Zaretskii
2024-10-01 16:32   ` Daniel Colascione
2024-10-08  6:46     ` Bernhard M. Wiedemann

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).