unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Finding the dump (redux)
@ 2021-04-15 19:38 Ali Bahrami
  2021-04-17 18:45 ` Eli Zaretskii
  0 siblings, 1 reply; 26+ messages in thread
From: Ali Bahrami @ 2021-04-15 19:38 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 6177 bytes --]

Hi,

     I'm a bit late to the pdump party, as we had just updated
the emacs delivered with Solaris shortly before the version
with pdumper arrived, and we don't tend to update more than once
every year or two. It's overdue though, and I've been playing with
emacs 27.2 on Solaris this week. It's great, but I find myself wanting
to request a small tweak to the search for the pdmp file.

First though, I want to say that pdumper is awesome, and has
exceeded my hopes and expectations. It seems just as fast,
and it is so nice to be out of the unexec game. Who knows,
we might someday even be able to get rid of the special dldump()
function that was written for emacs decades ago. :-) In any event,
I'm running 27.2 on my desktop as a position independent executable
(PIE) and with ASLR enabled, and everything seems to working as it
should. Thank You Daniel for persisting, and for this big step
forward!

The issue I'm hitting is similar to one discussed over 2
years ago:

      "Finding the dump"
      https://lists.gnu.org/archive/html/emacs-devel/2019-01/msg00558.html

At that time, Eli made a change to improve the way pdmp
files are found in PATH_EXEC, using the basename(argv[0])
to find matching dump files. It's useful, but not quite
enough to handle the way we deliver emacs on Solaris.

Like others, we deliver the 3 UI variants of emacs so that
users have their choice. Along with that is a "mediator" symlink,
provided by the packaging system, that provides the generic
'emacs' name, pointing at the default version. Mediators are
a Solaris feature, and there's a packaging command by which
the system owner can change them to point at their own favored
default. On my desktop, it looks like this:

      % cd /usr/bin
      % ls -alFh emacs*
      lrwxrwxrwx   1 root   root     9 Apr 14 22:15 emacs -> emacs-gtk*
      -r-xr-xr-x   2 root   bin  7.05M Apr 14 22:15 emacs-gtk*
      -r-xr-xr-x   2 root   bin  7.05M Apr 14 22:15 emacs-gtk-27.2*
      -r-xr-xr-x   2 root   bin  6.09M Apr 14 22:15 emacs-nox*
      -r-xr-xr-x   2 root   bin  6.09M Apr 14 22:15 emacs-nox-27.2*
      -r-xr-xr-x   2 root   bin  7.07M Apr 14 22:15 emacs-x*
      -r-xr-xr-x   2 root   bin  7.07M Apr 14 22:15 emacs-x-27.2*
      -r-xr-xr-x   1 root   bin    47K Apr 14 22:15 emacsclient*

And the pdmp files are similarly named, to match the
emacs executables:

      % cd /usr/lib/emacs/27.2/x86_64-pc-solaris2.11
      % ls -alhF *.pdmp
      -r--r--r--   2 root   bin  10.2M Apr 14 22:15 emacs-gtk-27.2.pdmp
      -r--r--r--   2 root   bin  10.2M Apr 14 22:15 emacs-gtk.pdmp
      -r--r--r--   2 root   bin  9.64M Apr 14 22:15 emacs-nox-27.2.pdmp
      -r--r--r--   2 root   bin  9.64M Apr 14 22:15 emacs-nox.pdmp
      -r--r--r--   2 root   bin  10.1M Apr 14 22:15 emacs-x-27.2.pdmp
      -r--r--r--   2 root   bin  10.1M Apr 14 22:15 emacs-x.pdmp

As noted, the default is emacs-gtk, but I can change that:

      # pkg set-mediator -I emacs-x emacs
      <...pkg output elided...>
      # ls -alFh /usr/bin/emacs
      rwxrwxrwx   1 root  root   7 Apr 15 11:47 /usr/bin/emacs -> emacs-x*

Like a lot of GNU/Linux distributions, we used to use a
shell script for this, but we moved to the mediator years
ago. With the new emacs, this works well for the explicitly
named versions (emacs-gtk, emacs-nox, and emacs-x). But, when
run via the 'emacs' symlink (the usual case), the binary is
unable to find its pdmp file. It still runs, but ends up
loading all the separate elc files, which is undesirable.

I can solve this without changing emacs, superficially, by just
adding a corresponding mediator symlink for emacs.pdmp in the
PATH_EXEC directory. But then, if an end user were to make their
own symlink to one of these emacs variants, they'll still face
this problem:

      % ln -s /usr/bin/emacs-gtk ~/bin/myemacs

Sure, that user could specify the pdmp file on the command
line, or wrap it in a shell script, but for default behavior,
I don't think that should be necessary. Why shouldn't an emacs,
found by whatever path, not be able to find its default pdmp file
in PATH_EXEC? I believe that the search done by emacs is just
missing a check that would solve this. The problem is that
load_pdump() in src/emacs.c only searches PATH_EXEC for
emacs.pdmp, and then for the pdmp file with the name given by:

      basename(argv[0])

If it were to also search for

      basename(realpath(argv[0]))

then it would find pdmp files that match either the given, or
"absolute" names, rather than just the given name, and symlinks
that point at emacs executables would always find the default
pdmp file that goes with them, whether we deliver them, or the
user makes their own.

I note that the discussion in 2019 spent a lot of time discussing
how realpath(argv[0])) can be unreliable, and that applications can
change argv[0]. That's true, but assuming that this is understood,
it is still useful most of the time. Note that the code in load_pdump()
already calls realpath(). It uses the result to locate the corresponding
pdmp file in the directory containing the executable. The only thing
missing is to take the next step, and apply the same information to
the search of PATH_EXEC.

It's not necessary to add more code to do this. It suffices to refactor
the existing code into a couple of functions that can be used to cover
the 2 existing cases, plus this new one. I've attached a patch to Emacs 27.2
that does that, to this message. Note however that I don't have a contributor
agreement in place, and it might be difficult for me to do that any time
soon. I'm hoping that this patch might still qualify, on the basis that
it is just a rearrangement of the existing code, and not at all novel.
Or, perhaps someone else will be willing to take the ideas and quickly
put together an official fix independently. Those ideas are:

      - Move the repeated code into functions, to clean up load_pdump().

      - Save the basename from the result of calling realpath() during the
        search of the executable directory.

      - During the PATH_EXEC stage, use the saved realpath basename to
        add a check for that name.

Thanks for your consideration.

- Ali


[-- Attachment #2: mediator.patch --]
[-- Type: text/plain, Size: 5111 bytes --]

# Enhance emacs to also look for the pdmp file in PATH_EXEC using the
# name seen in realpath(argv[0]). Emacs is typically run via the mediated
# /usr/bin/emacs symlink, which points at one of the emacs variants
# (emacs-gtk, emacs-x, emacs-nox). Adding the check for the name found
# via realpath() allows the actual emacs binary to find its pdump file.
#
--- emacs-27.2.orig/src/emacs.c	2021-01-28 10:52:38.000000000 +0000
+++ emacs-27.2/src/emacs.c	2021-04-14 13:12:46.732537010 +0000
@@ -766,6 +766,55 @@
 #endif	/* !WINDOWSNT */
 }
 
+/*
+ * basename() for pdump paths
+ */
+static char *
+load_pdump_basename(char *path)
+{
+  char *p, *last_sep = NULL;
+
+  for (p = path; *p; p++)
+    {
+      if (IS_DIRECTORY_SEP (*p))
+	 last_sep = p;
+    }
+
+    return last_sep ? last_sep + 1 : path;
+}
+
+/*
+ * Given a basename for the running emacs, attempt to open
+ * the corresponding pdump file in the directory given by path.
+ */
+static int
+load_pdump_from_path(const char *path, const char *name, const char *suffix,
+		     char **dump_file, ptrdiff_t *bufsize)
+{
+  ptrdiff_t needed = (strlen (path)
+		      + 1
+		      + strlen (name)
+		      + strlen (suffix)
+		      + 1);
+  if (*bufsize < needed)
+    {
+      xfree (*dump_file);
+      *dump_file = xmalloc (needed);
+      *bufsize = needed;
+    }
+#ifdef DOS_NT
+  ptrdiff_t name_len = strlen (name);
+  if (name_len >= 4
+      && c_strcasecmp (name + name_len - 4, ".exe") == 0)
+    sprintf (*dump_file, "%s%c%.*s%s", path, DIRECTORY_SEP,
+	     (int)(name_len - 4), name, suffix);
+  else
+#endif
+    sprintf (*dump_file, "%s%c%s%s",
+	     path, DIRECTORY_SEP, name, suffix);
+  return pdumper_load (*dump_file);
+}
+
 static void
 load_pdump (int argc, char **argv)
 {
@@ -817,6 +866,7 @@
      encoding the system natively uses for filesystem access, so
      there's no need for character set conversion.  */
   ptrdiff_t bufsize;
+  char *argv0_realbase = NULL;
   dump_file = load_pdump_find_executable (argv[0], &bufsize);
 
   /* If we couldn't find our executable, go straight to looking for
@@ -838,6 +888,9 @@
 #ifndef WINDOWSNT
       bufsize = exenamelen + 1;
 #endif
+      /* Save the realpath basename for possible use below */
+      argv0_realbase = xstrdup (load_pdump_basename (real_exename));
+
       if (strip_suffix)
         {
 	  ptrdiff_t strip_suffix_length = strlen (strip_suffix);
@@ -869,55 +922,23 @@
   /* Look for "emacs.pdmp" in PATH_EXEC.  We hardcode "emacs" in
      "emacs.pdmp" so that the Emacs binary still works if the user
      copies and renames it.  */
-  const char *argv0_base = "emacs";
-  ptrdiff_t needed = (strlen (path_exec)
-                      + 1
-                      + strlen (argv0_base)
-                      + strlen (suffix)
-                      + 1);
-  if (bufsize < needed)
+  result = load_pdump_from_path (path_exec, "emacs", suffix,
+				 &dump_file, &bufsize);
+
+  /* Finally, look for basename(argv0)+".pdmp" in PATH_EXEC, using both
+     absolute (realpath) and given forms. This way, they can access emacs
+     via a symlink of a different name, or rename both the executable and
+     its pdump file in PATH_EXEC, and have several Emacs configurations
+     in the same versioned libexec subdirectory.  */
+  if ((result == PDUMPER_LOAD_FILE_NOT_FOUND) && (argv0_realbase != NULL))
     {
-      xfree (dump_file);
-      dump_file = xpalloc (NULL, &bufsize, needed - bufsize, -1, 1);
+      result = load_pdump_from_path (path_exec, argv0_realbase, suffix,
+				     &dump_file, &bufsize);
     }
-  sprintf (dump_file, "%s%c%s%s",
-           path_exec, DIRECTORY_SEP, argv0_base, suffix);
-  result = pdumper_load (dump_file);
-
   if (result == PDUMPER_LOAD_FILE_NOT_FOUND)
     {
-      /* Finally, look for basename(argv0)+".pdmp" in PATH_EXEC.
-	 This way, they can rename both the executable and its pdump
-	 file in PATH_EXEC, and have several Emacs configurations in
-	 the same versioned libexec subdirectory.  */
-      char *p, *last_sep = NULL;
-      for (p = argv[0]; *p; p++)
-	{
-	  if (IS_DIRECTORY_SEP (*p))
-	    last_sep = p;
-	}
-      argv0_base = last_sep ? last_sep + 1 : argv[0];
-      ptrdiff_t needed = (strlen (path_exec)
-			  + 1
-			  + strlen (argv0_base)
-			  + strlen (suffix)
-			  + 1);
-      if (bufsize < needed)
-	{
-	  xfree (dump_file);
-	  dump_file = xmalloc (needed);
-	}
-#ifdef DOS_NT
-      ptrdiff_t argv0_len = strlen (argv0_base);
-      if (argv0_len >= 4
-	  && c_strcasecmp (argv0_base + argv0_len - 4, ".exe") == 0)
-	sprintf (dump_file, "%s%c%.*s%s", path_exec, DIRECTORY_SEP,
-		 (int)(argv0_len - 4), argv0_base, suffix);
-      else
-#endif
-      sprintf (dump_file, "%s%c%s%s",
-	       path_exec, DIRECTORY_SEP, argv0_base, suffix);
-      result = pdumper_load (dump_file);
+      result = load_pdump_from_path (path_exec, load_pdump_basename (argv[0]),
+				     suffix, &dump_file, &bufsize);
     }
 
   if (result != PDUMPER_LOAD_SUCCESS)
@@ -929,6 +950,8 @@
 
  out:
   xfree (dump_file);
+  if (argv0_realbase)
+	  xfree (argv0_realbase);
 }
 #endif /* HAVE_PDUMPER */
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-15 19:38 Finding the dump (redux) Ali Bahrami
@ 2021-04-17 18:45 ` Eli Zaretskii
  2021-04-18  0:15   ` Ali Bahrami
  0 siblings, 1 reply; 26+ messages in thread
From: Eli Zaretskii @ 2021-04-17 18:45 UTC (permalink / raw)
  To: Ali Bahrami; +Cc: emacs-devel

> From: Ali Bahrami <ali_gnu2@emvision.com>
> Date: Thu, 15 Apr 2021 13:38:37 -0600
> 
>       % cd /usr/bin
>       % ls -alFh emacs*
>       lrwxrwxrwx   1 root   root     9 Apr 14 22:15 emacs -> emacs-gtk*
>       -r-xr-xr-x   2 root   bin  7.05M Apr 14 22:15 emacs-gtk*
>       -r-xr-xr-x   2 root   bin  7.05M Apr 14 22:15 emacs-gtk-27.2*
>       -r-xr-xr-x   2 root   bin  6.09M Apr 14 22:15 emacs-nox*
>       -r-xr-xr-x   2 root   bin  6.09M Apr 14 22:15 emacs-nox-27.2*
>       -r-xr-xr-x   2 root   bin  7.07M Apr 14 22:15 emacs-x*
>       -r-xr-xr-x   2 root   bin  7.07M Apr 14 22:15 emacs-x-27.2*
>       -r-xr-xr-x   1 root   bin    47K Apr 14 22:15 emacsclient*
> 
> And the pdmp files are similarly named, to match the
> emacs executables:
> 
>       % cd /usr/lib/emacs/27.2/x86_64-pc-solaris2.11
>       % ls -alhF *.pdmp
>       -r--r--r--   2 root   bin  10.2M Apr 14 22:15 emacs-gtk-27.2.pdmp
>       -r--r--r--   2 root   bin  10.2M Apr 14 22:15 emacs-gtk.pdmp
>       -r--r--r--   2 root   bin  9.64M Apr 14 22:15 emacs-nox-27.2.pdmp
>       -r--r--r--   2 root   bin  9.64M Apr 14 22:15 emacs-nox.pdmp
>       -r--r--r--   2 root   bin  10.1M Apr 14 22:15 emacs-x-27.2.pdmp
>       -r--r--r--   2 root   bin  10.1M Apr 14 22:15 emacs-x.pdmp
> 
> As noted, the default is emacs-gtk, but I can change that:
> 
>       # pkg set-mediator -I emacs-x emacs
>       <...pkg output elided...>
>       # ls -alFh /usr/bin/emacs
>       rwxrwxrwx   1 root  root   7 Apr 15 11:47 /usr/bin/emacs -> emacs-x*
> 
> Like a lot of GNU/Linux distributions, we used to use a
> shell script for this, but we moved to the mediator years
> ago. With the new emacs, this works well for the explicitly
> named versions (emacs-gtk, emacs-nox, and emacs-x). But, when
> run via the 'emacs' symlink (the usual case), the binary is
> unable to find its pdmp file. It still runs, but ends up
> loading all the separate elc files, which is undesirable.
> 
> I can solve this without changing emacs, superficially, by just
> adding a corresponding mediator symlink for emacs.pdmp in the
> PATH_EXEC directory. But then, if an end user were to make their
> own symlink to one of these emacs variants, they'll still face
> this problem:
> 
>       % ln -s /usr/bin/emacs-gtk ~/bin/myemacs

Why can't you put the *.pdmp files in the same directory where you
keep the emacs-* executables (/usr/bin, AFAIU)?  That is already
supported by the current code, I think.

> Sure, that user could specify the pdmp file on the command
> line, or wrap it in a shell script, but for default behavior,
> I don't think that should be necessary. Why shouldn't an emacs,
> found by whatever path, not be able to find its default pdmp file
> in PATH_EXEC? I believe that the search done by emacs is just
> missing a check that would solve this. The problem is that
> load_pdump() in src/emacs.c only searches PATH_EXEC for
> emacs.pdmp, and then for the pdmp file with the name given by:
> 
>       basename(argv[0])
> 
> If it were to also search for
> 
>       basename(realpath(argv[0]))
> 
> then it would find pdmp files that match either the given, or
> "absolute" names, rather than just the given name, and symlinks
> that point at emacs executables would always find the default
> pdmp file that goes with them, whether we deliver them, or the
> user makes their own.

But realpath(argv[0]) can produce to a file in another directory,
because realpath expands all the symlinks, not just that of the
basename.  Does it make sense to look up the .pdmp file in the
directory of the original argv[0] when it is a symlink?

>       - Move the repeated code into functions, to clean up load_pdump().
> 
>       - Save the basename from the result of calling realpath() during the
>         search of the executable directory.
> 
>       - During the PATH_EXEC stage, use the saved realpath basename to
>         add a check for that name.

This is not enough, if we want to support *.pdmp files that have
arbitrary names.  For example, when Emacs is invoked as "../emacs" (or
any other relative file name which includes slashes), we currently
don't expand symlinks, so with your proposal "emacs" and "../emacs"
will behave differently.

IOW, supporting arbitrarily-named *.pdmp files requires more thorough
revision of the logic in load_pdump than just some simple refactoring.
Especially as Emacs 28 will have the native-compilation feature,
whereby it also needs to find at startup the directory with the *.eln
files that correspond to the preloaded Lisp files; see bug#44128.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-17 18:45 ` Eli Zaretskii
@ 2021-04-18  0:15   ` Ali Bahrami
  2021-04-18  7:55     ` Eli Zaretskii
  0 siblings, 1 reply; 26+ messages in thread
From: Ali Bahrami @ 2021-04-18  0:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Hi Eli,

    Thanks for giving your time on a Saturday to
look at this.

On 4/17/21 12:45 PM, Eli Zaretskii wrote:
>> From: Ali Bahrami <ali_gnu2@emvision.com>
>> I can solve this without changing emacs, superficially, by just
>> adding a corresponding mediator symlink for emacs.pdmp in the
>> PATH_EXEC directory. But then, if an end user were to make their
>> own symlink to one of these emacs variants, they'll still face
>> this problem:
>>
>>        % ln -s /usr/bin/emacs-gtk ~/bin/myemacs
> 
> Why can't you put the *.pdmp files in the same directory where you
> keep the emacs-* executables (/usr/bin, AFAIU)?  That is already
> supported by the current code, I think.

I know the current code does support that, but it doesn't fit the
long lived rules we have for what belongs in /usr/bin. For something to
be in /usr/bin, it needs to be directly executable by the end user, and
of general utility. Anything else, including private helper executables,
go into an application specific area, usually under /usr/lib, so using
PATH_EXEC for this is a better fit.

The question is bigger than emacs. It dilutes the meaning of
/usr/bin if it starts collecting other types of files, and
we'd like to hold that line if we can.

A related idea that's been floated before would be for the
executable to carry the default dump data within itself. That
would also work (effectively like unexec did), but I'm
99% happy with the current scheme, and not looking for
that sort of big change.

> But realpath(argv[0]) can produce to a file in another directory,
> because realpath expands all the symlinks, not just that of the
> basename.  Does it make sense to look up the .pdmp file in the
> directory of the original argv[0] when it is a symlink?

It's an interesting question, and I think can be argued
either way.

It's simple to say "no", and I doubt it would bother many
users. In that case, we'd stat the argv[0], and if it's a
symlink, we'd just switch the existing PATH_EXEC code to
use the "real" basename. That would work for my purposes,
but I had imagined that it was written as it is for some
purpose, perhaps as a way to provide options for using
other pdmp files (see below), so I was trying not to change
those aspects, just to augment them.

I can imagine a scenario where it might be useful to
say "yes". It might offer a pretty slick way for end users
to create arbitrary pdmp files and associate them to specific
purposes. Suppose for instance that I want to use a special 'X'
dump file when working on "Project X" code. I could create a special
name for that emacs variant as a symlink to the basic emacs-gtk
in my personal bin:

    % ln -s /usr/bin/emacs-gtk ~/bin/emacsX

Then, if I were to create ~/bin/emacsX.pdmp, and if emacs were
willing to see it as a pdmp file to be loaded, then I could
run my special emacsX, and get the standard emacs (from the
symlink) using my specialized X pdmp.

There are other ways to solve this (shell scripts, shell
aliases, typing an option on the emacs command line), so I
don't know how important this is, but it does offer a pretty
simple way to get that effect.  If we did want this, my patch
checks the 2 names in the wrong order for it to work, so we
should flip them to check the "given" name first, and then the
"real" one.

The reverse question is, what harm does it do to look in PATH_EXEC
for both names? PATH_EXEC is a controlled namespace, laid out by
the emacs install package, so there's little risk of a name collision
with user named dump files. We should never reach the point of looking
in PATH_EXEC unless we've already failed to find a user specified pdmp
file, and are looking for the default one. It seems harmless to look
for both names at that point.

> 
>>        - Move the repeated code into functions, to clean up load_pdump().
>>
>>        - Save the basename from the result of calling realpath() during the
>>          search of the executable directory.
>>
>>        - During the PATH_EXEC stage, use the saved realpath basename to
>>          add a check for that name.
> 
> This is not enough, if we want to support *.pdmp files that have
> arbitrary names.  For example, when Emacs is invoked as "../emacs" (or
> any other relative file name which includes slashes), we currently
> don't expand symlinks, so with your proposal "emacs" and "../emacs"
> will behave differently.

I'm not sure I understand. I have the proposed bits installed
on my desktop right now, and this does work as I expect.

     % cd /usr/bin
     % ../bin/emacs

As does

     % emacs

I can tell in both cases that the emacs-gtk.pdmp from
PATH_EXEC was loaded. It does seem to be solving the
problem I set out to fix.

I don't see any code in load_pdump() that special cases
the case that includes slashes --- it seems like realpath()
is always called, so perhaps I might not be understanding your
point correctly.


> IOW, supporting arbitrarily-named *.pdmp files requires more thorough
> revision of the logic in load_pdump than just some simple refactoring.
> Especially as Emacs 28 will have the native-compilation feature,
> whereby it also needs to find at startup the directory with the *.eln
> files that correspond to the preloaded Lisp files; see bug#44128.
> 

I think you're right that the above isn't enough for arbitrary names,
but it's not trying to be. As long as it doesn't interfere with that
later effort, there shouldn't be a conflict. My goal is just to make
sure that out of the box default behavior works, without the sort of
mysterious failure (to a naive user) that symlinks cause today. I would
expect processing arbitrary pdmp files would happen before we reached the
stage of looking for the default pdmp files in PATH_EXEC, and that
if arbitrary files were seen, that we either won't look in PATH_EXEC
at all, or that we can look with modified logic that make sense for
that case.

- Ali



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-18  0:15   ` Ali Bahrami
@ 2021-04-18  7:55     ` Eli Zaretskii
  2021-04-18  8:18       ` Andreas Schwab
  2021-04-19  4:01       ` Ali Bahrami
  0 siblings, 2 replies; 26+ messages in thread
From: Eli Zaretskii @ 2021-04-18  7:55 UTC (permalink / raw)
  To: Ali Bahrami; +Cc: emacs-devel

> Cc: emacs-devel@gnu.org
> From: Ali Bahrami <ali_gnu2@emvision.com>
> Date: Sat, 17 Apr 2021 18:15:13 -0600
> 
> >>        % ln -s /usr/bin/emacs-gtk ~/bin/myemacs
> > 
> > Why can't you put the *.pdmp files in the same directory where you
> > keep the emacs-* executables (/usr/bin, AFAIU)?  That is already
> > supported by the current code, I think.
> 
> I know the current code does support that, but it doesn't fit the
> long lived rules we have for what belongs in /usr/bin.

Then place a symlink emacs.pdmp there, and have the actual pdumper
file where you want it, under any name you want.  Or move/copy the
emacs-* executables to another directory, make the 'emacs' symlink
resolve to those emacs-* files, and have the pdumper files in the same
place.  Or configure with a different value of $libdir when you build
each emacs-* variant, and then have the corresponding emacs.pdmp file
in the directory under /usr/lib that is private to that variant.  Or
use the --dump-file command-line option.

There are many possible solutions that already work, so why insist on
something that doesn't work?  That it happened to work with unexec is
just sheer luck: the upstream Emacs project never explicitly supported
such configurations.

> A related idea that's been floated before would be for the
> executable to carry the default dump data within itself.

That idea didn't fly because it meant we again need to comply to
various binary formats, which change with time out of our control.
We'd eventually get into the same trouble as with unexec: the
corresponding developers will refuse supporting the tricks we play for
that to work, exactly as glibc dropped support for malloc hooks we
needed to support unexec.

More generally, that doesn't solve the general problem of how Emacs
finds files it needs to start.  Even if the dump data is in the
executable itself, there could be other files that are similarly
needed at startup.  We already have that with the native-compilation
feature: the *.eln files produced from the preloaded Lisp packages
need to be located at startup, otherwise Emacs will be unable to
start.  We cannot possibly put everything inside the Emacs executable,
even if we wanted to.

> > But realpath(argv[0]) can produce to a file in another directory,
> > because realpath expands all the symlinks, not just that of the
> > basename.  Does it make sense to look up the .pdmp file in the
> > directory of the original argv[0] when it is a symlink?
> 
> It's an interesting question, and I think can be argued
> either way.

Exactly.  So who's to say which way is TRT?  Whatever we decide, there
could be another distro out there which will argue that the opposite
makes sense because "it worked for them until now".

And if you are thinking about trying both, then (a) there's still the
question of order (which could affect the correctness), and (b) it
makes the startup slower, and soon enough people will start
complaining about that.

> I can imagine a scenario where it might be useful to
> say "yes". It might offer a pretty slick way for end users
> to create arbitrary pdmp files and associate them to specific
> purposes. Suppose for instance that I want to use a special 'X'
> dump file when working on "Project X" code. I could create a special
> name for that emacs variant as a symlink to the basic emacs-gtk
> in my personal bin:
> 
>     % ln -s /usr/bin/emacs-gtk ~/bin/emacsX
> 
> Then, if I were to create ~/bin/emacsX.pdmp, and if emacs were
> willing to see it as a pdmp file to be loaded, then I could
> run my special emacsX, and get the standard emacs (from the
> symlink) using my specialized X pdmp.

We support the --dump-file command-line option for this purpose: using
that you can have the pdumper file under any file name you want, all
you need is a shell script or an alias that would add that option.

> The reverse question is, what harm does it do to look in PATH_EXEC
> for both names?

See above: it makes startup slower, and also runs the risk of picking
up the wrong pdumper file and failing the startup altogether.

> >>        - Move the repeated code into functions, to clean up load_pdump().
> >>
> >>        - Save the basename from the result of calling realpath() during the
> >>          search of the executable directory.
> >>
> >>        - During the PATH_EXEC stage, use the saved realpath basename to
> >>          add a check for that name.
> > 
> > This is not enough, if we want to support *.pdmp files that have
> > arbitrary names.  For example, when Emacs is invoked as "../emacs" (or
> > any other relative file name which includes slashes), we currently
> > don't expand symlinks, so with your proposal "emacs" and "../emacs"
> > will behave differently.
> 
> I'm not sure I understand. I have the proposed bits installed
> on my desktop right now, and this does work as I expect.
> 
>      % cd /usr/bin
>      % ../bin/emacs
> 
> As does
> 
>      % emacs

That's because you are running Emacs installed, so it looks for the
pdumper file in the hardcoded place under PATH_EXEC, no matter what.
I was alluding to the case that you run Emacs uninstalled, when the
pdumper file is in the same directory where the Emacs binary lives.

> I don't see any code in load_pdump() that special cases
> the case that includes slashes

Look in load_pdump_find_executable, and you will see it.

> > IOW, supporting arbitrarily-named *.pdmp files requires more thorough
> > revision of the logic in load_pdump than just some simple refactoring.
> > Especially as Emacs 28 will have the native-compilation feature,
> > whereby it also needs to find at startup the directory with the *.eln
> > files that correspond to the preloaded Lisp files; see bug#44128.
> 
> I think you're right that the above isn't enough for arbitrary names,
> but it's not trying to be. As long as it doesn't interfere with that
> later effort, there shouldn't be a conflict. My goal is just to make
> sure that out of the box default behavior works, without the sort of
> mysterious failure (to a naive user) that symlinks cause today.

But in this case _you_ are the distro, so you determine how OOTB
works.  It isn't carved in stone, and the arrangements you used in the
past don't need to be repeated verbatim in the future, because "times
are a-changing".

Having said all of the above, since we are currently working on
related issues on the native-compilation branch, it is possible that
we eventually will teach Emacs to support also the arrangement you
want to work in your case.  But I make no promises, and in any case
this will not hit the street before Emacs 28.1, which is probably
still a year or more in the future.  We don't expect another 27.x
release, and even if there is such a release, it will probably be to
fix some very grave bug, so unsuitable for extending existing
features.  So it's your call whether to wait for Emacs 28 in the hope
that maybe it fixes your problem, or redesign your deployment now to
use some arrangement that already works.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-18  7:55     ` Eli Zaretskii
@ 2021-04-18  8:18       ` Andreas Schwab
  2021-04-18 16:05         ` Glenn Morris
  2021-04-19  4:53         ` Richard Stallman
  2021-04-19  4:01       ` Ali Bahrami
  1 sibling, 2 replies; 26+ messages in thread
From: Andreas Schwab @ 2021-04-18  8:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Ali Bahrami, emacs-devel

On Apr 18 2021, Eli Zaretskii wrote:

> See above: it makes startup slower, and also runs the risk of picking
> up the wrong pdumper file and failing the startup altogether.

That problem wouldn't exist if the pdumper file were looked up by its
fingerprint.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-18  8:18       ` Andreas Schwab
@ 2021-04-18 16:05         ` Glenn Morris
  2021-04-19  4:53         ` Richard Stallman
  1 sibling, 0 replies; 26+ messages in thread
From: Glenn Morris @ 2021-04-18 16:05 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Eli Zaretskii, Ali Bahrami, emacs-devel

Andreas Schwab wrote:

> That problem wouldn't exist if the pdumper file were looked up by its
> fingerprint.

Obligatory feeble request for help with https://debbugs.gnu.org/42790#43

(I actually don't see a need to fall back to less specific names any more.)



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-18  7:55     ` Eli Zaretskii
  2021-04-18  8:18       ` Andreas Schwab
@ 2021-04-19  4:01       ` Ali Bahrami
  2021-04-19 13:26         ` Stefan Monnier
  1 sibling, 1 reply; 26+ messages in thread
From: Ali Bahrami @ 2021-04-19  4:01 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Hi Eli,

    Your message about not doing anything now, and
possibly, doing something later, is heard loud and clear.
As I said up front, I have a couple of ways to fix this
that don't require anyone else, so I'm fine with that.
I brought this up because I think the way this works
is slightly broken, and could be fixed. I thought it was
worth a shot to see if we can't do something at the source,
rather than having various actors like myself apply one-off
hacks. I'm sure we'll get there.

I am however, going to continue on and respond to some
of your comments, because I'm not convinced by some of
them, and I'd like to explain why, so that if something is
done later, this stuff would have been discussed.

I also have a question: I'm warming to your suggestion
about how we might just refuse to look for the pdmp files
for symlinks, and instead use only the realpath basename
for those cases. I think that could be a  nice simplification
that might be smaller and safer than what is currently on
the table, and which does not add an addition load, hence
no slowdown. If I were to put together a patch to do that,
would you have any interest in looking at it?


On 4/18/21 1:55 AM, Eli Zaretskii wrote:
> Then place a symlink emacs.pdmp there, and have the actual pdumper
> file where you want it, under any name you want.  Or move/copy the
> emacs-* executables to another directory, make the 'emacs' symlink
> resolve to those emacs-* files, and have the pdumper files in the same
> place.  Or configure with a different value of $libdir when you build
> each emacs-* variant, and then have the corresponding emacs.pdmp file
> in the directory under /usr/lib that is private to that variant.  Or
> use the --dump-file command-line option.
> 
> There are many possible solutions that already work, so why insist on
> something that doesn't work?  That it happened to work with unexec is
> just sheer luck: the upstream Emacs project never explicitly supported
> such configurations.

We seem to have a basic difference of opinion about /usr/bin.
I don't think it's OK for programs to drop their data files there,
and I can't think of any significant other examples of programs that
do. You say this as if it's a normal answer, but it seems odd and
atypical to me. So I'm not going to put anything in /usr/bin other
than exectables, or symlinks that point at executables.

To be honest, I'm a bit surprised to be the first person to
bring this up, so either I'm alone in thinking it's wrong to
put data files in /usr/bin, or I'm just early. Time will tell
I suppose. I do think putting the pdmp files next to the executable
is a fine answer for other places, particularly in the emacs build
tree. But it doesn't make sense for /usr/bin, to me.

About the idea of moving the binaries out of /usr/bin, where we
could add the pdmp files, the problem there, is that we want users
have all those names in their PATH. Let me explain, illustrated
by this excerpt from the original message:

      % cd /usr/bin
      % ls -alFh emacs*
      lrwxrwxrwx   1 root   root     9 Apr 14 22:15 emacs -> emacs-gtk*
      -r-xr-xr-x   2 root   bin  7.05M Apr 14 22:15 emacs-gtk*
      -r-xr-xr-x   2 root   bin  7.05M Apr 14 22:15 emacs-gtk-27.2*
      -r-xr-xr-x   2 root   bin  6.09M Apr 14 22:15 emacs-nox*
      -r-xr-xr-x   2 root   bin  6.09M Apr 14 22:15 emacs-nox-27.2*
      -r-xr-xr-x   2 root   bin  7.07M Apr 14 22:15 emacs-x*
      -r-xr-xr-x   2 root   bin  7.07M Apr 14 22:15 emacs-x-27.2*
      -r-xr-xr-x   1 root   bin    47K Apr 14 22:15 emacsclient*


The intent here is that users who explicitly want the GTK version
will type 'emacs-gtk' or 'emacs-gtk-27.2'. The story is similar for
the Lucid (emacs-x), or pure tty (emacs-nox) versions. Users who
don't care, and just want to run something reasonable run 'emacs'.
Moving those binaries elsewhere would let us put pdmp files next
to them, yes, but since they won't be in anyone's PATH, it's not
very useful.


> 
>> A related idea that's been floated before would be for the
>> executable to carry the default dump data within itself.
> 
> That idea didn't fly because it meant we again need to comply to
> various binary formats, which change with time out of our control.
> We'd eventually get into the same trouble as with unexec: the
> corresponding developers will refuse supporting the tricks we play for
> that to work, exactly as glibc dropped support for malloc hooks we
> needed to support unexec.
> 
> More generally, that doesn't solve the general problem of how Emacs
> finds files it needs to start.  Even if the dump data is in the
> executable itself, there could be other files that are similarly
> needed at startup.  We already have that with the native-compilation
> feature: the *.eln files produced from the preloaded Lisp packages
> need to be located at startup, otherwise Emacs will be unable to
> start.  We cannot possibly put everything inside the Emacs executable,
> even if we wanted to.

Well, I did say that I wasn't suggesting we go there, and
I agree that we don't want to. It's not a given though that
things must become a mess like unexec. Unexec was a mess
because the approach is inherently messy.


>>> But realpath(argv[0]) can produce to a file in another directory,
>>> because realpath expands all the symlinks, not just that of the
>>> basename.  Does it make sense to look up the .pdmp file in the
>>> directory of the original argv[0] when it is a symlink?
>>
>> It's an interesting question, and I think can be argued
>> either way.
> 
> Exactly.  So who's to say which way is TRT?  Whatever we decide, there
> could be another distro out there which will argue that the opposite
> makes sense because "it worked for them until now".


I'd say, people at the top, like yourself ultimately decide, just
as you do with many other things that various folks might second
guess later. My point was that I really don't think it matters in this
case, because both outcomes are defensible. Just pick the one you prefer
and document it.

As I mentioned above though, I'm really warming to
this "no" option, as it solves the problem, is simple
to explain, and doesn't add any additional loads.


>> I can imagine a scenario where it might be useful to
>> say "yes". It might offer a pretty slick way for end users
>> to create arbitrary pdmp files and associate them to specific
>> purposes. Suppose for instance that I want to use a special 'X'
>> dump file when working on "Project X" code. I could create a special
>> name for that emacs variant as a symlink to the basic emacs-gtk
>> in my personal bin:
>>
>>      % ln -s /usr/bin/emacs-gtk ~/bin/emacsX
>>
>> Then, if I were to create ~/bin/emacsX.pdmp, and if emacs were
>> willing to see it as a pdmp file to be loaded, then I could
>> run my special emacsX, and get the standard emacs (from the
>> symlink) using my specialized X pdmp.
> 
> We support the --dump-file command-line option for this purpose: using
> that you can have the pdumper file under any file name you want, all
> you need is a shell script or an alias that would add that option.

I think that's a good answer. And, it's also possibly how we might
settle the  "Who's to say" question posed above. If we decide not
to load a pdmp file based on the name of a symlink, then the fix
becomes a matter of simply looking for the realpath basename in
PATH_EXEC, where we currently look for the given basename.
The number of possible loads remains the same as before,
and debates about about slowdowns become moot.


 > And if you are thinking about trying both, then (a) there's still the
 > question of order (which could affect the correctness), and (b) it
 > makes the startup slower, and soon enough people will start
 > complaining about that.
...
> 
>> The reverse question is, what harm does it do to look in PATH_EXEC
>> for both names?
> 
> See above: it makes startup slower, and also runs the risk of picking
> up the wrong pdumper file and failing the startup altogether.
> 

I'm not buying that this makes startup slower, and there
are 2 layers to my reasoning.

The first layer is that operating systems put a lot of effort
into making stat() on local files cheap. Anything that does
path searching like shells, or like emacs when it searches
for lisp files, relies on this. Certainly, there's often a
cache involved as well, but those cases do many lookups,
rather than the 2 we currently do, or the additional one
(making 3) that I'm suggested. You can measure the cost of
this added stat(), but you'll never feel it.

The second layer is that we're talking about the stage where
we start looking at PATH_EXEC. The PATH_EXEC stage is a backstop
that is only run when the --dump-file command-line option was
not used, and no pdmp file is found next to the executable. So
in the world that follows your advice of using those features,
the PATH_EXEC stage never runs, and costs 0.

If we do reach the PATH_EXEC stage, and we fail to find a pdmp
file, then the next thing that happens is that emacs will
proceed to search for, compile, and load, numerous elisp files,
spewing their names to stdout as it goes. The cost of this is
definitely felt, unlike the attempt to open the realpath basename
version of the pdmp file, which if successful, will prevent this
expensive outcome.

So now, let's think about the issue of finding the wrong
pdumper file. I'm not sure I see how this can happen. The
PATH_EXEC directory isn't a place where emacs users put
arbitrary content. The names found here correspond to the
names that emacs is installed under on the system. If the
user invents their own emacs name (e.g. myemacs), then there
will be no file in PATH_EXEC for them to accidentally load.
And if they run emacs under one of the installed names, then
they're going to find the right file.

One point I'd make here is that your suggestion that
we not chase pdmp files for symlinks used to run emacs
really simplifies this, because then the only names we'll
ever look for in PATH_EXEC are those of the actual
installed binaries, and assuming the binary names and
pdmp names match, there can be no mixups.


>>> This is not enough, if we want to support *.pdmp files that have
>>> arbitrary names.  For example, when Emacs is invoked as "../emacs" (or
>>> any other relative file name which includes slashes), we currently
>>> don't expand symlinks, so with your proposal "emacs" and "../emacs"
>>> will behave differently.
>>
>> I'm not sure I understand. I have the proposed bits installed
>> on my desktop right now, and this does work as I expect.
>>
>>       % cd /usr/bin
>>       % ../bin/emacs
>>
>> As does
>>
>>       % emacs
> 
> That's because you are running Emacs installed, so it looks for the
> pdumper file in the hardcoded place under PATH_EXEC, no matter what.
> I was alluding to the case that you run Emacs uninstalled, when the
> pdumper file is in the same directory where the Emacs binary lives.

In the case where emacs is uninstalled and the pdumper file is
next to it, we never look in PATH_EXEC, so my patch, which
alters that code, is irrelevant.


>> I don't see any code in load_pdump() that special cases
>> the case that includes slashes
> 
> Look in load_pdump_find_executable, and you will see it.

I do see it, thanks. But note that load_pdump() calls
realpath() on the result from load_pdump_find_executable(),
and so, both 'emacs' and '../bin/emacs' yield the same
absolute path (e.g. /usr/bin/emacs-gtk) in either case,
and my patch sees the same string in either case.


> Having said all of the above, since we are currently working on
> related issues on the native-compilation branch, it is possible that
> we eventually will teach Emacs to support also the arrangement you
> want to work in your case.  But I make no promises, and in any case
> this will not hit the street before Emacs 28.1, which is probably
> still a year or more in the future.  We don't expect another 27.x
> release, and even if there is such a release, it will probably be to
> fix some very grave bug, so unsuitable for extending existing
> features.  So it's your call whether to wait for Emacs 28 in the hope
> that maybe it fixes your problem, or redesign your deployment now to
> use some arrangement that already works.
> 

OK, sounds good. Thanks.

- Ali



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-18  8:18       ` Andreas Schwab
  2021-04-18 16:05         ` Glenn Morris
@ 2021-04-19  4:53         ` Richard Stallman
  2021-04-19  8:35           ` Andreas Schwab
  1 sibling, 1 reply; 26+ messages in thread
From: Richard Stallman @ 2021-04-19  4:53 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: eliz, ali_gnu2, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > See above: it makes startup slower, and also runs the risk of picking
  > > up the wrong pdumper file and failing the startup altogether.

  > That problem wouldn't exist if the pdumper file were looked up by its
  > fingerprint.

Could you explain concretely what that means?

-- 
Dr Richard Stallman
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-19  4:53         ` Richard Stallman
@ 2021-04-19  8:35           ` Andreas Schwab
  2021-04-19 13:00             ` Eli Zaretskii
  2021-04-19 13:04             ` Ali Bahrami
  0 siblings, 2 replies; 26+ messages in thread
From: Andreas Schwab @ 2021-04-19  8:35 UTC (permalink / raw)
  To: Richard Stallman; +Cc: eliz, ali_gnu2, emacs-devel

On Apr 19 2021, Richard Stallman wrote:

> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
>   > > See above: it makes startup slower, and also runs the risk of picking
>   > > up the wrong pdumper file and failing the startup altogether.
>
>   > That problem wouldn't exist if the pdumper file were looked up by its
>   > fingerprint.
>
> Could you explain concretely what that means?

Name the pdumper file emacs-FINGERPRINT.pdmp (where FINGERPRINT is the
hex string of the unique fingerprint).  Then the file can be put in a
fixed, shared directory without conflicts, and there is no need to use
an elaborate search strategy.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-19  8:35           ` Andreas Schwab
@ 2021-04-19 13:00             ` Eli Zaretskii
  2021-04-19 13:04             ` Ali Bahrami
  1 sibling, 0 replies; 26+ messages in thread
From: Eli Zaretskii @ 2021-04-19 13:00 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: emacs-devel, rms, ali_gnu2

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: eliz@gnu.org,  ali_gnu2@emvision.com,  emacs-devel@gnu.org
> Date: Mon, 19 Apr 2021 10:35:20 +0200
> 
> Name the pdumper file emacs-FINGERPRINT.pdmp (where FINGERPRINT is the
> hex string of the unique fingerprint).  Then the file can be put in a
> fixed, shared directory without conflicts, and there is no need to use
> an elaborate search strategy.                  ^^^^^^^^^^^^^^^^^^^^^^^
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The underlined part is not entirely accurate.  We will still need to
look for the Emacs executable along PATH, resolve symlinks in the
result, then look in 2 possible places relative to that.

To elaborate, we currently do the following:

  . if argv[0] doesn't have leading directories, look for it along PATH
  . resolve symlink in the result to find the real place of the executable
  . look in the same directory for BASENAME.pdmp, where BASENAME is the
    basename of the resolved executable
  . if there's no BASENAME.pdmp there, look for emacs.pdmp in the
    directory specified by EXEC_PATH
  . if that fails as well, look for BASE.pdm in EXEC_PATH, where BASE
    is the basename of the original argv[0]

In each place where we look, if the file by that name exists, we try
to load it; if the load fails, we quit (and don't search in the rest
of the places).  IOW, we only continue to the next candidate place if
the file by the name we looked for doesn't exist there.

Thus, using emacs-FINGERPRINT.pdmp can simplify this process:

  . it will eliminate the need for the last step above
  . it will avoid the danger that a file by the name we look does
    exist in some place, but fails the fingerprint test, and thus
    prevents us from finding the correct file in one of the other
    places

But we'd still need to decide on a search strategy.  For example, note
that the current strategy is the reverse of the one we use for looking
for Lisp files: there we first look in the installation directory and
only if that fails look in the source tree.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-19  8:35           ` Andreas Schwab
  2021-04-19 13:00             ` Eli Zaretskii
@ 2021-04-19 13:04             ` Ali Bahrami
  2021-04-19 13:14               ` Eli Zaretskii
  1 sibling, 1 reply; 26+ messages in thread
From: Ali Bahrami @ 2021-04-19 13:04 UTC (permalink / raw)
  To: Andreas Schwab, Richard Stallman; +Cc: eliz, emacs-devel

On 4/19/21 2:35 AM, Andreas Schwab wrote:
> Name the pdumper file emacs-FINGERPRINT.pdmp (where FINGERPRINT is the
> hex string of the unique fingerprint).  Then the file can be put in a
> fixed, shared directory without conflicts, and there is no need to use
> an elaborate search strategy.

I like this answer a lot, but it does present a different
sort of problem.

One of the things I do, in preparing a new emacs
for deployment, is to create a manifest listing every
file that the package delivers. We rebuild the emacs
packages constantly, but the manifest only needs to be
updated when updating to a new emacs.

I assume that the fingerprint changes each time emacs is
built? That complicates the manifest, since it would have
to be modified each time that happens. The package building
process would need to get the fingerprint out of emacs
and update the manifest list on every build. I'm sure there
are solutions to that, but it seemed worth noting, since
people might find files with continually changing names
difficult to deal with.

- Ali




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-19 13:04             ` Ali Bahrami
@ 2021-04-19 13:14               ` Eli Zaretskii
  2021-04-19 13:34                 ` Stefan Kangas
  0 siblings, 1 reply; 26+ messages in thread
From: Eli Zaretskii @ 2021-04-19 13:14 UTC (permalink / raw)
  To: Ali Bahrami; +Cc: schwab, rms, emacs-devel

> Cc: eliz@gnu.org, emacs-devel@gnu.org
> From: Ali Bahrami <ali_gnu2@emvision.com>
> Date: Mon, 19 Apr 2021 07:04:20 -0600
> 
> I assume that the fingerprint changes each time emacs is
> built? That complicates the manifest, since it would have
> to be modified each time that happens.

That's "paradise lost": you will have the same problem with the *.eln
files when native-compilation lands.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-19  4:01       ` Ali Bahrami
@ 2021-04-19 13:26         ` Stefan Monnier
  2021-04-19 14:28           ` Eli Zaretskii
  0 siblings, 1 reply; 26+ messages in thread
From: Stefan Monnier @ 2021-04-19 13:26 UTC (permalink / raw)
  To: Ali Bahrami; +Cc: Eli Zaretskii, emacs-devel

> We seem to have a basic difference of opinion about /usr/bin.
> I don't think it's OK for programs to drop their data files there,
> and I can't think of any significant other examples of programs that
> do.

I tend to agree.

> About the idea of moving the binaries out of /usr/bin, where we
> could add the pdmp files, the problem there, is that we want users
> have all those names in their PATH.

That reminds me: I think ideally, we should see the pdmp files as the
executables and the temacs file as a runtime library (after all,
a single temacs file can be used with several different pdmp files).

If we could find a nice&portable way to turn the pdmp files into
executables, then that would be The Right Way.


        Stefan




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-19 13:14               ` Eli Zaretskii
@ 2021-04-19 13:34                 ` Stefan Kangas
  2021-04-19 14:39                   ` Eli Zaretskii
  0 siblings, 1 reply; 26+ messages in thread
From: Stefan Kangas @ 2021-04-19 13:34 UTC (permalink / raw)
  To: Eli Zaretskii, Ali Bahrami; +Cc: schwab, rms, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> I assume that the fingerprint changes each time emacs is
>> built? That complicates the manifest, since it would have
>> to be modified each time that happens.
>
> That's "paradise lost": you will have the same problem with the *.eln
> files when native-compilation lands.

I would have hoped that we tried to ensure that builds are reproducible.
IOW, if the source hasn't changed, the fingerprint shouldn't change
either.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-19 13:26         ` Stefan Monnier
@ 2021-04-19 14:28           ` Eli Zaretskii
  2021-04-19 14:50             ` Stefan Monnier
  2021-04-19 15:43             ` Ali Bahrami
  0 siblings, 2 replies; 26+ messages in thread
From: Eli Zaretskii @ 2021-04-19 14:28 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: ali_gnu2, emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Eli Zaretskii <eliz@gnu.org>,  emacs-devel@gnu.org
> Date: Mon, 19 Apr 2021 09:26:57 -0400
> 
> That reminds me: I think ideally, we should see the pdmp files as the
> executables and the temacs file as a runtime library (after all,
> a single temacs file can be used with several different pdmp files).
> 
> If we could find a nice&portable way to turn the pdmp files into
> executables, then that would be The Right Way.

I'd rather we didn't mess with binary files and didn't produce
executables except by running the system linker on object files
created by some system compiler.  Otherwise we will slip back towards
the kind of problems we had with unexec.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-19 13:34                 ` Stefan Kangas
@ 2021-04-19 14:39                   ` Eli Zaretskii
  2021-04-19 15:41                     ` Ali Bahrami
  0 siblings, 1 reply; 26+ messages in thread
From: Eli Zaretskii @ 2021-04-19 14:39 UTC (permalink / raw)
  To: Stefan Kangas; +Cc: emacs-devel, schwab, ali_gnu2, rms

> From: Stefan Kangas <stefankangas@gmail.com>
> Date: Mon, 19 Apr 2021 08:34:19 -0500
> Cc: schwab@linux-m68k.org, rms@gnu.org, emacs-devel@gnu.org
> 
> I would have hoped that we tried to ensure that builds are reproducible.
> IOW, if the source hasn't changed, the fingerprint shouldn't change
> either.

I don't see what do reproducible builds have to do with this.  Ali
wasn't talking about rebuilding without changes.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-19 14:28           ` Eli Zaretskii
@ 2021-04-19 14:50             ` Stefan Monnier
  2021-04-19 15:43             ` Ali Bahrami
  1 sibling, 0 replies; 26+ messages in thread
From: Stefan Monnier @ 2021-04-19 14:50 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: ali_gnu2, emacs-devel

>> That reminds me: I think ideally, we should see the pdmp files as the
>> executables and the temacs file as a runtime library (after all,
>> a single temacs file can be used with several different pdmp files).
>> 
>> If we could find a nice&portable way to turn the pdmp files into
>> executables, then that would be The Right Way.
>
> I'd rather we didn't mess with binary files and didn't produce
> executables except by running the system linker on object files
> created by some system compiler.  Otherwise we will slip back towards
> the kind of problems we had with unexec.

100% agreed.  That was included in "nice&portable".


        Stefan




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-19 14:39                   ` Eli Zaretskii
@ 2021-04-19 15:41                     ` Ali Bahrami
  2021-04-19 15:58                       ` Eli Zaretskii
  0 siblings, 1 reply; 26+ messages in thread
From: Ali Bahrami @ 2021-04-19 15:41 UTC (permalink / raw)
  To: Eli Zaretskii, Stefan Kangas; +Cc: schwab, rms, emacs-devel

On 4/19/21 8:39 AM, Eli Zaretskii wrote:
>> From: Stefan Kangas <stefankangas@gmail.com>
>> Date: Mon, 19 Apr 2021 08:34:19 -0500
>> Cc: schwab@linux-m68k.org, rms@gnu.org, emacs-devel@gnu.org
>>
>> I would have hoped that we tried to ensure that builds are reproducible.
>> IOW, if the source hasn't changed, the fingerprint shouldn't change
>> either.
> 
> I don't see what do reproducible builds have to do with this.  Ali
> wasn't talking about rebuilding without changes.
> 

    That's true, but we do value reproducible builds. It's
hard to know up front sometimes, what information would be
good to share, and what would widen the discussion in
an unwanted way. It just hadn't come up yet.

When we update our systems, the packaging system only
updates the bits that have changed, so reproducible builds
mean that less data travels over the wire, and less data
has to be put in place on the updated system. The savings
are substantial, especially for a package like emacs, where
in principle, the bits only change when we update to
a new release, once every year or two. I'm not an expert,
but I think GNU/Linux installers are doing similar things,
for the same reasons. Distros don't want the bits to change
unless the source does.

It would also really help with that manifest problem.
Perhaps it is "paradise lost", but "paradise lost on
source change" would be a big upgrade.

- Ali



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-19 14:28           ` Eli Zaretskii
  2021-04-19 14:50             ` Stefan Monnier
@ 2021-04-19 15:43             ` Ali Bahrami
  2021-04-19 16:06               ` Eli Zaretskii
  1 sibling, 1 reply; 26+ messages in thread
From: Ali Bahrami @ 2021-04-19 15:43 UTC (permalink / raw)
  To: Eli Zaretskii, Stefan Monnier; +Cc: emacs-devel

On 4/19/21 8:28 AM, Eli Zaretskii wrote:
>> From: Stefan Monnier <monnier@iro.umontreal.ca>
>> Cc: Eli Zaretskii <eliz@gnu.org>,  emacs-devel@gnu.org
>> Date: Mon, 19 Apr 2021 09:26:57 -0400
>>
>> That reminds me: I think ideally, we should see the pdmp files as the
>> executables and the temacs file as a runtime library (after all,
>> a single temacs file can be used with several different pdmp files).
>>
>> If we could find a nice&portable way to turn the pdmp files into
>> executables, then that would be The Right Way.
> 
> I'd rather we didn't mess with binary files and didn't produce
> executables except by running the system linker on object files
> created by some system compiler.  Otherwise we will slip back towards
> the kind of problems we had with unexec.
> 

    I almost floated this yesterday in my reply, but I
dropped it, as things were already getting long, and that
discussion didn't need another digression. However,
I think I see a way to get this data into the binary
itself without doing anything clever with linking, and
only using vanilla C. I saw someone else allude to this
idea here before, so I make no claim to invention, but
let me sketch it out and see if you think it might be viable.

The idea is to get the pdump data into the executable, not
by unexec-like methods, but as a simple C array, compiled
by the C compiler, and linked into emacs like a normal
object. Since we have to build emacs before we generate the
pdump data, this needs 2 stages, much like the temacs/emacs
division as it existed in the unexec days.

Imagine that emacs has an array for this, and a variable
giving the size of the array. A size of 0 means that there
is no pdump content. When emacs starts, it still looks at
the command line option, or for a pdmp file sitting next to
the executable, but failing that, it will fall back to the
array, if present, for its pdump content.

Start with a stub C file containing these variables:

     unsigned char default_pdmp_data[1];
     const size_t  default_pdmp_data_size = 0;

Build temacs with that stub, and then run temacs to
generate a pdmp file, as it does today. temacs will
see that default_pdmp_data[] is empty, and will ignore it.

Then, use a script, or even temacs itself, to generate
a source file that creates a version of default_pdmp_data[]
initialized with the pdmp data from the generated file, and
with a non-zero value for default_pdmp_data_size giving the
size.

And finally, relink emacs once more, using the generated
object in place of the stub.

I notice that there are actually multiple mappings from
todays pdmp file in the process, so the above is probably
over-simplified, and perhaps we need more than one such
array. I'm sure a pdump-knowledgable person would know
what to do about it:

     ali@rtld% pmap `pgrep emacs` | grep pdmp
     00007FE834C00000   7088K rw-----  /usr/lib/emacs/27.2/x86_64-pc-solaris2.11/emacs-gtk.pdmp
     00007FE8352EC000    120K -------  /usr/lib/emacs/27.2/x86_64-pc-solaris2.11/emacs-gtk.pdmp
     00007FE83530A000   3188K rw-----  /usr/lib/emacs/27.2/x86_64-pc-solaris2.11/emacs-gtk.pdmp

(I'm not sure what to make of the mapping with no
access bits set).

I'll guess it also changes the fingerprint, between the
2 stages, but perhaps that's OK, since there's no need
for a fingerprint validation if the data is held in the binary,
or perhaps there's an easy way to finesse that. Perhaps the
fingerprint from temacs gets propagated to the final emacs,
instead of being recomputed?

The overall point of the above is that it only uses
basic C, so it should be robust and portable. Since
we already know that we can read and use the pdump data,
it should work in the same way when seen in an in-memory
array, rather than mapped from a data file.

- Ali



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-19 15:41                     ` Ali Bahrami
@ 2021-04-19 15:58                       ` Eli Zaretskii
  2021-04-19 16:08                         ` Ali Bahrami
  0 siblings, 1 reply; 26+ messages in thread
From: Eli Zaretskii @ 2021-04-19 15:58 UTC (permalink / raw)
  To: Ali Bahrami; +Cc: emacs-devel, schwab, stefankangas, rms

> Cc: schwab@linux-m68k.org, rms@gnu.org, emacs-devel@gnu.org
> From: Ali Bahrami <ali_gnu2@emvision.com>
> Date: Mon, 19 Apr 2021 09:41:17 -0600
> 
> It would also really help with that manifest problem.
> Perhaps it is "paradise lost", but "paradise lost on
> source change" would be a big upgrade.

The "paradise lost" I alluded to is that with native-compilation, when
you modify Emacs and rebuild, some of the hashes in the *.eln files'
names could legitimately change, and thus cause you to update the
manifest.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-19 15:43             ` Ali Bahrami
@ 2021-04-19 16:06               ` Eli Zaretskii
  2021-04-19 16:39                 ` Ali Bahrami
  0 siblings, 1 reply; 26+ messages in thread
From: Eli Zaretskii @ 2021-04-19 16:06 UTC (permalink / raw)
  To: Ali Bahrami; +Cc: monnier, emacs-devel

> Cc: emacs-devel@gnu.org
> From: Ali Bahrami <ali_gnu2@emvision.com>
> Date: Mon, 19 Apr 2021 09:43:46 -0600
> 
> The idea is to get the pdump data into the executable, not
> by unexec-like methods, but as a simple C array, compiled
> by the C compiler, and linked into emacs like a normal
> object.

This has been considered back when the portable dumping ideas were
discussed.  One reason why it was rejected is because it would require
end users to have a C development toolchain if they want to re-dump
Emacs (with some of their own code added).  Support for re-dumping is
a goal in Emacs development, and although we are not there yet, doing
something that would prevent it is a non-starter.  We want users to be
able to re-dump Emacs using just Emacs and nothing else.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-19 15:58                       ` Eli Zaretskii
@ 2021-04-19 16:08                         ` Ali Bahrami
  2021-04-19 17:09                           ` Eli Zaretskii
  0 siblings, 1 reply; 26+ messages in thread
From: Ali Bahrami @ 2021-04-19 16:08 UTC (permalink / raw)
  To: Eli Zaretskii, Ali Bahrami; +Cc: emacs-devel, schwab, stefankangas, rms

On 4/19/21 9:58 AM, Eli Zaretskii wrote:
>> Cc: schwab@linux-m68k.org, rms@gnu.org, emacs-devel@gnu.org
>> From: Ali Bahrami <ali_gnu2@emvision.com>
>> Date: Mon, 19 Apr 2021 09:41:17 -0600
>>
>> It would also really help with that manifest problem.
>> Perhaps it is "paradise lost", but "paradise lost on
>> source change" would be a big upgrade.
> 
> The "paradise lost" I alluded to is that with native-compilation, when
> you modify Emacs and rebuild, some of the hashes in the *.eln files'
> names could legitimately change, and thus cause you to update the
> manifest.
> 

When you say "modify emacs", that's a source change,
and not just a rebuild, right? If so, that's not
a problem for me, because I create a new manifest
in that case already.

What I consider paradise lost is if the fingerprint
changed just because I rebuild the unchanged sources.

- Ali



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-19 16:06               ` Eli Zaretskii
@ 2021-04-19 16:39                 ` Ali Bahrami
  2021-04-19 17:19                   ` Eli Zaretskii
  0 siblings, 1 reply; 26+ messages in thread
From: Ali Bahrami @ 2021-04-19 16:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: monnier, emacs-devel

On 4/19/21 10:06 AM, Eli Zaretskii wrote:
>> Cc: emacs-devel@gnu.org
>> From: Ali Bahrami <ali_gnu2@emvision.com>
>> Date: Mon, 19 Apr 2021 09:43:46 -0600
>>
>> The idea is to get the pdump data into the executable, not
>> by unexec-like methods, but as a simple C array, compiled
>> by the C compiler, and linked into emacs like a normal
>> object.
> 
> This has been considered back when the portable dumping ideas were
> discussed.  One reason why it was rejected is because it would require
> end users to have a C development toolchain if they want to re-dump
> Emacs (with some of their own code added).  Support for re-dumping is
> a goal in Emacs development, and although we are not there yet, doing
> something that would prevent it is a non-starter.  We want users to be
> able to re-dump Emacs using just Emacs and nothing else.
> 

    Is it really a conflict? Can't we do both?

We would still have support for putting a pdump file
next to the binary, or of using the --dump-file option.
We could even retain the PATH_EXEC support if that helped.
I don't think we need to take anything away, in terms of
re-dumping Emacs using just Emacs.

What we'd be doing, is to provide the final default (backstop)
dump internally, rather than on disk. If a dump is present through
any other method, that one would be used. But if not, as I think would
be the case for most emacs users who get it via some distro package,
emacs would 'just work', without their having to even know about
pdump. Most distros wouldn't ship any separate pdmp files, but
end users could add what they want.

- Ali



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-19 16:08                         ` Ali Bahrami
@ 2021-04-19 17:09                           ` Eli Zaretskii
  0 siblings, 0 replies; 26+ messages in thread
From: Eli Zaretskii @ 2021-04-19 17:09 UTC (permalink / raw)
  To: Ali Bahrami; +Cc: emacs-devel, schwab, ali_gnu2, rms, stefankangas

> Cc: stefankangas@gmail.com, schwab@linux-m68k.org, rms@gnu.org,
>         emacs-devel@gnu.org
> From: Ali Bahrami <ali@emvision.com>
> Date: Mon, 19 Apr 2021 10:08:49 -0600
> 
> > The "paradise lost" I alluded to is that with native-compilation, when
> > you modify Emacs and rebuild, some of the hashes in the *.eln files'
> > names could legitimately change, and thus cause you to update the
> > manifest.
> > 
> 
> When you say "modify emacs", that's a source change,
> and not just a rebuild, right?

Yes.

> If so, that's not a problem for me, because I create a new manifest
> in that case already.

Then you shouldn't have problems with emacs-FINGERPRINT.psmp names,
either, and the whole discussion of this sub-issue was based on a
misunderstanding.

> What I consider paradise lost is if the fingerprint
> changed just because I rebuild the unchanged sources.

It doesn't.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-19 16:39                 ` Ali Bahrami
@ 2021-04-19 17:19                   ` Eli Zaretskii
  2021-04-19 18:03                     ` Ali Bahrami
  0 siblings, 1 reply; 26+ messages in thread
From: Eli Zaretskii @ 2021-04-19 17:19 UTC (permalink / raw)
  To: Ali Bahrami; +Cc: monnier, emacs-devel

> Cc: monnier@iro.umontreal.ca, emacs-devel@gnu.org
> From: Ali Bahrami <ali_gnu2@emvision.com>
> Date: Mon, 19 Apr 2021 10:39:17 -0600
> 
> > This has been considered back when the portable dumping ideas were
> > discussed.  One reason why it was rejected is because it would require
> > end users to have a C development toolchain if they want to re-dump
> > Emacs (with some of their own code added).  Support for re-dumping is
> > a goal in Emacs development, and although we are not there yet, doing
> > something that would prevent it is a non-starter.  We want users to be
> > able to re-dump Emacs using just Emacs and nothing else.
> > 
> 
>     Is it really a conflict? Can't we do both?
> 
> We would still have support for putting a pdump file
> next to the binary, or of using the --dump-file option.
> We could even retain the PATH_EXEC support if that helped.
> I don't think we need to take anything away, in terms of
> re-dumping Emacs using just Emacs.
> 
> What we'd be doing, is to provide the final default (backstop)
> dump internally, rather than on disk. If a dump is present through
> any other method, that one would be used. But if not, as I think would
> be the case for most emacs users who get it via some distro package,
> emacs would 'just work', without their having to even know about
> pdump. Most distros wouldn't ship any separate pdmp files, but
> end users could add what they want.

I don't understand: if the user re-dumps Emacs, then under your
suggestion the user will have the same problem with Emacs locating the
pdumper file as we have today.  So what did we fix, exactly?



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Finding the dump (redux)
  2021-04-19 17:19                   ` Eli Zaretskii
@ 2021-04-19 18:03                     ` Ali Bahrami
  0 siblings, 0 replies; 26+ messages in thread
From: Ali Bahrami @ 2021-04-19 18:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: monnier, emacs-devel

On 4/19/21 11:19 AM, Eli Zaretskii wrote:
> 
> I don't understand: if the user re-dumps Emacs, then under your
> suggestion the user will have the same problem with Emacs locating the
> pdumper file as we have today.  So what did we fix, exactly?
> 

Yeah, it's gotten a bit sprawling, so let's try to recap.

The guy who re-dumps emacs doesn't have a problem, and
I'm not trying to do anything for him other than to not
break what he does. Most of this conversation has been your
pointing out ways in which I might be causing a problem for
that guy, and my trying to provide good answers for how
we might avoid that. So we've fixed nothing for him, but
importantly, have also broken exactly nothing.

I started this by noting that a symlink pointing at an
emacs binary, fails to find it's pdump file. That's
what I want to fix. And, we've now got 3 offers on the
table for how we might do that:

     - Use the basename from the realpath() when searching
       PATH_EXEC. My first patch looked at both the given
       and realpath() versions, but I now think just using
       the realpath() one is the better, simpler, answer.

     - Use the fingerprint for the PATH_EXEC pdmp files, rather
       than the executable name.

     - Put the data in the executable, but avoiding
       the unexec pitfalls, which are well known.

In this, I represent the many people who run emacs from
prebuilt packages, and who will never dump anything, or
even probably ever know what dumping is. What they do want,
is to be able to run emacs via an arbitrary symlink name,
and have it just work. Any of the above options fixes that.

And in particular, we put such a symlink in /usr/bin,
to provide a generic 'emacs' to users who don't care
about the UI variants. pdump has broken that use, and
I want to fix it, without dropping data files in /usr/bin.

- Ali



^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-04-19 18:03 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-15 19:38 Finding the dump (redux) Ali Bahrami
2021-04-17 18:45 ` Eli Zaretskii
2021-04-18  0:15   ` Ali Bahrami
2021-04-18  7:55     ` Eli Zaretskii
2021-04-18  8:18       ` Andreas Schwab
2021-04-18 16:05         ` Glenn Morris
2021-04-19  4:53         ` Richard Stallman
2021-04-19  8:35           ` Andreas Schwab
2021-04-19 13:00             ` Eli Zaretskii
2021-04-19 13:04             ` Ali Bahrami
2021-04-19 13:14               ` Eli Zaretskii
2021-04-19 13:34                 ` Stefan Kangas
2021-04-19 14:39                   ` Eli Zaretskii
2021-04-19 15:41                     ` Ali Bahrami
2021-04-19 15:58                       ` Eli Zaretskii
2021-04-19 16:08                         ` Ali Bahrami
2021-04-19 17:09                           ` Eli Zaretskii
2021-04-19  4:01       ` Ali Bahrami
2021-04-19 13:26         ` Stefan Monnier
2021-04-19 14:28           ` Eli Zaretskii
2021-04-19 14:50             ` Stefan Monnier
2021-04-19 15:43             ` Ali Bahrami
2021-04-19 16:06               ` Eli Zaretskii
2021-04-19 16:39                 ` Ali Bahrami
2021-04-19 17:19                   ` Eli Zaretskii
2021-04-19 18:03                     ` Ali Bahrami

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).