unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Finding the dump (redux)
@ 2021-04-15 19:38 Ali Bahrami
  2021-04-17 18:45 ` Eli Zaretskii
  0 siblings, 1 reply; 26+ messages in thread
From: Ali Bahrami @ 2021-04-15 19:38 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 6177 bytes --]

Hi,

     I'm a bit late to the pdump party, as we had just updated
the emacs delivered with Solaris shortly before the version
with pdumper arrived, and we don't tend to update more than once
every year or two. It's overdue though, and I've been playing with
emacs 27.2 on Solaris this week. It's great, but I find myself wanting
to request a small tweak to the search for the pdmp file.

First though, I want to say that pdumper is awesome, and has
exceeded my hopes and expectations. It seems just as fast,
and it is so nice to be out of the unexec game. Who knows,
we might someday even be able to get rid of the special dldump()
function that was written for emacs decades ago. :-) In any event,
I'm running 27.2 on my desktop as a position independent executable
(PIE) and with ASLR enabled, and everything seems to working as it
should. Thank You Daniel for persisting, and for this big step
forward!

The issue I'm hitting is similar to one discussed over 2
years ago:

      "Finding the dump"
      https://lists.gnu.org/archive/html/emacs-devel/2019-01/msg00558.html

At that time, Eli made a change to improve the way pdmp
files are found in PATH_EXEC, using the basename(argv[0])
to find matching dump files. It's useful, but not quite
enough to handle the way we deliver emacs on Solaris.

Like others, we deliver the 3 UI variants of emacs so that
users have their choice. Along with that is a "mediator" symlink,
provided by the packaging system, that provides the generic
'emacs' name, pointing at the default version. Mediators are
a Solaris feature, and there's a packaging command by which
the system owner can change them to point at their own favored
default. On my desktop, it looks like this:

      % cd /usr/bin
      % ls -alFh emacs*
      lrwxrwxrwx   1 root   root     9 Apr 14 22:15 emacs -> emacs-gtk*
      -r-xr-xr-x   2 root   bin  7.05M Apr 14 22:15 emacs-gtk*
      -r-xr-xr-x   2 root   bin  7.05M Apr 14 22:15 emacs-gtk-27.2*
      -r-xr-xr-x   2 root   bin  6.09M Apr 14 22:15 emacs-nox*
      -r-xr-xr-x   2 root   bin  6.09M Apr 14 22:15 emacs-nox-27.2*
      -r-xr-xr-x   2 root   bin  7.07M Apr 14 22:15 emacs-x*
      -r-xr-xr-x   2 root   bin  7.07M Apr 14 22:15 emacs-x-27.2*
      -r-xr-xr-x   1 root   bin    47K Apr 14 22:15 emacsclient*

And the pdmp files are similarly named, to match the
emacs executables:

      % cd /usr/lib/emacs/27.2/x86_64-pc-solaris2.11
      % ls -alhF *.pdmp
      -r--r--r--   2 root   bin  10.2M Apr 14 22:15 emacs-gtk-27.2.pdmp
      -r--r--r--   2 root   bin  10.2M Apr 14 22:15 emacs-gtk.pdmp
      -r--r--r--   2 root   bin  9.64M Apr 14 22:15 emacs-nox-27.2.pdmp
      -r--r--r--   2 root   bin  9.64M Apr 14 22:15 emacs-nox.pdmp
      -r--r--r--   2 root   bin  10.1M Apr 14 22:15 emacs-x-27.2.pdmp
      -r--r--r--   2 root   bin  10.1M Apr 14 22:15 emacs-x.pdmp

As noted, the default is emacs-gtk, but I can change that:

      # pkg set-mediator -I emacs-x emacs
      <...pkg output elided...>
      # ls -alFh /usr/bin/emacs
      rwxrwxrwx   1 root  root   7 Apr 15 11:47 /usr/bin/emacs -> emacs-x*

Like a lot of GNU/Linux distributions, we used to use a
shell script for this, but we moved to the mediator years
ago. With the new emacs, this works well for the explicitly
named versions (emacs-gtk, emacs-nox, and emacs-x). But, when
run via the 'emacs' symlink (the usual case), the binary is
unable to find its pdmp file. It still runs, but ends up
loading all the separate elc files, which is undesirable.

I can solve this without changing emacs, superficially, by just
adding a corresponding mediator symlink for emacs.pdmp in the
PATH_EXEC directory. But then, if an end user were to make their
own symlink to one of these emacs variants, they'll still face
this problem:

      % ln -s /usr/bin/emacs-gtk ~/bin/myemacs

Sure, that user could specify the pdmp file on the command
line, or wrap it in a shell script, but for default behavior,
I don't think that should be necessary. Why shouldn't an emacs,
found by whatever path, not be able to find its default pdmp file
in PATH_EXEC? I believe that the search done by emacs is just
missing a check that would solve this. The problem is that
load_pdump() in src/emacs.c only searches PATH_EXEC for
emacs.pdmp, and then for the pdmp file with the name given by:

      basename(argv[0])

If it were to also search for

      basename(realpath(argv[0]))

then it would find pdmp files that match either the given, or
"absolute" names, rather than just the given name, and symlinks
that point at emacs executables would always find the default
pdmp file that goes with them, whether we deliver them, or the
user makes their own.

I note that the discussion in 2019 spent a lot of time discussing
how realpath(argv[0])) can be unreliable, and that applications can
change argv[0]. That's true, but assuming that this is understood,
it is still useful most of the time. Note that the code in load_pdump()
already calls realpath(). It uses the result to locate the corresponding
pdmp file in the directory containing the executable. The only thing
missing is to take the next step, and apply the same information to
the search of PATH_EXEC.

It's not necessary to add more code to do this. It suffices to refactor
the existing code into a couple of functions that can be used to cover
the 2 existing cases, plus this new one. I've attached a patch to Emacs 27.2
that does that, to this message. Note however that I don't have a contributor
agreement in place, and it might be difficult for me to do that any time
soon. I'm hoping that this patch might still qualify, on the basis that
it is just a rearrangement of the existing code, and not at all novel.
Or, perhaps someone else will be willing to take the ideas and quickly
put together an official fix independently. Those ideas are:

      - Move the repeated code into functions, to clean up load_pdump().

      - Save the basename from the result of calling realpath() during the
        search of the executable directory.

      - During the PATH_EXEC stage, use the saved realpath basename to
        add a check for that name.

Thanks for your consideration.

- Ali


[-- Attachment #2: mediator.patch --]
[-- Type: text/plain, Size: 5111 bytes --]

# Enhance emacs to also look for the pdmp file in PATH_EXEC using the
# name seen in realpath(argv[0]). Emacs is typically run via the mediated
# /usr/bin/emacs symlink, which points at one of the emacs variants
# (emacs-gtk, emacs-x, emacs-nox). Adding the check for the name found
# via realpath() allows the actual emacs binary to find its pdump file.
#
--- emacs-27.2.orig/src/emacs.c	2021-01-28 10:52:38.000000000 +0000
+++ emacs-27.2/src/emacs.c	2021-04-14 13:12:46.732537010 +0000
@@ -766,6 +766,55 @@
 #endif	/* !WINDOWSNT */
 }
 
+/*
+ * basename() for pdump paths
+ */
+static char *
+load_pdump_basename(char *path)
+{
+  char *p, *last_sep = NULL;
+
+  for (p = path; *p; p++)
+    {
+      if (IS_DIRECTORY_SEP (*p))
+	 last_sep = p;
+    }
+
+    return last_sep ? last_sep + 1 : path;
+}
+
+/*
+ * Given a basename for the running emacs, attempt to open
+ * the corresponding pdump file in the directory given by path.
+ */
+static int
+load_pdump_from_path(const char *path, const char *name, const char *suffix,
+		     char **dump_file, ptrdiff_t *bufsize)
+{
+  ptrdiff_t needed = (strlen (path)
+		      + 1
+		      + strlen (name)
+		      + strlen (suffix)
+		      + 1);
+  if (*bufsize < needed)
+    {
+      xfree (*dump_file);
+      *dump_file = xmalloc (needed);
+      *bufsize = needed;
+    }
+#ifdef DOS_NT
+  ptrdiff_t name_len = strlen (name);
+  if (name_len >= 4
+      && c_strcasecmp (name + name_len - 4, ".exe") == 0)
+    sprintf (*dump_file, "%s%c%.*s%s", path, DIRECTORY_SEP,
+	     (int)(name_len - 4), name, suffix);
+  else
+#endif
+    sprintf (*dump_file, "%s%c%s%s",
+	     path, DIRECTORY_SEP, name, suffix);
+  return pdumper_load (*dump_file);
+}
+
 static void
 load_pdump (int argc, char **argv)
 {
@@ -817,6 +866,7 @@
      encoding the system natively uses for filesystem access, so
      there's no need for character set conversion.  */
   ptrdiff_t bufsize;
+  char *argv0_realbase = NULL;
   dump_file = load_pdump_find_executable (argv[0], &bufsize);
 
   /* If we couldn't find our executable, go straight to looking for
@@ -838,6 +888,9 @@
 #ifndef WINDOWSNT
       bufsize = exenamelen + 1;
 #endif
+      /* Save the realpath basename for possible use below */
+      argv0_realbase = xstrdup (load_pdump_basename (real_exename));
+
       if (strip_suffix)
         {
 	  ptrdiff_t strip_suffix_length = strlen (strip_suffix);
@@ -869,55 +922,23 @@
   /* Look for "emacs.pdmp" in PATH_EXEC.  We hardcode "emacs" in
      "emacs.pdmp" so that the Emacs binary still works if the user
      copies and renames it.  */
-  const char *argv0_base = "emacs";
-  ptrdiff_t needed = (strlen (path_exec)
-                      + 1
-                      + strlen (argv0_base)
-                      + strlen (suffix)
-                      + 1);
-  if (bufsize < needed)
+  result = load_pdump_from_path (path_exec, "emacs", suffix,
+				 &dump_file, &bufsize);
+
+  /* Finally, look for basename(argv0)+".pdmp" in PATH_EXEC, using both
+     absolute (realpath) and given forms. This way, they can access emacs
+     via a symlink of a different name, or rename both the executable and
+     its pdump file in PATH_EXEC, and have several Emacs configurations
+     in the same versioned libexec subdirectory.  */
+  if ((result == PDUMPER_LOAD_FILE_NOT_FOUND) && (argv0_realbase != NULL))
     {
-      xfree (dump_file);
-      dump_file = xpalloc (NULL, &bufsize, needed - bufsize, -1, 1);
+      result = load_pdump_from_path (path_exec, argv0_realbase, suffix,
+				     &dump_file, &bufsize);
     }
-  sprintf (dump_file, "%s%c%s%s",
-           path_exec, DIRECTORY_SEP, argv0_base, suffix);
-  result = pdumper_load (dump_file);
-
   if (result == PDUMPER_LOAD_FILE_NOT_FOUND)
     {
-      /* Finally, look for basename(argv0)+".pdmp" in PATH_EXEC.
-	 This way, they can rename both the executable and its pdump
-	 file in PATH_EXEC, and have several Emacs configurations in
-	 the same versioned libexec subdirectory.  */
-      char *p, *last_sep = NULL;
-      for (p = argv[0]; *p; p++)
-	{
-	  if (IS_DIRECTORY_SEP (*p))
-	    last_sep = p;
-	}
-      argv0_base = last_sep ? last_sep + 1 : argv[0];
-      ptrdiff_t needed = (strlen (path_exec)
-			  + 1
-			  + strlen (argv0_base)
-			  + strlen (suffix)
-			  + 1);
-      if (bufsize < needed)
-	{
-	  xfree (dump_file);
-	  dump_file = xmalloc (needed);
-	}
-#ifdef DOS_NT
-      ptrdiff_t argv0_len = strlen (argv0_base);
-      if (argv0_len >= 4
-	  && c_strcasecmp (argv0_base + argv0_len - 4, ".exe") == 0)
-	sprintf (dump_file, "%s%c%.*s%s", path_exec, DIRECTORY_SEP,
-		 (int)(argv0_len - 4), argv0_base, suffix);
-      else
-#endif
-      sprintf (dump_file, "%s%c%s%s",
-	       path_exec, DIRECTORY_SEP, argv0_base, suffix);
-      result = pdumper_load (dump_file);
+      result = load_pdump_from_path (path_exec, load_pdump_basename (argv[0]),
+				     suffix, &dump_file, &bufsize);
     }
 
   if (result != PDUMPER_LOAD_SUCCESS)
@@ -929,6 +950,8 @@
 
  out:
   xfree (dump_file);
+  if (argv0_realbase)
+	  xfree (argv0_realbase);
 }
 #endif /* HAVE_PDUMPER */
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-04-19 18:03 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-15 19:38 Finding the dump (redux) Ali Bahrami
2021-04-17 18:45 ` Eli Zaretskii
2021-04-18  0:15   ` Ali Bahrami
2021-04-18  7:55     ` Eli Zaretskii
2021-04-18  8:18       ` Andreas Schwab
2021-04-18 16:05         ` Glenn Morris
2021-04-19  4:53         ` Richard Stallman
2021-04-19  8:35           ` Andreas Schwab
2021-04-19 13:00             ` Eli Zaretskii
2021-04-19 13:04             ` Ali Bahrami
2021-04-19 13:14               ` Eli Zaretskii
2021-04-19 13:34                 ` Stefan Kangas
2021-04-19 14:39                   ` Eli Zaretskii
2021-04-19 15:41                     ` Ali Bahrami
2021-04-19 15:58                       ` Eli Zaretskii
2021-04-19 16:08                         ` Ali Bahrami
2021-04-19 17:09                           ` Eli Zaretskii
2021-04-19  4:01       ` Ali Bahrami
2021-04-19 13:26         ` Stefan Monnier
2021-04-19 14:28           ` Eli Zaretskii
2021-04-19 14:50             ` Stefan Monnier
2021-04-19 15:43             ` Ali Bahrami
2021-04-19 16:06               ` Eli Zaretskii
2021-04-19 16:39                 ` Ali Bahrami
2021-04-19 17:19                   ` Eli Zaretskii
2021-04-19 18:03                     ` Ali Bahrami

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).