From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ali Bahrami Newsgroups: gmane.emacs.devel Subject: Re: Finding the dump (redux) Date: Sun, 18 Apr 2021 22:01:51 -0600 Message-ID: <3079facf-9607-0069-cd49-d2a28fcc0d4d@emvision.com> References: <5decf0e7-8f26-3fc7-7094-1bfdb211eefc@emvision.com> <83k0p0vjgn.fsf@gnu.org> <0aa226dd-50e3-ec62-e0ac-2b9194c3d90d@emvision.com> <831rb8uiv0.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="19137"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.9.1 Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Apr 19 06:03:27 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lYL89-0004r4-7b for ged-emacs-devel@m.gmane-mx.org; Mon, 19 Apr 2021 06:03:25 +0200 Original-Received: from localhost ([::1]:41660 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lYL88-0000QY-9E for ged-emacs-devel@m.gmane-mx.org; Mon, 19 Apr 2021 00:03:24 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:50890) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lYL6j-0007TN-A9 for emacs-devel@gnu.org; Mon, 19 Apr 2021 00:01:57 -0400 Original-Received: from gateway.emvision.com ([71.33.253.1]:13751 helo=emvision.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lYL6g-0003uk-Pe; Mon, 19 Apr 2021 00:01:57 -0400 Original-Received: from [198.182.198.2] (moose.emvision.com [198.182.198.2]) by emvision.com (8.15.2+Sun/8.15.2) with ESMTP id 13J41ph3008887; Sun, 18 Apr 2021 22:01:51 -0600 (MDT) In-Reply-To: <831rb8uiv0.fsf@gnu.org> Content-Language: en-US X-Greylist: inspected by milter-greylist-4.6.2 (emvision.com [192.168.0.2]); Sun, 18 Apr 2021 22:01:52 -0600 (MDT) for IP:'198.182.198.2' DOMAIN:'moose.emvision.com' HELO:'[198.182.198.2]' FROM:'ali_gnu2@emvision.com' RCPT:'' X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (emvision.com [192.168.0.2]); Sun, 18 Apr 2021 22:01:52 -0600 (MDT) Received-SPF: none client-ip=71.33.253.1; envelope-from=ali_gnu2@emvision.com; helo=emvision.com X-Spam_score_int: -14 X-Spam_score: -1.5 X-Spam_bar: - X-Spam_report: (-1.5 / 5.0 requ) BAYES_00=-1.9, KHOP_HELO_FCRDNS=0.4, NICE_REPLY_A=-0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:268149 Archived-At: Hi Eli, Your message about not doing anything now, and possibly, doing something later, is heard loud and clear. As I said up front, I have a couple of ways to fix this that don't require anyone else, so I'm fine with that. I brought this up because I think the way this works is slightly broken, and could be fixed. I thought it was worth a shot to see if we can't do something at the source, rather than having various actors like myself apply one-off hacks. I'm sure we'll get there. I am however, going to continue on and respond to some of your comments, because I'm not convinced by some of them, and I'd like to explain why, so that if something is done later, this stuff would have been discussed. I also have a question: I'm warming to your suggestion about how we might just refuse to look for the pdmp files for symlinks, and instead use only the realpath basename for those cases. I think that could be a nice simplification that might be smaller and safer than what is currently on the table, and which does not add an addition load, hence no slowdown. If I were to put together a patch to do that, would you have any interest in looking at it? On 4/18/21 1:55 AM, Eli Zaretskii wrote: > Then place a symlink emacs.pdmp there, and have the actual pdumper > file where you want it, under any name you want. Or move/copy the > emacs-* executables to another directory, make the 'emacs' symlink > resolve to those emacs-* files, and have the pdumper files in the same > place. Or configure with a different value of $libdir when you build > each emacs-* variant, and then have the corresponding emacs.pdmp file > in the directory under /usr/lib that is private to that variant. Or > use the --dump-file command-line option. > > There are many possible solutions that already work, so why insist on > something that doesn't work? That it happened to work with unexec is > just sheer luck: the upstream Emacs project never explicitly supported > such configurations. We seem to have a basic difference of opinion about /usr/bin. I don't think it's OK for programs to drop their data files there, and I can't think of any significant other examples of programs that do. You say this as if it's a normal answer, but it seems odd and atypical to me. So I'm not going to put anything in /usr/bin other than exectables, or symlinks that point at executables. To be honest, I'm a bit surprised to be the first person to bring this up, so either I'm alone in thinking it's wrong to put data files in /usr/bin, or I'm just early. Time will tell I suppose. I do think putting the pdmp files next to the executable is a fine answer for other places, particularly in the emacs build tree. But it doesn't make sense for /usr/bin, to me. About the idea of moving the binaries out of /usr/bin, where we could add the pdmp files, the problem there, is that we want users have all those names in their PATH. Let me explain, illustrated by this excerpt from the original message: % cd /usr/bin % ls -alFh emacs* lrwxrwxrwx 1 root root 9 Apr 14 22:15 emacs -> emacs-gtk* -r-xr-xr-x 2 root bin 7.05M Apr 14 22:15 emacs-gtk* -r-xr-xr-x 2 root bin 7.05M Apr 14 22:15 emacs-gtk-27.2* -r-xr-xr-x 2 root bin 6.09M Apr 14 22:15 emacs-nox* -r-xr-xr-x 2 root bin 6.09M Apr 14 22:15 emacs-nox-27.2* -r-xr-xr-x 2 root bin 7.07M Apr 14 22:15 emacs-x* -r-xr-xr-x 2 root bin 7.07M Apr 14 22:15 emacs-x-27.2* -r-xr-xr-x 1 root bin 47K Apr 14 22:15 emacsclient* The intent here is that users who explicitly want the GTK version will type 'emacs-gtk' or 'emacs-gtk-27.2'. The story is similar for the Lucid (emacs-x), or pure tty (emacs-nox) versions. Users who don't care, and just want to run something reasonable run 'emacs'. Moving those binaries elsewhere would let us put pdmp files next to them, yes, but since they won't be in anyone's PATH, it's not very useful. > >> A related idea that's been floated before would be for the >> executable to carry the default dump data within itself. > > That idea didn't fly because it meant we again need to comply to > various binary formats, which change with time out of our control. > We'd eventually get into the same trouble as with unexec: the > corresponding developers will refuse supporting the tricks we play for > that to work, exactly as glibc dropped support for malloc hooks we > needed to support unexec. > > More generally, that doesn't solve the general problem of how Emacs > finds files it needs to start. Even if the dump data is in the > executable itself, there could be other files that are similarly > needed at startup. We already have that with the native-compilation > feature: the *.eln files produced from the preloaded Lisp packages > need to be located at startup, otherwise Emacs will be unable to > start. We cannot possibly put everything inside the Emacs executable, > even if we wanted to. Well, I did say that I wasn't suggesting we go there, and I agree that we don't want to. It's not a given though that things must become a mess like unexec. Unexec was a mess because the approach is inherently messy. >>> But realpath(argv[0]) can produce to a file in another directory, >>> because realpath expands all the symlinks, not just that of the >>> basename. Does it make sense to look up the .pdmp file in the >>> directory of the original argv[0] when it is a symlink? >> >> It's an interesting question, and I think can be argued >> either way. > > Exactly. So who's to say which way is TRT? Whatever we decide, there > could be another distro out there which will argue that the opposite > makes sense because "it worked for them until now". I'd say, people at the top, like yourself ultimately decide, just as you do with many other things that various folks might second guess later. My point was that I really don't think it matters in this case, because both outcomes are defensible. Just pick the one you prefer and document it. As I mentioned above though, I'm really warming to this "no" option, as it solves the problem, is simple to explain, and doesn't add any additional loads. >> I can imagine a scenario where it might be useful to >> say "yes". It might offer a pretty slick way for end users >> to create arbitrary pdmp files and associate them to specific >> purposes. Suppose for instance that I want to use a special 'X' >> dump file when working on "Project X" code. I could create a special >> name for that emacs variant as a symlink to the basic emacs-gtk >> in my personal bin: >> >> % ln -s /usr/bin/emacs-gtk ~/bin/emacsX >> >> Then, if I were to create ~/bin/emacsX.pdmp, and if emacs were >> willing to see it as a pdmp file to be loaded, then I could >> run my special emacsX, and get the standard emacs (from the >> symlink) using my specialized X pdmp. > > We support the --dump-file command-line option for this purpose: using > that you can have the pdumper file under any file name you want, all > you need is a shell script or an alias that would add that option. I think that's a good answer. And, it's also possibly how we might settle the "Who's to say" question posed above. If we decide not to load a pdmp file based on the name of a symlink, then the fix becomes a matter of simply looking for the realpath basename in PATH_EXEC, where we currently look for the given basename. The number of possible loads remains the same as before, and debates about about slowdowns become moot. > And if you are thinking about trying both, then (a) there's still the > question of order (which could affect the correctness), and (b) it > makes the startup slower, and soon enough people will start > complaining about that. ... > >> The reverse question is, what harm does it do to look in PATH_EXEC >> for both names? > > See above: it makes startup slower, and also runs the risk of picking > up the wrong pdumper file and failing the startup altogether. > I'm not buying that this makes startup slower, and there are 2 layers to my reasoning. The first layer is that operating systems put a lot of effort into making stat() on local files cheap. Anything that does path searching like shells, or like emacs when it searches for lisp files, relies on this. Certainly, there's often a cache involved as well, but those cases do many lookups, rather than the 2 we currently do, or the additional one (making 3) that I'm suggested. You can measure the cost of this added stat(), but you'll never feel it. The second layer is that we're talking about the stage where we start looking at PATH_EXEC. The PATH_EXEC stage is a backstop that is only run when the --dump-file command-line option was not used, and no pdmp file is found next to the executable. So in the world that follows your advice of using those features, the PATH_EXEC stage never runs, and costs 0. If we do reach the PATH_EXEC stage, and we fail to find a pdmp file, then the next thing that happens is that emacs will proceed to search for, compile, and load, numerous elisp files, spewing their names to stdout as it goes. The cost of this is definitely felt, unlike the attempt to open the realpath basename version of the pdmp file, which if successful, will prevent this expensive outcome. So now, let's think about the issue of finding the wrong pdumper file. I'm not sure I see how this can happen. The PATH_EXEC directory isn't a place where emacs users put arbitrary content. The names found here correspond to the names that emacs is installed under on the system. If the user invents their own emacs name (e.g. myemacs), then there will be no file in PATH_EXEC for them to accidentally load. And if they run emacs under one of the installed names, then they're going to find the right file. One point I'd make here is that your suggestion that we not chase pdmp files for symlinks used to run emacs really simplifies this, because then the only names we'll ever look for in PATH_EXEC are those of the actual installed binaries, and assuming the binary names and pdmp names match, there can be no mixups. >>> This is not enough, if we want to support *.pdmp files that have >>> arbitrary names. For example, when Emacs is invoked as "../emacs" (or >>> any other relative file name which includes slashes), we currently >>> don't expand symlinks, so with your proposal "emacs" and "../emacs" >>> will behave differently. >> >> I'm not sure I understand. I have the proposed bits installed >> on my desktop right now, and this does work as I expect. >> >> % cd /usr/bin >> % ../bin/emacs >> >> As does >> >> % emacs > > That's because you are running Emacs installed, so it looks for the > pdumper file in the hardcoded place under PATH_EXEC, no matter what. > I was alluding to the case that you run Emacs uninstalled, when the > pdumper file is in the same directory where the Emacs binary lives. In the case where emacs is uninstalled and the pdumper file is next to it, we never look in PATH_EXEC, so my patch, which alters that code, is irrelevant. >> I don't see any code in load_pdump() that special cases >> the case that includes slashes > > Look in load_pdump_find_executable, and you will see it. I do see it, thanks. But note that load_pdump() calls realpath() on the result from load_pdump_find_executable(), and so, both 'emacs' and '../bin/emacs' yield the same absolute path (e.g. /usr/bin/emacs-gtk) in either case, and my patch sees the same string in either case. > Having said all of the above, since we are currently working on > related issues on the native-compilation branch, it is possible that > we eventually will teach Emacs to support also the arrangement you > want to work in your case. But I make no promises, and in any case > this will not hit the street before Emacs 28.1, which is probably > still a year or more in the future. We don't expect another 27.x > release, and even if there is such a release, it will probably be to > fix some very grave bug, so unsuitable for extending existing > features. So it's your call whether to wait for Emacs 28 in the hope > that maybe it fixes your problem, or redesign your deployment now to > use some arrangement that already works. > OK, sounds good. Thanks. - Ali