unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Alternative solution to stat storm problem
@ 2022-01-03 20:05 Farid Zakaria
  2022-01-08 21:22 ` Ludovic Courtès
  0 siblings, 1 reply; 7+ messages in thread
From: Farid Zakaria @ 2022-01-03 20:05 UTC (permalink / raw)
  To: guix-devel; +Cc: Scogland, Tom, Carlos Maltzahn

Hi!

I was very inspired by the blog post on a per-application
ld.so.conf.cache to solve the stat-storm problem[1].

I wanted to share here another approach I am pursuing and seek to
eventually try to merge into NixOS however I thought starting a
discussion here on Guix would be fruitful since it was the genesis of
the idea.

I have written a tool _shrinkwrap_ [2] that takes all transitive
dynamic shared object dependencies (only those listed in DT_NEEDED)
and turns them into an absolute path.

This has the same result as caching the entries and avoids the
unnecessary failed attempts at trying each RUNPATH entry.

Using the same demo application _emacs_ shows as much as well:

$strace -e openat,stat -c ./emacs_stamped --version
GNU Emacs 27.2
Copyright (C) 2021 Free Software Foundation, Inc.
GNU Emacs comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GNU Emacs
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING.
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.000950           9       104         1 openat
------ ----------- ----------- --------- --------- ----------------
100.00    0.000950           9       104         1 total

$strace -e openat,stat -c
/nix/store/vvxcs4f8x14gyahw50ssff3sk2dij2b3-emacs-27.2/bin/.emacs-27.2-wrapped
--version
GNU Emacs 27.2
Copyright (C) 2021 Free Software Foundation, Inc.
GNU Emacs comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GNU Emacs
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING.
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.034121          18      1823      1720 openat
------ ----------- ----------- --------- --------- ----------------
100.00    0.034121          18      1823      1720 total

Happy to hear some thoughts on this approach.

[1] https://guix.gnu.org/blog/2021/taming-the-stat-storm-with-a-loader-cache/
[2] https://github.com/fzakaria/shrinkwrap


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Alternative solution to stat storm problem
  2022-01-03 20:05 Alternative solution to stat storm problem Farid Zakaria
@ 2022-01-08 21:22 ` Ludovic Courtès
  2022-01-09  3:00   ` Farid Zakaria
  0 siblings, 1 reply; 7+ messages in thread
From: Ludovic Courtès @ 2022-01-08 21:22 UTC (permalink / raw)
  To: Farid Zakaria; +Cc: guix-devel, Scogland, Tom, Carlos Maltzahn

Hi Farid,

Farid Zakaria <fmzakari@ucsc.edu> skribis:

> I have written a tool _shrinkwrap_ [2] that takes all transitive
> dynamic shared object dependencies (only those listed in DT_NEEDED)
> and turns them into an absolute path.
>
> This has the same result as caching the entries and avoids the
> unnecessary failed attempts at trying each RUNPATH entry.
>
> Using the same demo application _emacs_ shows as much as well:

Nice!  I think that’s another interesting way to address the problem.

I guess the advantage is that you don’t need the ld.so patch.  The
downside is that PatchELF needs to be able to write longer NEEDED
strings in the dynamic section, which it may not always be successful at
(I think?).

Also, I wonder if the absolute file names in NEEDED interfere with uses
of $LD_LIBRARY_PATH (making it impossible to force use of another
libxyz.so than the one that would be found in RUNPATH.)

Thoughts?

Thanks for sharing!

Ludo’.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Alternative solution to stat storm problem
  2022-01-08 21:22 ` Ludovic Courtès
@ 2022-01-09  3:00   ` Farid Zakaria
  2022-01-09  3:05     ` Farid Zakaria
  2022-01-18 13:56     ` Ludovic Courtès
  0 siblings, 2 replies; 7+ messages in thread
From: Farid Zakaria @ 2022-01-09  3:00 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel, Scogland, Tom, Carlos Maltzahn

Hi Ludovic,

On Sat, Jan 8, 2022 at 1:22 PM Ludovic Courtès <ludo@gnu.org> wrote:
>
> Hi Farid,
>
> Farid Zakaria <fmzakari@ucsc.edu> skribis:
>
> > I have written a tool _shrinkwrap_ [2] that takes all transitive
> > dynamic shared object dependencies (only those listed in DT_NEEDED)
> > and turns them into an absolute path.
> >
> > This has the same result as caching the entries and avoids the
> > unnecessary failed attempts at trying each RUNPATH entry.
> >
> > Using the same demo application _emacs_ shows as much as well:
>
> Nice!  I think that’s another interesting way to address the problem.
>
> I guess the advantage is that you don’t need the ld.so patch.  The
> downside is that PatchELF needs to be able to write longer NEEDED
> strings in the dynamic section, which it may not always be successful at
> (I think?).

I can't claim to be a ELF specification guru but I have not
encountered that longer NEEDED strings to be a cause for failure.
The emacs example is a pretty good test case because the transitive
closure of all NEEDED libraries is quite large, which all seem to be
added successfully to the ELF header.

The benefit to me seems:
1 - does not need a glibc patch for functionality (although for other
libc such as musl it might in this case
https://www.openwall.com/lists/musl/2021/12/21/1)
2 - understanding the dependencies of an application become simpler
3 - there are esoteric cases where in fact libraries might link to the
wrong libraries (although they are correct at build time) given a
RUNPATH/RPATH since there are subtleties with the inheritance model.

I'm actually researching ways to improve (3) as well through
mentorship with Tom Scogland by researching alternative ways to do
linking:
- RUNPATH per NEEDED
- the ability to specify whether a RUNPATH should be inherited or not
to downstream dependencies

> Also, I wonder if the absolute file names in NEEDED interfere with uses
> of $LD_LIBRARY_PATH (making it impossible to force use of another
> libxyz.so than the one that would be found in RUNPATH.)

Correct. For a system with reproducibility in mind this can perhaps be
a desired feature.
It is the current limitation of the proposal.

In fact, Carlos brought up a great philosophical question:
"Is linking to libraries through a content-addressable value allowed
for LGPL software?"
What if the linked address also forced the content-address by having
it resolve to something on IPFS ?

> Thoughts?
>
> Thanks for sharing!
>
> Ludo’.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Alternative solution to stat storm problem
  2022-01-09  3:00   ` Farid Zakaria
@ 2022-01-09  3:05     ` Farid Zakaria
  2022-01-10 18:13       ` Tom Scogland
  2022-01-18 13:56     ` Ludovic Courtès
  1 sibling, 1 reply; 7+ messages in thread
From: Farid Zakaria @ 2022-01-09  3:05 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel, Scogland, Tom, Carlos Maltzahn

I did forget to mention the point of LD_LIBRARY_PATH, that you can
still make use of LD_PRELOAD and I am also thinking about maybe using
something like dlopen-resolver[1] to further expand the NEEDED
section.

[1] https://github.com/Mic92/dlopen-resolver

On Sat, Jan 8, 2022 at 7:00 PM Farid Zakaria <fmzakari@ucsc.edu> wrote:
>
> Hi Ludovic,
>
> On Sat, Jan 8, 2022 at 1:22 PM Ludovic Courtès <ludo@gnu.org> wrote:
> >
> > Hi Farid,
> >
> > Farid Zakaria <fmzakari@ucsc.edu> skribis:
> >
> > > I have written a tool _shrinkwrap_ [2] that takes all transitive
> > > dynamic shared object dependencies (only those listed in DT_NEEDED)
> > > and turns them into an absolute path.
> > >
> > > This has the same result as caching the entries and avoids the
> > > unnecessary failed attempts at trying each RUNPATH entry.
> > >
> > > Using the same demo application _emacs_ shows as much as well:
> >
> > Nice!  I think that’s another interesting way to address the problem.
> >
> > I guess the advantage is that you don’t need the ld.so patch.  The
> > downside is that PatchELF needs to be able to write longer NEEDED
> > strings in the dynamic section, which it may not always be successful at
> > (I think?).
>
> I can't claim to be a ELF specification guru but I have not
> encountered that longer NEEDED strings to be a cause for failure.
> The emacs example is a pretty good test case because the transitive
> closure of all NEEDED libraries is quite large, which all seem to be
> added successfully to the ELF header.
>
> The benefit to me seems:
> 1 - does not need a glibc patch for functionality (although for other
> libc such as musl it might in this case
> https://www.openwall.com/lists/musl/2021/12/21/1)
> 2 - understanding the dependencies of an application become simpler
> 3 - there are esoteric cases where in fact libraries might link to the
> wrong libraries (although they are correct at build time) given a
> RUNPATH/RPATH since there are subtleties with the inheritance model.
>
> I'm actually researching ways to improve (3) as well through
> mentorship with Tom Scogland by researching alternative ways to do
> linking:
> - RUNPATH per NEEDED
> - the ability to specify whether a RUNPATH should be inherited or not
> to downstream dependencies
>
> > Also, I wonder if the absolute file names in NEEDED interfere with uses
> > of $LD_LIBRARY_PATH (making it impossible to force use of another
> > libxyz.so than the one that would be found in RUNPATH.)
>
> Correct. For a system with reproducibility in mind this can perhaps be
> a desired feature.
> It is the current limitation of the proposal.
>
> In fact, Carlos brought up a great philosophical question:
> "Is linking to libraries through a content-addressable value allowed
> for LGPL software?"
> What if the linked address also forced the content-address by having
> it resolve to something on IPFS ?
>
> > Thoughts?
> >
> > Thanks for sharing!
> >
> > Ludo’.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Alternative solution to stat storm problem
  2022-01-09  3:05     ` Farid Zakaria
@ 2022-01-10 18:13       ` Tom Scogland
  2022-01-18 14:00         ` Ludovic Courtès
  0 siblings, 1 reply; 7+ messages in thread
From: Tom Scogland @ 2022-01-10 18:13 UTC (permalink / raw)
  To: Farid Zakaria; +Cc: Ludovic Courtès, guix-devel, Carlos Maltzahn

Hi Ludovic, thanks for your thoughts.

You’re right, the LD_LIBRARY_PATH will not change the loading order, but using LD_PRELOAD will by the same mechanism we’re using, pre-filling the cache with a library at the same soname.  As part of other explorations we’re doing around tweaking or wrapping the loader, it may be possible to get semantics like LD_LIBRARY_PATH other ways, but at the moment the goal is to make a program that will correctly load all the dependencies it would have loaded were it run in the same environment it was built in, despite LD_LIBRARY_PATH or RUNPATH in dependencies or similar.  Making a little tool that would override the same way LD_LIBRARY_PATH would have would be relatively straightforward though, would that be worth exploring do you think?

-Tom

On 8 Jan 2022, at 19:05, Farid Zakaria wrote:

> I did forget to mention the point of LD_LIBRARY_PATH, that you can
> still make use of LD_PRELOAD and I am also thinking about maybe using
> something like dlopen-resolver[1] to further expand the NEEDED
> section.
>
> [1] https://urldefense.us/v3/__https://github.com/Mic92/dlopen-resolver__;!!G2kpM7uM-TzIFchu!gFf3bOVCBsw_Ld35XMHl8Y0nYb0k7ikmOrOuo5SGPLCLqrCqx5qaP3giJkQTBeRD1A$
>
> On Sat, Jan 8, 2022 at 7:00 PM Farid Zakaria <fmzakari@ucsc.edu> wrote:
>>
>> Hi Ludovic,
>>
>> On Sat, Jan 8, 2022 at 1:22 PM Ludovic Courtès <ludo@gnu.org> wrote:
>>>
>>> Hi Farid,
>>>
>>> Farid Zakaria <fmzakari@ucsc.edu> skribis:
>>>
>>>> I have written a tool _shrinkwrap_ [2] that takes all transitive
>>>> dynamic shared object dependencies (only those listed in DT_NEEDED)
>>>> and turns them into an absolute path.
>>>>
>>>> This has the same result as caching the entries and avoids the
>>>> unnecessary failed attempts at trying each RUNPATH entry.
>>>>
>>>> Using the same demo application _emacs_ shows as much as well:
>>>
>>> Nice!  I think that’s another interesting way to address the problem.
>>>
>>> I guess the advantage is that you don’t need the ld.so patch.  The
>>> downside is that PatchELF needs to be able to write longer NEEDED
>>> strings in the dynamic section, which it may not always be successful at
>>> (I think?).
>>
>> I can't claim to be a ELF specification guru but I have not
>> encountered that longer NEEDED strings to be a cause for failure.
>> The emacs example is a pretty good test case because the transitive
>> closure of all NEEDED libraries is quite large, which all seem to be
>> added successfully to the ELF header.
>>
>> The benefit to me seems:
>> 1 - does not need a glibc patch for functionality (although for other
>> libc such as musl it might in this case
>> https://urldefense.us/v3/__https://www.openwall.com/lists/musl/2021/12/21/1__;!!G2kpM7uM-TzIFchu!gFf3bOVCBsw_Ld35XMHl8Y0nYb0k7ikmOrOuo5SGPLCLqrCqx5qaP3giJkSfjFvBcw$ )
>> 2 - understanding the dependencies of an application become simpler
>> 3 - there are esoteric cases where in fact libraries might link to the
>> wrong libraries (although they are correct at build time) given a
>> RUNPATH/RPATH since there are subtleties with the inheritance model.
>>
>> I'm actually researching ways to improve (3) as well through
>> mentorship with Tom Scogland by researching alternative ways to do
>> linking:
>> - RUNPATH per NEEDED
>> - the ability to specify whether a RUNPATH should be inherited or not
>> to downstream dependencies
>>
>>> Also, I wonder if the absolute file names in NEEDED interfere with uses
>>> of $LD_LIBRARY_PATH (making it impossible to force use of another
>>> libxyz.so than the one that would be found in RUNPATH.)
>>
>> Correct. For a system with reproducibility in mind this can perhaps be
>> a desired feature.
>> It is the current limitation of the proposal.
>>
>> In fact, Carlos brought up a great philosophical question:
>> "Is linking to libraries through a content-addressable value allowed
>> for LGPL software?"
>> What if the linked address also forced the content-address by having
>> it resolve to something on IPFS ?
>>
>>> Thoughts?
>>>
>>> Thanks for sharing!
>>>
>>> Ludo’.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Alternative solution to stat storm problem
  2022-01-09  3:00   ` Farid Zakaria
  2022-01-09  3:05     ` Farid Zakaria
@ 2022-01-18 13:56     ` Ludovic Courtès
  1 sibling, 0 replies; 7+ messages in thread
From: Ludovic Courtès @ 2022-01-18 13:56 UTC (permalink / raw)
  To: Farid Zakaria; +Cc: guix-devel, Scogland, Tom, Carlos Maltzahn

Hi Farid,

Farid Zakaria <fmzakari@ucsc.edu> skribis:

[...]

>> I guess the advantage is that you don’t need the ld.so patch.  The
>> downside is that PatchELF needs to be able to write longer NEEDED
>> strings in the dynamic section, which it may not always be successful at
>> (I think?).
>
> I can't claim to be a ELF specification guru but I have not
> encountered that longer NEEDED strings to be a cause for failure.
> The emacs example is a pretty good test case because the transitive
> closure of all NEEDED libraries is quite large, which all seem to be
> added successfully to the ELF header.

Well, we’d need a closer look, but I think PatchELF may need to enlarge
the relevant string table, and that may not always be possible.

> The benefit to me seems:
> 1 - does not need a glibc patch for functionality (although for other
> libc such as musl it might in this case
> https://www.openwall.com/lists/musl/2021/12/21/1)
> 2 - understanding the dependencies of an application become simpler
> 3 - there are esoteric cases where in fact libraries might link to the
> wrong libraries (although they are correct at build time) given a
> RUNPATH/RPATH since there are subtleties with the inheritance model.
>
> I'm actually researching ways to improve (3) as well through
> mentorship with Tom Scogland by researching alternative ways to do
> linking:
> - RUNPATH per NEEDED
> - the ability to specify whether a RUNPATH should be inherited or not
> to downstream dependencies

OK.

>> Also, I wonder if the absolute file names in NEEDED interfere with uses
>> of $LD_LIBRARY_PATH (making it impossible to force use of another
>> libxyz.so than the one that would be found in RUNPATH.)
>
> Correct. For a system with reproducibility in mind this can perhaps be
> a desired feature.
> It is the current limitation of the proposal.

I think it’s still useful to allow users to bypass normal mechanisms, be
it via LD_LIBRARY_PATH or LD_PRELOAD.

> In fact, Carlos brought up a great philosophical question:
> "Is linking to libraries through a content-addressable value allowed
> for LGPL software?"
> What if the linked address also forced the content-address by having
> it resolve to something on IPFS ?

Oh you mean it could be thought of a static linking, conceptually?
Good question.

Ludo’.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Alternative solution to stat storm problem
  2022-01-10 18:13       ` Tom Scogland
@ 2022-01-18 14:00         ` Ludovic Courtès
  0 siblings, 0 replies; 7+ messages in thread
From: Ludovic Courtès @ 2022-01-18 14:00 UTC (permalink / raw)
  To: Tom Scogland; +Cc: Farid Zakaria, guix-devel, Carlos Maltzahn

Hi,

Tom Scogland <scogland1@llnl.gov> skribis:

> You’re right, the LD_LIBRARY_PATH will not change the loading order,
> but using LD_PRELOAD will by the same mechanism we’re using,
> pre-filling the cache with a library at the same soname.  As part of
> other explorations we’re doing around tweaking or wrapping the loader,
> it may be possible to get semantics like LD_LIBRARY_PATH other ways,
> but at the moment the goal is to make a program that will correctly
> load all the dependencies it would have loaded were it run in the same
> environment it was built in, despite LD_LIBRARY_PATH or RUNPATH in
> dependencies or similar.  Making a little tool that would override the
> same way LD_LIBRARY_PATH would have would be relatively
> straightforward though, would that be worth exploring do you think?

Sure, why not.

My approach was: take the loader and its mechanisms as they exist, and
make the minimal changes needed to adapt it to the
one-directory-per-package layout, when they currently assume FHS.

The approach you describe is more about keeping the loader unchanged and
working around its FHS assumptions “from the outside”.  In that spirit,
a separate mechanism for LD_LIBRARY_PATH-kind of user overrides might
make sense.

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-01-18 14:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-03 20:05 Alternative solution to stat storm problem Farid Zakaria
2022-01-08 21:22 ` Ludovic Courtès
2022-01-09  3:00   ` Farid Zakaria
2022-01-09  3:05     ` Farid Zakaria
2022-01-10 18:13       ` Tom Scogland
2022-01-18 14:00         ` Ludovic Courtès
2022-01-18 13:56     ` Ludovic Courtès

Code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).