Hi Hartmut, Awesome analysis! Thank you for taking point on this. I will offer some feedback. I hope it is useful. The short version is: I think Python should let us explicitly tell it where its system site directory is. If Python provided such a feature, then I think we could use it and avoid all these problems. I think this would be better than modifying the heuristics that Python uses for finding its system site during start-up (although I think that is a good back-up plan), since those heuristics are complicated and difficult to control. It would just be simpler if we could explicitly tell Python where its site directory is, instead of indirectly arranging for Python to find its site directory via its module-lookup Rube-Goldberg machine. Hartmut Goebel writes: > This python interpreter does not find the site-packages in GUIX_PROFILE > since site-packages are search relative to "sys.base_prefix" (which is > the same as "sys.prefix" except in virtual environments). > "sys.base_prefix" is determined based on the executable's path (argv[0]) > by resolving all symlinks. I am familiar with this problem. Any time you want to deploy Python and its libraries by building up a symlink tree, and you put Python in a part of the file system that lives far away from the libraries themselves, Python will punish you cruelly with this behavior. It is no fun at all. :-( You always have to come up with silly hacks to work around it, and those hacks don't work generally in every case. Question: Why does Python insist on canonicalizing its executable path? It always seemed to me like if Python just used the original path, these problems would not occur. People who use symlink trees to deploy Python would be happy. Perhaps I am missing some information. What is the intent behind Python's decision to canonicalize the executable path? What problems occur if Python doesn't do that? > The python interpreter assumes "site-packages" to be relative to "where > python is installed" - called "sys.base_prefix" (which is the same as > "sys.prefix" except in virtual environments). "sys.base_prefix" is > determined based on the executable's path (argv[0]) by resolving all > symlinks. For Guix this means: "sys.base_prefix" will always point to > /gnu/store/…-python-X.Y, not to GUIX_PROFILE. Thus the site-packages > installed into the guix profile will not be found. Yes. This is a problem. As you know, this heuristic fails spectacularly when you try to deploy Python in a symlink tree. Question: Why does Python not supply a way to "inject" the system site directory? In Guix-deployed systems, we are the masters of reality. We control ALL the paths. We can tell Python exactly where its "system site" is - we can build a symlink tree of its system site in the store and then tell Python to use that site specifically. For example, if Python would let us specify this path via a PYTHON_SYSTEM_SITE environment variable, then I think it would solve many (all?) of our problems. Perhaps this is similar to what you are suggesting regarding GUIX_PYTHON_X.Y_SITE_PACKAGES and GUIX_PYTHONHOME_X.Y. > This is why we currently (mis-) use PYTHONPATH: To make the > site-packages installed into the guix profile available. I agree that this is a mis-use. People do it because Python doesn't provide any better way. And then people find out about all its terrible down-sides, like for example the fact that .pth files will not be processed if they appear on the PYTHONPATH. And then they do stuff like hack site.py to walk the PYTHONPATH and evaluate all the .pth files, which is gross but sort of works. Just thinking about the pain I have experienced with this stuff makes my blood boil. > no. 2 > suggests using a mechanism already implemented in python: Setting > "PYTHONHOME" will make the interpreter to use this as "sys.base_prefix" > unconditionally. Again there is only one PYTHONHOME variable for all > versions of python (designed by upstream). We could work around this > easily (while keeping upstream compatibility) by using > GUIX-PYTHONHOME-X.Y, to be evaluated just after PYTHONHOME. Are there legitimate use cases where a user wants to set their own PYTHONHOME? If so, would our use of PYTHONHOME prevent them from doing that? If so, that seems bad. In the past, I have used PYTHONUSERBASE (or maybe it was PYTHONUSERSITE, I can't remember exactly which) to make Python find libraries in a symlink tree. However, because that is intended for users to use, I don't think it's a good solution for us here. If we co-opt these environment variables, then users would not be able to use them. > The drawback is: This is implemented using an environment variable, > which might not give the expected results in all cases. E.g. running > /gnu/store/…-profile/bin/python will not load the site-packages of that > profile. Also there might be issues implementing virtual environments. > (Thinking about this, I'm quite sure there will. Ouch!) I wouldn't be surprised if that's true, but right now, I can't think of any specific virtualenv-related problems that would occur by using PYTHONHOME. > no.3 > suggests changing the way the python interpreter is resolving symlinks > when searching for "sys.base_prefix". The idea is to stop "at the profile". > > The hard part of this is to determine "at the profile". Also this needs > a larger patch. But if we manage to implement this, it would be perfect. > I could contribute a draft for this implemented in Python. The > C-implementation needs to be done by some C programmer. This seems a little tricky, mainly because it's going to rely again on heuristics that may not always be accurate. As I mentioned above, in Guix we are the masters of reality, so why can't we just tell Python exactly where its system site path is? If Python needs to be taught how to be informed of such things, perhaps that is the patch we should write: a patch that enables us to tell Python exactly where its system site directory will be found. > Which way should we go? I think we should figure out a way to tell Python EXACTLY where its system site directory is. If that isn't viable, then I think the next best thing will be to adjust the site-finding heuristics (your proposal No. 3). Hartmut Goebel writes: > As it stands now, the venv-hack is not a valid solution. It may be the basis > for another solution, tough. I agree. We need a solution that allows users to use virtualenv the way they would normally on any other foreign distro, if they want to. > 1. How could GUIX-PYTHON-X.Y-SITE-PACKAGES be implemented? > ============================================================= > > [...] > > 2. How could GUIX-PYTHONHOME-X.Y be implemented? > ================================================= How do these two methods (GUIX-PYTHON-X.Y-SITE-PACKAGES vs. GUIX-PYTHONHOME-X.Y) differ? They seem to serve basically the same purpose. > 3. How to avoid GUIX-PYTHONHOME[23]? > ========================================= > > We could avoid GUIX-PYTHONHOME[23] if we stop resolving the symlinks at > the correct point in iteration. > > [...] > > Drawbacks: > > - More complicated patch. > > - More comparison within a look, this will slow down start-up a bit. > > Open questions: > > - Which are the correct paths to check to stop iteration? > - How to handle the "pythonX" -> "pythonX.Y" link? > - How to handle "python-wrapper", which links python -> python3 Instead of modifying Python's heuristics for finding its site, it'd be better if Python just exposed a way for us to explicitly tell it where its site directory is. However, if we really want to modify the heuristics, I can think of some possible ideas for how to do it: * Don't canonicalize the path in the first place. * Stop just before the first path that is in the store. * Stop at the first path that is in the store. * Stop at a path that matches a special pattern that we control, like "guix-python-site" or something. We could create > 4. Path-handling in Python's start-up sequence As you've shown, the way Python handles paths when it starts up is quite complicated. This is another reason why I would prefer not to change the heuristics, but instead to expose a way for us to explicitly tell Python where its site is. -- Chris