From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hartmut Goebel Subject: PYTHONPATH issue analysis - part 3 (was: PYTHONPATH woes) Date: Sun, 11 Mar 2018 22:47:17 +0100 Message-ID: References: <87371tqbyb.fsf@elephly.net> <20180223165953.GA6088@thebird.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:42922) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ev8oR-0008Pn-Az for guix-devel@gnu.org; Sun, 11 Mar 2018 17:47:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ev8oN-0007Y4-BK for guix-devel@gnu.org; Sun, 11 Mar 2018 17:47:27 -0400 Received: from mail-out.m-online.net ([2001:a60:0:28:0:1:25:1]:47464) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ev8oN-0007RM-0m for guix-devel@gnu.org; Sun, 11 Mar 2018 17:47:23 -0400 In-Reply-To: Content-Language: en-US List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: Pjotr Prins Cc: guix-devel@gnu.org Hi, here is my third part of the analysis: Result =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D We can avoid all of the problems related to how Guix is using PYTHONPATH quite simple. This will work for virtual environments, too. Preliminary Proposal =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D To be able to install different minor versions of Python in the same profile, any environment variable should contain the minor version, too. E.g. =E2=80=A6-3.5. Option 2 (GUIX-PYTHONHOME-X.Y) should be implemented since it is simple. If we can get option 3 (stop resolving sysmlinks at the correct iteration) to work, this might be a better solution. Rational =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D 1. Is setting GUIX-PYTHON-X.Y-SITE-PACKAGES enough? This should work for most cases, but might break Python appications expecting site-packages to be below sys.prefix. Thus setting (GUIX-)PYTHONHOME-X.Y is a better solution. 2. Would GUIX-PYTHONHOME-2.7, =E2=80=A6-3.4, =E2=80=A6-3.5 work? Yes, this would work, but still an environment variable would be required. 3. Can we get without any environment variable? Yes, if we manage to to resolving symlinks at the correct iteration. This might be complicated to achieve. 4. How does Path-handling in Python's start-up sequence work? See detailed analysis below. 1. How could GUIX-PYTHON-X.Y-SITE-PACKAGES be implemented? =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Given the analysis below, it would be possible to patch site.py to make it use these environment variables. I did not look at the details yet, but since the site-package paths are only set by site.py, this should be not much of an issue. To be able to install different minor versions of Python in the same profile, the variables should contain the minor version, too. E.g. =E2=80= =A6-3.5. Drawbacks: - sys.prefix and sys.exec_prefix would still point to the store, not to the profile. This might break Python appications expecting site-packages to be below sys.prefix. 2. How could GUIX-PYTHONHOME-X.Y be implemented? =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Given the analysis below, it should be okay to implement GUIX-PYTHONHOME-= X.Y like this: - In Py_GetPythonHome() (Python/pythonrun.c), after checking for PYTHONHOME, check for GUIX-PYTHONHOME-X.Y. This will effect below step 2 for non-venvs and step 3 for venvs. This should be save for virtual environments, too, since pyvenv.cfg is searched based on argv0_path. This should be save for stacked virtual environments, too, since pyvenv.cfg will still point to the prior venv and sys.prefix and sys.exec_prefix will be set correctly in site.py. - Implement a "search-path" GUIX-PYTHONHOME-X.Y - To be able to install different minor versions of Python in the same profile, the variables should contain the minor version, too. E.g. =E2=80=A6-3.5. Drawbacks: - We need to ensure GUIX-PYTHONHOME-X.Y is a single path, not a list of paths. Or we split the variable in Py_GetPythonHome(). - Requires GUIX-PYTHONHOME-X.Y to be set in the respective environment. 3. How to avoid GUIX-PYTHONHOME[23]? =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D We could avoid GUIX-PYTHONHOME[23] if we stop resolving the symlinks at the correct point in iteration. Something like this: # resolve the symlink last =3D progpath while 1: if (last.startswith('/gnu/store/') and \ last[43:52] =3D=3D '-profile/'): # links to a profile break try: next =3D os.readlink(last) except OSError: # not a link anymore break if not next.startswith(SEP): # Interpret relative to last next =3D os.path.join(os.path.dirname(last), next) if next =3D=3D GUIX_PYTHON_OUTPUT: # out "/bin/python" compile-ti= me # "next" points to the python binary of the guix-output break last =3D next argv0_path =3D last Drawbacks: - More complicated patch. - More comparison within a look, this will slow down start-up a bit. Open questions: - Which are the correct paths to check to stop iteration? - How to handle the "pythonX" -> "pythonX.Y" link? - How to handle "python-wrapper", which links python -> python3 4. Path-handling in Python's start-up sequence =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D No venv ------------ In getpath.c: 0. "progpath" will be set based on argv[0]. This is expected to be the fully quallified path of the argv[0]. 1. argv0_path is search based on progpath and all symlinks resolved. 2. prefix and exec_prefix are searched based on argv0_path *) **) 3. sys.path is set based on prefix and exec_prefix. In sysmodule.c: 4. sys.executable is set to "progpath" (see step 0 above). 5. sys.prefix, sys.base_prefix, sys.exec_prefix and sys.base_exec_prefix are set to prefix resp. exec_prefix evaluated in step 3 above. In site.py; 6. When site.py is loaded, system site-packages are added from sys.prefix and sys.exec_prefix. *) There are two special cases, guix does not need to handle: If an (embedding) application did call (Py_SetPythonHome) or "PYTHONHOME" was set, this overrules argv0_path. But this is where GUIX-PYTHONHOME-X.Y could step in. **) If some "landmark" file is not found, the build-time PREFIX resp. EXEC_PREFIX is used. For guix this should not happen. For a venv: -------------------- In getpath.c: 0. "progpath" will be set based on argv[0]. This is expected to be the fully qualified path of the argv[0]. Try this in a venv:: $ lm -s /tmp/venv/bin/python /tmp/qqq $ /tmp/qqq -S -c 'import sys; print(sys.executable)' /tmp/qqq 1. argv0_path is search based on progpath and all symlinks resolved, like above. 2. If pyvenf.cfg exists in argv0_path's directory or one level above, argv0_path is taken from the "home" entry in this file. 3. prefix and exec_prefix are searched based on argv0_path (resp. PYTHONHAOME, GUIX-PYTHONHOME-X.Y). Notably both are now pointing to the "home" - not to the virtual env. 4. sys.path is set based on prefix and exec_prefix. Notably this is adding the standard library based on "home". Try this in a venv (here on a foreign distribution):: $ /tmp/venv/bin/python -S -c 'import sys; print(sys.path)' ['', '/usr/lib64/python35.zip', '/usr/lib64/python3.5', '/usr/lib64/python3.5/plat-linux', '/usr/lib64/python3.5/lib-dynload'] In sysmodule.c: 5. sys.executable is set to "progpath" (see step 0 above). 6. sys.prefix, sys.base_prefix, sys.exec_prefix and sys.base_exec_prefix are set to prefix resp. exec_prefix evaluated in step 3 above. All of these are pointing to "home"! sys.prefix and sys.exec_prefix will be adjusted in site.py. Try this in a venv (here on a foreign distribution):: $ /tmp/venv/bin/python -S -c 'import sys; print(sys.prefix)' /usr In site.py: 7. If pyvenf.cfg exists in the executable's directory or one level above, site.py assumes a virtuel environment and will execute the following steps. 8. When site.py is loaded, sys.prefix and sys.exec_prefix will be set based on sys.executable - which is in the virtual env. 9. sys.base_prefix and sys.base_exec_prefix will not be changes and thus always be the "real" prefixes of the Python installation - see step 3 above. 10. sys._home (not documented) will be set to the "home" entry from pyvenv.cfg, if the entry exisits. -> sys.base_prefix and sys.base_exec_prefix should should point to GUIX_PROFILE pyvenv.cfg - as of Python 3.5 venv --------------------------------------- The "home" entry in pyvenv.cfg is based on sys.executable. This means if you are stacking venvs, the "home" entry is pointing to the prior venv, not to the original prefix. Nevertheless, sys.path is based on the *orginal* prefix, not the "home": At the beginning of site.py, the list of prefixes to be searched is set to sys.prefix, which - see step 6 above - is the same as sys.base_prefix and thus the orginal prefix, not the "home". site.py will change sys.prefix later, but not the presets in the list of prefixes to be searched. >From Modules/getpath.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D * Before any searches are done, the location of the executable is * determined. If argv[0] has one or more slashes in it, it is used * unchanged. Otherwise, it must have been invoked from the shell's path= , * so we search $PATH for the named executable and use that. If the * executable was not found on $PATH (or there was no $PATH environment * variable), the original argv[0] string is used. * * Next, the executable location is examined to see if it is a symbolic * link. If so, the link is chased (correctly interpreting a relative * pathname if one is found) and the directory of the link target is used= . * * Finally, argv0_path is set to the directory containing the executable * (i.e. the last component is stripped).