内容简介:The other day, while building aI’m using Nix to manage my development environment, butUsing this setup, I had very little trouble getting the project to build. I had to
Trying out the Nix development experience
The other day, while building a scientific project
to which I’m a contributor, I
ran into a nasty version conflict between two system libraries. In a fit of
pique, I decided to learn enough about Nix
to be able to set up a reproducible,
tightly controlled local build. It’s done now, and overall I’m very happy with
the tooling and setup. I’m using direnv
to tightly integrate my normal shell
with Nix’s nix-shell
feature, and for the most part everything feels seamless.
It is extremely refreshing to see cmake
report that it has found a plethora of
binaries and libraries, content-hashed and installed in neat little rows under /nix/store
.
I’m using Nix to manage my development environment, but not
to build the
project itself. Nix ensures that the project dependencies are installed and
discoverable by the compiler and linker. Building the project is done with
CMake, set up for cmake
to find the nix-installed libraries. Nix achieves this
by wrapping the C compiler
with its own shell script and injecting the paths to
libraries and binaries via environment variables. There’s very little to do to
make cmake
just work, beyond declaring that the packages you want are buildInputs
. The first version of my shell.nix
file looked like this:
# file shell.nix { pkgs ? import <nixpkgs> {} }: pkgs.mkShell { buildInputs = with pkgs; [ cmake (callPackage nix/petsc.nix {}) metis hdf5 openmpi (python38.withPackages (packages: [ packages.numpy ])) ]; }
Using this setup, I had very little trouble getting the project to build. I had to override the default PETSc derivation to compile with METIS and OpenMPI support, which was not too hard:
# file nix/petsc.nix { petsc , blas , gfortran , lapack , python , metis , openmpi }: petsc.overrideAttrs (oldAttrs: rec { nativeBuildInputs = [ blas gfortran gfortran.cc.lib lapack python openmpi metis ]; preConfigure = '' export FC="${gfortran}/bin/gfortran" F77="${gfortran}/bin/gfortran" patchShebangs . configureFlagsArray=( $configureFlagsArray "--with-mpi-dir=${openmpi}" "--with-metis=${metis}" "--with-blas-lib=[${blas}/lib/libblas.so,${gfortran.cc.lib}/lib/libgfortran.a]" "--with-lapack-lib=[${lapack}/lib/liblapack.so,${gfortran.cc.lib}/lib/libgfortran.a]" ) ''; })
This Nix file returns a function which is invoked in shell.nix
using callPackage
function. petsc.overrideAttrs
is a neat way to override the
attributes of a derivation created with stdenv.mkDerivation
. Building PETSc
with MPI and METIS support is as simple as passing in a different set of
arguments to the configure
script.
Figuring out how to do all of this was fun. I mostly referred to the Nix “Pills” , which are a great progression through the Nix tool and language.
With these Nix files, I was able to execute cmake .. && make
successfully.
Getting the project to run
was another story. The final binary failed
immediately with a dynamic loading error:
➜ bin/warpxm dyld: Library not loaded: /private/tmp/nix-build-petsc-3.13.2.drv-0/petsc-3.13.2/arch-darwin-c-debug/lib/libpetsc.3.13.dylib Referenced from: /Users/jack/src/warpxm/build/bin/warpxm Reason: image not found
The binary was trying to load a dynamic lib from one of the temporary directories
that Nix created in the process of building PETSc. Of course this failed: by the
time I invoked bin/warpxm
, that directory had been cleaned up. Instead of a
file under /private/tmp
, the binary should have linked to the result of the petsc
derivation in the Nix store, under /nix/store
. At some point, it
seemed, an environment variable was incorrectly set to this intermediate
directory. To figure out where, I would have to learn a lot more about linking
on OS X than I ever expected.
Whither the linker?
First I checked the compiler and linker flags that are inserted by Nix’s
compiler wrapper. These come in via NIX_CFLAGS_COMPILE
and NIX_LDFLAGS
. When
you’re working with nix-shell
and direnv
, all of the environment variables
from your derivations are injected into your shell. It’s a simple matter of echoing
them out:
➜ echo $NIX_CFLAGS_COMPILE ... -isystem /nix/store/w23r8kplmfx2xc111cpvmdjwmkwy6ip3-petsc-3.13.2/include ... ➜ echo $NIX_LDFLAGS ... -L/nix/store/w23r8kplmfx2xc111cpvmdjwmkwy6ip3-petsc-3.13.2/lib ...
These look fine! Invoking cmake
and make
in this shell ought to pull in the
correct library.
Then I remembered that this project uses pkg-config
to find and pull together
the linked libraries. Frankly, I don’t understand pkg-config
very well, but I
do know that in this project it is invoked from inside of cmake
. It searches for
libraries according to its own rules, and it runs after
Nix has done its
job setting everything up. Therefore, it circumvents the compiler and linker
flags that we just checked.
I happened to have pkg-config
installed from before setting up this Nix
environment. Therefore, cmake
was able to invoke the system pkg-config
from
my user PATH
. Perhaps the system version of pkg-config
was somehow finding
the wrong library? Indeed, echo $PKG_CONFIG_PATH
confirmed that it was
searching a directory under my $HOME
. I thought it possible that some wires
got crossed while I was adding dependencies to my Nix derivation one at a time:
configuring pkg-config
appropriately might help.
I referred once again to the Nix wiki page on C projects, which also has a
section
on using pkg-config
. It seems that including the pkg-config
derivation as a nativeBuildInput
will let packages like petsc
append their
output paths to the PKG_CONFIG_PATH
environment variable. I did so:
pkgs.mkShell { buildInputs = with pkgs; [ ... ]; nativeBuildInputs = with pkgs; [ pkg-config ]; }
but it didn’t fix the problem. I would have to go deeper and track down where the bad library was being pulled in.
Digging into the cmake
documentation and the project’s .cmake
files led me
to insert a trio of print statements:
find_package(PkgConfig REQUIRED) pkg_check_modules(PETSC PETSc REQUIRED) link_directories(${PETSC_LIBRARY_DIRS}) + message("petsc libraries: ${PETSC_LIBRARIES}") + message("petsc library dirs: ${PETSC_LIBRARY_DIRS}") + message("petsc link libraries: ${PETSC_LINK_LIBRARIES}") list(APPEND WARPXM_LINK_TARGETS ${PETSC_LIBRARIES})
These printed out three lines in my cmake
output:
petsc libraries: petsc petsc library dirs: /nix/store/w23r8kplmfx2xc111cpvmdjwmkwy6ip3-petsc-3.13.2/lib petsc link libraries: /nix/store/w23r8kplmfx2xc111cpvmdjwmkwy6ip3-petsc-3.13.2/lib/libpetsc.dylib
The second two look good. But the first, just the library name petsc
, was a little too
implicit for comfort. It was precisely this variable that was being appended
to the link targets list. At compile
time, it would be up to the linker to find
the library petsc
, and I wasn’t sure where it would look. Safer to use the
absolute path to the .dylib
, like so:
- list(APPEND WARPXM_LINK_TARGETS ${PETSC_LIBRARIES}) + list(APPEND WARPXM_LINK_TARGETS ${PETSC_LINK_LIBRARIES})
Changing the link target to the absolute path eased my mind only for the duration of
the next cmake .. && make
cycle. Surely there was no way the linker could
screw up now. No arcane library search involved, just an absolute path, which
couldn’t possibly be misinterpreted…
➜ bin/warpxm dyld: Library not loaded: /private/tmp/nix-build-petsc-3.13.2.drv-0/petsc-3.13.2/arch-darwin-c-debug/lib/libpetsc.3.13.dylib Referenced from: /Users/jack/src/warpxm/build/bin/warpxm Reason: image not found
Damn it!
install_name and other depravities
At this point I was absolutely flummoxed. With every fix I attempted, I
grepped vainly for the offending /private/tmp
path in my build directory, and come up
empty-handed. I tracked down the final, irrevocable link options passed to the
compiler, tucked away in a link.txt
file in the build tree. They showed
incontrovertibly that my binary was being linked to the correct library:
➜ cat build/src/CMakeFiles/warpxm.dir/link.txt /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -O3 -DNDEBUG -isysroot ... -L/nix/store/31d3hng4sclxi3sz8g3zi3yqmychj2kg-petsc-3.13.2/lib ...
I had proved nearly to my satisfaction that CMake was doing the right thing with this library, and I was completely out of ideas. Finally, a very lucky google search led me to the section of the Nix manual describing issues specific to the Darwin (MacOS) platform. It states:
On Darwin, libraries are linked using absolute paths, libraries are resolved by their install_name at link time. Sometimes packages won’t set this correctly causing the library lookups to fail at runtime. This can be fixed by adding extra linker flags or by running install_name_tool -id during the fixupPhase.
This is a very matter-of-fact way of stating something that, when I understood it, flabbergasted me. To the best of my understanding, here’s what happens on MacOS:
-
My source code has an include directive,
include<petsc.h>
or something like that, which creates a binary interface to be satisfied by the linker. - At link time, we pass the list of absolute paths to libraries, and the linker finds the one that matches the interface.
- The linker then saves the install_name of the library it found in the binary’s load section.
-
At run time, the binary (actually, the MacOS
dyld
system) loads the library. The install_name is all it has, so it looks there.
I’ve certainly gotten some aspect of this wrong, so I would definitely appreciate hearing from someone who understands it better than me!
In any case, this find pointed me to the concept of the install_name, so I had something to go on. More searching led to a helpful blog post describing exactly the issue that I was facing. It also described how to check the install_name of the library:
➜ otool -D /nix/store/31d3hng4sclxi3sz8g3zi3yqmychj2kg-petsc-3.13.2/lib/libpetsc.dylib /nix/store/31d3hng4sclxi3sz8g3zi3yqmychj2kg-petsc-3.13.2/lib/libpetsc.dylib: /private/tmp/nix-build-petsc-3.13.2.drv-0/petsc-3.13.2/arch-darwin-c-debug/lib/libpetsc.3.13.dylib
Gotcha.
The Nix manual states that “some packages won’t set this correctly”, and points
to the fix, which is to use install_name_tool
to change the install_name of
the built library. Is the PETSc derivation
on nixpkgs doing this correctly? I
saw that it was doing something
with install_name_tool
:
prePatch = '' substituteInPlace configure \ --replace /bin/sh /usr/bin/python '' + stdenv.lib.optionalString stdenv.isDarwin '' substituteInPlace config/install.py \ --replace /usr/bin/install_name_tool install_name_tool '';
This directive replaces the appearances of the string /usr/bin/install_name_tool
with just install_name_tool
. The reason that Nix
packages do this is to ensure that builds rely on the Nix-built tools, which are
provided in the build shell’s PATH
, and not on binaries in system directories
like /usr/bin
.
The PR that introduced this substitution indicates that it fixed a build on
Darwin, so there must be some invocation of /usr/bin/install_name_tool
in
PETSc. Searching for that in the PETSc repo leads to this line
, which is doing
exactly what the Mark’s Logs post
on install_name instructed: it changes the
install_name to the absolute path of the library in its installation directory,
using install_name_tool -id
.
if os.path.splitext(dst)[1] == '.dylib' and os.path.isfile('/usr/bin/install_name_tool'): [output,err,flg] = self.executeShellCommand("otool -D "+src) oldname = output[output.find("\n")+1:] installName = oldname.replace(os.path.realpath(self.archDir), self.installDir) self.executeShellCommand('/usr/bin/install_name_tool -id ' + installName + ' ' + dst)
According to this, the install_name of the library should have been repaired by
PETSc when the library was built! Except… notice something? The second
condition in the if
statement. After the PETSc derivation runs its prePatch
step, that condition will become and os.path.isfile('install_name_tool')
. That
will certainly fail: install_name_tool
is not going to be a file in the
directory where configure
is running! The patched configure
script will
silently skip this step, leaving the install_name of the library as the
temporary directory where it was built!
Luckily, the solution to this problem is not too hard. Instead of the name of a
program on the PATH
, we should pass the absolute path to the program we want
to run. This can be done by overriding the prePatch
step like so:
prePatch = '' substituteInPlace configure \ --replace /bin/sh /usr/bin/python '' + stdenv.lib.optionalString stdenv.isDarwin '' substituteInPlace config/install.py \ --replace /usr/bin/install_name_tool ${darwin.cctools}/bin/install_name_tool '';
The Nix variable ${darwin.cctools}
will expand to the full path of the
built darwin.cctools
derivation, which is a directory under /nix/store
. So
the patched if
statement inside of PETSc’s configure.py
becomes
if os.path.splitext(dst)[1] == '.dylib' and os.path.isfile('/nix/store/1dgdim74d05ypll85vslm8i7kgzq78vw-cctools-port/bin/install_name_tool'): # use install_name_tool
and the install_name of the resulting library will be correct. We can check that
with otool -D
again:
➜ otool -D /nix/store/w23r8kplmfx2xc111cpvmdjwmkwy6ip3-petsc-3.13.2/lib/libpetsc.dylib /nix/store/w23r8kplmfx2xc111cpvmdjwmkwy6ip3-petsc-3.13.2/lib/libpetsc.dylib: /nix/store/w23r8kplmfx2xc111cpvmdjwmkwy6ip3-petsc-3.13.2/lib/libpetsc.3.13.dylib
Looking much better! And since the error was in a dynamically loaded library, we don’t even have to recompile to check that it’s working:
➜ build git:(master) ✗ DYLD_PRINT_LIBRARIES=1 bin/warpxm dyld: loaded: /Users/jack/src/warpxm/build/bin/warpxm dyld: loaded: /nix/store/ni26aaiira47ak60vks1qv4apbkwbg1d-hdf5-1.10.6/lib/libhdf5.103.dylib dyld: loaded: /nix/store/acsjaw04hrf4rv8gizai7gx1ibq92ksa-zlib-1.2.11/lib/libz.dylib dyld: loaded: /nix/store/z4f1bq363m0ydmbyncfi2srij8vlsx32-Libsystem-osx-10.12.6/lib/libSystem.B.dylib dyld: loaded: /nix/store/w23r8kplmfx2xc111cpvmdjwmkwy6ip3-petsc-3.13.2/lib/libpetsc.3.13.dylib ...
That’s more like it.
Epilogue
I spent most of my time debugging this problem without a working
understanding of the different build phases. It should have been clear
to me that neither the CMake nor the pkg-config
setups could be the cause,
because at the time that I was invoking cmake
, the offending /private/tmp
directory had long vanished. If I had focused exclusively on the PETSc
derivation provided by Nix, I might have homed in on the install_name_tool
patch a little sooner. As it went, I was lucky to find the note in the Nix
manual about Darwin-specific linker problems.
As for Nix, I will absolutely be using more of it. What’s remarkable is how little impact it can have. I am able to use it to manage my environment for this project without impacting the way the other developers manage their environments. Of course, if they asked, I would advocate that they try out Nix, but it’s nice for everyone to be able to do it on their own time.
I’m also looking forward to having my first contribution to Nix!
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。