runc source code——bootstrap分析2

栏目: CSS · 发布时间: 5年前

内容简介：如前一篇博文介绍，在nsenter包中，实际进行了两次进程clone，分别有parent, child, init三个进程进行相应的交互处理后，最后留下init运行go runtime。这三个进程的关系是 parent –> child –> init，注意箭头只是clone关系，因为在clone时，clone flags参数为CLONE_PARENT | SIGCHLD所以这三个进程实际是具有相同ppid。

Why clone twice

如前一篇博文介绍，在nsenter包中，实际进行了两次进程clone，分别有parent, child, init三个进程进行相应的交互处理后，最后留下init运行go runtime。

这三个进程的关系是 parent –> child –> init，注意箭头只是clone关系，因为在clone时，clone flags参数为CLONE_PARENT | SIGCHLD

static  int  clone_parent(jmp_buf *env, int jmpval)  __attribute__  ((noinline));
static  int  clone_parent(jmp_buf *env, int jmpval)
{

    struct  clone_t ca  =  {
        .env =  env,
        .jmpval =  jmpval,
    };
    return  clone(child_func,  ca.stack_ptr,  CLONE_PARENT  |  SIGCHLD,  &ca);
}

所以这三个进程实际是具有相同ppid。

这里之所以要clone两次，一次是因为CLONE_NEWPID后，child进程才会进入该namespace，描述见下，这就是第二次clone生成init进程的原因, 当在child进程中设置namespace后，child进程的pid namespace并不起作用，需要在clone出init进程，使其与child进程同namespace，但是pid namespace生效。

**CLONE_NEWPID** (since Linux 2.6.24)
              If **CLONE_NEWPID** is set, then create the process in a new PID
              namespace.  If this flag is not set, then (as with [fork(2)](http://man7.org/linux/man-pages/man2/fork.2.html))
              the process is created in the same PID namespace as the
              calling process.  This flag is intended for the implementation
              of containers.

另一个clone的原因是，由于内核原因，一方面不能将USER namespace与其它namespace一起挂载，那样会导致namespace的所属不清楚的问题，另一方面对于rootless container应为没有CAP_SYS_ADMIN权限而无法挂载其它namespace（见下说明），所以首先需要先挂载user namespace，而有些操作系统挂载了user namespace后如果不做uid/gid map的话，后面操作也会报错，所以需要在挂载user namespace后先完成uid/gid map。而一旦先挂载了user namespace，那么配置必须要由原来的namespace来做（见下说明2），于是这里必须得有一次clone，也就是parent clone出child进程。

Starting in Linux 3.8, unprivileged processes can create user
       namespaces, and the other types of namespaces can be created with
       just the **CAP_SYS_ADMIN** capability in the caller's user namespace.

       If **CLONE_NEWUSER** is specified along with other **CLONE_NEW*** flags in a
       single [clone(2)](http://man7.org/linux/man-pages/man2/clone.2.html) or [unshare(2)](http://man7.org/linux/man-pages/man2/unshare.2.html) call, the user namespace is guaranteed
       to be created first, giving the child ([clone(2)](http://man7.org/linux/man-pages/man2/clone.2.html)) or caller
       ([unshare(2)](http://man7.org/linux/man-pages/man2/unshare.2.html)) privileges over the remaining namespaces created by the
       call.  Thus, it is possible for an unprivileged caller to specify
       this combination of flags

The _uid_map_ file exposes the mapping of user IDs from the user
       namespace of the process _pid_ to the user namespace of the process
       that opened _uid_map_ (but see a qualification to this point below).
       In other words, processes that are in different user namespaces will
       potentially see different values when reading from a particular
       _uid_map_ file, depending on the user ID mappings for the user
       namespaces of the reading processes.

我们知道把user namespace与其它namespace分开挂载的话，将会有很多种方案：

先clone user-ns，再clone others-ns
先ushare user-ns，再clone others-ns
先clone user-ns, 再unshare other-ns

第一种方案，必须要开启dump clone flags，这对于rootless container来讲将没法满足。第二种方案，unshare user-ns后，原来的进程由于进入了新namespace，将没有权限设置多个uid/gid map(uid, gid两个)。所以这里runc采用先clone在unshare的方式. 然而事实并不是就调用一下clone然后再unshare那么简单，因为要考虑使用已有namespace的问题。最后的逻辑是先clone，然后挂载已有namespace（见下说明），接着进入user namesapce，然后再unshare，注释如下：

//挂载已有的namespaces
if (config.namespaces)

    join_namespaces(config.namespaces);

//先new user namespace
if (config.cloneflags &  CLONE_NEWUSER)  {

    if (unshare(CLONE_NEWUSER)  < 0)

        bail("failed to unshare user namespace");

    config.cloneflags &=  ~CLONE_NEWUSER;


if (config.namespaces)  {

    if (prctl(PR_SET_DUMPABLE, 1, 0, 0, 0)  < 0)

        bail("failed  to  set  process  as  dumpable");

}

//调用parent 配置uid/gid映射
s  =  SYNC_USERMAP_PLS;

if (write(syncfd,  &s, sizeof(s))  != sizeof(s))

    bail("failed  to  sync  with  parent:  write(SYNC_USERMAP_PLS)");

/* ...  wait  for  mapping  ... */
if (read(syncfd,  &s, sizeof(s))  != sizeof(s))

    bail("failed to sync with parent: read(SYNC_USERMAP_ACK)");
.........


//unshare namespaces
if (unshare(config.cloneflags &  ~CLONE_NEWCGROUP)  < 0)
    bail("failed  to  unshare  namespaces");

说明：这里join_namespaces是加入runc配置的已有namespace，这个已有namespace是在bundle的config.json中的linux»namespaces配置, 如果这里配置了path，则是使用已有namespace，没有配置则是进行cloneflags进行ushare。这部分处理逻辑在container初始化bootstrapdata时进行了判断 https://github.com/opencontainers/runc/blob/v1.0.0-rc8/libcontainer/container_linux.go#L500

"namespaces": [
                        {
                                "type": "pid"
                        },
                        {
                                "type": "network"
                        },
                        {
                                "type": "ipc"
                        },
                        {
                                "type": "uts"
                        },
                        {
                                "type": "mount"
                        }
                ],

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

引人入胜

Lynda Felder / 李婧 / 机械工业出版社华章公司 / 2012-9 / 59.00元

在这个信息泛滥、人人焦躁的时代，用户对待网页上密密麻麻的信息如同速食快餐一般，来不及咀嚼和回味就直接从眼前一闪而过了。用户是否能喜欢你的网站内容，往往取决于他瞬间的感受。我们如何才能使网站引人入胜、让用户看一眼就能迷上并流连忘返？本书给出了切实可行的解决方案，系统总结了创建优秀网站内容的策略、方法与最佳实践，内容丰富而生动。本书作者极富创作魅力，将所有影响网站内容创作的问题进行逐一讲解和分......一起来看看《引人入胜》这本书的介绍吧!

码农工具

runc source code——bootstrap分析2

Why clone twice

引人入胜

URL 编码/解码

Markdown 在线编辑器

HEX HSV 转换工具