linux-kernel - Re: [PATCH RESEND] init/initramfs.c: make initramfs support pivot

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADxym3akKEurTTGiBxYZiXKVWUbOg=a8UeuRsJ07tT+DixA8mw@mail.gmail.com>
Date:   Tue, 25 May 2021 08:55:48 +0800
From:   Menglong Dong <menglong8.dong@...il.com>
To:     Luis Chamberlain <mcgrof@...nel.org>
Cc:     Jan Kara <jack@...e.cz>, Jens Axboe <axboe@...nel.dk>,
        hare@...e.de, gregkh@...uxfoundation.org, tj@...nel.org,
        Menglong Dong <dong.menglong@....com.cn>, song@...nel.org,
        NeilBrown <neilb@...e.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        wangkefeng.wang@...wei.com, f.fainelli@...il.com, arnd@...db.de,
        Barret Rhoden <brho@...gle.com>,
        Rasmus Villemoes <linux@...musvillemoes.dk>,
        mhiramat@...nel.org, Steven Rostedt <rostedt@...dmis.org>,
        Kees Cook <keescook@...omium.org>, vbabka@...e.cz,
        Alexander Potapenko <glider@...gle.com>, pmladek@...e.com,
        Chris Down <chris@...isdown.name>, ebiederm@...ssion.com,
        jojing64@...il.com, LKML <linux-kernel@...r.kernel.org>,
        palmerdabbelt@...gle.com, linux-fsdevel@...r.kernel.org,
        Alexander Viro <viro@...iv.linux.org.uk>
Subject: Re: [PATCH RESEND] init/initramfs.c: make initramfs support pivot_root

Hello!

On Tue, May 25, 2021 at 6:58 AM Luis Chamberlain <mcgrof@...nel.org> wrote:
>
> > In fact, the network namespace is just a case for the problem that the
> > 'mount leak' caused. And this kind modification is not friendly to
> > current docker users, it makes great changes to the usage of docker.
>
> You mean an upgrade of docker? If so... that does not seem like a
> definitive reason to do something new in the kernel *always*.

No, I means that in this solution, users have to create all the
containers first,
and create the network namespace for them later.

>
> However, if you introduce it as a kconfig option so that users
> who want to use this new feature can enable it, and then use it,
> the its sold as a new feature.
>
> Should this always be enabled, or done this way? Should we never have
> the option to revert back to the old behaviour? If not, why not?
>

This change seems transparent to users, which don't change the behavior
of initramfs. However, it seems more reasonable to make it a kconfig option.
I'll do it in the v2 of the three patches I sended.

>
> I don't see you using any context directly, where are you specifying the
> context directly?
>
> > so it is not needed any more
> > and I just removed it in init_rootfs().
> >
> > The initialization of 'user_root' in init_rootfs() is used in:
> > do_populate_rootfs -> mount_user_root, which set the file system(
> > ramfs or tmpfs) of the second mount.
> >
> > Seems it's not suitable to place it in init_rootfs()......
>
> OK I think I just need to understand how you added the context of the
> first mount explicitly now and where, as I don't see it.
>

......

>
> What do you mean? init_mount_tree() is always called, and it has
> statically:
>
> static void __init init_mount_tree(void)
> {
>         struct vfsmount *mnt;
>         ...
>         mnt = vfs_kern_mount(&rootfs_fs_type, 0, "rootfs", NULL);
>         ...
> }
>
> And as I noted, this is *always* called earlier than
> do_populate_rootfs(). Your changes did not remove the init_mount_tree()
> or modify it, and so why would the context of the above call always
> be OK to be used now with a ramfs context now?
>
> > So it makes no sense to make the file system of the first mount selectable.
>
> Why? I don't see why, nor is it explained, we're always caling
> vfs_kern_mount(&rootfs_fs_type, ...) and you have not changed that
> either.
>
> > To simplify the code here, I make it ramfs_init_fs_context directly. In fact,
> > it's fine to make it shmen_init_fs_context here too.
>
> So indeed you're suggesting its arbitrary now.... I don't see why.
>

So the biggest problem now seems to be the first mount I changed, maybe I didn't
make it clear before.

Let's call the first mount which is created in init_mount_tree() the
'init_mount'.
If the 'root' is a block fs, initrd or nfs, the 'init_mount' will be a
ramfs, that seems
clear, it can be seen from the enable of tmpfs:

void __init init_rootfs(void)
{
    if (IS_ENABLED(CONFIG_TMPFS) && !saved_root_name[0] &&
       (!root_fs_names || strstr(root_fs_names, "tmpfs")))
       is_tmpfs = true;
}

That makes sense, because the 'init_mount' will not be the root file
system, as the
block fs or initrd or nfs that mounted later on '/root' will become
the root by a call of
init_chroot().

If the 'root' is initramfs, cpio will be unpacked init 'init_mount'.
In this scene, the
'init_mount' can be ramfs or tmpfs, and it is the root file system.

In my patch, I create a second mount which is mounted on the
'init_mount', let's call
it 'user_mount'.

cpio is unpacked into 'user_mount', and the 'user_mount' is made the
root by 'init_chroot()'
later. The 'user_mount' can be ramfs or tmpfs. So 'init_mount' is not
the root any more,
and it makes no sense to make it tmpfs, just like why it should be
ramfs when 'root'
is a block fs or initrd.

'init_mount' is created from 'rootfs_fs_type' in init_mount_tree():

static void __init init_mount_tree(void)
{
        struct vfsmount *mnt;
        ...
        mnt = vfs_kern_mount(&rootfs_fs_type, 0, "rootfs", NULL);
        ...
}

I make the 'init_mount' to be ramfs by making
'rootfs_fs_type->init_fs_context' with
'ramfs_init_fs_context':

struct file_system_type rootfs_fs_type = {
.name = "rootfs",
.init_fs_context = ramfs_init_fs_context,
.kill_sb = kill_litter_super,
};

I think the third patch that I sended has made it clear what I do to
the 'init_mount'.

Thanks!
Menglong Dong