[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADxym3akKEurTTGiBxYZiXKVWUbOg=a8UeuRsJ07tT+DixA8mw@mail.gmail.com>
Date: Tue, 25 May 2021 08:55:48 +0800
From: Menglong Dong <menglong8.dong@...il.com>
To: Luis Chamberlain <mcgrof@...nel.org>
Cc: Jan Kara <jack@...e.cz>, Jens Axboe <axboe@...nel.dk>,
hare@...e.de, gregkh@...uxfoundation.org, tj@...nel.org,
Menglong Dong <dong.menglong@....com.cn>, song@...nel.org,
NeilBrown <neilb@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>,
wangkefeng.wang@...wei.com, f.fainelli@...il.com, arnd@...db.de,
Barret Rhoden <brho@...gle.com>,
Rasmus Villemoes <linux@...musvillemoes.dk>,
mhiramat@...nel.org, Steven Rostedt <rostedt@...dmis.org>,
Kees Cook <keescook@...omium.org>, vbabka@...e.cz,
Alexander Potapenko <glider@...gle.com>, pmladek@...e.com,
Chris Down <chris@...isdown.name>, ebiederm@...ssion.com,
jojing64@...il.com, LKML <linux-kernel@...r.kernel.org>,
palmerdabbelt@...gle.com, linux-fsdevel@...r.kernel.org,
Alexander Viro <viro@...iv.linux.org.uk>
Subject: Re: [PATCH RESEND] init/initramfs.c: make initramfs support pivot_root
Hello!
On Tue, May 25, 2021 at 6:58 AM Luis Chamberlain <mcgrof@...nel.org> wrote:
>
> > In fact, the network namespace is just a case for the problem that the
> > 'mount leak' caused. And this kind modification is not friendly to
> > current docker users, it makes great changes to the usage of docker.
>
> You mean an upgrade of docker? If so... that does not seem like a
> definitive reason to do something new in the kernel *always*.
No, I means that in this solution, users have to create all the
containers first,
and create the network namespace for them later.
>
> However, if you introduce it as a kconfig option so that users
> who want to use this new feature can enable it, and then use it,
> the its sold as a new feature.
>
> Should this always be enabled, or done this way? Should we never have
> the option to revert back to the old behaviour? If not, why not?
>
This change seems transparent to users, which don't change the behavior
of initramfs. However, it seems more reasonable to make it a kconfig option.
I'll do it in the v2 of the three patches I sended.
>
> I don't see you using any context directly, where are you specifying the
> context directly?
>
> > so it is not needed any more
> > and I just removed it in init_rootfs().
> >
> > The initialization of 'user_root' in init_rootfs() is used in:
> > do_populate_rootfs -> mount_user_root, which set the file system(
> > ramfs or tmpfs) of the second mount.
> >
> > Seems it's not suitable to place it in init_rootfs()......
>
> OK I think I just need to understand how you added the context of the
> first mount explicitly now and where, as I don't see it.
>
......
>
> What do you mean? init_mount_tree() is always called, and it has
> statically:
>
> static void __init init_mount_tree(void)
> {
> struct vfsmount *mnt;
> ...
> mnt = vfs_kern_mount(&rootfs_fs_type, 0, "rootfs", NULL);
> ...
> }
>
> And as I noted, this is *always* called earlier than
> do_populate_rootfs(). Your changes did not remove the init_mount_tree()
> or modify it, and so why would the context of the above call always
> be OK to be used now with a ramfs context now?
>
> > So it makes no sense to make the file system of the first mount selectable.
>
> Why? I don't see why, nor is it explained, we're always caling
> vfs_kern_mount(&rootfs_fs_type, ...) and you have not changed that
> either.
>
> > To simplify the code here, I make it ramfs_init_fs_context directly. In fact,
> > it's fine to make it shmen_init_fs_context here too.
>
> So indeed you're suggesting its arbitrary now.... I don't see why.
>
So the biggest problem now seems to be the first mount I changed, maybe I didn't
make it clear before.
Let's call the first mount which is created in init_mount_tree() the
'init_mount'.
If the 'root' is a block fs, initrd or nfs, the 'init_mount' will be a
ramfs, that seems
clear, it can be seen from the enable of tmpfs:
void __init init_rootfs(void)
{
if (IS_ENABLED(CONFIG_TMPFS) && !saved_root_name[0] &&
(!root_fs_names || strstr(root_fs_names, "tmpfs")))
is_tmpfs = true;
}
That makes sense, because the 'init_mount' will not be the root file
system, as the
block fs or initrd or nfs that mounted later on '/root' will become
the root by a call of
init_chroot().
If the 'root' is initramfs, cpio will be unpacked init 'init_mount'.
In this scene, the
'init_mount' can be ramfs or tmpfs, and it is the root file system.
In my patch, I create a second mount which is mounted on the
'init_mount', let's call
it 'user_mount'.
cpio is unpacked into 'user_mount', and the 'user_mount' is made the
root by 'init_chroot()'
later. The 'user_mount' can be ramfs or tmpfs. So 'init_mount' is not
the root any more,
and it makes no sense to make it tmpfs, just like why it should be
ramfs when 'root'
is a block fs or initrd.
'init_mount' is created from 'rootfs_fs_type' in init_mount_tree():
static void __init init_mount_tree(void)
{
struct vfsmount *mnt;
...
mnt = vfs_kern_mount(&rootfs_fs_type, 0, "rootfs", NULL);
...
}
I make the 'init_mount' to be ramfs by making
'rootfs_fs_type->init_fs_context' with
'ramfs_init_fs_context':
struct file_system_type rootfs_fs_type = {
.name = "rootfs",
.init_fs_context = ramfs_init_fs_context,
.kill_sb = kill_litter_super,
};
I think the third patch that I sended has made it clear what I do to
the 'init_mount'.
Thanks!
Menglong Dong
Powered by blists - more mailing lists