[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20161101061422.GA14866@outlook.office365.com>
Date: Mon, 31 Oct 2016 23:14:23 -0700
From: Andrei Vagin <avagin@...tuozzo.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
CC: Andrey Vagin <avagin@...nvz.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
Linux Containers <containers@...ts.linux-foundation.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC][PATCH v2] mount: In propagate_umount handle overlapping
mount propagation trees
On Tue, Oct 25, 2016 at 04:45:44PM -0500, Eric W. Biederman wrote:
> Andrei Vagin <avagin@...tuozzo.com> writes:
> >
> > From 8e0f45c0272aa1f789d1657a0acc98c58919dcc3 Mon Sep 17 00:00:00 2001
> > From: Andrei Vagin <avagin@...nvz.org>
> > Date: Tue, 25 Oct 2016 13:57:31 -0700
> > Subject: [PATCH] mount: skip all mounts from a shared group if one is marked
> >
> > If we meet a marked mount, it means that all mounts from
> > its group have been already revised.
> >
> > Signed-off-by: Andrei Vagin <avagin@...nvz.org>
> > ---
> > fs/pnode.c | 18 +++++++++++++++---
> > 1 file changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/pnode.c b/fs/pnode.c
> > index 8fd1a3f..ebb7134 100644
> > --- a/fs/pnode.c
> > +++ b/fs/pnode.c
> > @@ -426,10 +426,16 @@ static struct mount *propagation_visit_child(struct mount *last_child,
> > if (child && !IS_MNT_MARKED(child))
> > return child;
> >
> > - if (!child)
> > + if (!child) {
> > m = propagation_next(m, origin);
> > - else
> > + } else {
> > + if (IS_MNT_MARKED(child)) {
> > + if (m->mnt_group_id == origin->mnt_group_id)
> > + return NULL;
> > + m = m->mnt_master;
> > + }
> > m = propagation_next_sib(m, origin);
> > + }
> > }
> > return NULL;
> > }
> > @@ -456,8 +462,14 @@ static struct mount *propagation_revisit_child(struct mount *last_child,
> >
> > if (!child)
> > m = propagation_next(m, origin);
> > - else
> > + else {
> > + if (!IS_MNT_MARKED(child)) {
> > + if (m->mnt_group_id == origin->mnt_group_id)
> > + return NULL;
> > + m = m->mnt_master;
> > + }
> > m = propagation_next_sib(m, origin);
> > + }
> > }
> > return NULL;
> > }
>
> That is certainly interesting. The problem is that the reason we were
> going slow is that there were in fact mounts that had not been traversed
> in the share group.
>
> And in fact the entire idea of visiting a vfsmount mountpoint pair
> exactly once is wrong in the face of shadow mounts. For a vfsmount
> mountpoint pair that has shadow mounts the number of shadow mounts needs
> to be descreased by one each time the propgation tree is traversed
> during unmount. Which means that as far as I can see we have to kill
> shadow mounts to correctly optimize this code. Once shadow mounts are
> gone I don't know of a case where need your optimization.
I am not sure that now shadow mounts are worked as you described
here. start_umount_propagation() doesn't remove a mount from mnt_hash,
so in a second time we will look up the same mount again.
Look at this script:
[root@...4 mounts]# cat ./opus02.sh
set -e
mkdir -p /mnt
mount -t tmpfs zdtm /mnt
mkdir -p /mnt/A/a
mkdir -p /mnt/B/a
mount --bind --make-shared /mnt/A /mnt/A
mount --bind /mnt/A /mnt/B
mount --bind /mnt/A/a /mnt/A/a
mount --bind /mnt/A/a /mnt/A/a
umount -l /mnt/A
cat /proc/self/mountinfo | grep zdtm
[root@...4 mounts]# unshare --propagation private -m ./opus02.sh
159 121 0:46 / /mnt rw,relatime - tmpfs zdtm rw
162 159 0:46 /A /mnt/B rw,relatime shared:67 - tmpfs zdtm rw
167 162 0:46 /A/a /mnt/B/a rw,relatime shared:67 - tmpfs zdtm rw
We mount nothing into /mnt/B, but when we umount everything from A, we
still have something in B.
Thanks,
Andrei
>
> I am busily verifying my patch to kill shadow mounts but the following
> patch is the minimal version. As far as I can see propagate_one
> is the only place we create shadow mounts, and holding the
> namespace_lock over attach_recursive_mnt, propagate_mnt, and
> propgate_one is sufficient for that __lookup_mnt to be competely safe.
>
> diff --git a/fs/pnode.c b/fs/pnode.c
> index 234a9ac49958..b14119b370d4 100644
> --- a/fs/pnode.c
> +++ b/fs/pnode.c
> @@ -217,6 +217,9 @@ static int propagate_one(struct mount *m)
> /* skip if mountpoint isn't covered by it */
> if (!is_subdir(mp->m_dentry, m->mnt.mnt_root))
> return 0;
> + /* skip if mountpoint already has a mount on it */
> + if (__lookup_mnt(&m->mnt, mp->m_dentry))
> + return 0;
> if (peers(m, last_dest)) {
> type = CL_MAKE_SHARED;
> } else {
>
> If you run with that patch you will see that there are go faster stripes.
>
> Eric
>
Powered by blists - more mailing lists