linux-kernel - Re: [RFC][PATCH v2] mount: In propagate_umount handle overlapping mount propagation trees

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20161101061422.GA14866@outlook.office365.com>
Date:   Mon, 31 Oct 2016 23:14:23 -0700
From:   Andrei Vagin <avagin@...tuozzo.com>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>
CC:     Andrey Vagin <avagin@...nvz.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Linux Containers <containers@...ts.linux-foundation.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC][PATCH v2] mount: In propagate_umount handle overlapping
 mount propagation trees

On Tue, Oct 25, 2016 at 04:45:44PM -0500, Eric W. Biederman wrote:
> Andrei Vagin <avagin@...tuozzo.com> writes:
> >
> > From 8e0f45c0272aa1f789d1657a0acc98c58919dcc3 Mon Sep 17 00:00:00 2001
> > From: Andrei Vagin <avagin@...nvz.org>
> > Date: Tue, 25 Oct 2016 13:57:31 -0700
> > Subject: [PATCH] mount: skip all mounts from a shared group if one is marked
> >
> > If we meet a marked mount, it means that all mounts from
> > its group have been already revised.
> >
> > Signed-off-by: Andrei Vagin <avagin@...nvz.org>
> > ---
> >  fs/pnode.c | 18 +++++++++++++++---
> >  1 file changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/pnode.c b/fs/pnode.c
> > index 8fd1a3f..ebb7134 100644
> > --- a/fs/pnode.c
> > +++ b/fs/pnode.c
> > @@ -426,10 +426,16 @@ static struct mount *propagation_visit_child(struct mount *last_child,
> >  		if (child && !IS_MNT_MARKED(child))
> >  			return child;
> >  
> > -		if (!child)
> > +		if (!child) {
> >  			m = propagation_next(m, origin);
> > -		else
> > +		} else {
> > +			if (IS_MNT_MARKED(child)) {
> > +				if (m->mnt_group_id == origin->mnt_group_id)
> > +					return NULL;
> > +				m = m->mnt_master;
> > +			}
> >  			m = propagation_next_sib(m, origin);
> > +		}
> >  	}
> >  	return NULL;
> >  }
> > @@ -456,8 +462,14 @@ static struct mount *propagation_revisit_child(struct mount *last_child,
> >  
> >  		if (!child)
> >  			m = propagation_next(m, origin);
> > -		else
> > +		else {
> > +			if (!IS_MNT_MARKED(child)) {
> > +				if (m->mnt_group_id == origin->mnt_group_id)
> > +					return NULL;
> > +				m = m->mnt_master;
> > +			}
> >  			m = propagation_next_sib(m, origin);
> > +		}
> >  	}
> >  	return NULL;
> >  }
> 
> That is certainly interesting.  The problem is that the reason we were
> going slow is that there were in fact mounts that had not been traversed
> in the share group.
> 
> And in fact the entire idea of visiting a vfsmount mountpoint pair
> exactly once is wrong in the face of shadow mounts.  For a vfsmount
> mountpoint pair that has shadow mounts the number of shadow mounts needs
> to be descreased by one each time the propgation tree is traversed
> during unmount. Which means that as far as I can see we have to kill
> shadow mounts to correctly optimize this code.  Once shadow mounts are
> gone I don't know of a case where need your optimization.

I am not sure that now shadow mounts are worked as you described
here. start_umount_propagation() doesn't remove a mount from mnt_hash,
so in a second time we will look up the same mount again.

Look at this script:

[root@...4 mounts]# cat ./opus02.sh
set -e
mkdir -p /mnt
mount -t tmpfs zdtm /mnt
mkdir -p /mnt/A/a
mkdir -p /mnt/B/a
mount --bind --make-shared /mnt/A /mnt/A
mount --bind /mnt/A /mnt/B
mount --bind /mnt/A/a /mnt/A/a
mount --bind /mnt/A/a /mnt/A/a

umount -l /mnt/A
cat /proc/self/mountinfo | grep zdtm

[root@...4 mounts]# unshare --propagation private -m ./opus02.sh
159 121 0:46 / /mnt rw,relatime - tmpfs zdtm rw
162 159 0:46 /A /mnt/B rw,relatime shared:67 - tmpfs zdtm rw
167 162 0:46 /A/a /mnt/B/a rw,relatime shared:67 - tmpfs zdtm rw

We mount nothing into /mnt/B, but when we umount everything from A, we
still have something in B.

Thanks,
Andrei
> 
> I am busily verifying my patch to kill shadow mounts but the following
> patch is the minimal version.  As far as I can see propagate_one
> is the only place we create shadow mounts, and holding the
> namespace_lock over attach_recursive_mnt, propagate_mnt, and
> propgate_one is sufficient for that __lookup_mnt to be competely safe.
> 
> diff --git a/fs/pnode.c b/fs/pnode.c
> index 234a9ac49958..b14119b370d4 100644
> --- a/fs/pnode.c
> +++ b/fs/pnode.c
> @@ -217,6 +217,9 @@ static int propagate_one(struct mount *m)
>         /* skip if mountpoint isn't covered by it */
>         if (!is_subdir(mp->m_dentry, m->mnt.mnt_root))
>                 return 0;
> +       /* skip if mountpoint already has a mount on it */
> +       if (__lookup_mnt(&m->mnt, mp->m_dentry))
> +               return 0;
>         if (peers(m, last_dest)) {
>                 type = CL_MAKE_SHARED;
>         } else {
> 
> If you run with that patch you will see that there are go faster stripes.
> 
> Eric
>