linux-kernel - Re: [PATCH v5 13/16] ima: Move some IMA policy and filesystem related variables into ima

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <915aa1173be6d73c63f17e7e170da4fe20ed36f3.camel@linux.ibm.com>
Date:   Fri, 10 Dec 2021 09:21:59 -0500
From:   James Bottomley <jejb@...ux.ibm.com>
To:     Stefan Berger <stefanb@...ux.ibm.com>,
        Christian Brauner <christian.brauner@...ntu.com>
Cc:     linux-integrity@...r.kernel.org, zohar@...ux.ibm.com,
        serge@...lyn.com, containers@...ts.linux.dev,
        dmitry.kasatkin@...il.com, ebiederm@...ssion.com,
        krzysztof.struczynski@...wei.com, roberto.sassu@...wei.com,
        mpeters@...hat.com, lhinds@...hat.com, lsturman@...hat.com,
        puiterwi@...hat.com, jamjoom@...ibm.com,
        linux-kernel@...r.kernel.org, paul@...l-moore.com, rgb@...hat.com,
        linux-security-module@...r.kernel.org, jmorris@...ei.org
Subject: Re: [PATCH v5 13/16] ima: Move some IMA policy and filesystem
 related variables into ima_namespace

On Fri, 2021-12-10 at 08:57 -0500, Stefan Berger wrote:
> On 12/10/21 06:32, Christian Brauner wrote:
> > On Thu, Dec 09, 2021 at 07:57:02PM -0500, Stefan Berger wrote:
> > > On 12/9/21 14:11, Christian Brauner wrote:
> > > >   From 1f03dc427c583d5e9ebc9ebe9de77c3c535bbebe Mon Sep 17
> > > > 00:00:00 2001
> > > > From: Christian Brauner <christian.brauner@...ntu.com>
> > > > Date: Thu, 9 Dec 2021 20:07:02 +0100
> > > > Subject: [PATCH] !!!! HERE BE DRAGONS - UNTESTED !!!!
> > > > 
> > > > ---
> > > >    security/integrity/ima/ima_fs.c | 43
> > > > +++++++++++++++++++++++++++++----
> > > >    1 file changed, 38 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/security/integrity/ima/ima_fs.c
> > > > b/security/integrity/ima/ima_fs.c
> > > > index 583462b29cb5..d5b302b925b8 100644
> > > > --- a/security/integrity/ima/ima_fs.c
> > > > +++ b/security/integrity/ima/ima_fs.c
> > > > @@ -317,10 +317,14 @@ static ssize_t ima_read_policy(char
> > > > *path)
> > > >    static ssize_t ima_write_policy(struct file *file, const
> > > > char __user *buf,
> > > >    				size_t datalen, loff_t *ppos)
> > > >    {
> > > > -	struct ima_namespace *ns = get_current_ns();
> > > > +	struct ima_namespace *ns;
> > > > +	struct user_namespace *user_ns;
> > > >    	char *data;
> > > >    	ssize_t result;
> > > > +	user_ns = ima_filp_private(filp);
> > > > +	ns = user_ns->ima_ns
> > > > +
> > > >    	if (datalen >= PAGE_SIZE)
> > > >    		datalen = PAGE_SIZE - 1;
> > > > @@ -373,26 +377,51 @@ static const struct seq_operations
> > > > ima_policy_seqops = {
> > > >    };
> > > >    #endif
> > > > +static struct user_namespace *ima_filp_private(struct file
> > > > *filp)
> > > > +{
> > > > +	if (!(filp->f_flags & O_WRONLY)) {
> > > > +#ifdef CONFIG_IMA_READ_POLICY
> > > > +		struct seq_file *seq;
> > > > +
> > > > +		seq = filp->private_data;
> > > > +		return seq->private;
> > > > +#endif
> > > > +	}
> > > > +	return filp->private_data;
> > > > +}
> > > > +
> > > >    /*
> > > >     * ima_open_policy: sequentialize access to the policy file
> > > >     */
> > > >    static int ima_open_policy(struct inode *inode, struct file
> > > > *filp)
> > > >    {
> > > > -	struct ima_namespace *ns = get_current_ns();
> > > > +	struct user_namespace *user_ns = current_user_ns();
> > > 
> > > Do we have to take a reference on the user namespace assuming one
> > > can open
> > > the file, pass the fd down the hierarchy, and then the user
> > > namespace with
> > > the opened file goes away? Or is there anything else that keeps
> > > the user
> > > namespace alive?
> > No, we don't. When ima_policy_open() is called we do
> > current_user_ns() but that will be guaranteed to be identical to
> > filp->f_cred->user_ns. And f_cred is a reference that has been
> > taken when the vfs allocated a struct file for this .open call so
> > won't go away until the last fput.
> > 
> > My proposal is also too complicated, I think.
> > (The booster is giving me the same side-effects as my second shot
> > so this looks like two good days of fever and headache. So I'll use
> > that as an excuse. :))
> > 
> > Your patch series as it stands has a bit of a security issue with
> > those get_current_ns() calls across differnet file/seq_file
> > operations. 
> > You have to make an architectural decision, I think. I see two
> > sensible options:
> > 1. The relevant ima_ns that .open/.read/.write operate on is always
> > taken to be the ima_ns of the filesystem's userns, i.e. sb-
> > >s_user_ns->ima_ns.
> >     This - but I'm not an ima person - makes the most sense to me
> > and the semantics are straightforward. If I write to a file to
> > alter some policy then I expect the ima namespace of the user
> > namespace to be affected that the securityfs instance was mounted
> > in.
> > 2. The relevant ima_ns that .open/.read/.write operate on is always
> > taken to be the one of the opener. I don't really like that as that
> > gets weird if for some complicated reason the caller is not located
> > in the userns the filesystem was mounted in (weird mount
> > propagation scenario or sm). It also feels strange to operate on an
> > ima_ns that's different from s_user_ns->ima_ns in a securityfs
> > instance.
> 
> We have this situation because one can setns() to another mount 
> namespaces but the data shown by SecurityFS lives in a user
> namespace,  right?

Well, not necessarily.  There is another case where only the userns is
unshared and securityfs is never mounted inside the container.  If the
process has the capability to open the securityfs files (kubernetes
privileged container, say), what should it see? The analogue with the
pid namespace says it should see the contents of the what the parent
had mounted because if it wanted to see its own it would have done a
mount of securityfs inside the userns.  This argues for sb->s_user_ns-
>ima_ns.

for the setns mount namespace case, the vfsmnt tree is duplicated, so
if the securityfs sb->s_user_ns is your user namespace in the prior
mount namespace, it will end up being so in the new one.  sb->s_user_ns 
only changes on actual mount.

>  And now we need to decide whether to affect the data in the
> user namespace  that did the open (option 2) or to which the
> SecurityFS  belongs to (option 1). If we were to open a regular file
> it would be option 1, so we should probably not break that existing
> semantic and also choose option 1 unless there one wasn't allowed to
> choose the user namespace the SecurityFS files belonged to then it
> should be option 2 

Once the userns is unshared, IMA accounting is done inside the
namespace.  However, in order to see the results, the container must
mount securityfs in the userns.  I can't think of a good reason why a
privileged container should want to be accounted separately but see the
results of its parents, but similarly I can't see why a pid namespace
should want to see /proc of its parent either ... yet that's the
semantic we have today.

> but then we have file descriptor passing where 'being allowed' can 
> change depending on who is reading/writing a file... Is there
> anything that would prevent us from setns()'ing to that target user
> namespace so that we would now see that of a user namespace that we
> are not allowed to see?

If you're able to setns to a user namespace, you logically have all its
privileges, so that problem shouldn't arise.

Option 2 is basically sliding back towards securityfs magically
changing properties depending on which userns is asking.  If we're
going to support that, I don't see what was wrong with the owner/guid
magically changing as well like I first propsed.  If we're going to
insist on a new mount of securityfs, I think it has to function cleanly
like the pid namespace, so option 1 is required.

James