linux-kernel - Re: [PATCH v12 02/26] securityfs: Extend securityfs with namespacing support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:   Thu, 7 Jul 2022 10:34:09 -0400
From:   Stefan Berger <stefanb@...ux.ibm.com>
To:     "Serge E. Hallyn" <serge@...lyn.com>
Cc:     linux-integrity@...r.kernel.org, zohar@...ux.ibm.com,
        christian.brauner@...ntu.com, containers@...ts.linux.dev,
        dmitry.kasatkin@...il.com, ebiederm@...ssion.com,
        krzysztof.struczynski@...wei.com, roberto.sassu@...wei.com,
        mpeters@...hat.com, lhinds@...hat.com, lsturman@...hat.com,
        puiterwi@...hat.com, jejb@...ux.ibm.com, jamjoom@...ibm.com,
        linux-kernel@...r.kernel.org, paul@...l-moore.com, rgb@...hat.com,
        linux-security-module@...r.kernel.org, jmorris@...ei.org,
        jpenumak@...hat.com, Christian Brauner <brauner@...nel.org>,
        James Bottomley <James.Bottomley@...senPartnership.com>
Subject: Re: [PATCH v12 02/26] securityfs: Extend securityfs with namespacing
 support



On 5/20/22 22:23, Serge E. Hallyn wrote:
> On Wed, Apr 20, 2022 at 10:06:09AM -0400, Stefan Berger wrote:
>> Enable multiple instances of securityfs by keying each instance with a
>> pointer to the user namespace it belongs to.
>>
>> Since we do not need the pinning of the filesystem for the virtualization
>> case, limit the usage of simple_pin_fs() and simpe_release_fs() to the
>> case when the init_user_ns is active. This simplifies the cleanup for the
>> virtualization case where usage of securityfs_remove() to free dentries
>> is therefore not needed anymore.
>>
>> For the initial securityfs, i.e. the one mounted in the host userns mount,
>> nothing changes. The rules for securityfs_remove() are as before and it is
>> still paired with securityfs_create(). Specifically, a file created via
>> securityfs_create_dentry() in the initial securityfs mount still needs to
>> be removed by a call to securityfs_remove(). Creating a new dentry in the
>> initial securityfs mount still pins the filesystem like it always did.
>> Consequently, the initial securityfs mount is not destroyed on
>> umount/shutdown as long as at least one user of it still has dentries that
>> it hasn't removed with a call to securityfs_remove().
>>
>> Prevent mounting of an instance of securityfs in another user namespace
>> than it belongs to. Also, prevent accesses to files and directories by
>> a user namespace that is neither the user namespace it belongs to
>> nor an ancestor of the user namespace that the instance of securityfs
>> belongs to. Do not prevent access if securityfs was bind-mounted and
>> therefore the init_user_ns is the owning user namespace.
>>
>> Suggested-by: Christian Brauner <brauner@...nel.org>
>> Signed-off-by: Stefan Berger <stefanb@...ux.ibm.com>
>> Signed-off-by: James Bottomley <James.Bottomley@...senPartnership.com>
>>
>> ---
>> v11:
>>   - Formatted comment's first line to be '/*'
>> ---
>>   security/inode.c | 73 ++++++++++++++++++++++++++++++++++++++++--------
>>   1 file changed, 62 insertions(+), 11 deletions(-)
>>
>> diff --git a/security/inode.c b/security/inode.c
>> index 13e6780c4444..84c9396792a9 100644
>> --- a/security/inode.c
>> +++ b/security/inode.c
>> @@ -21,9 +21,38 @@
>>   #include <linux/security.h>
>>   #include <linux/lsm_hooks.h>
>>   #include <linux/magic.h>
>> +#include <linux/user_namespace.h>
>>   
>> -static struct vfsmount *mount;
>> -static int mount_count;
>> +static struct vfsmount *init_securityfs_mount;
>> +static int init_securityfs_mount_count;
>> +
>> +static int securityfs_permission(struct user_namespace *mnt_userns,
>> +				 struct inode *inode, int mask)
>> +{
>> +	int err;
>> +
>> +	err = generic_permission(&init_user_ns, inode, mask);
>> +	if (!err) {
>> +		/*
>> +		 * Unless bind-mounted, deny access if current_user_ns() is not
>> +		 * ancestor.
> 
> This comment has confused me the last few times I looked at this.  I see
> now you're using "bind-mounted" as a shortcut for saying "bind mounted from
> the init_user_ns into a child_user_ns container".  I do think that needs
> to be made clearer in this comment.


I rephrased the comment now.

    Stefan