[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <07b7e70b-29e3-46bd-91e3-f19eabb915c8@auristor.com>
Date: Thu, 29 May 2025 20:11:59 -0400
From: Jeffrey E Altman <jaltman@...istor.com>
To: David Howells <dhowells@...hat.com>,
Christian Brauner <christian@...uner.io>
Cc: Marc Dionne <marc.dionne@...istor.com>, linux-afs@...ts.infradead.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
Etienne Champetier <champetier.etienne@...il.com>,
Chet Ramey <chet.ramey@...e.edu>, Cheyenne Wills <cwills@...enomine.net>,
Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>, Steve French <sfrench@...ba.org>,
openafs-devel@...nafs.org, linux-cifs@...r.kernel.org
Subject: Re: [PATCH 1/2] afs, bash: Fix open(O_CREAT) on an extant AFS file in
a sticky dir
On 5/19/2025 12:11 PM, David Howells wrote:
> Since version 1.11 (January 1992) Bash has a work around in redir_open()
> that causes open(O_CREAT) of a file to be retried without O_CREAT if open()
> fails with an EACCES error if bash was built with AFS workarounds
> configured:
>
> #if defined (AFS)
> if ((fd < 0) && (errno == EACCES))
> {
> fd = open (filename, flags & ~O_CREAT, mode);
> errno = EACCES; /* restore errno */
> }
>
> #endif /* AFS */
>
> The ~O_CREAT fallback logic was introduced to workaround a bug[1] in the
> IBM AFS 3.1 cache manager and server which can return EACCES in preference
> to EEXIST if the requested file exists but the caller is neither granted
> explicit PRSFS_READ permission nor is the file owner and is granted
> PRSFS_INSERT permission on the directory. IBM AFS 3.2 altered the cache
> manager permission checks but failed to correct the permission checks in
> the AFS server. As of this writing, all IBM AFS derived servers continue
> to return EACCES in preference to EEXIST when these conditions are met.
> Bug reports have been filed with all implementations.
>
> As an unintended side effect, the Bash fallback logic also undermines the
> Linux kernel protections against O_CREAT opening FIFOs and regular files
> not owned by the user in world writeable sticky directories - unless the
> owner is the same as that of the directory - as was added in commit
> 30aba6656f61e ("namei: allow restricted O_CREAT of FIFOs and regular
> files").
>
> As a result the Bash fallback logic masks an incompatibility between the
> ownership checks performed by may_create_in_sticky() and network
> filesystems such as AFS where the uid namespace is disjoint from the uid
> namespace of the local system.
>
> However, the bash work around is going to be removed[2].
>
> Fix this in the kernel by:
>
> (1) Provide an ->is_owned_by_me() inode op, similar to ->permission(),
> and, if provided, call that to determine if the caller owns the file
> instead of checking the i_uid to current_fsuid().
>
> (2) Provide a ->have_same_owner() inode op, that, if provided, can be
> called to see if an inode has the same owner as the parent on the path
> walked.
>
> For kafs, use the first hook to check to see if the server indicated the
> ADMINISTER bit in the access rights returned by the FS.FetchStatus and
> suchlike and the second hook to compare the AFS user IDs retrieved by
> FS.FetchStatus (which may not fit in a kuid if AuriStor's YFS variant).
>
> These hooks should probably be used in other places too, and a follow-up
> patch will be submitted for that.
>
> This can be tested by creating a sticky directory (the user must have a
> token to do this) and creating a file in it. Then strace bash doing "echo
> foo >>file" and look at whether bash does a single, successful O_CREAT open
> on the file or whether that one fails and then bash does one without
> O_CREAT that succeeds.
>
> Fixes: 30aba6656f61 ("namei: allow restricted O_CREAT of FIFOs and regular files")
> Reported-by: Etienne Champetier <champetier.etienne@...il.com>
> Signed-off-by: David Howells <dhowells@...hat.com>
> cc: Marc Dionne <marc.dionne@...istor.com>
> cc: Jeffrey Altman <jaltman@...istor.com>
> cc: Chet Ramey <chet.ramey@...e.edu>
> cc: Cheyenne Wills <cwills@...enomine.net>
> cc: Alexander Viro <viro@...iv.linux.org.uk>
> cc: Christian Brauner <brauner@...nel.org>
> cc: Steve French <sfrench@...ba.org>
> cc: linux-afs@...ts.infradead.org
> cc: openafs-devel@...nafs.org
> cc: linux-cifs@...r.kernel.org
> cc: linux-fsdevel@...r.kernel.org
> Link: https://groups.google.com/g/gnu.bash.bug/c/6PPTfOgFdL4/m/2AQU-S1N76UJ [1]
> Link: https://git.savannah.gnu.org/cgit/bash.git/tree/redir.c?h=bash-5.3-rc1#n733 [2]
> ---
> fs/afs/dir.c | 2 ++
> fs/afs/file.c | 2 ++
> fs/afs/internal.h | 3 +++
> fs/afs/security.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++
> fs/internal.h | 1 +
> fs/namei.c | 50 +++++++++++++++++++++++++++++++++++---------
> include/linux/fs.h | 3 +++
> 7 files changed, 103 insertions(+), 10 deletions(-)
>
> diff --git a/fs/afs/dir.c b/fs/afs/dir.c
> index 9e7b1fe82c27..6360db1673b0 100644
> --- a/fs/afs/dir.c
> +++ b/fs/afs/dir.c
> @@ -65,6 +65,8 @@ const struct inode_operations afs_dir_inode_operations = {
> .permission = afs_permission,
> .getattr = afs_getattr,
> .setattr = afs_setattr,
> + .is_owned_by_me = afs_is_owned_by_me,
> + .have_same_owner = afs_have_same_owner,
> };
>
> const struct address_space_operations afs_dir_aops = {
> diff --git a/fs/afs/file.c b/fs/afs/file.c
> index fc15497608c6..0317f0a36cf2 100644
> --- a/fs/afs/file.c
> +++ b/fs/afs/file.c
> @@ -47,6 +47,8 @@ const struct inode_operations afs_file_inode_operations = {
> .getattr = afs_getattr,
> .setattr = afs_setattr,
> .permission = afs_permission,
> + .is_owned_by_me = afs_is_owned_by_me,
> + .have_same_owner = afs_have_same_owner,
> };
>
> const struct address_space_operations afs_file_aops = {
> diff --git a/fs/afs/internal.h b/fs/afs/internal.h
> index 440b0e731093..fbfbf615abe3 100644
> --- a/fs/afs/internal.h
> +++ b/fs/afs/internal.h
> @@ -1495,6 +1495,9 @@ extern struct key *afs_request_key(struct afs_cell *);
> extern struct key *afs_request_key_rcu(struct afs_cell *);
> extern int afs_check_permit(struct afs_vnode *, struct key *, afs_access_t *);
> extern int afs_permission(struct mnt_idmap *, struct inode *, int);
> +int afs_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode);
> +int afs_have_same_owner(struct mnt_idmap *idmap, struct inode *inode,
> + struct dentry *dentry);
> extern void __exit afs_clean_up_permit_cache(void);
>
> /*
> diff --git a/fs/afs/security.c b/fs/afs/security.c
> index 6a7744c9e2a2..a49070c8342d 100644
> --- a/fs/afs/security.c
> +++ b/fs/afs/security.c
> @@ -477,6 +477,58 @@ int afs_permission(struct mnt_idmap *idmap, struct inode *inode,
> return ret;
> }
>
> +/*
> + * Determine if an inode is owned by 'me' - whatever that means for the
> + * filesystem. In the case of AFS, this means that the file is owned by the
> + * AFS user represented by the token (e.g. from a kerberos server) held in a
> + * key. Returns 0 if owned by me, 1 if not; can also return an error.
> + */
Technically AFS tokens are not issued by Kerberos KDCs. Could we say
"... the file is owned
by the AFS user represented by the Rx Security Class token held in a key."?
> +int afs_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode)
> +{
> + struct afs_vnode *vnode = AFS_FS_I(inode);
> + afs_access_t access;
> + struct key *key;
> + int ret;
> +
> + key = afs_request_key(vnode->volume->cell);
> + if (IS_ERR(key))
> + return PTR_ERR(key);
> +
> + /* Get the access rights for the key on this file. */
> + ret = afs_check_permit(vnode, key, &access);
> + if (ret < 0)
> + goto error;
> +
> + /* We get the ADMINISTER bit if we own the file. */
> + ret = (access & AFS_ACE_ADMINISTER) ? 0 : 1;
AFS_ACE_ADMINISTER only means ownership if the inode is a
non-directory. Should
we add an explicit check for inode type?
> +error:
> + key_put(key);
> + return ret;
> +}
> +
> +/*
> + * Determine if a file has the same owner as its parent - whatever that means
> + * for the filesystem. In the case of AFS, this means comparing their AFS
> + * UIDs. Returns 0 if same, 1 if not same; can also return an error.
> + */
> +int afs_have_same_owner(struct mnt_idmap *idmap, struct inode *inode,
> + struct dentry *dentry)
> +{
> + struct afs_vnode *vnode = AFS_FS_I(inode), *dvnode;
> + struct dentry *parent;
> + s64 owner;
> +
> + /* Get the owner's ID for the directory. Ideally, we'd use RCU to
> + * access the parent rather than getting a ref.
> + */
> + parent = dget_parent(dentry);
> + dvnode = AFS_FS_I(d_backing_inode(parent));
> + owner = dvnode->status.owner;
> + dput(parent);
> +
> + return vnode->status.owner != owner;
> +}
> +
> void __exit afs_clean_up_permit_cache(void)
> {
> int i;
> diff --git a/fs/internal.h b/fs/internal.h
> index b9b3e29a73fd..9e84bfc5aee6 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -52,6 +52,7 @@ extern int finish_clean_context(struct fs_context *fc);
> /*
> * namei.c
> */
> +int vfs_inode_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode);
> extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> struct path *path, struct path *root);
> int do_rmdir(int dfd, struct filename *name);
> diff --git a/fs/namei.c b/fs/namei.c
> index 84a0e0b0111c..9f42dc46322f 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -53,8 +53,8 @@
> * The new code replaces the old recursive symlink resolution with
> * an iterative one (in case of non-nested symlink chains). It does
> * this with calls to <fs>_follow_link().
> - * As a side effect, dir_namei(), _namei() and follow_link() are now
> - * replaced with a single function lookup_dentry() that can handle all
> + * As a side effect, dir_namei(), _namei() and follow_link() are now
> + * replaced with a single function lookup_dentry() that can handle all
> * the special cases of the former code.
> *
> * With the new dcache, the pathname is stored at each inode, at least as
> @@ -1149,6 +1149,36 @@ fs_initcall(init_fs_namei_sysctls);
>
> #endif /* CONFIG_SYSCTL */
>
> +/*
> + * Determine if an inode is owned by the process (allowing for fsuid override),
> + * returning 0 if so, 1 if not and a negative error code if there was a problem
> + * making the determination.
> + */
> +int vfs_inode_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode)
> +{
> + if (unlikely(inode->i_op->is_owned_by_me))
> + return inode->i_op->is_owned_by_me(idmap, inode);
> +
> + return vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode),
> + current_fsuid()) ? 1 : 0;
> +}
> +
> +/*
> + * Determine if two inodes have the same owner, returning 0 if so, 1 if not and
> + * a negative error code if there was a problem making the determination.
> + */
> +static int vfs_inodes_have_same_owner(struct mnt_idmap *idmap, struct inode *inode,
> + const struct nameidata *nd)
> +{
> + if (unlikely(inode->i_op->have_same_owner))
> + return inode->i_op->have_same_owner(idmap, inode, nd->path.dentry);
> +
> + if (vfsuid_valid(nd->dir_vfsuid) &&
> + vfsuid_eq(i_uid_into_vfsuid(idmap, inode), nd->dir_vfsuid))
> + return 0;
> + return 1; /* Not same. */
> +}
> +
> /**
> * may_follow_link - Check symlink following for unsafe situations
> * @nd: nameidata pathwalk data
> @@ -1302,10 +1332,10 @@ int may_linkat(struct mnt_idmap *idmap, const struct path *link)
> * Returns 0 if the open is allowed, -ve on error.
> */
> static int may_create_in_sticky(struct mnt_idmap *idmap, struct nameidata *nd,
> - struct inode *const inode)
> + struct inode *inode)
> {
> umode_t dir_mode = nd->dir_mode;
> - vfsuid_t dir_vfsuid = nd->dir_vfsuid, i_vfsuid;
> + int ret;
>
> if (likely(!(dir_mode & S_ISVTX)))
> return 0;
> @@ -1316,13 +1346,13 @@ static int may_create_in_sticky(struct mnt_idmap *idmap, struct nameidata *nd,
> if (S_ISFIFO(inode->i_mode) && !sysctl_protected_fifos)
> return 0;
>
> - i_vfsuid = i_uid_into_vfsuid(idmap, inode);
> -
> - if (vfsuid_eq(i_vfsuid, dir_vfsuid))
> - return 0;
> + ret = vfs_inodes_have_same_owner(idmap, inode, nd);
> + if (ret <= 0)
> + return ret;
>
> - if (vfsuid_eq_kuid(i_vfsuid, current_fsuid()))
> - return 0;
> + ret = vfs_inode_is_owned_by_me(idmap, inode);
> + if (ret <= 0)
> + return ret;
>
> if (likely(dir_mode & 0002)) {
> audit_log_path_denied(AUDIT_ANOM_CREAT, "sticky_create");
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 016b0fe1536e..ec278d2d362a 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2236,6 +2236,9 @@ struct inode_operations {
> struct dentry *dentry, struct fileattr *fa);
> int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa);
> struct offset_ctx *(*get_offset_ctx)(struct inode *inode);
> + int (*is_owned_by_me)(struct mnt_idmap *idmap, struct inode *inode);
> + int (*have_same_owner)(struct mnt_idmap *idmap, struct inode *inode,
> + struct dentry *dentry);
> } ____cacheline_aligned;
>
> static inline int call_mmap(struct file *file, struct vm_area_struct *vma)
It would be nice if this patch added documentation for the new
inode_operations to
Documentation/filesystems/vfs.rst.
The may_create_in_sticky() logic looks good to me.
Thanks for the patch.
Jeffrey Altman
Download attachment "smime.p7s" of type "application/pkcs7-signature" (4276 bytes)
Powered by blists - more mailing lists