linux-kernel - Re: [PATCH 1/2] afs, bash: Fix open(O_CREAT) on an extant AFS file in a sticky dir

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <07b7e70b-29e3-46bd-91e3-f19eabb915c8@auristor.com>
Date: Thu, 29 May 2025 20:11:59 -0400
From: Jeffrey E Altman <jaltman@...istor.com>
To: David Howells <dhowells@...hat.com>,
 Christian Brauner <christian@...uner.io>
Cc: Marc Dionne <marc.dionne@...istor.com>, linux-afs@...ts.infradead.org,
 linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
 Etienne Champetier <champetier.etienne@...il.com>,
 Chet Ramey <chet.ramey@...e.edu>, Cheyenne Wills <cwills@...enomine.net>,
 Alexander Viro <viro@...iv.linux.org.uk>,
 Christian Brauner <brauner@...nel.org>, Steve French <sfrench@...ba.org>,
 openafs-devel@...nafs.org, linux-cifs@...r.kernel.org
Subject: Re: [PATCH 1/2] afs, bash: Fix open(O_CREAT) on an extant AFS file in
 a sticky dir


On 5/19/2025 12:11 PM, David Howells wrote:
> Since version 1.11 (January 1992) Bash has a work around in redir_open()
> that causes open(O_CREAT) of a file to be retried without O_CREAT if open()
> fails with an EACCES error if bash was built with AFS workarounds
> configured:
>
>          #if defined (AFS)
>                if ((fd < 0) && (errno == EACCES))
>              {
>                fd = open (filename, flags & ~O_CREAT, mode);
>                errno = EACCES;    /* restore errno */
>              }
>
>          #endif /* AFS */
>
> The ~O_CREAT fallback logic was introduced to workaround a bug[1] in the
> IBM AFS 3.1 cache manager and server which can return EACCES in preference
> to EEXIST if the requested file exists but the caller is neither granted
> explicit PRSFS_READ permission nor is the file owner and is granted
> PRSFS_INSERT permission on the directory.  IBM AFS 3.2 altered the cache
> manager permission checks but failed to correct the permission checks in
> the AFS server.  As of this writing, all IBM AFS derived servers continue
> to return EACCES in preference to EEXIST when these conditions are met.
> Bug reports have been filed with all implementations.
>
> As an unintended side effect, the Bash fallback logic also undermines the
> Linux kernel protections against O_CREAT opening FIFOs and regular files
> not owned by the user in world writeable sticky directories - unless the
> owner is the same as that of the directory - as was added in commit
> 30aba6656f61e ("namei: allow restricted O_CREAT of FIFOs and regular
> files").
>
> As a result the Bash fallback logic masks an incompatibility between the
> ownership checks performed by may_create_in_sticky() and network
> filesystems such as AFS where the uid namespace is disjoint from the uid
> namespace of the local system.
>
> However, the bash work around is going to be removed[2].
>
> Fix this in the kernel by:
>
>   (1) Provide an ->is_owned_by_me() inode op, similar to ->permission(),
>       and, if provided, call that to determine if the caller owns the file
>       instead of checking the i_uid to current_fsuid().
>
>   (2) Provide a ->have_same_owner() inode op, that, if provided, can be
>       called to see if an inode has the same owner as the parent on the path
>       walked.
>
> For kafs, use the first hook to check to see if the server indicated the
> ADMINISTER bit in the access rights returned by the FS.FetchStatus and
> suchlike and the second hook to compare the AFS user IDs retrieved by
> FS.FetchStatus (which may not fit in a kuid if AuriStor's YFS variant).
>
> These hooks should probably be used in other places too, and a follow-up
> patch will be submitted for that.
>
> This can be tested by creating a sticky directory (the user must have a
> token to do this) and creating a file in it.  Then strace bash doing "echo
> foo >>file" and look at whether bash does a single, successful O_CREAT open
> on the file or whether that one fails and then bash does one without
> O_CREAT that succeeds.
>
> Fixes: 30aba6656f61 ("namei: allow restricted O_CREAT of FIFOs and regular files")
> Reported-by: Etienne Champetier <champetier.etienne@...il.com>
> Signed-off-by: David Howells <dhowells@...hat.com>
> cc: Marc Dionne <marc.dionne@...istor.com>
> cc: Jeffrey Altman <jaltman@...istor.com>
> cc: Chet Ramey <chet.ramey@...e.edu>
> cc: Cheyenne Wills <cwills@...enomine.net>
> cc: Alexander Viro <viro@...iv.linux.org.uk>
> cc: Christian Brauner <brauner@...nel.org>
> cc: Steve French <sfrench@...ba.org>
> cc: linux-afs@...ts.infradead.org
> cc: openafs-devel@...nafs.org
> cc: linux-cifs@...r.kernel.org
> cc: linux-fsdevel@...r.kernel.org
> Link: https://groups.google.com/g/gnu.bash.bug/c/6PPTfOgFdL4/m/2AQU-S1N76UJ [1]
> Link: https://git.savannah.gnu.org/cgit/bash.git/tree/redir.c?h=bash-5.3-rc1#n733 [2]
> ---
>   fs/afs/dir.c       |  2 ++
>   fs/afs/file.c      |  2 ++
>   fs/afs/internal.h  |  3 +++
>   fs/afs/security.c  | 52 ++++++++++++++++++++++++++++++++++++++++++++++
>   fs/internal.h      |  1 +
>   fs/namei.c         | 50 +++++++++++++++++++++++++++++++++++---------
>   include/linux/fs.h |  3 +++
>   7 files changed, 103 insertions(+), 10 deletions(-)
>
> diff --git a/fs/afs/dir.c b/fs/afs/dir.c
> index 9e7b1fe82c27..6360db1673b0 100644
> --- a/fs/afs/dir.c
> +++ b/fs/afs/dir.c
> @@ -65,6 +65,8 @@ const struct inode_operations afs_dir_inode_operations = {
>   	.permission	= afs_permission,
>   	.getattr	= afs_getattr,
>   	.setattr	= afs_setattr,
> +	.is_owned_by_me	= afs_is_owned_by_me,
> +	.have_same_owner = afs_have_same_owner,
>   };
>   
>   const struct address_space_operations afs_dir_aops = {
> diff --git a/fs/afs/file.c b/fs/afs/file.c
> index fc15497608c6..0317f0a36cf2 100644
> --- a/fs/afs/file.c
> +++ b/fs/afs/file.c
> @@ -47,6 +47,8 @@ const struct inode_operations afs_file_inode_operations = {
>   	.getattr	= afs_getattr,
>   	.setattr	= afs_setattr,
>   	.permission	= afs_permission,
> +	.is_owned_by_me	= afs_is_owned_by_me,
> +	.have_same_owner = afs_have_same_owner,
>   };
>   
>   const struct address_space_operations afs_file_aops = {
> diff --git a/fs/afs/internal.h b/fs/afs/internal.h
> index 440b0e731093..fbfbf615abe3 100644
> --- a/fs/afs/internal.h
> +++ b/fs/afs/internal.h
> @@ -1495,6 +1495,9 @@ extern struct key *afs_request_key(struct afs_cell *);
>   extern struct key *afs_request_key_rcu(struct afs_cell *);
>   extern int afs_check_permit(struct afs_vnode *, struct key *, afs_access_t *);
>   extern int afs_permission(struct mnt_idmap *, struct inode *, int);
> +int afs_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode);
> +int afs_have_same_owner(struct mnt_idmap *idmap, struct inode *inode,
> +			struct dentry *dentry);
>   extern void __exit afs_clean_up_permit_cache(void);
>   
>   /*
> diff --git a/fs/afs/security.c b/fs/afs/security.c
> index 6a7744c9e2a2..a49070c8342d 100644
> --- a/fs/afs/security.c
> +++ b/fs/afs/security.c
> @@ -477,6 +477,58 @@ int afs_permission(struct mnt_idmap *idmap, struct inode *inode,
>   	return ret;
>   }
>   
> +/*
> + * Determine if an inode is owned by 'me' - whatever that means for the
> + * filesystem.  In the case of AFS, this means that the file is owned by the
> + * AFS user represented by the token (e.g. from a kerberos server) held in a
> + * key.  Returns 0 if owned by me, 1 if not; can also return an error.
> + */

Technically AFS tokens are not issued by Kerberos KDCs.  Could we say 
"... the file is owned
by the AFS user represented by the Rx Security Class token held in a key."?

> +int afs_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode)
> +{
> +	struct afs_vnode *vnode = AFS_FS_I(inode);
> +	afs_access_t access;
> +	struct key *key;
> +	int ret;
> +
> +	key = afs_request_key(vnode->volume->cell);
> +	if (IS_ERR(key))
> +		return PTR_ERR(key);
> +
> +	/* Get the access rights for the key on this file. */
> +	ret = afs_check_permit(vnode, key, &access);
> +	if (ret < 0)
> +		goto error;
> +
> +	/* We get the ADMINISTER bit if we own the file. */
> +	ret = (access & AFS_ACE_ADMINISTER) ? 0 : 1;

AFS_ACE_ADMINISTER only means ownership if the inode is a 
non-directory.  Should
we add an explicit check for inode type?

> +error:
> +	key_put(key);
> +	return ret;
> +}
> +
> +/*
> + * Determine if a file has the same owner as its parent - whatever that means
> + * for the filesystem.  In the case of AFS, this means comparing their AFS
> + * UIDs.  Returns 0 if same, 1 if not same; can also return an error.
> + */
> +int afs_have_same_owner(struct mnt_idmap *idmap, struct inode *inode,
> +			struct dentry *dentry)
> +{
> +	struct afs_vnode *vnode = AFS_FS_I(inode), *dvnode;
> +	struct dentry *parent;
> +	s64 owner;
> +
> +	/* Get the owner's ID for the directory.  Ideally, we'd use RCU to
> +	 * access the parent rather than getting a ref.
> +	 */
> +	parent = dget_parent(dentry);
> +	dvnode = AFS_FS_I(d_backing_inode(parent));
> +	owner = dvnode->status.owner;
> +	dput(parent);
> +
> +	return vnode->status.owner != owner;
> +}
> +
>   void __exit afs_clean_up_permit_cache(void)
>   {
>   	int i;
> diff --git a/fs/internal.h b/fs/internal.h
> index b9b3e29a73fd..9e84bfc5aee6 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -52,6 +52,7 @@ extern int finish_clean_context(struct fs_context *fc);
>   /*
>    * namei.c
>    */
> +int vfs_inode_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode);
>   extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
>   			   struct path *path, struct path *root);
>   int do_rmdir(int dfd, struct filename *name);
> diff --git a/fs/namei.c b/fs/namei.c
> index 84a0e0b0111c..9f42dc46322f 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -53,8 +53,8 @@
>    * The new code replaces the old recursive symlink resolution with
>    * an iterative one (in case of non-nested symlink chains).  It does
>    * this with calls to <fs>_follow_link().
> - * As a side effect, dir_namei(), _namei() and follow_link() are now
> - * replaced with a single function lookup_dentry() that can handle all
> + * As a side effect, dir_namei(), _namei() and follow_link() are now
> + * replaced with a single function lookup_dentry() that can handle all
>    * the special cases of the former code.
>    *
>    * With the new dcache, the pathname is stored at each inode, at least as
> @@ -1149,6 +1149,36 @@ fs_initcall(init_fs_namei_sysctls);
>   
>   #endif /* CONFIG_SYSCTL */
>   
> +/*
> + * Determine if an inode is owned by the process (allowing for fsuid override),
> + * returning 0 if so, 1 if not and a negative error code if there was a problem
> + * making the determination.
> + */
> +int vfs_inode_is_owned_by_me(struct mnt_idmap *idmap, struct inode *inode)
> +{
> +	if (unlikely(inode->i_op->is_owned_by_me))
> +		return inode->i_op->is_owned_by_me(idmap, inode);
> +
> +	return vfsuid_eq_kuid(i_uid_into_vfsuid(idmap, inode),
> +			      current_fsuid()) ? 1 : 0;
> +}
> +
> +/*
> + * Determine if two inodes have the same owner, returning 0 if so, 1 if not and
> + * a negative error code if there was a problem making the determination.
> + */
> +static int vfs_inodes_have_same_owner(struct mnt_idmap *idmap, struct inode *inode,
> +				      const struct nameidata *nd)
> +{
> +	if (unlikely(inode->i_op->have_same_owner))
> +		return inode->i_op->have_same_owner(idmap, inode, nd->path.dentry);
> +
> +	if (vfsuid_valid(nd->dir_vfsuid) &&
> +	    vfsuid_eq(i_uid_into_vfsuid(idmap, inode), nd->dir_vfsuid))
> +		return 0;
> +	return 1; /* Not same. */
> +}
> +
>   /**
>    * may_follow_link - Check symlink following for unsafe situations
>    * @nd: nameidata pathwalk data
> @@ -1302,10 +1332,10 @@ int may_linkat(struct mnt_idmap *idmap, const struct path *link)
>    * Returns 0 if the open is allowed, -ve on error.
>    */
>   static int may_create_in_sticky(struct mnt_idmap *idmap, struct nameidata *nd,
> -				struct inode *const inode)
> +				struct inode *inode)
>   {
>   	umode_t dir_mode = nd->dir_mode;
> -	vfsuid_t dir_vfsuid = nd->dir_vfsuid, i_vfsuid;
> +	int ret;
>   
>   	if (likely(!(dir_mode & S_ISVTX)))
>   		return 0;
> @@ -1316,13 +1346,13 @@ static int may_create_in_sticky(struct mnt_idmap *idmap, struct nameidata *nd,
>   	if (S_ISFIFO(inode->i_mode) && !sysctl_protected_fifos)
>   		return 0;
>   
> -	i_vfsuid = i_uid_into_vfsuid(idmap, inode);
> -
> -	if (vfsuid_eq(i_vfsuid, dir_vfsuid))
> -		return 0;
> +	ret = vfs_inodes_have_same_owner(idmap, inode, nd);
> +	if (ret <= 0)
> +		return ret;
>   
> -	if (vfsuid_eq_kuid(i_vfsuid, current_fsuid()))
> -		return 0;
> +	ret = vfs_inode_is_owned_by_me(idmap, inode);
> +	if (ret <= 0)
> +		return ret;
>   
>   	if (likely(dir_mode & 0002)) {
>   		audit_log_path_denied(AUDIT_ANOM_CREAT, "sticky_create");
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 016b0fe1536e..ec278d2d362a 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2236,6 +2236,9 @@ struct inode_operations {
>   			    struct dentry *dentry, struct fileattr *fa);
>   	int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa);
>   	struct offset_ctx *(*get_offset_ctx)(struct inode *inode);
> +	int (*is_owned_by_me)(struct mnt_idmap *idmap, struct inode *inode);
> +	int (*have_same_owner)(struct mnt_idmap *idmap, struct inode *inode,
> +			       struct dentry *dentry);
>   } ____cacheline_aligned;
>   
>   static inline int call_mmap(struct file *file, struct vm_area_struct *vma)
It would be nice if this patch added documentation for the new 
inode_operations to

Documentation/filesystems/vfs.rst.

The may_create_in_sticky() logic looks good to me.

Thanks for the patch.

Jeffrey Altman


Download attachment "smime.p7s" of type "application/pkcs7-signature" (4276 bytes)