linux-kernel - Re: processes hung after sys_renameat, and 'missing' processes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 07 Jun 2012 19:08:04 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Al Viro <viro@...IV.linux.org.uk>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Dave Jones <davej@...hat.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	Miklos Szeredi <mszeredi@...e.cz>, Jan Kara <jack@...e.cz>,
	Peter Zijlstra <peterz@...radead.org>,
	linux-fsdevel@...r.kernel.org,
	"J. Bruce Fields" <bfields@...hat.com>,
	Sage Weil <sage@...dream.net>
Subject: Re: processes hung after sys_renameat, and 'missing' processes

Al Viro <viro@...IV.linux.org.uk> writes:

> On Thu, Jun 07, 2012 at 04:57:13PM -0700, Linus Torvalds wrote:
>
>> Any per-filesystem mutex should do, so if sysfs always holds the
>> sysfs_mutex - and never allows user-initiated renames - it should be
>> safe.
>
> Frankly, I would very much prefer to have the same locking rules wherever
> possible.  The locking system is already overcomplicated and making its
> analysis fs-dependent as well... <shudder>  Sure, we can do that, and that
> might even work, until we find out that some piece of code that started
> as a helper to some function never called on sysfs dentries had been
> reused on the path that *is* reachable on sysfs.  At which point we are
> suddenly in trouble.

Staring at it I see what I was missing.   The practical issue is
lock_rename(), and any parts of the vfs that depend on lock_rename().

d_move and the dcache are made safe just by rename_lock.  However other
parts of the vfs that care about using d_ancestor are not.  I can't
immediately see a case that really cares but I can't rule such a case
out easily either.

> I wouldn't be bothered so much if the overall picture had been simpler;
> unfortunately, it isn't.
>
> Eric, how about this - if nothing else, that makes code in there simpler
> and less dependent on details of VFS guts:
>
> diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
> index e6bb9b2..5579826 100644
> --- a/fs/sysfs/dir.c
> +++ b/fs/sysfs/dir.c
> @@ -363,7 +363,7 @@ static void sysfs_dentry_iput(struct dentry *dentry, struct inode *inode)
>  	iput(inode);
>  }
>  
> -static const struct dentry_operations sysfs_dentry_ops = {
> +const struct dentry_operations sysfs_dentry_ops = {
>  	.d_revalidate	= sysfs_dentry_revalidate,
>  	.d_delete	= sysfs_dentry_delete,
>  	.d_iput		= sysfs_dentry_iput,
> @@ -795,16 +795,8 @@ static struct dentry * sysfs_lookup(struct inode *dir, struct dentry *dentry,
>  	}
>  
>  	/* instantiate and hash dentry */
> -	ret = d_find_alias(inode);
> -	if (!ret) {
> -		d_set_d_op(dentry, &sysfs_dentry_ops);
> -		dentry->d_fsdata = sysfs_get(sd);
> -		d_add(dentry, inode);
> -	} else {
> -		d_move(ret, dentry);
> -		iput(inode);
> -	}
> -
> +	dentry->d_fsdata = sysfs_get(sd);
> +	ret = d_materialise_unique(dentry, inode);

I have a small problem with d_materialise_unique.  For renames of files
d_materialise_unique calls __d_instantiate_unique. __d_instantiate_unique
does not detect renames of files.  Which at least misses the rename
of sysfs symlinks.

Could we put together a d_materialise_unalias for inodes that we know
they always only have one dentry?  That I would be happy to use.

I think the reason I would up with my own version was that the dcache
did no provide what I needed and it was just a few lines to code my own.

> diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
> index 52c3bdb..c15a7a3 100644
> --- a/fs/sysfs/mount.c
> +++ b/fs/sysfs/mount.c
> @@ -68,6 +68,7 @@ static int sysfs_fill_super(struct super_block *sb, void *data, int silent)
>  	}
>  	root->d_fsdata = &sysfs_root;
>  	sb->s_root = root;
> +	sb->s_d_op = &sysfs_dentry_ops;

I have no problem with this bit.  To answer your earlier question s_d_op
predates this code which is why sysfs was not using it.
>  	return 0;
>  }
>  
> diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
> index 661a963..d73c093 100644
> --- a/fs/sysfs/sysfs.h
> +++ b/fs/sysfs/sysfs.h
> @@ -157,6 +157,7 @@ extern struct kmem_cache *sysfs_dir_cachep;
>   */
>  extern struct mutex sysfs_mutex;
>  extern spinlock_t sysfs_assoc_lock;
> +extern const struct dentry_operations sysfs_dentry_ops;
>  
>  extern const struct file_operations sysfs_dir_operations;
>  extern const struct inode_operations sysfs_dir_inode_operations;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/