linux-kernel - Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <FD7AC03F-66C7-4506-B30D-86A705B610D5@linuxhacker.ru>
Date:	Mon, 4 Jul 2016 22:32:21 -0400
From:	Oleg Drokin <green@...uxhacker.ru>
To:	Al Viro <viro@...IV.linux.org.uk>
Cc:	Mailing List <linux-kernel@...r.kernel.org>,
	"<linux-fsdevel@...r.kernel.org>" <linux-fsdevel@...r.kernel.org>
Subject: Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.

ah,
and of course the testcase that I forgot to attach ;)


Download attachment "lstest.sh" of type "application/octet-stream" (1655 bytes)



On Jul 4, 2016, at 10:28 PM, Oleg Drokin wrote:

> 
> On Jul 3, 2016, at 2:29 AM, Al Viro wrote:
> 
>> On Sat, Jun 25, 2016 at 12:38:40PM -0400, Oleg Drokin wrote:
>> 
>>> Sorry to nag you about this, but did any of those pan out?
>>> 
>>> d_alloc_parallel() sounds like a bit too heavy there, esp. considering we came in with
>>> a dentry already (though a potentially shared one, I understand).
>>> Would not it be better to try and establish some dentry locking rule for calling into
>>> d_splice_alias() instead? At least then the callers can make sure the dentry does
>>> not change under them?
>>> Though I guess if there's dentry locking like that, we might as well do all the
>>> checking in d_splice_alias(), but that means the unhashed dentries would no
>>> longer be disallowed which is a change of semantic from now.--
>> 
>> FWIW, the only interesting case here is this:
>> 	* no O_CREAT in flags (otherwise the parent is held exclusive).
>> 	* dentry is found in hash
>> 	* dentry is negative
>> 	* dentry has passed ->d_revalidate() (i.e. in case of
>> NFS it had nfs_neg_need_reval() return false).
>> 
>> Only two instances are non-trivial in that respect - NFS and Lustre.
>> Everything else will simply fail open() with ENOENT in that case.
>> 
>> And at least for NFS we could bloody well do d_drop + d_alloc_parallel +
>> finish_no_open and bugger off in case it's not in_lookup, otherwise do
>> pretty much what we do in case we'd got in_lookup from the very beginning.
>> Some adjustments are needed for that case (basically, we need to make
>> sure we hit d_lookup_done() matching that d_alloc_parallel() and deal
>> with refcounting correctly).
>> 
>> Tentative NFS patch follows; I don't understand Lustre well enough, but it
>> looks like a plausible strategy there as well.
> 
> This patch seems to have brought the other crash back in (or something similar),
> the one with a negative dentry being hashed when it's already hashed.
> it's not as easy to hit as before, but a lot easier than the race we are hitting here.
> 
> Also in all cases it is ls that is crashing now, which seems to highlight it is
> taking some of this new paths added.
> I have half a dosen crashdumps if you want me to look something up there.
> This is on top of 4.7.0-rc6 a99cde438de0c4c0cecc1d1af1a55a75b10bfdef
> with just your patch on top.
> 
> I can reproduce it with both the complex workload, or with a simplified one (attached),
> takes anywhere from 5 to 30 minutes to hit.
> 
>> 
>> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
>> index d8015a03..5474e39 100644
>> --- a/fs/nfs/dir.c
>> +++ b/fs/nfs/dir.c
>> @@ -1485,11 +1485,13 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
>> 		    struct file *file, unsigned open_flags,
>> 		    umode_t mode, int *opened)
>> {
>> +	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
>> 	struct nfs_open_context *ctx;
>> 	struct dentry *res;
>> 	struct iattr attr = { .ia_valid = ATTR_OPEN };
>> 	struct inode *inode;
>> 	unsigned int lookup_flags = 0;
>> +	bool switched = false;
>> 	int err;
>> 
>> 	/* Expect a negative dentry */
>> @@ -1528,6 +1530,17 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
>> 		attr.ia_size = 0;
>> 	}
>> 
>> +	if (!(open_flags & O_CREAT) && !d_unhashed(dentry)) {
>> +		d_drop(dentry);
>> +		switched = true;
>> +		dentry = d_alloc_parallel(dentry->d_parent,
>> +					  &dentry->d_name, &wq);
> 
> Hm, d_alloc_parallel can return some preexisting dentry it was able to lookup in
> rcu attached to the same parent with the same name it seems?
> What's to stop a parallel thread to rehash the dentry after we dropped it here,
> and so d_alloc_parallel will happily find it?
> I am running a test to verify this theory now.
> 
>> +		if (IS_ERR(dentry))
>> +			return PTR_ERR(dentry);
>> +		if (unlikely(!d_in_lookup(dentry)))
>> +			return finish_no_open(file, dentry);
>> +	}
>> +
>> 	ctx = create_nfs_open_context(dentry, open_flags);
>> 	err = PTR_ERR(ctx);
>> 	if (IS_ERR(ctx))
>> @@ -1563,14 +1576,23 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
>> 	trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
>> 	put_nfs_open_context(ctx);
>> out:
>> +	if (unlikely(switched)) {
>> +		d_lookup_done(dentry);
>> +		dput(dentry);
>> +	}
>> 	return err;
>> 
>> no_open:
>> 	res = nfs_lookup(dir, dentry, lookup_flags);
>> -	err = PTR_ERR(res);
>> +	if (switched) {
>> +		d_lookup_done(dentry);
>> +		if (!res)
>> +			res = dentry;
>> +		else
>> +			dput(dentry);
>> +	}
>> 	if (IS_ERR(res))
>> -		goto out;
>> -
>> +		return PTR_ERR(res);
>> 	return finish_no_open(file, res);
>> }
>> EXPORT_SYMBOL_GPL(nfs_atomic_open);
>