linux-kernel - Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1587788B-C7F6-4D19-BDAD-29846231CD6C@linuxhacker.ru>
Date:	Tue, 5 Jul 2016 16:21:42 -0400
From:	Oleg Drokin <green@...uxhacker.ru>
To:	Al Viro <viro@...IV.linux.org.uk>
Cc:	Mailing List <linux-kernel@...r.kernel.org>,
	"<linux-fsdevel@...r.kernel.org>" <linux-fsdevel@...r.kernel.org>
Subject: Re: More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.


On Jul 5, 2016, at 4:08 PM, Al Viro wrote:

> On Tue, Jul 05, 2016 at 03:12:44PM -0400, Oleg Drokin wrote:
>> 
>> When d_in_lookup was introduced, it was described as:
>>    New primitives: d_in_lookup() (a predicate checking if dentry is in
>>    the in-lookup state) and d_lookup_done() (tells the system that
>>    we are done with lookup and if it's still marked as in-lookup, it
>>    should cease to be such).
>> 
>> I don't see where it mentions anything about exclusive vs parallel lookup
>> that probably led to some confusion.
> 
> In the same commit:
> 
> #define DCACHE_PAR_LOOKUP               0x10000000 /* being looked up (with parent locked shared) */

Well, I did not really check the commit, just the log message,
since there's no other documentation about it apparently ;)

>> So with Lustre the dentry can be in three states, really:
>> 
>> 1. hashed dentry that's all A-ok to reuse.
>> 2. hashed dentry that's NOT valid (dlm lock went away) - this is distinguished in d_compare by looking at a bit in the fs_data
>> 3. unhashed dentry ( I guess could be both valid and invalid lustre-wise).
>> 
>> So the logic in ll_lookup_it_finish() (part of regular lookup) is this:
>> 
>> If the dentry we have is not hashed - this is a new lookup, so we need to
>> call into ll_splice_alias() to see if there's a better dentry we need to
>> reuse that was already rejected by VFS before since we did not have necessary locks,
>> but we do have them now.
>> The comment at the top of ll_dcompare() explains why we don't just unhash the
>> dentry on lock-loss - that apparently leads to a loop around real_lookup for
>> real-contended dentries.
>> This is also why we cannot use d_splice_alias here - such cases are possible
>> for regular files and directories.
>> 
>> Anyway, I guess additional point of confusion here is then why does
>> ll_lookup_it_finish() need to check for hashedness of the dentry since it's in
>> lookup, so we should be unhashed here.
>> I checked the commit history and this check was added along with atomic_open
>> support, so I imagine we can just move it up into ll_atomic_open and then your
>> change starts to make sense along with a few other things.
> 
> So basically this
>        } else if (!it_disposition(it, DISP_LOOKUP_NEG)  &&
>                   !it_disposition(it, DISP_OPEN_CREATE)) {
>                /* With DISP_OPEN_CREATE dentry will be
>                 * instantiated in ll_create_it.
>                 */
>                LASSERT(!d_inode(*de));
>                d_instantiate(*de, inode);
>        }
> is something we should do only for negative hashed fed to it by
> ->atomic_open(), and that - only if we have no O_CREAT in flags?

Yes, and in fact we can totally avoid it I think.

> Then, since 3/3 eliminates that case completely, we could just rip that
> else-if, along with the check for d_unhashed itself, making the call of
> ll_splice_alias() unconditional there.  Or am I misreading what you said?
> Do you see any problems with what's in #for-linus now (head at 11f0083)?

Yes, we can make it unconditional
I think we can simplify it even more and since unlike NFS our negative dentries
are a lot less speculative, we can just return ENOENT if this is not a create
request and unhash otherwise. Still needs to go through the whole test suite
to make sure it does not break anything unexpected.

At least this is the patch I am playing with right now:
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -391,6 +391,7 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request,
 	struct inode *inode = NULL;
 	__u64 bits = 0;
 	int rc = 0;
+	struct dentry *alias;
 
 	/* NB 1 request reference will be taken away by ll_intent_lock()
 	 * when I return
@@ -415,26 +416,12 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request,
 		 */
 	}
 
-	/* Only hash *de if it is unhashed (new dentry).
-	 * Atoimc_open may passing hashed dentries for open.
-	 */
-	if (d_unhashed(*de)) {
-		struct dentry *alias;
-
-		alias = ll_splice_alias(inode, *de);
-		if (IS_ERR(alias)) {
-			rc = PTR_ERR(alias);
-			goto out;
-		}
-		*de = alias;
-	} else if (!it_disposition(it, DISP_LOOKUP_NEG)  &&
-		   !it_disposition(it, DISP_OPEN_CREATE)) {
-		/* With DISP_OPEN_CREATE dentry will be
-		 * instantiated in ll_create_it.
-		 */
-		LASSERT(!d_inode(*de));
-		d_instantiate(*de, inode);
+	alias = ll_splice_alias(inode, *de);
+	if (IS_ERR(alias)) {
+		rc = PTR_ERR(alias);
+		goto out;
 	}
+	*de = alias;
 
 	if (!it_disposition(it, DISP_LOOKUP_NEG)) {
 		/* we have lookup look - unhide dentry */
@@ -590,6 +577,24 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry,
 	       dentry, PFID(ll_inode2fid(dir)), dir, file, open_flags, mode,
 	       *opened);
 
+	/* Only negative dentries enter here */
+	LASSERT(!d_inode(dentry));
+
+	if (!(open_flags & O_CREAT) && !d_in_lookup(dentry)) {
+		/* A valid negative dentry that just passed revalidation,
+		 * there's little point to try and open it server-side,
+		 * even though there's a minuscle chance it might succeed.
+		 * Either way it's a valid race to just return -ENOENT here.
+		 */
+		if (!(open_flags & O_CREAT))
+			return -ENOENT;
+
+		/* Otherwise we just unhash it to be rehashed afresh via
+		 * lookup if necessary
+		 */
+		d_drop(dentry);
+	}
+
 	it = kzalloc(sizeof(*it), GFP_NOFS);
 	if (!it)
 		return -ENOMEM;