linux-kernel - Re: [PATCH] nfsd: Using guard() to simplify nfsd_cache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cecf4793-d737-4501-a306-0c5a74daaf30@oracle.com>
Date: Tue, 24 Jun 2025 11:07:06 -0400
From: Chuck Lever <chuck.lever@...cle.com>
To: NeilBrown <neil@...wn.name>, Su Hui <suhui@...china.com>
Cc: jlayton@...nel.org, okorniev@...hat.com, Dai.Ngo@...cle.com,
        tom@...pey.com, linux-nfs@...r.kernel.org,
        linux-kernel@...r.kernel.org, kernel-janitors@...r.kernel.org
Subject: Re: [PATCH] nfsd: Using guard() to simplify nfsd_cache_lookup()

On 6/23/25 8:19 PM, NeilBrown wrote:
> On Mon, 23 Jun 2025, Su Hui wrote:
>> Using guard() to replace *unlock* label. guard() makes lock/unlock code
>> more clear. Change the order of the code to let all lock code in the
>> same scope. No functional changes.
> 
> While I agree that this code could usefully be cleaned up and that you
> have made some improvements, I think the use of guard() is a nearly
> insignificant part of the change.  You could easily do exactly the same
> patch without using guard() but having and explicit spin_unlock() before
> the new return.  That doesn't mean you shouldn't use guard(), but it
> does mean that the comment explaining the change could be more usefully
> focused on the "Change the order ..." part, and maybe explain what that
> is important.
> 
> I actually think there is room for other changes which would make the
> code even better:
> - Change nfsd_prune_bucket_locked() to nfsd_prune_bucket().  Have it
>   take the lock when needed, then drop it, then call
>   nfsd_cacherep_dispose() - and return the count.
> - change nfsd_cache_insert to also skip updating the chain length stats
>   when it finds a match - in that case the "entries" isn't a chain
>   length. So just  lru_put_end(), return.  Have it return NULL if
>   no match was found
> - after the found_entry label don't use nfsd_reply_cache_free_locked(),
>   just free rp.  It has never been included in any rbtree or list, so it
>   doesn't need to be removed.
> - I'd be tempted to have nfsd_cache_insert() take the spinlock itself
>   and call it under rcu_read_lock() - and use RCU to free the cached
>   items. 
> - put the chunk of code after the found_entry label into a separate
>   function and instead just return RC_REPLY (and maybe rename that
>   RC_CACHED).  Then in nfsd_dispatch(), if RC_CACHED was returned, call
>   that function that has the found_entry code.
> 
> I think that would make the code a lot easier to follow.  Would you like
> to have a go at that - I suspect it would be several patches - or shall
> I do it?

I'm going to counsel some caution.

nfsd_cache_lookup() is a hot path. Source code readability, though
important, is not the priority in this area.

I'm happy to consider changes to this function, but the bottom line is
patches need to be accompanied by data that show that proposed code
modifications do not negatively impact performance. (Plus the usual
test results that show no impact to correctness).

That data might include:
- flame graphs that show a decrease in CPU utilization
- objdump output showing a smaller instruction cache footprint
  and/or short instruction path lengths
- perf results showing better memory bandwidth
- perf results showing better branch prediction
- lockstat results showing less contention and/or shorter hold
  time on locks held in this path

Macro benchmark results are also welcome: equal or lower latency for
NFSv3 operations, and equal or higher I/O throughput.

The benefit for the scoped_guard construct is that it might make it more
difficult to add code that returns from this function with a lock held.
However, so far that hasn't been an issue.

Thus I'm not sure there's a lot of strong technical justification for
modification of this code path. But, you might know of one -- if so,
please make sure that appears in the patch descriptions.

What is more interesting to me is trying out more sophisticated abstract
data types for the DRC hashtable. rhashtable is one alternative; so is
Maple tree, which is supposed to handle lookups with more memory
bandwidth efficiency than walking a linked list.

Anyway, have fun, get creative, and let's see what comes up.


>> Signed-off-by: Su Hui <suhui@...china.com>
>> ---
>>  fs/nfsd/nfscache.c | 99 ++++++++++++++++++++++------------------------
>>  1 file changed, 48 insertions(+), 51 deletions(-)
>>
>> diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
>> index ba9d326b3de6..2d92adf3e6b0 100644
>> --- a/fs/nfsd/nfscache.c
>> +++ b/fs/nfsd/nfscache.c
>> @@ -489,7 +489,7 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp, unsigned int start,
>>  
>>  	if (type == RC_NOCACHE) {
>>  		nfsd_stats_rc_nocache_inc(nn);
>> -		goto out;
>> +		return rtn;
>>  	}
>>  
>>  	csum = nfsd_cache_csum(&rqstp->rq_arg, start, len);
>> @@ -500,64 +500,61 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp, unsigned int start,
>>  	 */
>>  	rp = nfsd_cacherep_alloc(rqstp, csum, nn);
>>  	if (!rp)
>> -		goto out;
>> +		return rtn;
>>  
>>  	b = nfsd_cache_bucket_find(rqstp->rq_xid, nn);
>> -	spin_lock(&b->cache_lock);
>> -	found = nfsd_cache_insert(b, rp, nn);
>> -	if (found != rp)
>> -		goto found_entry;
>> -	*cacherep = rp;
>> -	rp->c_state = RC_INPROG;
>> -	nfsd_prune_bucket_locked(nn, b, 3, &dispose);
>> -	spin_unlock(&b->cache_lock);
>> +	scoped_guard(spinlock, &b->cache_lock) {
>> +		found = nfsd_cache_insert(b, rp, nn);
>> +		if (found == rp) {
>> +			*cacherep = rp;
>> +			rp->c_state = RC_INPROG;
>> +			nfsd_prune_bucket_locked(nn, b, 3, &dispose);
>> +			goto out;
>> +		}
>> +		/* We found a matching entry which is either in progress or done. */
>> +		nfsd_reply_cache_free_locked(NULL, rp, nn);
>> +		nfsd_stats_rc_hits_inc(nn);
>> +		rtn = RC_DROPIT;
>> +		rp = found;
>> +
>> +		/* Request being processed */
>> +		if (rp->c_state == RC_INPROG)
>> +			goto out_trace;
>> +
>> +		/* From the hall of fame of impractical attacks:
>> +		 * Is this a user who tries to snoop on the cache?
>> +		 */
>> +		rtn = RC_DOIT;
>> +		if (!test_bit(RQ_SECURE, &rqstp->rq_flags) && rp->c_secure)
>> +			goto out_trace;
>>  
>> +		/* Compose RPC reply header */
>> +		switch (rp->c_type) {
>> +		case RC_NOCACHE:
>> +			break;
>> +		case RC_REPLSTAT:
>> +			xdr_stream_encode_be32(&rqstp->rq_res_stream, rp->c_replstat);
>> +			rtn = RC_REPLY;
>> +			break;
>> +		case RC_REPLBUFF:
>> +			if (!nfsd_cache_append(rqstp, &rp->c_replvec))
>> +				return rtn; /* should not happen */
>> +			rtn = RC_REPLY;
>> +			break;
>> +		default:
>> +			WARN_ONCE(1, "nfsd: bad repcache type %d\n", rp->c_type);
>> +		}
>> +
>> +out_trace:
>> +		trace_nfsd_drc_found(nn, rqstp, rtn);
>> +		return rtn;
>> +	}
>> +out:
>>  	nfsd_cacherep_dispose(&dispose);
>>  
>>  	nfsd_stats_rc_misses_inc(nn);
>>  	atomic_inc(&nn->num_drc_entries);
>>  	nfsd_stats_drc_mem_usage_add(nn, sizeof(*rp));
>> -	goto out;
>> -
>> -found_entry:
>> -	/* We found a matching entry which is either in progress or done. */
>> -	nfsd_reply_cache_free_locked(NULL, rp, nn);
>> -	nfsd_stats_rc_hits_inc(nn);
>> -	rtn = RC_DROPIT;
>> -	rp = found;
>> -
>> -	/* Request being processed */
>> -	if (rp->c_state == RC_INPROG)
>> -		goto out_trace;
>> -
>> -	/* From the hall of fame of impractical attacks:
>> -	 * Is this a user who tries to snoop on the cache? */
>> -	rtn = RC_DOIT;
>> -	if (!test_bit(RQ_SECURE, &rqstp->rq_flags) && rp->c_secure)
>> -		goto out_trace;
>> -
>> -	/* Compose RPC reply header */
>> -	switch (rp->c_type) {
>> -	case RC_NOCACHE:
>> -		break;
>> -	case RC_REPLSTAT:
>> -		xdr_stream_encode_be32(&rqstp->rq_res_stream, rp->c_replstat);
>> -		rtn = RC_REPLY;
>> -		break;
>> -	case RC_REPLBUFF:
>> -		if (!nfsd_cache_append(rqstp, &rp->c_replvec))
>> -			goto out_unlock; /* should not happen */
>> -		rtn = RC_REPLY;
>> -		break;
>> -	default:
>> -		WARN_ONCE(1, "nfsd: bad repcache type %d\n", rp->c_type);
>> -	}
>> -
>> -out_trace:
>> -	trace_nfsd_drc_found(nn, rqstp, rtn);
>> -out_unlock:
>> -	spin_unlock(&b->cache_lock);
>> -out:
>>  	return rtn;
>>  }
>>  
>> -- 
>> 2.30.2
>>
>>
> 


-- 
Chuck Lever