linux-kernel - Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <C56C9BFD-0B20-4F2B-B282-84566ACBF41B@oracle.com>
Date:	Thu, 25 Apr 2013 10:51:42 -0400
From:	Chuck Lever <chuck.lever@...cle.com>
To:	bfields@...ldses.org
Cc:	"Myklebust, Trond" <Trond.Myklebust@...app.com>,
	David Wysochanski <dwysocha@...hat.com>,
	Dave Chiluk <chiluk@...onical.com>,
	"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY


On Apr 25, 2013, at 9:49 AM, bfields@...ldses.org wrote:

> On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote:
>> On Thu, 2013-04-25 at 09:29 -0400, bfields@...ldses.org wrote:
>> 
>>> My position is that we simply have no idea what order of magnitude even
>>> delay should be.  And that in such a situation exponential backoff such
>>> as implemented in the synchronous case seems the reasonable default as
>>> it guarantees at worst doubling the delay while still bounding the
>>> long-term average frequency of retries.
>> 
>> So we start with a 15 second delay, and then go to 60 seconds?
> 
> I agree that a server should normally be doing the wait on its own if
> the wait would be on the order of an rpc round trip.
> 
> So I'd be inclined to start with a delay that was an order of magnitude
> or two more than a round trip.
> 
> And I'd expect NFS isn't common on networks with 1-second latencies.
> 
> So the 1/10 second we're using in the synchronous case sounds closer to
> the right ballpark to me.

The RPC layer already keeps RPC round trip statistics, so the client doesn't have to guess with a "one size fits all" number.

I'm all for keeping client recovery time short.  But after following this argument, I think 10xRTT is crazy short.  Aggressive retransmits can lead to data corruption, and RTT on a fast server is going to be on the order of a millisecond.  And what about RDMA, where RTT is about 20usecs? 

A better answer might be to start at one second then exponentially back off to the minimum of 0.25x the lease time and 0.25x the RPC retransmit time out.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/