lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130425184626.GC5049@fieldses.org>
Date:	Thu, 25 Apr 2013 14:46:26 -0400
From:	"bfields@...ldses.org" <bfields@...ldses.org>
To:	Chuck Lever <chuck.lever@...cle.com>
Cc:	"Myklebust, Trond" <Trond.Myklebust@...app.com>,
	David Wysochanski <dwysocha@...hat.com>,
	Dave Chiluk <chiluk@...onical.com>,
	"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] NFSv4: Use exponential backoff delay for NFS4_ERRDELAY

On Thu, Apr 25, 2013 at 02:40:11PM -0400, Chuck Lever wrote:
> 
> On Apr 25, 2013, at 2:19 PM, "bfields@...ldses.org" <bfields@...ldses.org> wrote:
> 
> > On Thu, Apr 25, 2013 at 02:10:36PM +0000, Myklebust, Trond wrote:
> >> On Thu, 2013-04-25 at 09:49 -0400, bfields@...ldses.org wrote:
> >>> On Thu, Apr 25, 2013 at 01:30:58PM +0000, Myklebust, Trond wrote:
> >>>> On Thu, 2013-04-25 at 09:29 -0400, bfields@...ldses.org wrote:
> >>>> 
> >>>>> My position is that we simply have no idea what order of magnitude even
> >>>>> delay should be.  And that in such a situation exponential backoff such
> >>>>> as implemented in the synchronous case seems the reasonable default as
> >>>>> it guarantees at worst doubling the delay while still bounding the
> >>>>> long-term average frequency of retries.
> >>>> 
> >>>> So we start with a 15 second delay, and then go to 60 seconds?
> >>> 
> >>> I agree that a server should normally be doing the wait on its own if
> >>> the wait would be on the order of an rpc round trip.
> >>> 
> >>> So I'd be inclined to start with a delay that was an order of magnitude
> >>> or two more than a round trip.
> >>> 
> >>> And I'd expect NFS isn't common on networks with 1-second latencies.
> >>> 
> >>> So the 1/10 second we're using in the synchronous case sounds closer to
> >>> the right ballpark to me.
> >> 
> >> OK, then. Now all I need is actual motivation for changing the existing
> >> code other than handwaving arguments about "polling is better than flat
> >> waits".
> >> What actual use cases are impacting us now, other than the AIX design
> >> decision to force CLOSE to retry at least once before succeeding?
> > 
> > Nah, I've got nothing, and I agree that the AIX problem is there bug.
> > 
> > Just for fun I looked at re-checked the Linux server cases.  As far as I
> > can tell they are:
> > 
> > 	- delegations: returned immediately on detection of any
> > 	  conflict.  The current behavior in the sync case looks
> > 	  reasonable to me.
> > 	- allocation failures: not really sure it's the best error, but
> > 	  it seems to be all the protocol offers.  We probably don't
> > 	  care much what the client does in this case.
> > 	- some rare cases that would probably indicate bugs (e.g.,
> > 	  attempting to destroy a client while other rpc's from that
> > 	  client are running.)  Again we don't care what the client does
> > 	  here.
> > 	- the 4.1 slot-inuse case.
> > 
> > We also by default map four errors (ETIMEDOUT, EAGAIN, EWOULDBLOCK,
> > ENOMEM) to delay.  I thought I remembered one of those being used by
> > some HFS system, but can't actually find an example now.  A quick grep
> > doesn't show anything interesting.
> 
> It's worth mentioning that servers that have frozen state (say, in preparation for Transparent State Migration) may use NFS4ERR_DELAY to prevent clients from modifying open or lock state until that state has transitioned to a destination server.

I thought they'd decided they'll be forced to find a different way to do
that?

(The issue being that it only works if you're using 4.1, and if the
session state itself isn't part of the state to be transferred.
Otherwise you're forced to modify the state anyway since NFS4ERR_DELAY
is seqid-modifying.)

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ