lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 08 Jul 2016 10:27:38 -0400
From:	Jeff Layton <jlayton@...hat.com>
To:	Michal Hocko <mhocko@...nel.org>
Cc:	Seth Forshee <seth.forshee@...onical.com>,
	Trond Myklebust <trond.myklebust@...marydata.com>,
	Anna Schumaker <anna.schumaker@...app.com>,
	linux-fsdevel@...r.kernel.org, linux-nfs@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	Tycho Andersen <tycho.andersen@...onical.com>
Subject: Re: Hang due to nfs letting tasks freeze with locked inodes

On Fri, 2016-07-08 at 16:23 +0200, Michal Hocko wrote:
> On Fri 08-07-16 08:51:54, Jeff Layton wrote:
> > 
> > On Fri, 2016-07-08 at 14:22 +0200, Michal Hocko wrote:
> [...]
> > 
> > > 
> > > Apart from alternative Dave was mentioning in other email, what
> > > is the
> > > point to use freezable wait from this path in the first place?
> > > 
> > > nfs4_handle_exception does nfs4_wait_clnt_recover from the same
> > > path and
> > > that does wait_on_bit_action with TASK_KILLABLE so we are waiting
> > > in two
> > > different modes from the same path AFAICS. There do not seem to
> > > be other
> > > callers of nfs4_delay outside of nfs4_handle_exception. Sounds
> > > like
> > > something is not quite right here to me. If the nfs4_delay did
> > > regular
> > > wait then the freezing would fail as well but at least it would
> > > be clear
> > > who is the culrprit rather than having an indirect dependency.
> > The codepaths involved there are a lot more complex than that
> > unfortunately.
> > 
> > nfs4_delay is the function that we use to handle the case where the
> > server returns NFS4ERR_DELAY. Basically telling us that it's too
> > busy
> > right now or has some transient error and the client should retry
> > after
> > a small, sliding delay.
> > 
> > That codepath could probably be made more freezer-safe. The typical
> > case however, is that we've sent a call and just haven't gotten a
> > reply. That's the trickier one to handle.
> Why using a regular non-freezable wait would be a problem?

It has been a while since I looked at that code, but IIRC, that could
block the freezer for up to 15s, which is a significant portion of the
20s that you get before the freezer gives up.

-- 
Jeff Layton <jlayton@...hat.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ