linux-kernel - Re: [PATCH 3/4] NFS: avoid deadlocks with loop-back mounted NFS filesystems.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140917093757.472c8cf2@notabene.brown>
Date:	Wed, 17 Sep 2014 09:37:57 +1000
From:	NeilBrown <neilb@...e.de>
To:	Anna Schumaker <Anna.Schumaker@...app.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Trond Myklebust <trond.myklebust@...marydata.com>,
	Ingo Molnar <mingo@...hat.com>,
	<linux-fsdevel@...r.kernel.org>, <linux-mm@...ck.org>,
	<linux-nfs@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	Jeff Layton <jeff.layton@...marydata.com>
Subject: Re: [PATCH 3/4] NFS: avoid deadlocks with loop-back mounted NFS
 filesystems.

On Tue, 16 Sep 2014 08:39:39 -0400 Anna Schumaker <Anna.Schumaker@...app.com>
wrote:

> On 09/16/2014 01:31 AM, NeilBrown wrote:
> > Support for loop-back mounted NFS filesystems is useful when NFS is
> > used to access shared storage in a high-availability cluster.
> >
> > If the node running the NFS server fails, some other node can mount the
> > filesystem and start providing NFS service.  If that node already had
> > the filesystem NFS mounted, it will now have it loop-back mounted.
> >
> > nfsd can suffer a deadlock when allocating memory and entering direct
> > reclaim.
> > While direct reclaim does not write to the NFS filesystem it can send
> > and wait for a COMMIT through nfs_release_page().
> 
> Is there anything that can be done on the nfsd side to prevent the deadlocks?
> 

I went down that path first and it didn't work out.
Setting PF_FSTRANS in nfsd (when the request comes from localhost) and then
arranging the __GFP_FS is cleared when that flag is set overcomes a number of
possible deadlock sources, but not all.

There are a number of situations where nfsd is waiting on some other thread
(which doesn't have PF_FSTRANS set) and that thread tries to reclaim memory
and hits nfs_release_page().
It was a long and complex patch set, and nobody liked it.
And the common thread was always that it always blocked in nfs_release_page().
So it seemed to make sense to just remove that blockage.

Thanks,
NeilBrown

Download attachment "signature.asc" of type "application/pgp-signature" (829 bytes)