linux-kernel - Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 6 Mar 2013 18:37:16 +0000
From:	"Myklebust, Trond" <Trond.Myklebust@...app.com>
To:	Jeff Layton <jlayton@...hat.com>
CC:	Mandeep Singh Baines <msb@...omium.org>,
	Ingo Molnar <mingo@...nel.org>, Tejun Heo <tj@...nel.org>,
	"J. Bruce Fields" <bfields@...ldses.org>,
	"Oleg Nesterov" <oleg@...hat.com>,
	Ming Lei <ming.lei@...onical.com>,
	"Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>,
	"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!

On Wed, 2013-03-06 at 13:23 -0500, Jeff Layton wrote:
> On Wed, 6 Mar 2013 07:59:01 -0800
> Mandeep Singh Baines <msb@...omium.org> wrote:
> > In general, holding a lock and freezing can cause a deadlock if:
> > 
> > 1) you froze via the cgroup_freezer subsystem and a task in another
> > cgroup tried to acquire the same lock
> > 2) the lock was needed later is suspend/hibernate. For example, if the
> > lock was needed in dpm_suspend by one of the device callbacks. For
> > hibernate, you also need to worry about any locks that need to be
> > acquired in order to write to the swap device.
> > 3) another freezing task blocked on this lock and held other locks
> > needed later in suspend. If that task were skipped by the freezer, you
> > would deadlock
> > 
> > You will block/prevent suspend if:
> > 
> > 4) another freezing task blocked on this lock and was unable to freeze
> > 
> > I think 1) and 4) can happen for the NFS/RPC case. Case 1) requires
> > cgroup freezer. Case 4) while not causing a deadlock could prevent
> > your laptop/phone from sleeping and end up burning all your battery.
> > If suspend is initiated via lid close you won't even know about the
> > failure.
> > 
> 
> We're aware of #4. That was the intent of adding try_to_freeze() into
> this codepath in the first place. It's not a great solution for obvious
> reasons, but we don't have another at the moment.
> 
> For #1 I'm not sure what to do. I'm that familiar with cgroups or how
> the freezer works.
> 
> The bottom line is that we have a choice -- we can either rip out this
> new lockdep warning, or rip out the code that causes it to fire.
> 
> If we rip out the warning we may miss some legit cases where we might
> possibly have hit a deadlock.
> 
> If we rip out the code that causes it to fire, then we exacerbate the
> #4 problem above. That will effectively make it so that you can't
> suspend the host whenever NFS is doing anything moderately active.

#4 is probably the only case where we might want to freeze.

Unless we're in a situation where the network is going down, we can
usually always make progress with completing the RPC call and finishing
the system call. So in the case of cgroup_freezer, we only care if the
freezing cgroup also owns the network device (or net namespace) that NFS
is using to talk to the server.

As I said, the alternative is to notify NFS that the device is going
down, and to give it a chance to quiesce itself before that happens.
This is also the only way to ensure that processes which own locks on
the server (e.g. posix file locks) have a chance to release them before
being suspended.


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@...app.com
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/