linux-kernel - Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130307152504.GA29601@htj.dyndns.org>
Date:	Thu, 7 Mar 2013 07:25:04 -0800
From:	Tejun Heo <tj@...nel.org>
To:	Jeff Layton <jlayton@...hat.com>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Oleg Nesterov <oleg@...hat.com>,
	"Myklebust, Trond" <Trond.Myklebust@...app.com>,
	Mandeep Singh Baines <msb@...omium.org>,
	Ming Lei <ming.lei@...onical.com>,
	"J. Bruce Fields" <bfields@...ldses.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...hat.com>,
	Al Viro <viro@...iv.linux.org.uk>
Subject: Re: LOCKDEP: 3.9-rc1: mount.nfs/4272 still has locks held!

Hello, Jeff.

On Thu, Mar 07, 2013 at 06:41:40AM -0500, Jeff Layton wrote:
> Suppose I call unlink("somefile"); on an NFS mount. We take all of the
> VFS locks, go down into the NFS layer. That marshals up the UNLINK
> call, sends it off to the server, and waits for the reply. While we're
> waiting, a freeze event comes in and we start returning from the
> kernel with our new -EFREEZE return code that works sort of like
> -ERESTARTSYS. Meanwhile, the server is processing the UNLINK call and
> removes the file. A little while later we wake up the machine and it
> goes to try and pick up where it left off.

But you can't freeze regardless of the freezing mechanism in such
cases, right?  The current code which allows freezing while such
operations are in progress is broken as it can lead to freezer
deadlocks.  They should go away no matter how we implement freezer, so
the question is not whether we can move all the existing freezing
points to signal mechanism but that, after removing the deadlock-prone
ones, how many would be difficult to convert.  I'm fully speculating
but my suspicion is not too many if you remove (or update somehow) the
ones which are being done with some locks held.

> The catch here is that it's quite possible that when we need to quiesce
> that we've lost communications with the server. We don't want to hold
> up the freezer at that point so the wait for replies has to be bounded
> in time somehow. If that times out, we probably just have to return all
> calls with our new -EFREEZE return and hope for the best when the
> machine wakes back up.

Sure, a separate prep step may be helpful but assuming a user
nfs-mounting stuff on a laptop, I'm not sure how reliable that can be
made.  People move around with laptops, wifi can be iffy and the lid
can be shut at any moment.  I don't think it's possible for nfs to be
laptop friendly while staying completely correct.  Designing such a
network filesystem probably is possible with transactions and whatnot
but AFAIU nfs isn't designed that way.  If such use case is something
nfs wants to support, I think it just should make do with some
middleground - ie. just implement a mount switch which says "retry
operations across network / power problems" and explain the
implications.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/