netdev - [RFC PATCH] sunrpc: do not allow process to freeze within RPC state machine

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160803165412.22407.47399.stgit@localhost.localdomain>
Date:	Wed, 03 Aug 2016 20:54:50 +0400
From:	Stanislav Kinsburskiy <skinsbursky@...tuozzo.com>
To:	bfields@...ldses.org, jlayton@...chiereds.net,
	trond.myklebust@...marydata.com, anna.schumaker@...app.com
Cc:	linux-nfs@...r.kernel.org, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org, gorcunov@...tuozzo.com,
	davem@...emloft.net, devel@...nvz.org
Subject: [RFC PATCH] sunrpc: do not allow process to freeze within RPC state
 machine

Otherwise freezer cgroup state might never become "FROZEN".

Here is a deadlock scheme for 2 processes in one freezer cgroup, which is
freezing:

CPU 0                                   CPU 1
--------                                --------
do_last
inode_lock(dir->d_inode)
vfs_create
nfs_create
...
__rpc_execute
rpc_wait_bit_killable
__refrigerator
                                        do_last
                                        inode_lock(dir->d_inode)

So, the problem is that one process takes directory inode mutex, executes
creation request and goes to refrigerator.
Another one waits till directory lock is released, remains "thawed" and thus
freezer cgroup state never becomes "FROZEN".

Notes:
1) Interesting, that this is not a pure deadlock: one can thaw cgroup and then
freeze it again.
2) The issue was introduced by commit d310310cbff18ec385c6ab4d58f33b100192a96a.
3) This patch is not aimed to fix the issue, but to show the problem root.
Look like this problem moght be applicable to other hunks from the commit,
mentioned above.

Signed-off-by: Stanislav Kinsburskiy <skinsbursky@...tuozzo.com>
---
 net/sunrpc/sched.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 9ae5885..ec7ccc1 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -253,7 +253,6 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);

 static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
 {
-	freezable_schedule_unsafe();
 	if (signal_pending_state(mode, current))
 		return -ERESTARTSYS;
 	return 0;