linux-kernel - RCU nocb list not reclaiming causing OOM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <BYAPR02MB4501B990E353A2616594821294510@BYAPR02MB4501.namprd02.prod.outlook.com>
Date:   Fri, 20 Jul 2018 23:05:52 +0000
From:   David Chen <david.chen@...anix.com>
To:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
CC:     David Chen <david.chen@...anix.com>
Subject: RCU nocb list not reclaiming causing OOM

Hi Paul,

We hit an RCU issue on 4.9.37 kernel. One of the nocb_follower list grows too
large, and not getting reclaimed, causing the system to OOM.

Printing the culprit rcu_sched_data:

  nocb_q_count = {
    counter = 32369635
  },
  nocb_follower_head = 0xffff88ae901c0a00,
  nocb_follower_tail = 0xffff88af1538b8d8,
  nocb_kthread = 0xffff88b06d290000,

As you can see here, the nocb_follower_head is not empty, so in theory, the
nocb_kthread shouldn't go to sleep. However, if dump the stack of the kthread:

crash> bt 0xffff88b06d290000
PID: 21     TASK: ffff88b06d290000  CPU: 3   COMMAND: "rcuos/1"
 #0 [ffffafc9020b7dc0] __schedule at ffffffff8d8789dc
 #1 [ffffafc9020b7e38] schedule at ffffffff8d878e76
 #2 [ffffafc9020b7e50] rcu_nocb_kthread at ffffffff8d112337
 #3 [ffffafc9020b7ec8] kthread at ffffffff8d0c6ce7
 #4 [ffffafc9020b7f50] ret_from_fork at ffffffff8d87d755

And if we dis the address at ffffffff8d112337:

/usr/src/debug/kernel-4.9.37/linux-4.9.37-29.nutanix.07142017.el7.centos.x86_64/kernel/rcu/tree_plugin.h: 2106
0xffffffff8d11232d <rcu_nocb_kthread+381>:      test   %rax,%rax
0xffffffff8d112330 <rcu_nocb_kthread+384>:      jne    0xffffffff8d112355 <rcu_nocb_kthread+421>
0xffffffff8d112332 <rcu_nocb_kthread+386>:      callq  0xffffffff8d878e40 <schedule>
0xffffffff8d112337 <rcu_nocb_kthread+391>:      lea    -0x40(%rbp),%rsi

So the kthread is blocked at swait_event_interruptible in the nocb_follower_wait.
This contradict with the fact that nocb_follower_head was not empty. So I
wonder if this is caused by the lack of memory barrier in the place shown below.
If the head is set to NULL after doing xchg, it will overwrite the head set
by leader. This caused the kthread to sleep the next iteration, and the leader
won't wake him up as the tail doesn't point to head.

Please tell me what do you think.

Thanks,
David

diff -ru linux-4.9.37.orig/kernel/rcu/tree_plugin.h linux-4.9.37/kernel/rcu/tree_plugin.h
--- linux-4.9.37.orig/kernel/rcu/tree_plugin.h	2017-07-12 06:42:41.000000000 -0700
+++ linux-4.9.37/kernel/rcu/tree_plugin.h	2018-07-20 15:25:57.311206343 -0700
@@ -2149,6 +2149,7 @@
 		BUG_ON(!list);
 		trace_rcu_nocb_wake(rdp->rsp->name, rdp->cpu, "WokeNonEmpty");
 		WRITE_ONCE(rdp->nocb_follower_head, NULL);
+		smp_mb();
 		tail = xchg(&rdp->nocb_follower_tail, &rdp->nocb_follower_head);

 		/* Each pass through the following loop invokes a callback. */