linux-kernel - INFO: rcu_sched detected stalls on CPUs/tasks with `kswapd` and `mem_cgroup_shrink

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <28a9fabb-c9fe-c865-016a-467a4d5e2a34@molgen.mpg.de>
Date:   Tue, 8 Nov 2016 13:22:28 +0100
From:   Paul Menzel <pmenzel@...gen.mpg.de>
To:     linux-kernel@...r.kernel.org
Cc:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Josh Triplett <josh@...htriplett.org>, dvteam@...gen.mpg.de
Subject: INFO: rcu_sched detected stalls on CPUs/tasks with `kswapd` and
 `mem_cgroup_shrink_node`

Dear Linux folks,


Could you please help me shedding some light into the messages below?

With Linux 4.4.X, these messages were not seen. When updating to Linux 
4.8.4, and Linux 4.8.6 they started to appear. In that version, we 
enabled several CGROUP options.

> $ dmesg -T
> […]
> [Mon Nov  7 15:09:45 2016] INFO: rcu_sched detected stalls on CPUs/tasks:
> [Mon Nov  7 15:09:45 2016]     3-...: (493 ticks this GP) idle=515/140000000000000/0 softirq=5504423/5504423 fqs=13876
> [Mon Nov  7 15:09:45 2016]     (detected by 5, t=60002 jiffies, g=1363193, c=1363192, q=268508)
> [Mon Nov  7 15:09:45 2016] Task dump for CPU 3:
> [Mon Nov  7 15:09:45 2016] kswapd1         R  running task        0    87      2 0x00000008
> [Mon Nov  7 15:09:45 2016]  ffffffff81aabdfd ffff8810042a5cb8 ffff88080ad34000 ffff88080ad33dc8
> [Mon Nov  7 15:09:45 2016]  ffff88080ad33d00 0000000000003501 0000000000000000 0000000000000000
> [Mon Nov  7 15:09:45 2016]  0000000000000000 0000000000000000 0000000000022316 000000000002bc9f
> [Mon Nov  7 15:09:45 2016] Call Trace:
> [Mon Nov  7 15:09:45 2016]  [<ffffffff81aabdfd>] ? __schedule+0x21d/0x5b0
> [Mon Nov  7 15:09:45 2016]  [<ffffffff81106dcf>] ? shrink_node+0xbf/0x1c0
> [Mon Nov  7 15:09:45 2016]  [<ffffffff81107865>] ? kswapd+0x315/0x5f0
> [Mon Nov  7 15:09:45 2016]  [<ffffffff81107550>] ? mem_cgroup_shrink_node+0x90/0x90
> [Mon Nov  7 15:09:45 2016]  [<ffffffff8106c614>] ? kthread+0xc4/0xe0
> [Mon Nov  7 15:09:45 2016]  [<ffffffff81aaf64f>] ? ret_from_fork+0x1f/0x40
> [Mon Nov  7 15:09:45 2016]  [<ffffffff8106c550>] ? kthread_worker_fn+0x160/0x160

Even after reading `stallwarn.txt` [1], I don’t know what could cause 
this. All items in the backtrace seem to belong to the Linux kernel.

There is also nothing suspicious in the monitoring graphs during that time.


Kind regards,

Paul


[1] https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt

View attachment "config-4.8.6.mx64.115" of type "text/plain" (112117 bytes)

View attachment "config-4.4.14.mx64.90" of type "text/plain" (107630 bytes)