lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:	Fri, 22 Jan 2016 00:37:07 +1000
From:	Nalorokk <nalorokk@...il.com>
To:	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	Stefan Strogin <s.strogin@...tner.samsung.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Sasha Levin <sasha.levin@...cle.com>,
	Mel Gorman <mgorman@...hsingularity.net>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, oleksandr@...alenko.name
Subject: Fwd: [REGRESSION] [BISECTED] kswapd high CPU usage

It appears that kernels newer than 4.1 have kswapd-related bug
resulting in high CPU usage. CPU 100% usage could last for several
minutes or several days, with CPU being busy entirely with serving
kswapd. It happens usually after server being mostly idle, sometimes
after days, sometimes after weeks of uptime. But the issue appears
much sooner if the machine is loaded with something like building a
kernel.

Here are the graphs of CPU load: first [1], second [2]. Perf top
output is here [3] as well.

To find the cause of this problem I've started with the fact that the
issue appeared after 4.1 kernel update. Then I performed longterm test
of 3.18, and discovered that 3.18 is unaffected by this bug. Then I
did some tests of 4.0 to confirm that this version behaves well too.

Then I performed git bisect from tag v4.0 to v4.1-rc1 and found exact
commits that seem to be reason of high CPU usage.

The first really "bad" commit is
79553da293d38d63097278de13e28a3b371f43c1. 2 previous commits cause
weird behavior as well resulting in kswapd consuming more CPU than
unaffected kernels, but not that much as the commit pointed above. I
believe those commits are related to the same mm tree merge.

I tried to add transparent_hugepage=never to kernel boot parameters,
but it did not change anything. Changing allocator to SLAB from SLUB
alters behavior and makes CPU load lower, but don't solve a problem at
all.

Here [4] is kernel bugzilla bugreport as well.

Ideas?

[1] http://i.piccy.info/i9/9ee6c0620c9481a974908484b2a52a0f/1453384595/44012/994698/cpu_month.png
[2] http://i.piccy.info/i9/7c97c2f39620bb9d7ea93096312dbbb6/1453384649/41222/994698/cpu_year.png
[3] http://pastebin.com/aRzTjb2x
[4] https://bugzilla.kernel.org/show_bug.cgi?id=110501

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ