linux-kernel - Re: kswapd craziness round 2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <5121C7AF.2090803@numascale-asia.com>
Date:	Mon, 18 Feb 2013 14:18:23 +0800
From:	Daniel J Blueman <daniel@...ascale-asia.com>
To:	Jiri Slaby <jslaby@...e.cz>
CC:	"Linux Kernel" <linux-kernel@...r.kernel.org>,
	"Steffen Persvold" <sp@...ascale.com>
Subject: Re: kswapd craziness round 2

On Monday, 18 February 2013 06:10:02 UTC+8, Jiri Slaby  wrote:
 > Hi,
 >
 > You still feel the sour taste of the "kswapd craziness in v3.7" thread,
 > right? Welcome to the hell, part two :{.
 >
 > I believe this started happening after update from
 > 3.8.0-rc4-next-20130125 to 3.8.0-rc7-next-20130211. The same as before,
 > many hours of uptime are needed and perhaps some suspend/resume cycles
 > too. Memory pressure is not high, plenty of I/O cache:
 > # free
 >              total       used       free     shared    buffers     cached
 > Mem:       6026692    5571184     455508          0     351252    2016648
 > -/+ buffers/cache:    3203284    2823408
 > Swap:            0          0          0
 >
 > kswap is working very toughly though:
 > root       580  0.6  0.0      0     0 ?        S    úno12  46:21 
[kswapd0]
 >
 > This happens on I/O activity right now. For example by updatedb or find
 > /. This is what the stack trace of kswapd0 looks like:
 > [<ffffffff8113c431>] shrink_slab+0xa1/0x2d0
 > [<ffffffff8113ecd1>] kswapd+0x541/0x930
 > [<ffffffff810a3000>] kthread+0xc0/0xd0
 > [<ffffffff816beb5c>] ret_from_fork+0x7c/0xb0
 > [<ffffffffffffffff>] 0xffffffffffffffff

Likewise with 3.8-rc, I've been able to reproduce [1] a livelock 
scenario which hoses the box and observe RCU stalls are observed [2].

There may be a connection; I'll do a bit more debugging in the next few 
days.

Daniel

--- [1]

1. live-booted image using ramdisk
2. boot 3.8-rc with <16GB memory and without swap
3. run OpenMP NAS Parallel Benchmark dc.B against local disk (ie not 
ramdisk)
4. observe hang O(30) mins later

--- [2]

[ 2675.587878] INFO: rcu_sched self-detected stall on CPU { 5}  (t=24000 
jiffies g=6313 c=6312 q=68)
-- 
Daniel J Blueman
Principal Software Engineer, Numascale Asia
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/