linux-kernel - Re: Kswapd in 3.2.0-rc5 is a CPU hog

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20111227035730.GA22840@barrios-laptop.redhat.com>
Date:	Tue, 27 Dec 2011 12:57:31 +0900
From:	Minchan Kim <minchan@...nel.org>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Cc:	Dave Chinner <david@...morbit.com>,
	nowhere <nowhere@...kenden.ath.cx>,
	Michal Hocko <mhocko@...e.cz>, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org
Subject: Re: Kswapd in 3.2.0-rc5 is a CPU hog

On Tue, Dec 27, 2011 at 11:15:43AM +0900, KAMEZAWA Hiroyuki wrote:
> On Sat, 24 Dec 2011 07:45:03 +1100
> Dave Chinner <david@...morbit.com> wrote:
> 
> > On Fri, Dec 23, 2011 at 03:04:02PM +0400, nowhere wrote:
> > > В Пт., 23/12/2011 в 21:20 +1100, Dave Chinner пишет:
> > > > On Fri, Dec 23, 2011 at 01:01:20PM +0400, nowhere wrote:
> > > > > В Чт., 22/12/2011 в 09:55 +1100, Dave Chinner пишет:
> > > > > > On Wed, Dec 21, 2011 at 10:52:49AM +0100, Michal Hocko wrote:
> 
> > > Here is the report of trace-cmd while dd'ing
> > > https://80.237.6.56/report-dd.xz
> > 
> > Ok, it's not a shrink_slab() problem - it's just being called ~100uS
> > by kswapd. The pattern is:
> > 
> > 	- reclaim 94 (batches of 32,32,30) pages from iinactive list
> > 	  of zone 1, node 0, prio 12
> > 	- call shrink_slab
> > 		- scan all caches
> > 		- all shrinkers return 0 saying nothing to shrink
> > 	- 40us gap
> > 	- reclaim 10-30 pages from inactive list of zone 2, node 0, prio 12
> > 	- call shrink_slab
> > 		- scan all caches
> > 		- all shrinkers return 0 saying nothing to shrink
> > 	- 40us gap
> > 	- isolate 9 pages from LRU zone ?, node ?, none isolated, none freed
> > 	- isolate 22 pages from LRU zone ?, node ?, none isolated, none freed
> > 	- call shrink_slab
> > 		- scan all caches
> > 		- all shrinkers return 0 saying nothing to shrink
> > 	40us gap
> > 
> > And it just repeats over and over again. After a while, nid=0,zone=1
> > drops out of the traces, so reclaim only comes in batches of 10-30
> > pages from zone 2 between each shrink_slab() call.
> > 
> > The trace starts at 111209.881s, with 944776 pages on the LRUs. It
> > finishes at 111216.1 with kswapd going to sleep on node 0 with
> > 930067 pages on the LRU. So 7 seconds to free 15,000 pages (call it
> > 2,000 pages/s) which is awfully slow....
> > 
> > vmscan gurus - time for you to step in now...
> >
>  
> Can you show /proc/zoneinfo ? I want to know each zone's size.
> 
> Below is my memo.
> 
> In trace log, priority = 11 or 12. Then, I think kswapd can reclaim memory
> to satisfy "sc.nr_reclaimed >= SWAP_CLUSTER_MAX" condition and loops again.
> 
> Seeing balance_pgdat() and trace log, I guess it does
> 
> 	wake up
> 
> 	shrink_zone(zone=0(DMA?))     => nothing to reclaim.
> 		shrink_slab()
> 	shrink_zone(zone=1(DMA32?))   => reclaim 32,32,31 pages 
> 		shrink_slab()
> 	shrink_zone(zone=2(NORMAL?))  => reclaim 13 pages. 
> 		srhink_slab()
> 
> 	sleep or retry.
> 
> Why shrink_slab() need to be called frequently like this ?

I guess it's caused by small NORMAL zone.
The scenario I think is as follows,

1. dd comsumes memory in NORMAL zone
2. dd enter direct reclaim and wakeup kswapd
3. kswapd reclaims some memory in NORMAL zone until it reclaims high wamrk
4. schedule
5. dd consumes memory again in NORMAL zone
6. kswapd fail to reclaim memory by high watermark due to 5.
7. loop again, goto 3.

The point is speed between reclaim VS memory consumption.
So kswapd cannot reach a point which enough pages are in NORMAL zone.

> 
> BTW. I'm sorry if I miss something ...Why only kswapd reclaims memory
> while 'dd' operation ? (no direct relcaim by dd.)
> Is this log record cpu hog after 'dd' ?

If above scenario is right, dd couldn't enter direct reclaim to reclaim memory.


> 
> Thanks,
> -Kame
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/