[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5093A3F4.8090108@redhat.com>
Date: Fri, 02 Nov 2012 11:44:04 +0100
From: Zdenek Kabelac <zkabelac@...hat.com>
To: Mel Gorman <mgorman@...e.de>
CC: Jiri Slaby <jslaby@...e.cz>, Valdis.Kletnieks@...edu,
Jiri Slaby <jirislaby@...il.com>, linux-mm@...ck.org,
LKML <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: kswapd0: excessive CPU usage
Dne 15.10.2012 13:09, Mel Gorman napsal(a):
> On Mon, Oct 15, 2012 at 11:54:13AM +0200, Jiri Slaby wrote:
>> On 10/12/2012 03:57 PM, Mel Gorman wrote:
>>> mm: vmscan: scale number of pages reclaimed by reclaim/compaction only in direct reclaim
>>>
>>> Jiri Slaby reported the following:
>>>
>>> (It's an effective revert of "mm: vmscan: scale number of pages
>>> reclaimed by reclaim/compaction based on failures".)
>>> Given kswapd had hours of runtime in ps/top output yesterday in the
>>> morning and after the revert it's now 2 minutes in sum for the last 24h,
>>> I would say, it's gone.
>>>
>>> The intention of the patch in question was to compensate for the loss of
>>> lumpy reclaim. Part of the reason lumpy reclaim worked is because it
>>> aggressively reclaimed pages and this patch was meant to be a
>>> sane compromise.
>>>
>>> When compaction fails, it gets deferred and both compaction and
>>> reclaim/compaction is deferred avoid excessive reclaim. However, since
>>> commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each time
>>> and continues reclaiming which was not taken into account when the patch
>>> was developed.
>>>
>>> As it is not taking deferred compaction into account in this path it scans
>>> aggressively before falling out and making the compaction_deferred check in
>>> compaction_ready. This patch avoids kswapd scaling pages for reclaim and
>>> leaves the aggressive reclaim to the process attempting the THP
>>> allocation.
>>>
>>> Signed-off-by: Mel Gorman <mgorman@...e.de>
>>> ---
>>> mm/vmscan.c | 10 ++++++++--
>>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index 2624edc..2b7edfa 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -1763,14 +1763,20 @@ static bool in_reclaim_compaction(struct scan_control *sc)
>>> #ifdef CONFIG_COMPACTION
>>> /*
>>> * If compaction is deferred for sc->order then scale the number of pages
>>> - * reclaimed based on the number of consecutive allocation failures
>>> + * reclaimed based on the number of consecutive allocation failures. This
>>> + * scaling only happens for direct reclaim as it is about to attempt
>>> + * compaction. If compaction fails, future allocations will be deferred
>>> + * and reclaim avoided. On the other hand, kswapd does not take compaction
>>> + * deferral into account so if it scaled, it could scan excessively even
>>> + * though allocations are temporarily not being attempted.
>>> */
>>> static unsigned long scale_for_compaction(unsigned long pages_for_compaction,
>>> struct lruvec *lruvec, struct scan_control *sc)
>>> {
>>> struct zone *zone = lruvec_zone(lruvec);
>>>
>>> - if (zone->compact_order_failed <= sc->order)
>>> + if (zone->compact_order_failed <= sc->order &&
>>> + !current_is_kswapd())
>>> pages_for_compaction <<= zone->compact_defer_shift;
>>> return pages_for_compaction;
>>> }
>>
>> Yes, applying this instead of the revert fixes the issue as well.
>>
>
I've applied this patch on 3.7.0-rc3 kernel - and I still see excessive CPU
usage - mainly after suspend/resume
Here is just simple kswapd backtrace from running kernel:
kswapd0 R running task 0 30 2 0x00000000
ffff8801331ddae8 0000000000000082 ffff880135b8a340 0000000000000008
ffff880135b8a340 ffff8801331ddfd8 ffff8801331ddfd8 ffff8801331ddfd8
ffff880071db8000 ffff880135b8a340 0000000000000286 ffff8801331dc000
Call Trace:
[<ffffffff81555cd2>] preempt_schedule+0x42/0x60
[<ffffffff81557b75>] _raw_spin_unlock+0x55/0x60
[<ffffffff811929d1>] put_super+0x31/0x40
[<ffffffff81192aa2>] drop_super+0x22/0x30
[<ffffffff81193be9>] prune_super+0x149/0x1b0
[<ffffffff81141e2a>] shrink_slab+0xba/0x510
[<ffffffff81185baa>] ? mem_cgroup_iter+0x17a/0x2e0
[<ffffffff81185afa>] ? mem_cgroup_iter+0xca/0x2e0
[<ffffffff811450f9>] balance_pgdat+0x629/0x7f0
[<ffffffff81145434>] kswapd+0x174/0x620
[<ffffffff8106fd20>] ? __init_waitqueue_head+0x60/0x60
[<ffffffff811452c0>] ? balance_pgdat+0x7f0/0x7f0
[<ffffffff8106f50b>] kthread+0xdb/0xe0
[<ffffffff8106f430>] ? kthread_create_on_node+0x140/0x140
[<ffffffff8155fb1c>] ret_from_fork+0x7c/0xb0
[<ffffffff8106f430>] ? kthread_create_on_node+0x140/0x140
Zdenek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists