linux-kernel - Re: [patch] vmscan: fix zone shrinking exit when scan work is done

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110209164656.GA1063@csn.ul.ie>
Date:	Wed, 9 Feb 2011 16:46:56 +0000
From:	Mel Gorman <mel@....ul.ie>
To:	Johannes Weiner <hannes@...xchg.org>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Rik van Riel <riel@...hat.com>,
	Michal Hocko <mhocko@...e.cz>,
	Kent Overstreet <kent.overstreet@...il.com>,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [patch] vmscan: fix zone shrinking exit when scan work is done

On Wed, Feb 09, 2011 at 04:46:06PM +0100, Johannes Weiner wrote:
> Hi,
> 
> I think this should fix the problem of processes getting stuck in
> reclaim that has been reported several times.

I don't think it's the only source but I'm basing this on seeing
constant looping in balance_pgdat() and calling congestion_wait() a few
weeks ago that I haven't rechecked since. However, this looks like a
real fix for a real problem.

> Kent actually
> single-stepped through this code and noted that it was never exiting
> shrink_zone(), which really narrowed it down a lot, considering the
> tons of nested loops from the allocator down to the list shrinking.
> 
> 	Hannes
> 
> ---
> From: Johannes Weiner <hannes@...xchg.org>
> Subject: vmscan: fix zone shrinking exit when scan work is done
> 
> '3e7d344 mm: vmscan: reclaim order-0 and use compaction instead of
> lumpy reclaim' introduced an indefinite loop in shrink_zone().
> 
> It meant to break out of this loop when no pages had been reclaimed
> and not a single page was even scanned.  The way it would detect the
> latter is by taking a snapshot of sc->nr_scanned at the beginning of
> the function and comparing it against the new sc->nr_scanned after the
> scan loop.  But it would re-iterate without updating that snapshot,
> looping forever if sc->nr_scanned changed at least once since
> shrink_zone() was invoked.
> 
> This is not the sole condition that would exit that loop, but it
> requires other processes to change the zone state, as the reclaimer
> that is stuck obviously can not anymore.
> 
> This is only happening for higher-order allocations, where reclaim is
> run back to back with compaction.
> 
> Reported-by: Michal Hocko <mhocko@...e.cz>
> Reported-by: Kent Overstreet <kent.overstreet@...il.com>
> Signed-off-by: Johannes Weiner <hannes@...xchg.org>

Well spotted.

Acked-by: Mel Gorman <mel@....ul.ie>

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/