lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 8 Mar 2011 09:44:38 +0900
From:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Minchan Kim <minchan.kim@...il.com>,
	Andrew Vagin <avagin@...il.com>,
	Andrey Vagin <avagin@...nvz.org>, Mel Gorman <mel@....ul.ie>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: check zone->all_unreclaimable in
 all_unreclaimable()

On Mon, 7 Mar 2011 13:58:31 -0800
Andrew Morton <akpm@...ux-foundation.org> wrote:

> On Sun, 6 Mar 2011 02:07:59 +0900
> Minchan Kim <minchan.kim@...il.com> wrote:
> 
> > On Sat, Mar 05, 2011 at 07:41:26PM +0300, Andrew Vagin wrote:
> > > On 03/05/2011 06:53 PM, Minchan Kim wrote:
> > > >On Sat, Mar 05, 2011 at 06:34:37PM +0300, Andrew Vagin wrote:
> > > >>On 03/05/2011 06:20 PM, Minchan Kim wrote:
> > > >>>On Sat, Mar 05, 2011 at 02:44:16PM +0300, Andrey Vagin wrote:
> > > >>>>Check zone->all_unreclaimable in all_unreclaimable(), otherwise the
> > > >>>>kernel may hang up, because shrink_zones() will do nothing, but
> > > >>>>all_unreclaimable() will say, that zone has reclaimable pages.
> > > >>>>
> > > >>>>do_try_to_free_pages()
> > > >>>>	shrink_zones()
> > > >>>>		 for_each_zone
> > > >>>>			if (zone->all_unreclaimable)
> > > >>>>				continue
> > > >>>>	if !all_unreclaimable(zonelist, sc)
> > > >>>>		return 1
> > > >>>>
> > > >>>>__alloc_pages_slowpath()
> > > >>>>retry:
> > > >>>>	did_some_progress = do_try_to_free_pages(page)
> > > >>>>	...
> > > >>>>	if (!page&&   did_some_progress)
> > > >>>>		retry;
> > > >>>>
> > > >>>>Signed-off-by: Andrey Vagin<avagin@...nvz.org>
> > > >>>>---
> > > >>>>  mm/vmscan.c |    2 ++
> > > >>>>  1 files changed, 2 insertions(+), 0 deletions(-)
> > > >>>>
> > > >>>>diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > >>>>index 6771ea7..1c056f7 100644
> > > >>>>--- a/mm/vmscan.c
> > > >>>>+++ b/mm/vmscan.c
> > > >>>>@@ -2002,6 +2002,8 @@ static bool all_unreclaimable(struct zonelist *zonelist,
> > > >>>>
> > > >>>>  	for_each_zone_zonelist_nodemask(zone, z, zonelist,
> > > >>>>  			gfp_zone(sc->gfp_mask), sc->nodemask) {
> > > >>>>+		if (zone->all_unreclaimable)
> > > >>>>+			continue;
> > > >>>>  		if (!populated_zone(zone))
> > > >>>>  			continue;
> > > >>>>  		if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
> > > >>>zone_reclaimable checks it. Isn't it enough?
> > > >>I sent one more patch [PATCH] mm: skip zombie in OOM-killer.
> > > >>This two patches are enough.
> > > >Sorry if I confused you.
> > > >I mean zone->all_unreclaimable become true if !zone_reclaimable in balance_pgdat.
> > > >zone_reclaimable compares recent pages_scanned with the number of zone lru pages.
> > > >So too many page scanning in small lru pages makes the zone to unreclaimable zone.
> > > >
> > > >In all_unreclaimable, we calls zone_reclaimable to detect it.
> > > >It's the same thing with your patch.
> > > balance_pgdat set zone->all_unreclaimable, but the problem is that
> > > it is cleaned late.
> > 
> > Yes. It can be delayed by pcp so (zone->all_unreclaimable = true) is
> > a false alram since zone have a free page and it can be returned 
> > to free list by drain_all_pages in next turn.
> > 
> > > 
> > > The problem is that zone->all_unreclaimable = True, but
> > > zone_reclaimable() returns True too.
> > 
> > Why is it a problem? 
> > If zone->all_unreclaimable gives a false alram, we does need to check
> > it again by zone_reclaimable call.
> > 
> > If we believe a false alarm and give up the reclaim, maybe we have to make
> > unnecessary oom kill.
> > 
> > > 
> > > zone->all_unreclaimable will be cleaned in free_*_pages, but this
> > > may be late. It is enough allocate one page from page cache, that
> > > zone_reclaimable() returns True and zone->all_unreclaimable becomes
> > > True.
> > > >>>Does the hang up really happen or see it by code review?
> > > >>Yes. You can reproduce it for help the attached python program. It's
> > > >>not very clever:)
> > > >>It make the following actions in loop:
> > > >>1. fork
> > > >>2. mmap
> > > >>3. touch memory
> > > >>4. read memory
> > > >>5. munmmap
> > > >It seems the test program makes fork bombs and memory hogging.
> > > >If you applied this patch, the problem is gone?
> > > Yes.
> > 
> > Hmm.. Although it solves the problem, I think it's not a good idea that
> > depends on false alram and give up the retry.
> 
> Any alternative proposals?  We should get the livelock fixed if possible..

I agree with Minchan and can't think this is a real fix....
Andrey, I'm now trying your fix and it seems your fix for oom-killer,
'skip-zombie-process' works enough good for my environ.

What is your enviroment ? number of cpus ? architecture ? size of memory ?



Thanks,
-Kame

















--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists