linux-kernel - Re: abnormal OOM killer message

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090819103611.GG24809@csn.ul.ie>
Date:	Wed, 19 Aug 2009 11:36:11 +0100
From:	Mel Gorman <mel@....ul.ie>
To:	Minchan Kim <minchan.kim@...il.com>
Cc:	????????? <chungki.woo@...il.com>, ngupta@...are.org,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	fengguang.wu@...el.com, riel@...hat.com, akpm@...ux-foundation.org,
	kosaki.motohiro@...fujitsu.com
Subject: Re: abnormal OOM killer message

On Wed, Aug 19, 2009 at 03:49:58PM +0900, Minchan Kim wrote:
> On Wed, 19 Aug 2009 15:24:54 +0900
> ????????? <chungki.woo@...il.com> wrote:
> 
> > Thank you very much for replys.
> > 
> > But I think it seems not to relate with stale data problem in compcache.
> > My question was why last chance to allocate memory was failed.
> > When OOM killer is executed, memory state is not a condition to
> > execute OOM killer.
> > Specially, there are so many pages of order 0. And allocating order is zero.
> > I think that last allocating memory should have succeeded.
> > That's my worry.
> 
> Yes. I agree with you.
> Mel. Could you give some comment in this situation ?
> Is it possible that order 0 allocation is failed 
> even there are many pages in buddy ?
> 

Not ordinarily. If it happens, I tend to suspect that the free list data
is corrupted and would put a check in __rmqueue() that looked like

	BUG_ON(list_empty(&area->free_list) && area->nr_free);

The second question is, why are we in direct reclaim this far above the
watermark? It should only be kswapd that is doing any reclaim at that
point. That makes me wonder again are the free lists corrupted.

The other possibility is that the zonelist used for allocation in the
troubled path contains no populated zones. I would put a BUG_ON check in
get_page_from_freelist() to check if the first zone in the zonelist has no
pages. If that bug triggers, it might explain why OOMs are triggering for
no good reason.

I consider both of those possibilities abnormal though.

> > 
> > -----------------------------------------------------------------------------------------------------------------------------------------------
> >       page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, order,
> > <== this is last chance
> >                            zonelist, ALLOC_WMARK_HIGH|ALLOC_CPUSET);
> > <== uses ALLOC_WMARK_HIGH
> >       if (page)
> >       goto got_pg;
> > 
> >       out_of_memory(zonelist, gfp_mask, order);
> >       goto restart;
> > -----------------------------------------------------------------------------------------------------------------------------------------------
> > 
> > > Let me have a question.
> > > Now the system has 79M as total swap.
> > > It's bigger than system memory size.
> > > Is it possible in compcache?
> > > Can we believe the number?
> > 
> > Yeah, It's possible. 79Mbyte is data size can be swap.
> > It's not compressed data size. It's just original data size.
> 
> You means your pages with 79M are swap out in compcache's reserved
> memory?
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/