linux-kernel - Re: Found the commit that causes the OOMs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 29 Jun 2009 17:07:25 +0100
From:	Mel Gorman <mel@....ul.ie>
To:	Minchan Kim <minchan.kim@...il.com>
Cc:	David Howells <dhowells@...hat.com>,
	Wu Fengguang <fengguang.wu@...el.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Johannes Weiner <hannes@...xchg.org>,
	"riel@...hat.com" <riel@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Christoph Lameter <cl@...ux-foundation.org>,
	"peterz@...radead.org" <peterz@...radead.org>,
	"tytso@....edu" <tytso@....edu>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"elladan@...imo.com" <elladan@...imo.com>,
	"npiggin@...e.de" <npiggin@...e.de>,
	"Barnes, Jesse" <jesse.barnes@...el.com>
Subject: Re: Found the commit that causes the OOMs

On Tue, Jun 30, 2009 at 12:00:26AM +0900, Minchan Kim wrote:
> On Mon, Jun 29, 2009 at 11:21 PM, David Howells<dhowells@...hat.com> wrote:
> > Wu Fengguang <fengguang.wu@...el.com> wrote:
> >
> >> Sorry! This one compiles OK:
> >
> > Sadly that doesn't seem to work either:
> >
> > msgctl11 invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
> > msgctl11 cpuset=/ mems_allowed=0
> > Pid: 30858, comm: msgctl11 Not tainted 2.6.31-rc1-cachefs #146
> > Call Trace:
> >  [<ffffffff8107207e>] ? oom_kill_process.clone.0+0xa9/0x245
> >  [<ffffffff81072345>] ? __out_of_memory+0x12b/0x142
> >  [<ffffffff810723c6>] ? out_of_memory+0x6a/0x94
> >  [<ffffffff81074a90>] ? __alloc_pages_nodemask+0x42e/0x51d
> >  [<ffffffff81080843>] ? do_wp_page+0x2c6/0x5f5
> >  [<ffffffff810820c1>] ? handle_mm_fault+0x5dd/0x62f
> >  [<ffffffff81022c32>] ? do_page_fault+0x1f8/0x20d
> >  [<ffffffff812e069f>] ? page_fault+0x1f/0x30
> > Mem-Info:
> > DMA per-cpu:
> > CPU    0: hi:    0, btch:   1 usd:   0
> > CPU    1: hi:    0, btch:   1 usd:   0
> > DMA32 per-cpu:
> > CPU    0: hi:  186, btch:  31 usd:  38
> > CPU    1: hi:  186, btch:  31 usd: 106
> > Active_anon:75040 active_file:0 inactive_anon:2031
> >  inactive_file:0 unevictable:0 dirty:0 writeback:0 unstable:0
> >  free:1951 slab:41499 mapped:301 pagetables:60674 bounce:0
> > DMA free:3932kB min:60kB low:72kB high:88kB active_anon:2868kB inactive_anon:384kB active_file:0kB inactive_file:0kB unevictable:0kB present:15364kB pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 968 968 968
> > DMA32 free:3872kB min:3948kB low:4932kB high:5920kB active_anon:297292kB inactive_anon:7740kB active_file:0kB inactive_file:0kB unevictable:0kB present:992032kB pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 0 0 0
> > DMA: 7*4kB 0*8kB 0*16kB 0*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3932kB
> > DMA32: 500*4kB 2*8kB 0*16kB 0*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 3872kB
> > 1928 total pagecache pages
> > 0 pages in swap cache
> > Swap cache stats: add 0, delete 0, find 0/0
> > Free swap  = 0kB
> > Total swap = 0kB
> > 255744 pages RAM
> > 5589 pages reserved
> > 238251 pages shared
> > 216210 pages non-shared
> > Out of memory: kill process 25221 (msgctl11) score 130560 or a child
> > Killed process 26379 (msgctl11)
> 
> Totally, I can't understand this situation.
> Now, this page allocation is order zero and It is just likely GFP_HIGHUSER.
> So it's unlikely interrupt context.

The GFP flags that are set are

#define __GFP_HIGHMEM	(0x02)
#define __GFP_MOVABLE	(0x08)  /* Page is movable */
#define __GFP_WAIT	(0x10)	/* Can wait and reschedule? */
#define __GFP_IO	(0x40)	/* Can start physical IO? */
#define __GFP_FS	(0x80)	/* Can call down to low-level FS? */
#define __GFP_HARDWALL   (0x20000) /* Enforce hardwall cpuset memory allocs */

which are fairly permissive in terms of what action can be taken.

> Buddy already has enough fallback DMA32, I think.

It doesn't really. We are below the minimum watermark. It wouldn't be
able to grant the allocation until a few pages had been freed.

> Why kernel can't allocate page for order 0 ?
> Is it allocator bug ?
> 

If it is, it is not because the allocation failed as the watermarks were not
being met. For this situation to be occuring, it has to be scanning the LRU
lists and making no forward progress. Odd things to note;

o active_anon is very large in comparison to inactive_anon. Is this
  because there is no swap and they are no longer being rotated?
o Slab and pagetables are very large. Is slab genuinely unshrinkable?

I think this system might be genuinely OOM. It can't reclaim memory and
we are below the minimum watermarks.

Is it possible there are pages that are counted as active_anon that in
fact are reclaimable because they are on the wrong LRU list? If that was
the case, the lack of rotation to inactive list would prevent them
getting discovered.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/