[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1195549767.8601.22.camel@localhost>
Date: Tue, 20 Nov 2007 10:09:27 +0100
From: Ian Kumlien <pomac@...or.com>
To: Nick Piggin <nickpiggin@...oo.com.au>
Cc: Linux-kernel@...r.kernel.org
Subject: Re: [BUG?] OOM with large cache....(x86_64, 2.6.24-rc3-git1, nohz)
On tis, 2007-11-20 at 15:13 +1100, Nick Piggin wrote:
> On Tuesday 20 November 2007 11:59, Ian Kumlien wrote:
> > Hi,
> >
> > I have had this before and sent a mail about it.
> >
> > It seems like the diskcache is still in use and is never shrunk. This
> > happened with a odd load though, trackerd started indexing a bit late
> > and the other workload which is a large bittorrent seed/download.
> >
> > The bittorrent app is the one that drives up the diskcache.
> >
> > I don't think that trackerd was triggering it, i actually upgraded
> > kernel since it kept happening on 2.6.23...
> >
> > I really don't know what other information i can provide.
> >
> > free from now (some hours later)
> > vmstat from now ^
> >
> > and the dmesg log.
> >
> > Ideas? Comments?
> >
> > free:
> > total used free shared buffers cached
> > Mem: 2056484 2039736 16748 0 20776 1585408
> > -/+ buffers/cache: 433552 1622932
> > Swap: 2530180 426020 2104160
> > ---
> >
> > vmstat:
> > procs -----------memory---------- ---swap-- -----io---- -system--
> > ----cpu---- r b swpd free buff cache si so bi bo in
> > cs us sy id wa 0 0 426020 16612 20580 1585848 26 21 684 56 34
> > 51 5 3 88 4 ---
> >
> > --- 8<--- 8<---
> > ntpd invoked oom-killer: gfp_mask=0x1201d2, order=0, oomkilladj=0
> >
> > Call Trace:
> > [<ffffffff80270fd6>] oom_kill_process+0xf6/0x110
> > [<ffffffff80271476>] out_of_memory+0x1b6/0x200
> > [<ffffffff80273a07>] __alloc_pages+0x387/0x3c0
> > [<ffffffff80275b03>] __do_page_cache_readahead+0x103/0x260
> > [<ffffffff802703f1>] filemap_fault+0x2f1/0x420
> > [<ffffffff8027bcbb>] __do_fault+0x6b/0x410
> > [<ffffffff802499de>] recalc_sigpending+0xe/0x40
> > [<ffffffff8027d9dd>] handle_mm_fault+0x1bd/0x7a0
> > [<ffffffff80212cda>] save_i387+0x9a/0xe0
> > [<ffffffff80227e76>] do_page_fault+0x176/0x790
> > [<ffffffff8020bacf>] sys_rt_sigreturn+0x35f/0x400
> > [<ffffffff806acbf9>] error_exit+0x0/0x51
> >
> > Mem-info:
> > DMA per-cpu:
> > CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1
> > usd: 0 CPU 1: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0,
> > btch: 1 usd: 0 DMA32 per-cpu:
> > CPU 0: Hot: hi: 186, btch: 31 usd: 148 Cold: hi: 62, btch: 15
> > usd: 60 CPU 1: Hot: hi: 186, btch: 31 usd: 116 Cold: hi: 62,
> > btch: 15 usd: 18 Active:241172 inactive:241825 dirty:0 writeback:0
> > unstable:0
> > free:3388 slab:8095 mapped:149 pagetables:6263 bounce:0
> > DMA free:7908kB min:20kB low:24kB high:28kB active:0kB inactive:0kB
> > present:7436kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0
> > 2003 2003 2003
> > DMA32 free:5644kB min:5716kB low:7144kB high:8572kB active:964688kB
> > inactive:967188kB present:2052008kB pages_scanned:5519125
> > all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0
> > DMA: 5*4kB 4*8kB 3*16kB 4*32kB 6*64kB 5*128kB 4*256kB 3*512kB 0*1024kB
> > 0*2048kB 1*4096kB = 7908kB DMA32: 95*4kB 2*8kB 0*16kB 0*32kB 0*64kB 1*128kB
> > 0*256kB 2*512kB 0*1024kB 0*2048kB 1*4096kB = 5644kB Swap cache: add
> > 1979600, delete 1979592, find 144656/307405, race 1+17 Free swap = 0kB
> > Total swap = 2530180kB
> > Free swap: 0kB
> > 524208 pages of RAM
> > 10149 reserved pages
> > 5059 pages shared
> > 8 pages swap cached
> > Out of memory: kill process 8421 (trackerd) score 1016524 or a child
> > Killed process 8421 (trackerd)
>
> It's also used up all your 2.5GB of swap. The output of your `free` shows
> a fair bit of disk cache there, but it also shows a lot of swap free, which
> isn't the case at oom-time.
Yes, as i said those was from several hours later...
> Unfortunately, we don't show NR_ANON_PAGES in these stats, but at a guess,
> I'd say that the file cache is mostly shrunk and you still don't have
> enough memory. trackerd probably has a memory leak in it, or else is just
> trying to allocate more memory than you have. Is this a regression?
I have had it happen twice before, without tracker running...
It didn't quite get to the oom stage, it just failed alot of allocations
while having 1.5 or more memory locked in cache.
http://marc.info/?l=linux-kernel&m=118895576924867&w=2
I saw that again and thought "Lets update and see what happens" and then
this happened. I haven't had the same version of trackerd go haywire on
me before, i haven't had a oom kill gnome-session or metacity before.
--
Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net
Download attachment "signature.asc" of type "application/pgp-signature" (199 bytes)
Powered by blists - more mailing lists