[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110425231016.34b4293e@neptune.home>
Date: Mon, 25 Apr 2011 23:10:16 +0200
From: Bruno Prémont <bonbons@...ux-vserver.org>
To: paulmck@...ux.vnet.ibm.com
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Mike Frysinger <vapier.adi@...il.com>,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-fsdevel@...r.kernel.org,
"Paul E. McKenney" <paul.mckenney@...aro.org>,
Pekka Enberg <penberg@...nel.org>
Subject: Re: 2.6.39-rc4+: Kernel leaking memory during FS scanning,
regression?
On Mon, 25 April 2011 "Paul E. McKenney" wrote:
> On Mon, Apr 25, 2011 at 08:36:06PM +0200, Bruno Prémont wrote:
> > On Mon, 25 April 2011 Linus Torvalds wrote:
> > > On Mon, Apr 25, 2011 at 10:00 AM, Bruno Prémont wrote:
> > > >
> > > > I hope tiny-rcu is not that broken... as it would mean driving any
> > > > PREEMPT_NONE or PREEMPT_VOLUNTARY system out of memory when compiling
> > > > packages (and probably also just unpacking larger tarballs or running
> > > > things like du).
> > >
> > > I'm sure that TINYRCU can be fixed if it really is the problem.
> > >
> > > So I just want to make sure that we know what the root cause of your
> > > problem is. It's quite possible that it _is_ a real leak of filp or
> > > something, but before possibly wasting time trying to figure that out,
> > > let's see if your config is to blame.
> >
> > With changed config (PREEMPT=y, TREE_PREEMPT_RCU=y) I haven't reproduced
> > yet.
> >
> > When I was reproducing with TINYRCU things went normally for some time
> > until suddenly slabs stopped being freed.
>
> Hmmm... If the system is responsive during this time, could you please
> do the following after the slabs stop being freed?
>
> ps -eo pid,class,sched,rtprio,stat,state,sgi_p,cpu_time,cmd | grep '\[rcu'
Looks like tinyrcu is not innocent (or at least it makes bug appear much
more easily)
With + + TREE_PREMPT_RCU system was stable compiling for over 2 hours,
switching to TINY_RCU, filp count started increasing pretty early after beginning
compiling.
All the relevant information attached (PREEMPT+TINY_RCU):
config.gz
ps auxf |
slabinfo | twice, once early (1-*), the second 30 minutes later (2-*)
meminfo |
ls -l proc/*/fd produces 658 lines for the 1-* series of numbers, 300 for 2-*.
In both cases
ps -eo pid,class,sched,rtprio,stat,state,sgi_p,cputime,cmd | grep '\[rcu'
returns the same information:
6 FF 1 1 R R 0 00:00:00 [rcu_kthread]
according to slabtop filp count is increasing permanentally, (about +1000
every 3 seconds) probably because of top (1s refresh rate) and collectd (10s
rate) scanning /proc (without top, increasing by about 300 every 10s).
Running something like `for ((X=0; X < 200; X++)); do /bin/true; done` causes
count of pid, task_struct, signal_cache slab count to increase by about 200,
but no zombies are being left behind.
1-* Taken a few minutes after starting compile process, but after having
SIGSTOPed the compiling process tree
2-* about 30 minutes later, killed compile process tree, run above for loop
multiple times, close most terminal sessions (including top)
Between 1-slabinfo and 2-slabinfo some values increased (a lot) while a few
ones did decrease. Don't know which ones are RCU-affected and which ones are
not.
Bruno
Download attachment "config.gz" of type "application/x-gzip" (15707 bytes)
View attachment "1-meminfo" of type "text/plain" (1008 bytes)
View attachment "1-ps_auxf" of type "text/plain" (23343 bytes)
View attachment "1-slabinfo" of type "text/plain" (15853 bytes)
View attachment "2-meminfo" of type "text/plain" (1008 bytes)
View attachment "2-ps_auxf" of type "text/plain" (4728 bytes)
View attachment "2-slabinfo" of type "text/plain" (15854 bytes)
Powered by blists - more mailing lists