[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.02.1104280028250.3323@ionos>
Date: Thu, 28 Apr 2011 00:32:50 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
cc: Bruno Prémont <bonbons@...ux-vserver.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Ingo Molnar <mingo@...e.hu>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Mike Frysinger <vapier.adi@...il.com>,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-fsdevel@...r.kernel.org,
"Paul E. McKenney" <paul.mckenney@...aro.org>,
Pekka Enberg <penberg@...nel.org>
Subject: Re: 2.6.39-rc4+: Kernel leaking memory during FS scanning,
regression?
On Wed, 27 Apr 2011, Paul E. McKenney wrote:
> On Thu, Apr 28, 2011 at 12:06:11AM +0200, Thomas Gleixner wrote:
> > On Wed, 27 Apr 2011, Bruno Prémont wrote:
> > > On Wed, 27 April 2011 Bruno Prémont wrote:
> > > Voluntary context switches stay constant from the time on SLABs pile up.
> > > (which makes sense as it doesn't run get CPU slices anymore)
> > >
> > > > > Can you please enable CONFIG_SCHED_DEBUG and provide the output of
> > > > > /proc/sched_stat when the problem surfaces and a minute after the
> > > > > first snapshot?
> > >
> > > hm, did you mean CONFIG_SCHEDSTAT or /proc/sched_debug?
> > >
> > > I did use CONFIG_SCHED_DEBUG (and there is no /proc/sched_stat) so I took
> > > /proc/sched_debug which exists... (attached, taken about 7min and +1min
> > > after SLABs started piling up), though build processes were SIGSTOPped
> > > during first minute.
> >
> > Oops. /proc/sched_debug is the right thing.
> >
> > > printk wrote (in case its timestamp is useful, more below):
> > > [ 518.480103] sched: RT throttling activated
> >
> > Ok. Aside of the fact that the CPU time accounting is completely hosed
> > this is pointing to the root cause of the problem.
> >
> > kthread_rcu seems to run in circles for whatever reason and the RT
> > throttler catches it. After that things go down the drain completely
> > as it should get on the CPU again after that 50ms throttling break.
>
> Ah. This could happen if there was a huge number of callbacks, in
> which case blimit would be set very large and kthread_rcu could then
> go CPU-bound. And this workload was generating large numbers of
> callbacks due to filesystem operations, right?
>
> So, perhaps I should kick kthread_rcu back to SCHED_NORMAL if blimit
> has been set high. Or have some throttling of my own. I must confess
> that throttling kthread_rcu for two hours seems a bit harsh. ;-)
That's not the intended thing. See below.
> If this was just throttling kthread_rcu for a few hundred milliseconds,
> or even for a second or two, things would be just fine.
>
> Left to myself, I will put together a patch that puts callback processing
> down to SCHED_NORMAL in the case where there are huge numbers of
> callbacks to be processed.
Well that's going to paper over the problem at hand possibly. I really
don't see why that thing would run for more than 950ms in a row even
if there is a large number of callbacks pending.
And then I don't have an explanation for the hosed CPU accounting and
why that thing does not get another 950ms RT time when the 50ms
throttling break is over.
Thanks,
tglx
Powered by blists - more mailing lists