linux-kernel - Re: 2.6.39-rc4+: Kernel leaking memory during FS scanning, regression?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.02.1104272351290.3323@ionos>
Date:	Thu, 28 Apr 2011 00:06:11 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Bruno Prémont <bonbons@...ux-vserver.org>
cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Ingo Molnar <mingo@...e.hu>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	paulmck@...ux.vnet.ibm.com, Mike Frysinger <vapier.adi@...il.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	linux-fsdevel@...r.kernel.org,
	"Paul E. McKenney" <paul.mckenney@...aro.org>,
	Pekka Enberg <penberg@...nel.org>
Subject: Re: 2.6.39-rc4+: Kernel leaking memory during FS scanning,
 regression?

On Wed, 27 Apr 2011, Bruno Prémont wrote:
> On Wed, 27 April 2011 Bruno Prémont wrote:
> Voluntary context switches stay constant from the time on SLABs pile up.
> (which makes sense as it doesn't run get CPU slices anymore)
> 
> > > Can you please enable CONFIG_SCHED_DEBUG and provide the output of
> > > /proc/sched_stat when the problem surfaces and a minute after the
> > > first snapshot?
> 
> hm, did you mean CONFIG_SCHEDSTAT or /proc/sched_debug?
> 
> I did use CONFIG_SCHED_DEBUG (and there is no /proc/sched_stat) so I took
> /proc/sched_debug which exists... (attached, taken about 7min and +1min
> after SLABs started piling up), though build processes were SIGSTOPped
> during first minute.

Oops. /proc/sched_debug is the right thing.
 
> printk wrote (in case its timestamp is useful, more below):
> [  518.480103] sched: RT throttling activated

Ok. Aside of the fact that the CPU time accounting is completely hosed
this is pointing to the root cause of the problem.

kthread_rcu seems to run in circles for whatever reason and the RT
throttler catches it. After that things go down the drain completely
as it should get on the CPU again after that 50ms throttling break.

Though we should not ignore the fact, that the RT throttler hit, but
none of the RT tasks actually accumulated runtime.

So there is a couple of questions:

   - Why does the scheduler detect the 950 ms RT runtime, but does
     not accumulate that runtime to any thread

   - Why is the runtime accounting totally hosed

   - Why does that not happen (at least not reproducible) with 
     TREE_RCU

I need some sleep now, but I will try to come up with sensible
debugging tomorrow unless Paul or someone else beats me to it.

Thanks,

	tglx