linux-kernel - Re: 2.6.39-rc4+: Kernel leaking memory during FS scanning, regression?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110425231016.34b4293e@neptune.home>
Date:	Mon, 25 Apr 2011 23:10:16 +0200
From:	Bruno Prémont <bonbons@...ux-vserver.org>
To:	paulmck@...ux.vnet.ibm.com
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Mike Frysinger <vapier.adi@...il.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	linux-fsdevel@...r.kernel.org,
	"Paul E. McKenney" <paul.mckenney@...aro.org>,
	Pekka Enberg <penberg@...nel.org>
Subject: Re: 2.6.39-rc4+: Kernel leaking memory during FS scanning,
 regression?

On Mon, 25 April 2011 "Paul E. McKenney" wrote:
> On Mon, Apr 25, 2011 at 08:36:06PM +0200, Bruno Prémont wrote:
> > On Mon, 25 April 2011 Linus Torvalds wrote:
> > > On Mon, Apr 25, 2011 at 10:00 AM, Bruno Prémont wrote:
> > > >
> > > > I hope tiny-rcu is not that broken... as it would mean driving any
> > > > PREEMPT_NONE or PREEMPT_VOLUNTARY system out of memory when compiling
> > > > packages (and probably also just unpacking larger tarballs or running
> > > > things like du).
> > > 
> > > I'm sure that TINYRCU can be fixed if it really is the problem.
> > > 
> > > So I just want to make sure that we know what the root cause of your
> > > problem is. It's quite possible that it _is_ a real leak of filp or
> > > something, but before possibly wasting time trying to figure that out,
> > > let's see if your config is to blame.
> > 
> > With changed config (PREEMPT=y, TREE_PREEMPT_RCU=y) I haven't reproduced
> > yet.
> > 
> > When I was reproducing with TINYRCU things went normally for some time
> > until suddenly slabs stopped being freed.
> 
> Hmmm... If the system is responsive during this time, could you please
> do the following after the slabs stop being freed?
> 
> ps -eo pid,class,sched,rtprio,stat,state,sgi_p,cpu_time,cmd | grep '\[rcu'

Looks like tinyrcu is not innocent (or at least it makes bug appear much
more easily)

With + + TREE_PREMPT_RCU system was stable compiling for over 2 hours,
switching to TINY_RCU, filp count started increasing pretty early after beginning
compiling.

All the relevant information attached (PREEMPT+TINY_RCU):
  config.gz
  ps auxf     |
  slabinfo    |  twice, once early (1-*), the second 30 minutes later (2-*)
  meminfo     |

ls -l proc/*/fd produces 658 lines for the 1-* series of numbers, 300 for 2-*.

In both cases 
   ps -eo pid,class,sched,rtprio,stat,state,sgi_p,cputime,cmd | grep '\[rcu'
returns the same information:
      6 FF    1      1 R    R 0 00:00:00 [rcu_kthread]


according to slabtop filp count is increasing permanentally, (about +1000
every 3 seconds) probably because of top (1s refresh rate) and collectd (10s
rate) scanning /proc (without top, increasing by about 300 every 10s).

Running something like `for ((X=0; X < 200; X++)); do /bin/true; done` causes
count of pid, task_struct, signal_cache slab count to increase by about 200,
but no zombies are being left behind.

1-*  Taken a few minutes after starting compile process, but after having
     SIGSTOPed the compiling process tree
2-*  about 30 minutes later, killed compile process tree, run above for loop
     multiple times, close most terminal sessions (including top)

Between 1-slabinfo and 2-slabinfo some values increased (a lot) while a few
ones did decrease. Don't know which ones are RCU-affected and which ones are
not.

Bruno

Download attachment "config.gz" of type "application/x-gzip" (15707 bytes)

View attachment "1-meminfo" of type "text/plain" (1008 bytes)

View attachment "1-ps_auxf" of type "text/plain" (23343 bytes)

View attachment "1-slabinfo" of type "text/plain" (15853 bytes)

View attachment "2-meminfo" of type "text/plain" (1008 bytes)

View attachment "2-ps_auxf" of type "text/plain" (4728 bytes)

View attachment "2-slabinfo" of type "text/plain" (15854 bytes)