linux-kernel - Re: 2.6.34-rc3: simple du (on a big xfs tree) triggers oom killer [bisected: 57817c68229984818fea9e614d6f95249c3fb098]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 7 Apr 2010 11:45:33 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Hans-Peter Jansen <hpj@...la.net>
Cc:	linux-kernel@...r.kernel.org, opensuse-kernel@...nsuse.org,
	xfs@....sgi.com
Subject: Re: 2.6.34-rc3: simple du (on a big xfs tree) triggers oom killer
 [bisected: 57817c68229984818fea9e614d6f95249c3fb098]

On Wed, Apr 07, 2010 at 09:11:44AM +1000, Dave Chinner wrote:
> On Tue, Apr 06, 2010 at 04:52:57PM +0200, Hans-Peter Jansen wrote:
> > Hi Dave,
> > 
> > On Tuesday 06 April 2010, 01:06:00 Dave Chinner wrote:
> > > On Mon, Apr 05, 2010 at 01:35:41PM +0200, Hans-Peter Jansen wrote:
> > > > >
> > > > > Oh, this is a highmem box. You ran out of low memory, I think, which
> > > > > is where all the inodes are cached. Seems like a VM problem or a
> > > > > highmem/lowmem split config problem to me, not anything to do with
> > > > > XFS...
> 
> [snip]
> 
> > Dave, I really don't want to disappoint you, but a lengthy bisection session 
> > points to:
> > 
> > 57817c68229984818fea9e614d6f95249c3fb098 is the first bad commit
> > commit 57817c68229984818fea9e614d6f95249c3fb098
> > Author: Dave Chinner <david@...morbit.com>
> > Date:   Sun Jan 10 23:51:47 2010 +0000
> > 
> >     xfs: reclaim all inodes by background tree walks
> 
> Interesting. I did a fair bit of low memory testing when i made that
> change (admittedly none on a highmem i386 box), and since then I've
> done lots of "millions of files" tree creates, traversals and destroys on
> limited memory machines without triggering problems when memory is
> completely full of inodes.
> 
> Let me try to reproduce this on a small VM and I'll get back to you.

OK, if there is page cache pressure (e.g. creating small files or
grepping the resultant tree) or the machine has significant amounts
of memory (e.g. >= 4GB) then I can't reproduce this.

However, if the memory pressure is purely inode cache (creating zero
length files or read-only traversal), then the OOM killer kicks a
while after the slab cache fills memory.  This doesn't need highmem;
I used a x86_64 kernel on a VM w/ 1GB RAM to reliably reproduce
this.  I'll add zero length file tests and traversals to my low
memory testing.

The best way to fix this, I think, is to trigger a shrinker callback
when memory is low to run the background inode reclaim. The problem
is that these inode caches and the reclaim state are per-filesystem,
not global state, and the current shrinker interface only works with
global state.

Hence there are two patches to this fix - the first adds a context
to the shrinker callout, and the second adds the XFS infrastructure
to track the number of reclaimable inodes per filesystem and
register/unregister shrinkers for each filesystem.

With these patches, my reproducable test case which locked the
machine up with a OOM panic in a couple of minutes has been running
for over half an hour. I have much more confidence in this change
with limited testing than the reverting of the background inode
reclaim as the revert introduces 

The patches below apply to the xfs-dev tree, which is currently at
34-rc1. If they don't apply, let me know and I'll redo them against
a vanilla kernel tree. Can you test them to see if the problem goes
away? If the problem is fixed, I'll push them for a proper review
cycle...

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com

View attachment "shrinker-context" of type "text/plain" (10312 bytes)

View attachment "xfs-inode-reclaim-shrinker" of type "text/plain" (6499 bytes)