linux-kernel - Re: [Bug #13648] nfsd: page allocation failure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 30 Jul 2009 14:30:42 -0700 (PDT)
From:	David Rientjes <rientjes@...gle.com>
To:	Stephan von Krawczynski <skraw@...net.com>
cc:	"Rafael J. Wysocki" <rjw@...k.pl>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Kernel Testers List <kernel-testers@...r.kernel.org>,
	Justin Piszcz <jpiszcz@...idpixels.com>,
	Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
	Jesse Brandeburg <jesse.brandeburg@...el.com>,
	"David S. Miller" <davem@...emloft.com>
Subject: Re: [Bug #13648] nfsd: page allocation failure

On Mon, 27 Jul 2009, Stephan von Krawczynski wrote:

> This is no regression between 2.6.29 and 2.6.30.
> In fact we could reproduce the problem with kernel versions:
> 
> 2.6.27.26 < X <= 2.6.30.3
> 
> (Meaning 2.6.27.26 is the last one _not_ showing the problem).
> 

And 2.6.28.10 is showing the exact same problem as initially reported, 
right?

I noticed your /var/log/messages is showing you're using slub as opposed 
to slab (which Justin was using, and causing order-0 allocations errors).  
SLUB uses order-1 allocations for this cache growth and it's failing 
because of memory fragmentation, not because you're truly oom.

The only thing that is immediately apparent that changed in this path over 
these kernel versions (there were significant changes to e1000e) is the 
CRC stripping.  If it's loaded as a module, perhaps you could try

	modprobe e1000e CrcStripping=0,0

(assuming you have two adapters).

I've cc'd some relevant e1000e driver people in the hopes they'll be able 
to diagnose this problem.  Memory fragmentation as the result of page 
group changes wouldn't affect order-0 allocations such as this on slab, so 
it's doubtful the VM regressed if you can reproduce the problem with 
CONFIG_SLAB.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/