linux-kernel - Re: I have a blaze of 353 page allocation failures, all alike

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 12 Apr 2011 18:34:15 -0700 (PDT)
From:	David Rientjes <rientjes@...gle.com>
To:	Christoph Lameter <cl@...ux.com>
cc:	Peter Kruse <pk@...eap.de>, eric.dumazet@...il.com,
	linux-kernel@...r.kernel.org
Subject: Re: I have a blaze of 353 page allocation failures, all alike

On Tue, 12 Apr 2011, Christoph Lameter wrote:

> > > it took a while to find a date for a reboot... Unfortunately
> > > it was not possible to get the early boot messages with the
> > > kernel 2.6.32.23 since the compiled in log buffer is too
> > > small. So we installed as you suggested a more recent kernel
> > > 2.6.32.29 with a bigger log buffer, I attach the dmesg
> > > of that, and hope that the information in there is useful.
> > > We will keep an eye on that server with the newer kernel
> > > to see if the allocation failures appear again.
> > 
> > the server was running for a few without any more allocation
> > failures with kernel 2.6.32.29 but at one point the server
> > stopped responding, it was still possible for a while to
> > get a login, and trying to kill some processes but that
> > didn't succeed.  But after that even login was
> > no longer possible so we had to reset it.
> > I attach the call trace, I hope you can find out what is
> > the problem.
> 
> The problem maybe that you have lots and lots of SCSI devices which
> consume ZONE_DMA memory for their control structures. I guess that is
> oversubscribing the 16M zone.
> 

You can try to get more memory reserves specifically for lowmem in 
ZONE_DMA by changing /proc/sys/vm/lowmem_reserve_ratio.  The values are 
ratios, so lowering the numbers will yield larger amounts of memory 
reserves in ZONE_DMA for GFP_DMA allocations.  Try lowering the non-zero 
entries to 1 to reserve the entire zone for lowmem, assuming your system 
has enough RAM for everything else you're running.

This will verify if ZONE_DMA is being depleted from the larger number of 
SCSI devices.  If you don't get any additional page allocation failures, 
then check how much memory in ZONE_DMA is used at peak and that would be a 
sane reserve ratio to use next time you restart the system.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/