lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1104121830030.14956@chino.kir.corp.google.com>
Date:	Tue, 12 Apr 2011 18:34:15 -0700 (PDT)
From:	David Rientjes <rientjes@...gle.com>
To:	Christoph Lameter <cl@...ux.com>
cc:	Peter Kruse <pk@...eap.de>, eric.dumazet@...il.com,
	linux-kernel@...r.kernel.org
Subject: Re: I have a blaze of 353 page allocation failures, all alike

On Tue, 12 Apr 2011, Christoph Lameter wrote:

> > > it took a while to find a date for a reboot... Unfortunately
> > > it was not possible to get the early boot messages with the
> > > kernel 2.6.32.23 since the compiled in log buffer is too
> > > small. So we installed as you suggested a more recent kernel
> > > 2.6.32.29 with a bigger log buffer, I attach the dmesg
> > > of that, and hope that the information in there is useful.
> > > We will keep an eye on that server with the newer kernel
> > > to see if the allocation failures appear again.
> > 
> > the server was running for a few without any more allocation
> > failures with kernel 2.6.32.29 but at one point the server
> > stopped responding, it was still possible for a while to
> > get a login, and trying to kill some processes but that
> > didn't succeed.  But after that even login was
> > no longer possible so we had to reset it.
> > I attach the call trace, I hope you can find out what is
> > the problem.
> 
> The problem maybe that you have lots and lots of SCSI devices which
> consume ZONE_DMA memory for their control structures. I guess that is
> oversubscribing the 16M zone.
> 

You can try to get more memory reserves specifically for lowmem in 
ZONE_DMA by changing /proc/sys/vm/lowmem_reserve_ratio.  The values are 
ratios, so lowering the numbers will yield larger amounts of memory 
reserves in ZONE_DMA for GFP_DMA allocations.  Try lowering the non-zero 
entries to 1 to reserve the entire zone for lowmem, assuming your system 
has enough RAM for everything else you're running.

This will verify if ZONE_DMA is being depleted from the larger number of 
SCSI devices.  If you don't get any additional page allocation failures, 
then check how much memory in ZONE_DMA is used at peak and that would be a 
sane reserve ratio to use next time you restart the system.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ