[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1104121830030.14956@chino.kir.corp.google.com>
Date: Tue, 12 Apr 2011 18:34:15 -0700 (PDT)
From: David Rientjes <rientjes@...gle.com>
To: Christoph Lameter <cl@...ux.com>
cc: Peter Kruse <pk@...eap.de>, eric.dumazet@...il.com,
linux-kernel@...r.kernel.org
Subject: Re: I have a blaze of 353 page allocation failures, all alike
On Tue, 12 Apr 2011, Christoph Lameter wrote:
> > > it took a while to find a date for a reboot... Unfortunately
> > > it was not possible to get the early boot messages with the
> > > kernel 2.6.32.23 since the compiled in log buffer is too
> > > small. So we installed as you suggested a more recent kernel
> > > 2.6.32.29 with a bigger log buffer, I attach the dmesg
> > > of that, and hope that the information in there is useful.
> > > We will keep an eye on that server with the newer kernel
> > > to see if the allocation failures appear again.
> >
> > the server was running for a few without any more allocation
> > failures with kernel 2.6.32.29 but at one point the server
> > stopped responding, it was still possible for a while to
> > get a login, and trying to kill some processes but that
> > didn't succeed. But after that even login was
> > no longer possible so we had to reset it.
> > I attach the call trace, I hope you can find out what is
> > the problem.
>
> The problem maybe that you have lots and lots of SCSI devices which
> consume ZONE_DMA memory for their control structures. I guess that is
> oversubscribing the 16M zone.
>
You can try to get more memory reserves specifically for lowmem in
ZONE_DMA by changing /proc/sys/vm/lowmem_reserve_ratio. The values are
ratios, so lowering the numbers will yield larger amounts of memory
reserves in ZONE_DMA for GFP_DMA allocations. Try lowering the non-zero
entries to 1 to reserve the entire zone for lowmem, assuming your system
has enough RAM for everything else you're running.
This will verify if ZONE_DMA is being depleted from the larger number of
SCSI devices. If you don't get any additional page allocation failures,
then check how much memory in ZONE_DMA is used at peak and that would be a
sane reserve ratio to use next time you restart the system.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists