[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4DD50584.4010408@q-leap.de>
Date: Thu, 19 May 2011 13:56:52 +0200
From: Peter Kruse <pk@...eap.de>
To: Christoph Lameter <cl@...ux.com>
CC: David Rientjes <rientjes@...gle.com>, eric.dumazet@...il.com,
linux-kernel@...r.kernel.org
Subject: Re: I have a blaze of 353 page allocation failures, all alike
Hello again,
you may remember, we have a server running 2.6.32.29
and once in a while it just crashes, meaning that
it just stops responding, and we have to reset it.
The kernel itself and some processes are still running,
and it is possible to initiate the sysRQ.
The mentioned allocation failures are gone after the update
to the kernel version, but the server crashed again
after 40 days running without any problem.
At this time kswapd0/1 started to consume 99% CPU time
and until the server was reset 8 hours later
never used less than 50% CPU time. Shortly after
that (half an hour) the disk reads dropped down to zero. And
a program produced the attached Call Trace. I also
attach the process information of kswapd (provided by collectl).
Thanks,
Peter
On 04/13/2011 06:17 PM, Christoph Lameter wrote:
> On Wed, 13 Apr 2011, Peter Kruse wrote:
>
>> Hello,
>>
>> thanks for your replies, I appreciate that.
>>
>> On 04/13/2011 03:34 AM, David Rientjes wrote:
>>> On Tue, 12 Apr 2011, Christoph Lameter wrote:
>>>
>>>> The problem maybe that you have lots and lots of SCSI devices which
>>>> consume ZONE_DMA memory for their control structures. I guess that is
>>>> oversubscribing the 16M zone.
>>
>> but there are only two devices:
>
> The output you sent me showed a long list of devices. Maybe there is a
> broken driver /device that continues being probed?
>
--
Peter Kruse <pk@...eap.de>
Q-Leap Networks GmbH
phone: +497034-2776-175, mobile: +491522-1593877
View attachment "messages" of type "text/plain" (19490 bytes)
View attachment "kswapd.collectl" of type "text/plain" (4501 bytes)
Powered by blists - more mailing lists