lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4DD50584.4010408@q-leap.de>
Date:	Thu, 19 May 2011 13:56:52 +0200
From:	Peter Kruse <pk@...eap.de>
To:	Christoph Lameter <cl@...ux.com>
CC:	David Rientjes <rientjes@...gle.com>, eric.dumazet@...il.com,
	linux-kernel@...r.kernel.org
Subject: Re: I have a blaze of 353 page allocation failures, all alike

Hello again,

you may remember, we have a server running 2.6.32.29
and once in a while it just crashes, meaning that
it just stops responding, and we have to reset it.
The kernel itself and some processes are still running,
and it is possible to initiate the sysRQ.
The mentioned allocation failures are gone after the update
to the kernel version, but the server crashed again
after 40 days running without any problem.

At this time kswapd0/1 started to consume 99% CPU time
and until the server was reset 8 hours later
never used less than 50% CPU time.  Shortly after
that (half an hour) the disk reads dropped down to zero.  And
a program produced the attached Call Trace.  I also
attach the process information of kswapd (provided by collectl).

Thanks,

  Peter

On 04/13/2011 06:17 PM, Christoph Lameter wrote:
> On Wed, 13 Apr 2011, Peter Kruse wrote:
> 
>> Hello,
>>
>> thanks for your replies, I appreciate that.
>>
>> On 04/13/2011 03:34 AM, David Rientjes wrote:
>>> On Tue, 12 Apr 2011, Christoph Lameter wrote:
>>>
>>>> The problem maybe that you have lots and lots of SCSI devices which
>>>> consume ZONE_DMA memory for their control structures. I guess that is
>>>> oversubscribing the 16M zone.
>>
>> but there are only two devices:
> 
> The output you sent me showed a long list of devices. Maybe there is a
> broken driver /device that continues being probed?
> 


-- 
Peter Kruse <pk@...eap.de>
Q-Leap Networks GmbH
phone: +497034-2776-175, mobile: +491522-1593877

View attachment "messages" of type "text/plain" (19490 bytes)

View attachment "kswapd.collectl" of type "text/plain" (4501 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ