linux-kernel - Re: I have a blaze of 353 page allocation failures, all alike

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4D5BC16A.2090205@q-leap.com>
Date:	Wed, 16 Feb 2011 13:22:02 +0100
From:	Peter Kruse <pk@...eap.com>
To:	Christoph Lameter <cl@...ux.com>
CC:	linux-kernel@...r.kernel.org
Subject: Re: I have a blaze of 353 page allocation failures, all alike

Hi Christoph,

thanks again for your time.

Christoph Lameter wrote:
> On Tue, 15 Feb 2011, Peter Kruse wrote:
> 
>> > > we have set vm.min_free_kbytes = 2097152 but the problem
>> > > obviously did not go away.
>> >
>> > 2GB of reserves? How much memory does your system have?
>>
>> 48GB
> 
> Ok then you just may potentially clog up the DMA zones. Maybe set the
> reserves to a reasonable level like 10M or so?

ok, that's what we had before the first incident, and then increased
it to this value to see if it makes difference.

> 
> How many buffers are configured at the various levels for the device that
> is receiving messages? I guess that may be a bit on the high side?

hm, I'm not sure if I know what you want mean or want me to do.

> 
>> > Could you post the entire messages from the kernel log? We need the OOM
>> > info to figure out more about the problem.
>> >
>>
>> I attach one of the call traces, or would it be better if I send the
>> kern.log (about 6MB)?
> 
> The call traces are sufficient but the traces vanished when I hit reply.
> Include them inline next time. It would be good to have the log starting
> at the last system boot. There is some information cut off that I would to
> see.

Ok, I attach the gzipped kern.log.

> 
> An atomic order 1 allocation failed and led to the OOM but it seems that
> there is still ample memory available. Slab is in "fallback_alloc" so
> something went wrong with the regular allocation attempt. Any use of
> cpusets or cgroups?

not that I know of, no.

> 
> A significant amount of memory has been allocated to reclaimable slabs.
> I guess these are the socket buffers?
> 
> Feb 10 11:59:49 beosrv1-t kernel: [1968911.211777] Node 0 Normal
> free:965164kB min:917952kB low:1147440kB high:1376928kB
> active_anon:2742680kB inactive_anon:293184kB active_file:4801512kB
> inactive_file:11129708kB unevictable:0kB isolated(anon):0kB
> isolated(file):0kB present:21719040kB mlocked:0kB dirty:600kB
> writeback:0kB mapped:26356kB shmem:4896kB slab_reclaimable:1780208kB 
> <-----!!
> slab_unreclaimable:199576kB kernel_stack:1576kB pagetables:22956kB
> unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
> all_unreclaimable? no
> 
> Could you try to reduce the number of network buffers?

which parameter?

thanks,

   Peter


Download attachment "kern.log.gz" of type "application/x-gzip" (330458 bytes)