linux-kernel - Re: OOM killer firing on 2.6.18 and later during LTP runs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <45697CB3.7020903@shadowen.org>
Date:	Sun, 26 Nov 2006 11:38:27 +0000
From:	Andy Whitcroft <apw@...dowen.org>
To:	Andrew Morton <akpm@...l.org>
CC:	"Martin J. Bligh" <mbligh@...igh.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: OOM killer firing on 2.6.18 and later during LTP runs

Andrew Morton wrote:
> On Sat, 25 Nov 2006 13:03:45 -0800
> "Martin J. Bligh" <mbligh@...igh.org> wrote:
> 
>> On 2.6.18-rc7 and later during LTP:
>> http://test.kernel.org/abat/48393/debug/console.log
> 
> The traces are a bit confusing, but I don't actually see anything wrong
> there.  The machine has used up all swap, has used up all memory and has
> correctly gone and killed things.  After that, there's free memory again.
> 
>> oom-killer: gfp_mask=0x201d2, order=0
>>
>> Call Trace:
>>   [<ffffffff802638cb>] out_of_memory+0x33/0x220
>>   [<ffffffff80265374>] __alloc_pages+0x23a/0x2c3
>>   [<ffffffff802667d2>] __do_page_cache_readahead+0x99/0x212
>>   [<ffffffff80260799>] sync_page+0x0/0x45
>>   [<ffffffff804b304c>] io_schedule+0x28/0x33
>>   [<ffffffff804b32b8>] __wait_on_bit_lock+0x5b/0x66
>>   [<ffffffff8043d849>] dm_any_congested+0x3b/0x42
>>   [<ffffffff80262e50>] filemap_nopage+0x14b/0x353
>>   [<ffffffff8026cf9a>] __handle_mm_fault+0x387/0x93f
>>   [<ffffffff804b6366>] do_page_fault+0x44b/0x7ba
>>   [<ffffffff80245a4e>] autoremove_wake_function+0x0/0x2e
>> oom-killer: gfp_mask=0x280d2, order=0
>>
>> Call Trace:
>>   [<ffffffff802638cb>] out_of_memory+0x33/0x220
>>   [<ffffffff80265374>] __alloc_pages+0x23a/0x2c3
>>   [<ffffffff8026cde3>] __handle_mm_fault+0x1d0/0x93f
>>   [<ffffffff804b6366>] do_page_fault+0x44b/0x7ba
>>   [<ffffffff804b2854>] thread_return+0x0/0xe0
>>   [<ffffffff8020a405>] error_exit+0x0/0x84
>>
>> --------------------------------------------------
>>
>> This doesn't seem to happen every run, unfortnately, only
>> intermittently, and we don't have much data before that, so
>> hard to tell how long it's been going on.
>>
>> Still happening on latest kernels.
>> http://test.kernel.org/abat/62445/debug/console.log
> 
> The same appears to have happened there too.  Although it does seem to have
> killed a lot more than it should have.
> 
> Has something changed in the configuration of that machine?  New LTP
> version?  Less swapsapce?

As far as I know neither LTP has changed nor the machine configuration
has changed.   This is one of the very few machines we run which uses
LVM/dm etc perhaps that is a factor.

/dev/mapper/VolGroup00-LogVol01         partition       2031608 156     -1

We do know that the LTP tests add a bunch of swap and then rip them away
again.  Its possible that something bad happens when that is occuring.
It would change the level of deparation rather dramatically for sure.

Perhaps it would make sense to try out the patch from RedHat.  Sadly its
not really reproducible reliably ... so its hard to know how we tell if
its worked.

Sigh.

-apw

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/