linux-kernel - Re: [PATCH] mm: fix null pointer dereference in wait_iff

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <50DD9EA7.6050309@iskon.hr>
Date:	Fri, 28 Dec 2012 14:29:11 +0100
From:	Zlatko Calusic <zlatko.calusic@...on.hr>
To:	Minchan Kim <minchan@...nel.org>
CC:	Andrew Morton <akpm@...ux-foundation.org>,
	Mel Gorman <mgorman@...e.de>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Hugh Dickins <hughd@...gle.com>, linux-mm <linux-mm@...ck.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Zhouping Liu <zliu@...hat.com>,
	Sedat Dilek <sedat.dilek@...il.com>
Subject: Re: [PATCH] mm: fix null pointer dereference in wait_iff_congested()

On 28.12.2012 03:49, Minchan Kim wrote:
> Hello Zlatko,
>
> On Fri, Dec 28, 2012 at 03:16:38AM +0100, Zlatko Calusic wrote:
>> From: Zlatko Calusic <zlatko.calusic@...on.hr>
>>
>> The unintended consequence of commit 4ae0a48b is that
>> wait_iff_congested() can now be called with NULL struct zone*
>> producing kernel oops like this:
>
> For good description, it would be better to write simple pseudo code
> flow to show how NULL-zone pass into wait_iff_congested because
> kswapd code flow is too complex.
>
> As I see the code, we have following line above wait_iff_congested.
>
> if (!unbalanced_zone || blah blah)
>          break;
>
> How can NULL unbalanced_zone reach wait_iff_congested?
>

Hello Minchan, and thanks for the comment.

That line was there before commit 4ae0a48b got in, and you're right, 
it's what was protecting wait_iff_congested() from being called with 
NULL zone*. But then all that logic got colapsed to a simple 
pgdat_balanced() call and that's when I introduced the bug, I lost the 
protection.

What I _think_ is happening (pseudo code following...) is that after 
scanning the zone in the dma->highmem direction, and concluding that all 
zones are balanced (unbalanced_zone remains NULL!), 
wake_up(&pgdat->pfmemalloc_wait) wakes up a lot of memory hungry 
processes (especially true in various aggressive test/benchmarks) that 
immediately drain and unbalance one or more zones. Then pgdat_balanced() 
call which immediately follows will be false, but we still have 
unbalanced_zone = NULL, rememeber? Oops...

But, all that is a speculation that I can't prove atm. Of course, if 
anybody thinks that's a credible explanation, I could add it as a commit 
comment, or even as a code comment, but I didn't want to be overly 
imaginative. The fix itself is simple and real.

Regards,
-- 
Zlatko
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/