linux-kernel - Re: [-rc7 regression] Buggy commit: "mm: use aligned zone start for pfn_to

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <512275FF.9050508@codeaurora.org>
Date:	Mon, 18 Feb 2013 10:42:07 -0800
From:	Laura Abbott <lauraa@...eaurora.org>
To:	Mel Gorman <mgorman@...e.de>
CC:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Ingo Molnar <mingo@...nel.org>,
	Yinghai Lu <yinghai@...nel.org>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Jens Axboe <axboe@...nel.dk>,
	Alexander Viro <viro@....linux.org.uk>,
	Theodore Ts'o <tytso@....edu>, "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [-rc7 regression] Buggy commit: "mm: use aligned zone start for
 pfn_to_bitidx calculation"

On 2/18/2013 6:46 AM, Mel Gorman wrote:
> On Sat, Feb 16, 2013 at 10:26:30AM -0800, Linus Torvalds wrote:
>> On Fri, Feb 15, 2013 at 3:44 AM, Ingo Molnar <mingo@...nel.org> wrote:
>>>>
>>>> c060f943d092 may be related as you config does not have
>>>> CONFIG_SPARSEMEM defined.
>>>
>>> Right, that's the commit causing the x86 regression:
>>>
>>>   c060f943d0929f3e429c5d9522290584f6281d6e is the first bad commit
>>>   commit c060f943d0929f3e429c5d9522290584f6281d6e
>>>   Date:   Fri Jan 11 14:31:51 2013 -0800
>>>
>>>       mm: use aligned zone start for pfn_to_bitidx calculation
>>
>> Ok, looking more at this, I don't really want to revert it, and I have
>> an idea of what is wrong.
>>
>> When we allocate the zone use bitmap, we do not take the
>> zone_start_pfn into account. So I *think* that what happens is that
>> "pfn_to_bitidx()" simply overruns the allocation for unaligned zonesm
>> and the spinlock just happens to be right after (or the overrun causes
>> some other memory corruption that then indirectly causes the spinlock
>> corruption).
>>
>
> More likely the latter. I'd expect the usemap to be adjacent to the
> zone->wait_table because of when they are allocated by the bootmem
> allocator. This would break wait_on_page_[locked|writeback] at the very
> least. If page_waitqueue() returned a corrupt pointer from the wait table
> then it would lead to further corruption elsewhere each time wait_on_page_foo
> was called.
>
>> So I'm wondering if the fix is simply something like the attached
>> patch. It takes the zone_start_pfn into account when allocating the
>> zone bitmap.
>>
>> Laura? Mel?
>>
>
> Looks correct to me and should cc stable@...r.kernel.org
>
> Acked-by: Mel Gorman <mgorman@...e.de>
>

I had convinced myself when I sent the patch that everything would just 
shift down and there wouldn't need to be an array size increase. Looks 
like my math was bogus and I'll double check it next time. The updated 
version looks okay to me and I'll pull in the patch for more testing on 
the setup that originally found the problem this week.

Thanks,
Laura

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/