linux-kernel - Re: mm/compaction: BUG: NULL pointer dereference

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9ae23db2-e696-047b-af18-1e75ebbda085@arm.com>
Date:   Fri, 24 May 2019 18:43:59 +0530
From:   Anshuman Khandual <anshuman.khandual@....com>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Suzuki K Poulose <suzuki.poulose@....com>, linux-mm@...ck.org,
        akpm@...ux-foundation.org, mhocko@...e.com, cai@....pw,
        linux-kernel@...r.kernel.org, marc.zyngier@....com,
        kvmarm@...ts.cs.columbia.edu, kvm@...r.kernel.org
Subject: Re: mm/compaction: BUG: NULL pointer dereference



On 05/24/2019 06:00 PM, Mel Gorman wrote:
> On Fri, May 24, 2019 at 04:26:16PM +0530, Anshuman Khandual wrote:
>>
>>
>> On 05/24/2019 02:50 PM, Suzuki K Poulose wrote:
>>> Hi,
>>>
>>> We are hitting NULL pointer dereferences while running stress tests with KVM.
>>> See splat [0]. The test is to spawn 100 VMs all doing standard debian
>>> installation (Thanks to Marc's automated scripts, available here [1] ).
>>> The problem has been reproduced with a better rate of success from 5.1-rc6
>>> onwards.
>>>
>>> The issue is only reproducible with swapping enabled and the entire
>>> memory is used up, when swapping heavily. Also this issue is only reproducible
>>> on only one server with 128GB, which has the following memory layout:
>>>
>>> [32GB@4GB, hole , 96GB@...GB]
>>>
>>> Here is my non-expert analysis of the issue so far.
>>>
>>> Under extreme memory pressure, the kswapd could trigger reset_isolation_suitable()
>>> to figure out the cached values for migrate/free pfn for a zone, by scanning through
>>> the entire zone. On our server it does so in the range of [ 0x10_0000, 0xa00_0000 ],
>>> with the following area of holes : [ 0x20_0000, 0x880_0000 ].
>>> In the failing case, we end up setting the cached migrate pfn as : 0x508_0000, which
>>> is right in the center of the zone pfn range. i.e ( 0x10_0000 + 0xa00_0000 ) / 2,
>>> with reset_migrate = 0x88_4e00, reset_free = 0x10_0000.
>>>
>>> Now these cached values are used by the fast_isolate_freepages() to find a pfn. However,
>>> since we cant find anything during the search we fall back to using the page belonging
>>> to the min_pfn (which is the migrate_pfn), without proper checks to see if that is valid
>>> PFN or not. This is then passed on to fast_isolate_around() which tries to do :
>>> set_pageblock_skip(page) on the page which blows up due to an NULL mem_section pointer.
>>>
>>> The following patch seems to fix the issue for me, but I am not quite convinced that
>>> it is the right fix. Thoughts ?
>>>
>>>
>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>> index 9febc8c..9e1b9ac 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc)
>>>  				page = pfn_to_page(highest);
>>>  				cc->free_pfn = highest;
>>>  			} else {
>>> -				if (cc->direct_compaction) {
>>> +				if (cc->direct_compaction && pfn_valid(min_pfn)) {
>>>  					page = pfn_to_page(min_pfn);
>>
>> pfn_to_online_page() here would be better as it does not add pfn_valid() cost on
>> architectures which does not subscribe to CONFIG_HOLES_IN_ZONE. But regardless if
>> the compaction is trying to scan pfns in zone holes, then it should be avoided.
> 
> CONFIG_HOLES_IN_ZONE typically applies in special cases where an arch
> punches holes within a section. As both do a section lookup, the cost is
> similar but pfn_valid in general is less subtle in this case. Normally
> pfn_valid_within is only ok when a pfn_valid check has been made on the
> max_order aligned range as well as a zone boundary check. In this case,
> it's much more straight-forward to leave it as pfn_valid.

Sure, makes sense.