linux-kernel - Re: [PATCH] memory-hotplug: Fix bad area access on dissolve_free_huge

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <fc05ee3c-097f-709b-7484-1cadc9f3ce22@linux.vnet.ibm.com>
Date:   Tue, 20 Sep 2016 23:52:25 +0800
From:   Rui Teng <rui.teng@...ux.vnet.ibm.com>
To:     Dave Hansen <dave.hansen@...ux.intel.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
        Michal Hocko <mhocko@...e.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        "Aneesh Kumar K . V" <aneesh.kumar@...ux.vnet.ibm.com>,
        Paul Gortmaker <paul.gortmaker@...driver.com>,
        Santhosh G <santhog4@...ibm.com>
Subject: Re: [PATCH] memory-hotplug: Fix bad area access on
 dissolve_free_huge_pages()

On 9/20/16 10:53 PM, Dave Hansen wrote:
> On 09/20/2016 07:45 AM, Rui Teng wrote:
>> On 9/17/16 12:25 AM, Dave Hansen wrote:
>>>
>>> That's an interesting data point, but it still doesn't quite explain
>>> what is going on.
>>>
>>> It seems like there might be parts of gigantic pages that have
>>> PageHuge() set on tail pages, while other parts don't.  If that's true,
>>> we have another bug and your patch just papers over the issue.
>>>
>>> I think you really need to find the root cause before we apply this
>>> patch.
>>>
>> The root cause is the test scripts(tools/testing/selftests/memory-
>> hotplug/mem-on-off-test.sh) changes online/offline status on memory
>> blocks other than page header. It will *randomly* select 10% memory
>> blocks from /sys/devices/system/memory/memory*, and change their
>> online/offline status.
>
> Ahh, that does explain it!  Thanks for digging into that!
>
>> That's why we need a PageHead() check now, and why this problem does
>> not happened on systems with smaller huge page such as 16M.
>>
>> As far as the PageHuge() set, I think PageHuge() will return true for
>> all tail pages. Because it will get the compound_head for tail page,
>> and then get its huge page flag.
>>     page = compound_head(page);
>>
>> And as far as the failure message, if one memory block is in use, it
>> will return failure when offline it.
>
> That's good, but aren't we still left with a situation where we've
> offlined and dissolved the _middle_ of a gigantic huge page while the
> head page is still in place and online?
>
> That seems bad.
>
What about refusing to change the status for such memory block, if it
contains a huge page which larger than itself? (function
memory_block_action())

I think it will not affect the hot-plug function too much. We can
change the nr_hugepages to zero first, if we really want to hot-plug a
memory.

And I also found that the __test_page_isolated_in_pageblock() function
can not handle a gigantic page well. It will cause a device busy error
later. I am still investigating on that.

Any suggestion?

Thanks!