[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <fc05ee3c-097f-709b-7484-1cadc9f3ce22@linux.vnet.ibm.com>
Date: Tue, 20 Sep 2016 23:52:25 +0800
From: Rui Teng <rui.teng@...ux.vnet.ibm.com>
To: Dave Hansen <dave.hansen@...ux.intel.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
Michal Hocko <mhocko@...e.com>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
Vlastimil Babka <vbabka@...e.cz>,
Mike Kravetz <mike.kravetz@...cle.com>,
"Aneesh Kumar K . V" <aneesh.kumar@...ux.vnet.ibm.com>,
Paul Gortmaker <paul.gortmaker@...driver.com>,
Santhosh G <santhog4@...ibm.com>
Subject: Re: [PATCH] memory-hotplug: Fix bad area access on
dissolve_free_huge_pages()
On 9/20/16 10:53 PM, Dave Hansen wrote:
> On 09/20/2016 07:45 AM, Rui Teng wrote:
>> On 9/17/16 12:25 AM, Dave Hansen wrote:
>>>
>>> That's an interesting data point, but it still doesn't quite explain
>>> what is going on.
>>>
>>> It seems like there might be parts of gigantic pages that have
>>> PageHuge() set on tail pages, while other parts don't. If that's true,
>>> we have another bug and your patch just papers over the issue.
>>>
>>> I think you really need to find the root cause before we apply this
>>> patch.
>>>
>> The root cause is the test scripts(tools/testing/selftests/memory-
>> hotplug/mem-on-off-test.sh) changes online/offline status on memory
>> blocks other than page header. It will *randomly* select 10% memory
>> blocks from /sys/devices/system/memory/memory*, and change their
>> online/offline status.
>
> Ahh, that does explain it! Thanks for digging into that!
>
>> That's why we need a PageHead() check now, and why this problem does
>> not happened on systems with smaller huge page such as 16M.
>>
>> As far as the PageHuge() set, I think PageHuge() will return true for
>> all tail pages. Because it will get the compound_head for tail page,
>> and then get its huge page flag.
>> page = compound_head(page);
>>
>> And as far as the failure message, if one memory block is in use, it
>> will return failure when offline it.
>
> That's good, but aren't we still left with a situation where we've
> offlined and dissolved the _middle_ of a gigantic huge page while the
> head page is still in place and online?
>
> That seems bad.
>
What about refusing to change the status for such memory block, if it
contains a huge page which larger than itself? (function
memory_block_action())
I think it will not affect the hot-plug function too much. We can
change the nr_hugepages to zero first, if we really want to hot-plug a
memory.
And I also found that the __test_page_isolated_in_pageblock() function
can not handle a gigantic page well. It will cause a device busy error
later. I am still investigating on that.
Any suggestion?
Thanks!
Powered by blists - more mailing lists