[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200817030425.GA25240@L-31X9LVDL-1304.local>
Date: Mon, 17 Aug 2020 11:04:25 +0800
From: Wei Yang <richard.weiyang@...ux.alibaba.com>
To: Michal Hocko <mhocko@...e.com>
Cc: Mike Kravetz <mike.kravetz@...cle.com>,
Baoquan He <bhe@...hat.com>,
Wei Yang <richard.weiyang@...ux.alibaba.com>,
akpm@...ux-foundation.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 10/10] mm/hugetlb: not necessary to abuse temporary page
to workaround the nasty free_huge_page
On Thu, Aug 13, 2020 at 01:46:38PM +0200, Michal Hocko wrote:
>On Tue 11-08-20 14:43:28, Mike Kravetz wrote:
>> On 8/10/20 11:54 PM, Michal Hocko wrote:
>> >
>> > I have managed to forgot all the juicy details since I have made that
>> > change. All that remains is that the surplus pages accounting was quite
>> > tricky and back then I didn't figure out a simpler method that would
>> > achieve the consistent look at those counters. As mentioned above I
>> > suspect this could lead to pre-mature allocation failures while the
>> > migration is ongoing.
>>
>> It is likely lost in the e-mail thread, but the suggested change was to
>> alloc_surplus_huge_page(). The code which allocates the migration target
>> (alloc_migrate_huge_page) will not be changed. So, this should not be
>> an issue.
>
>OK, I've missed that obviously.
>
>> > Sure quite unlikely to happen and the race window
>> > is likely very small. Maybe this is even acceptable but I would strongly
>> > recommend to have all this thinking documented in the changelog.
>>
>> I wrote down a description of what happens in the two different approaches
>> "temporary page" vs "surplus page". It is at the very end of this e-mail.
>> When looking at the details, I came up with what may be an even better
>> approach. Why not just call the low level routine to free the page instead
>> of going through put_page/free_huge_page? At the very least, it saves a
>> lock roundtrip and there is no need to worry about the counters/accounting.
>>
>> Here is a patch to do that. However, we are optimizing a return path in
>> a race condition that we are unlikely to ever hit. I 'tested' it by allocating
>> an 'extra' page and freeing it via this method in alloc_surplus_huge_page.
>>
>> >From 864c5f8ef4900c95ca3f6f2363a85f3cb25e793e Mon Sep 17 00:00:00 2001
>> From: Mike Kravetz <mike.kravetz@...cle.com>
>> Date: Tue, 11 Aug 2020 12:45:41 -0700
>> Subject: [PATCH] hugetlb: optimize race error return in
>> alloc_surplus_huge_page
>>
>> The routine alloc_surplus_huge_page() could race with with a pool
>> size change. If this happens, the allocated page may not be needed.
>> To free the page, the current code will 'Abuse temporary page to
>> workaround the nasty free_huge_page codeflow'. Instead, directly
>> call the low level routine that free_huge_page uses. This works
>> out well because the page is new, we hold the only reference and
>> already hold the hugetlb_lock.
>>
>> Signed-off-by: Mike Kravetz <mike.kravetz@...cle.com>
>> ---
>> mm/hugetlb.c | 13 ++++++++-----
>> 1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 590111ea6975..ac89b91fba86 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -1923,14 +1923,17 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask,
>> /*
>> * We could have raced with the pool size change.
>> * Double check that and simply deallocate the new page
>> - * if we would end up overcommiting the surpluses. Abuse
>> - * temporary page to workaround the nasty free_huge_page
>> - * codeflow
>> + * if we would end up overcommiting the surpluses.
>> */
>> if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) {
>> - SetPageHugeTemporary(page);
>> + /*
>> + * Since this page is new, we hold the only reference, and
>> + * we already hold the hugetlb_lock call the low level free
>> + * page routine. This saves at least a lock roundtrip.
>> + */
>> + (void)put_page_testzero(page); /* don't call destructor */
>> + update_and_free_page(h, page);
>> spin_unlock(&hugetlb_lock);
>> - put_page(page);
>> return NULL;
>> } else {
>> h->surplus_huge_pages++;
>
>Yes this makes sense. I would have to think about this more to be
>confident and give Acked-by but this looks sensible from a quick glance.
>
If it is ok, I would like to send v2 without this one to give more time
for a discussion?
>Thanks!
>--
>Michal Hocko
>SUSE Labs
--
Wei Yang
Help you, Help me
Powered by blists - more mailing lists