[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87354p5lw8.fsf@linux.ibm.com>
Date: Mon, 24 Apr 2023 21:10:23 +0530
From: Tarun Sahu <tsahu@...ux.ibm.com>
To: Mike Kravetz <mike.kravetz@...cle.com>,
Matthew Wilcox <willy@...radead.org>
Cc: linux-mm@...ck.org, akpm@...ux-foundation.org,
muchun.song@...ux.dev, aneesh.kumar@...ux.ibm.com,
sidhartha.kumar@...cle.com, gerald.schaefer@...ux.ibm.com,
linux-kernel@...r.kernel.org, jaypatel@...ux.ibm.com
Subject: Re: [PATCH] mm/folio: Avoid special handling for order value 0 in
folio_set_order
Hi Mike,
Mike Kravetz <mike.kravetz@...cle.com> writes:
> On 04/14/23 21:12, Matthew Wilcox wrote:
>> On Sat, Apr 15, 2023 at 01:18:32AM +0530, Tarun Sahu wrote:
>> > folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order
>> > folio does not have any tail page to set order.
>>
>> I think you're missing the point of how folio_set_order() is used.
>> When splitting a large folio, we need to zero out the folio_nr_pages
>> in the tail, so it does have a tail page, and that tail page needs to
>> be zeroed. We even assert that there is a tail page:
>>
>> if (WARN_ON_ONCE(!folio_test_large(folio)))
>> return;
>>
>> Or maybe you need to explain yourself better.
>>
>> > folio->_folio_nr_pages is
>> > set to 0 for order 0 in folio_set_order. It is required because
>> > _folio_nr_pages overlapped with page->mapping and leaving it non zero
>> > caused "bad page" error while freeing gigantic hugepages. This was fixed in
>> > Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic
>> > pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA
>> > pages to CMA") now explicitly clear page->mapping and hence we won't see
>> > the bad page error even if _folio_nr_pages remains unset. Also the order 0
>> > folios are not supposed to call folio_set_order, So now we can get rid of
>> > folio_set_order(folio, 0) from hugetlb code path to clear the confusion.
>>
>> ... this is all very confusing.
>>
>> > The patch also moves _folio_set_head and folio_set_order calls in
>> > __prep_compound_gigantic_folio() such that we avoid clearing them in the
>> > error path.
>>
>> But don't we need those bits set while we operate on the folio to set it
>> up? It makes me nervous if we don't have those bits set because we can
>> end up with speculative references that point to a head page while that
>> page is not marked as a head page. It may not be a problem, but I want
>> to see some air-tight analysis of that.
>
> I am fairly certain we are 'safe'. Here is code before setting up the
> pointer to the head page.
>
> * In the case of demote, the ref count will be zero.
> */
> if (!demote) {
> if (!page_ref_freeze(p, 1)) {
> pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n");
> goto out_error;
> }
> } else {
> VM_BUG_ON_PAGE(page_count(p), p);
> }
> if (i != 0)
> set_compound_head(p, &folio->page);
>
> So, before setting the pointer to head page ref count will be zero.
>
> I 'think' it would actually be better to move the calls to _folio_set_head and
> folio_set_order in __prep_compound_gigantic_folio() as suggested here. Why?
> In the current code, the ref count on the 'head page' is still 1 (or more)
> while those calls are made. So, someone could take a speculative ref on the
> page BEFORE the tail pages are set up.
>
Thanks, for confirming the correctness of moving these calls. Also I
didn't look at it this way while moving them. Thanks for the comment.
I will update the commit msg and send the v2.
~Tarun
> TBH, I do not have much of an opinion about potential confusion surrounding
> folio_set_compound_order(folio, 0). IIUC, hugetlb gigantic page setup is the
> only place outside the page allocation code that sets up compound pages/large
> folios. So, it is going to be a bit 'special'. As mentioned, when this was
> originally discussed I suggested folio_clear_order(). I would be happy with
> either.
> --
> Mike Kravetz
Powered by blists - more mailing lists