[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ed175cd4-1411-459e-e892-7d889e1253c0@huawei.com>
Date: Wed, 23 Mar 2022 10:31:06 +0800
From: Miaohe Lin <linmiaohe@...wei.com>
To: Zi Yan <ziy@...dia.com>
CC: Roman Gushchin <roman.gushchin@...ux.dev>,
Shuah Khan <shuah@...nel.org>, Yang Shi <shy828301@...il.com>,
Hugh Dickins <hughd@...gle.com>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
<linux-kernel@...r.kernel.org>, <cgroups@...r.kernel.org>,
<linux-kselftest@...r.kernel.org>,
Matthew Wilcox <willy@...radead.org>, <linux-mm@...ck.org>,
Yu Zhao <yuzhao@...gle.com>
Subject: Re: [RFC PATCH 3/5] mm: thp: split huge page to any lower order
pages.
On 2022/3/22 22:30, Zi Yan wrote:
> On 21 Mar 2022, at 23:21, Miaohe Lin wrote:
>
>> On 2022/3/21 22:21, Zi Yan wrote:
>>> From: Zi Yan <ziy@...dia.com>
>>>
>>> To split a THP to any lower order pages, we need to reform THPs on
>>> subpages at given order and add page refcount based on the new page
>>> order. Also we need to reinitialize page_deferred_list after removing
>>> the page from the split_queue, otherwise a subsequent split will see
>>> list corruption when checking the page_deferred_list again.
>>>
>>> It has many uses, like minimizing the number of pages after
>>> truncating a pagecache THP. For anonymous THPs, we can only split them
>>> to order-0 like before until we add support for any size anonymous THPs.
>>>
>>> Signed-off-by: Zi Yan <ziy@...dia.com>
>>> ---
>>> include/linux/huge_mm.h | 8 +++
>>> mm/huge_memory.c | 111 ++++++++++++++++++++++++++++++----------
>>> 2 files changed, 91 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>> index 2999190adc22..c7153cd7e9e4 100644
>>> --- a/include/linux/huge_mm.h
>>> +++ b/include/linux/huge_mm.h
>>> @@ -186,6 +186,8 @@ void free_transhuge_page(struct page *page);
>>>
>>> bool can_split_folio(struct folio *folio, int *pextra_pins);
>>> int split_huge_page_to_list(struct page *page, struct list_head *list);
>>> +int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>>> + unsigned int new_order);
>>> static inline int split_huge_page(struct page *page)
>>> {
>>> return split_huge_page_to_list(page, NULL);
>>> @@ -355,6 +357,12 @@ split_huge_page_to_list(struct page *page, struct list_head *list)
>>> {
>>> return 0;
>>> }
>>> +static inline int
>>> +split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
>>> + unsigned int new_order)
>>> +{
>>> + return 0;
>>> +}
>>> static inline int split_huge_page(struct page *page)
>>> {
>>> return 0;
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index fcfa46af6c4c..3617aa3ad0b1 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -2236,11 +2236,13 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma,
>>> static void unmap_page(struct page *page)
>>> {
>>> struct folio *folio = page_folio(page);
>>> - enum ttu_flags ttu_flags = TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD |
>>> - TTU_SYNC;
>>> + enum ttu_flags ttu_flags = TTU_RMAP_LOCKED | TTU_SYNC;
>>>
>>> VM_BUG_ON_PAGE(!PageHead(page), page);
>>>
>>> + if (folio_order(folio) >= HPAGE_PMD_ORDER)
>>> + ttu_flags |= TTU_SPLIT_HUGE_PMD;
>>> +
>>> /*
>>> * Anon pages need migration entries to preserve them, but file
>>> * pages can simply be left unmapped, then faulted back on demand.
>>> @@ -2254,9 +2256,9 @@ static void unmap_page(struct page *page)
>>> VM_WARN_ON_ONCE_PAGE(page_mapped(page), page);
>>> }
>>>
>>> -static void remap_page(struct folio *folio, unsigned long nr)
>>> +static void remap_page(struct folio *folio, unsigned short nr)
>>> {
>>> - int i = 0;
>>> + unsigned int i;
>>>
>>> /* If unmap_page() uses try_to_migrate() on file, remove this check */
>>> if (!folio_test_anon(folio))
>>> @@ -2274,7 +2276,6 @@ static void lru_add_page_tail(struct page *head, struct page *tail,
>>> struct lruvec *lruvec, struct list_head *list)
>>> {
>>> VM_BUG_ON_PAGE(!PageHead(head), head);
>>> - VM_BUG_ON_PAGE(PageCompound(tail), head);
>>> VM_BUG_ON_PAGE(PageLRU(tail), head);
>>> lockdep_assert_held(&lruvec->lru_lock);
>>>
>>> @@ -2295,9 +2296,10 @@ static void lru_add_page_tail(struct page *head, struct page *tail,
>>> }
>>>
>>> static void __split_huge_page_tail(struct page *head, int tail,
>>> - struct lruvec *lruvec, struct list_head *list)
>>> + struct lruvec *lruvec, struct list_head *list, unsigned int new_order)
>>> {
>>> struct page *page_tail = head + tail;
>>> + unsigned long compound_head_flag = new_order ? (1L << PG_head) : 0;
>>>
>>> VM_BUG_ON_PAGE(atomic_read(&page_tail->_mapcount) != -1, page_tail);
>>>
>>> @@ -2321,6 +2323,7 @@ static void __split_huge_page_tail(struct page *head, int tail,
>>> #ifdef CONFIG_64BIT
>>> (1L << PG_arch_2) |
>>> #endif
>>> + compound_head_flag |
>>> (1L << PG_dirty)));
>>>
>>> /* ->mapping in first tail page is compound_mapcount */
>>> @@ -2329,7 +2332,10 @@ static void __split_huge_page_tail(struct page *head, int tail,
>>> page_tail->mapping = head->mapping;
>>> page_tail->index = head->index + tail;
>>>
>>> - /* Page flags must be visible before we make the page non-compound. */
>>> + /*
>>> + * Page flags must be visible before we make the page non-compound or
>>> + * a compound page in new_order.
>>> + */
>>> smp_wmb();
>>>
>>> /*
>>> @@ -2339,10 +2345,15 @@ static void __split_huge_page_tail(struct page *head, int tail,
>>> * which needs correct compound_head().
>>> */
>>> clear_compound_head(page_tail);
>>> + if (new_order) {
>>> + prep_compound_page(page_tail, new_order);
>>> + prep_transhuge_page(page_tail);
>>> + }
>>
>> Many thanks for your series. It looks really good. One question:
>> IIUC, It seems there has assumption that LRU compound_pages should
>> be PageTransHuge. So PageTransHuge just checks PageHead:
>>
>> static inline int PageTransHuge(struct page *page)
>> {
>> VM_BUG_ON_PAGE(PageTail(page), page);
>> return PageHead(page);
>> }
>>
>> So LRU pages with any order( > 0) will might be wrongly treated as THP which
>> has order = HPAGE_PMD_ORDER. We should ensure thp_nr_pages is used instead of
>> hard coded HPAGE_PMD_ORDER.
>>
>> Looks at the below code snippet:
>> mm/mempolicy.c:
>> static struct page *new_page(struct page *page, unsigned long start)
>> {
>> ...
>> } else if (PageTransHuge(page)) {
>> struct page *thp;
>>
>> thp = alloc_hugepage_vma(GFP_TRANSHUGE, vma, address,
>> HPAGE_PMD_ORDER);
>> ^^^^^^^^^^^^^^^^
>> if (!thp)
>> return NULL;
>> prep_transhuge_page(thp);
>> return thp;
>> }
>> ...
>> }
>>
>> HPAGE_PMD_ORDER is used instead of thp_nr_pages. So the lower order pages might be
>> used as if its order is HPAGE_PMD_ORDER. All of such usage might need to be fixed.
>> Or am I miss something ?
>>
>> Thanks again for your work. :)
>
> THP will still only have HPAGE_PMD_ORDER and will not be split into any order
> other than 0. This series only allows to split huge page cache folio (added by Matthew)
> into any lower order. I have an explicit VM_BUG_ON() to ensure new_order
> is only 0 when non page cache page is the input. Since there is still non-trivial
> amount of work to add any order THP support in the kernel. IIRC, Yu Zhao (cc’d) was
> planning to work on that.
>
Many thanks for clarifying. I'm sorry but I haven't followed Matthew's patches. I am
wondering could huge page cache folio be treated as THP ? If so, how to ensure the
correctness of huge page cache ?
Thanks again!
> Thanks for checking the patches.
BTW: I like your patches. It's really interesting. :)
>
Powered by blists - more mailing lists