[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1fad03a6-98cf-1b0e-e012-82dc6466c7d2@linux.alibaba.com>
Date: Sun, 8 May 2022 17:19:43 +0800
From: Baolin Wang <baolin.wang@...ux.alibaba.com>
To: Mike Kravetz <mike.kravetz@...cle.com>, akpm@...ux-foundation.org,
catalin.marinas@....com, will@...nel.org
Cc: tsbogend@...ha.franken.de, James.Bottomley@...senPartnership.com,
deller@....de, mpe@...erman.id.au, benh@...nel.crashing.org,
paulus@...ba.org, hca@...ux.ibm.com, gor@...ux.ibm.com,
agordeev@...ux.ibm.com, borntraeger@...ux.ibm.com,
svens@...ux.ibm.com, ysato@...rs.sourceforge.jp, dalias@...c.org,
davem@...emloft.net, arnd@...db.de,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-ia64@...r.kernel.org, linux-mips@...r.kernel.org,
linux-parisc@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
linux-s390@...r.kernel.org, linux-sh@...r.kernel.org,
sparclinux@...r.kernel.org, linux-arch@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: [PATCH 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when
migration
On 5/7/2022 10:33 AM, Baolin Wang wrote:
>
>
> On 5/7/2022 1:56 AM, Mike Kravetz wrote:
>> On 5/5/22 20:39, Baolin Wang wrote:
>>>
>>> On 5/6/2022 7:53 AM, Mike Kravetz wrote:
>>>> On 4/29/22 01:14, Baolin Wang wrote:
>>>>> On some architectures (like ARM64), it can support CONT-PTE/PMD size
>>>>> hugetlb, which means it can support not only PMD/PUD size hugetlb:
>>>>> 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
>>>>> size specified.
>>>> <snip>
>>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>>> index 6fdd198..7cf2408 100644
>>>>> --- a/mm/rmap.c
>>>>> +++ b/mm/rmap.c
>>>>> @@ -1924,13 +1924,15 @@ static bool try_to_migrate_one(struct folio
>>>>> *folio, struct vm_area_struct *vma,
>>>>> break;
>>>>> }
>>>>> }
>>>>> +
>>>>> + /* Nuke the hugetlb page table entry */
>>>>> + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
>>>>> } else {
>>>>> flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
>>>>> + /* Nuke the page table entry. */
>>>>> + pteval = ptep_clear_flush(vma, address, pvmw.pte);
>>>>> }
>>>>
>>>> On arm64 with CONT-PTE/PMD the returned pteval will have dirty or
>>>> young set
>>>> if ANY of the PTE/PMDs had dirty or young set.
>>>
>>> Right.
>>>
>>>>
>>>>> - /* Nuke the page table entry. */
>>>>> - pteval = ptep_clear_flush(vma, address, pvmw.pte);
>>>>> -
>>>>> /* Set the dirty flag on the folio now the pte is gone. */
>>>>> if (pte_dirty(pteval))
>>>>> folio_mark_dirty(folio);
>>>>> @@ -2015,7 +2017,10 @@ static bool try_to_migrate_one(struct folio
>>>>> *folio, struct vm_area_struct *vma,
>>>>> pte_t swp_pte;
>>>>> if (arch_unmap_one(mm, vma, address, pteval) < 0) {
>>>>> - set_pte_at(mm, address, pvmw.pte, pteval);
>>>>> + if (folio_test_hugetlb(folio))
>>>>> + set_huge_pte_at(mm, address, pvmw.pte, pteval);
>>>>
>>>> And, we will use that pteval for ALL the PTE/PMDs here. So, we
>>>> would set
>>>> the dirty or young bit in ALL PTE/PMDs.
>>>>
>>>> Could that cause any issues? May be more of a question for the
>>>> arm64 people.
>>>
>>> I don't think this will cause any issues. Since the hugetlb can not
>>> be split, and we should not lose the the dirty or young state if any
>>> subpages were set. Meanwhile we already did like this in hugetlb.c:
>>>
>>> pte = huge_ptep_get_and_clear(mm, address, ptep);
>>> tlb_remove_huge_tlb_entry(h, tlb, ptep, address);
>>> if (huge_pte_dirty(pte))
>>> set_page_dirty(page);
>>>
>>
>> Agree that it 'should not' cause issues. It just seems inconsistent.
>> This is not a problem specifically with your patch, just the handling of
>> CONT-PTE/PMD entries.
>>
>> There does not appear to be an arm64 specific version of huge_ptep_get()
>> that takes CONT-PTE/PMD into account. So, huge_ptep_get() would only
>> return the one specific value. It would not take into account the dirty
>> or young bits of CONT-PTE/PMDs like your new version of
>> huge_ptep_get_and_clear. Is that correct? Or, am I missing something.
>
> Yes, you are right.
>
>>
>> If I am correct, then code like the following may not work:
>>
>> static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask,
>> unsigned long addr, unsigned long end, struct mm_walk
>> *walk)
>> {
>> pte_t huge_pte = huge_ptep_get(pte);
>> struct numa_maps *md;
>> struct page *page;
>>
>> if (!pte_present(huge_pte))
>> return 0;
>>
>> page = pte_page(huge_pte);
>>
>> md = walk->private;
>> gather_stats(page, md, pte_dirty(huge_pte), 1);
>> return 0;
>> }
>
> Right, this is inconsistent with current huge_ptep_get() interface like
> you said. So I think we can define an ARCH-specific huge_ptep_get()
> interface for arm64, and some sample code like below. How do you think?
After some investigation, I send out a RFC patch set[1] to address this
issue. We can talk about this issue in that thread. Thanks.
[1]
https://lore.kernel.org/all/cover.1651998586.git.baolin.wang@linux.alibaba.com/
Powered by blists - more mailing lists