[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a8ac43c8-d5e7-4d47-89fb-394db5994fe1@os.amperecomputing.com>
Date: Tue, 13 Aug 2024 10:08:22 -0700
From: Yang Shi <yang@...amperecomputing.com>
To: Catalin Marinas <catalin.marinas@....com>
Cc: muchun.song@...ux.dev, will@...nel.org, akpm@...ux-foundation.org,
linux-mm@...ck.org, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] hugetlbfs: add MTE support
On 7/9/24 10:42 AM, Yang Shi wrote:
>
>
> On 7/4/24 6:44 AM, Catalin Marinas wrote:
>> On Tue, Jul 02, 2024 at 05:15:12PM -0700, Yang Shi wrote:
>>> On 7/2/24 5:04 PM, Yang Shi wrote:
>>>> On 7/2/24 5:34 AM, Catalin Marinas wrote:
>>>>> Last time I checked, about a year ago, this was not sufficient. One
>>>>> issue is that there's no arch_clear_hugetlb_flags() implemented by
>>>>> your
>>>>> patch, leaving PG_arch_{2,3} set on a page. The other issue was
>>>>> that I
>>>>> initially tried to do this only on the head page but this did not go
>>>>> well with the folio_copy() -> copy_highpage() which expects the
>>>>> PG_arch_* flags on each individual page. The alternative was for
>>>>> arch_clear_hugetlb_flags() to iterate over all the pages in a folio.
>>>> Thanks for pointing this out. I did miss this point. I took a quick
>>>> look
>>>> at when the PG_ flags are set. IIUC, it is set by post_alloc_hook()
>>>> for
>>>> order-0 anonymous folio (clearing page and tags) and set_ptes() for
>>>> others (just clear tags), for example, THP and hugetlb.
>>>>
>>>> I can see THP does set the PG_mte_tagged flag for each sub pages.
>>>> But it
>>>> seems it does not do it for hugetlb if I read the code correctly. The
>>>> call path is:
>>>>
>>>> hugetlb_fault() ->
>>>> hugetlb_no_page->
>>>> set_huge_pte_at ->
>>>> __set_ptes() ->
>>>> __sync_cache_and_tags() ->
>>>>
>>>>
>>>> The __set_ptes() is called in a loop:
>>>>
>>>> if (!pte_present(pte)) {
>>>> for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
>>>> __set_ptes(mm, addr, ptep, pte, 1);
>>>> return;
>>>> }
>>>>
>>>> The ncontig and pgsize are returned by num_contig_ptes(). For example,
>>>> 2M hugetlb, ncontig is 1 and pgsize is 2M IIUC. So it means actually
>>>> just the head page has PG_mte_tagged set. If so the copy_highpage()
>>>> will
>>>> just copy the old head page's flag to the new head page, and the tag.
>>>> All the sub pages don't have PG_mte_tagged set.
>>>>
>>>> Is it expected behavior? I'm supposed we need tags for every sub pages
>>>> too, right?
>>> We should need something like the below to have tags and page flag
>>> set up
>>> for each sub page:
>>>
>>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>>> index 3f09ac73cce3..528164deef27 100644
>>> --- a/arch/arm64/mm/hugetlbpage.c
>>> +++ b/arch/arm64/mm/hugetlbpage.c
>>> @@ -228,9 +228,12 @@ void set_huge_pte_at(struct mm_struct *mm,
>>> unsigned
>>> long addr,
>>> int ncontig;
>>> unsigned long pfn, dpfn;
>>> pgprot_t hugeprot;
>>> + unsigned long nr = sz >> PAGE_SHIFT;
>>>
>>> ncontig = num_contig_ptes(sz, &pgsize);
>>>
>>> + __sync_cache_and_tags(pte, nr);
>>> +
>>> if (!pte_present(pte)) {
>>> for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
>>> __set_ptes(mm, addr, ptep, pte, 1);
>> We only need this for the last loop when we have a present pte. I'd also
>> avoid calling __set_ptes(...., 1) if we call the __sync_cache_and_tags()
>> here already. Basically just unroll __set_ptes() in set_huge_pte_at()
>> and call __set_pte() directly.
>>
>> It might be better to convert those page flag checks to only happen on
>> the head page. My stashed changes from over a year ago (before we had
>> more folio conversions) below. However, as I mentioned, I got stuck on
>> folio_copy() which also does a cond_resched() between copy_highpage().
>
Ping...
Does this make sense and sound good? If is is good, I will try to
implement it.
> We can have the page flags set for head only for hugetlb page. For
> copy_highpage(), we should be able to do something like the below:
>
> if page_is_head && page_is_hugetlb && page_has_mte_tagged
> set page_mte_tagged flags
> copy tags for all sub pages
> else // <-- tail page or non-hugetlb page
> current copy_highpage implementation
>
>
> The hugetlb folio can't go away under us since migration path should
> pin it so the status of folio is stable. The preemption caused by
> cond_resched() should be fine too due to the pin and the page table
> entry keeps being migration entry until migration is done, so every
> one should just see migration entry and wait for migration is done.
>
> The other concerned user of copy_highpage() is uprobe, but it also
> pins the page then doing copy and it is called with holding write
> mmap_lock.
>
> IIUC, it should work if I don't miss something. This also should have
> no impact on HVO. The overhead for other users of copy_highpage()
> should be also acceptable.
>
>>
>> ---------8<--------------------------------
>> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
>> index f5bcb0dc6267..a84ad0e46f12 100644
>> --- a/arch/arm64/kernel/mte.c
>> +++ b/arch/arm64/kernel/mte.c
>> @@ -49,7 +49,9 @@ static void mte_sync_page_tags(struct page *page,
>> pte_t old_pte,
>> return;
>> if (try_page_mte_tagging(page)) {
>> - mte_clear_page_tags(page_address(page));
>> + unsigned long i, nr_pages = compound_nr(page);
>> + for (i = 0; i < nr_pages; i++)
>> + mte_clear_page_tags(page_address(page + i));
>> set_page_mte_tagged(page);
>> }
>> }
>> @@ -57,22 +59,23 @@ static void mte_sync_page_tags(struct page *page,
>> pte_t old_pte,
>> void mte_sync_tags(pte_t old_pte, pte_t pte)
>> {
>> struct page *page = pte_page(pte);
>> - long i, nr_pages = compound_nr(page);
>> - bool check_swap = nr_pages == 1;
>> + bool check_swap = !PageCompound(page);
>> bool pte_is_tagged = pte_tagged(pte);
>> /* Early out if there's nothing to do */
>> if (!check_swap && !pte_is_tagged)
>> return;
>> + /*
>> + * HugeTLB pages are always fully mapped, so only setting head
>> page's
>> + * PG_mte_* flags is enough.
>> + */
>> + if (PageHuge(page))
>> + page = compound_head(page);
>> +
>> /* if PG_mte_tagged is set, tags have already been initialised */
>> - for (i = 0; i < nr_pages; i++, page++) {
>> - if (!page_mte_tagged(page)) {
>> - mte_sync_page_tags(page, old_pte, check_swap,
>> - pte_is_tagged);
>> - set_page_mte_tagged(page);
>> - }
>> - }
>> + if (!page_mte_tagged(page))
>> + mte_sync_page_tags(page, old_pte, check_swap, pte_is_tagged);
>> /* ensure the tags are visible before the PTE is set */
>> smp_wmb();
>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
>> index 5626ddb540ce..692198d8c0d2 100644
>> --- a/arch/arm64/kvm/guest.c
>> +++ b/arch/arm64/kvm/guest.c
>> @@ -1079,7 +1079,7 @@ long kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
>> /* uaccess failed, don't leave stale tags */
>> if (num_tags != MTE_GRANULES_PER_PAGE)
>> - mte_clear_page_tags(page);
>> + mte_clear_page_tags(page_address(page));
>> set_page_mte_tagged(page);
>> kvm_release_pfn_dirty(pfn);
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 31d7fa4c7c14..b531b1d75cda 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -1173,11 +1173,10 @@ static void sanitise_mte_tags(struct kvm
>> *kvm, kvm_pfn_t pfn,
>> if (!kvm_has_mte(kvm))
>> return;
>> - for (i = 0; i < nr_pages; i++, page++) {
>> - if (try_page_mte_tagging(page)) {
>> - mte_clear_page_tags(page_address(page));
>> - set_page_mte_tagged(page);
>> - }
>> + if (try_page_mte_tagging(page)) {
>> + for (i = 0; i < nr_pages; i++)
>> + mte_clear_page_tags(page_address(page + i));
>> + set_page_mte_tagged(page);
>> }
>> }
>>
>
Powered by blists - more mailing lists