[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <483fd440-df7b-fab3-b138-f3789f2dc078@intel.com>
Date: Mon, 20 Mar 2023 21:38:55 +0800
From: "Yin, Fengwei" <fengwei.yin@...el.com>
To: Matthew Wilcox <willy@...radead.org>
CC: Ryan Roberts <ryan.roberts@....com>, <linux-arch@...r.kernel.org>,
<will@...nel.org>, <linux-mm@...ck.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
Hi Matthew,
On 3/17/2023 11:44 AM, Matthew Wilcox wrote:
> On Fri, Mar 17, 2023 at 09:58:17AM +0800, Yin, Fengwei wrote:
>>
>>
>> On 3/17/2023 1:52 AM, Matthew Wilcox wrote:
>>> On Thu, Mar 16, 2023 at 04:38:58PM +0000, Ryan Roberts wrote:
>>>> On 16/03/2023 16:23, Yin, Fengwei wrote:
>>>>>> I think you are changing behavior here - is this intentional? Previously this
>>>>>> would be evaluated per page, now its evaluated once for the whole range. The
>>>>>> intention below is that directly faulted pages are mapped young and prefaulted
>>>>>> pages are mapped old. But now a whole range will be mapped the same.
>>>>>
>>>>> Yes. You are right here.
>>>>>
>>>>> Look at the prefault and cpu_has_hw_af for ARM64, it looks like we
>>>>> can avoid to handle vmf->address == addr specially. It's OK to
>>>>> drop prefault and change the logic here a little bit to:
>>>>> if (arch_wants_old_prefaulted_pte())
>>>>> entry = pte_mkold(entry);
>>>>> else
>>>>> entry = pte_sw_mkyong(entry);
>>>>>
>>>>> It's not necessary to use pte_sw_mkyong for vmf->address == addr
>>>>> because HW will set the ACCESS bit in page table entry.
>>>>>
>>>>> Add Will Deacon in case I missed something here. Thanks.
>>>>
>>>> I'll defer to Will's response, but not all arm HW supports HW access flag
>>>> management. In that case it's done by SW, so I would imagine that by setting
>>>> this to old initially, we will get a second fault to set the access bit, which
>>>> will slow things down. I wonder if you will need to split this into (up to) 3
>>>> calls to set_ptes()?
>>>
>>> I don't think we should do that. The limited information I have from
>>> various microarchitectures is that the PTEs must differ only in their
>>> PFN bits in order to use larger TLB entries. That includes the Accessed
>>> bit (or equivalent). So we should mkyoung all the PTEs in the same
>>> folio, at least initially.
>>>
>>> That said, we should still do this conditionally. We'll prefault some
>>> other folios too. So I think this should be:
>>>
>>> bool prefault = (addr > vmf->address) || ((addr + nr) < vmf->address);
>>>
>> According to commit 46bdb4277f98e70d0c91f4289897ade533fe9e80, if hardware access
>> flag is supported on ARM64, there is benefit if prefault PTEs is set as "old".
>> If we change prefault like above, the PTEs is set as "yong" which loose benefit
>> on ARM64 with hardware access flag.
>>
>> ITOH, if from "old" to "yong" is cheap, why not leave all PTEs of folio as "old"
>> and let hardware to update it to "yong"?
>
> Because we're tracking the entire folio as a single entity. So we're
> better off avoiding the extra pagefaults to update the accessed bit,
> which won't actually give us any information (vmscan needs to know "were
> any of the accessed bits set", not "how many of them were set").
>
> Anyway, hopefully Ryan can test this and let us know if it fixes the
> regression he sees.
Thanks a lot to Ryan for helping to test the debug patch I made.
Ryan confirmed that the following change could fix the kernel build regression:
diff --git a/mm/filemap.c b/mm/filemap.c
index db86e459dde6..343d6ff36b2c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3557,7 +3557,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
ret |= filemap_map_folio_range(vmf, folio,
xas.xa_index - folio->index, addr, nr_pages);
- xas.xa_index += nr_pages;
+ xas.xa_index += folio_test_large(folio) ? nr_pages : 0;
folio_unlock(folio);
folio_put(folio);
I will make upstream-able change as "xas.xa_index += nr_pages - 1;"
Ryan and I also identify some other changes needed. I am not sure how to
integrate those changes to this series. Maybe an add-on patch after this
series? Thanks.
Regards
Yin, Fengwei
Powered by blists - more mailing lists