[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0d52d680-f3d3-454f-8c12-602f650469ab@arm.com>
Date: Wed, 6 Aug 2025 15:07:49 +0530
From: Dev Jain <dev.jain@....com>
To: David Hildenbrand <david@...hat.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: akpm@...ux-foundation.org, ryan.roberts@....com, willy@...radead.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org, catalin.marinas@....com,
will@...nel.org, Liam.Howlett@...cle.com, vbabka@...e.cz, jannh@...gle.com,
anshuman.khandual@....com, peterx@...hat.com, joey.gouly@....com,
ioworker0@...il.com, baohua@...nel.org, kevin.brodsky@....com,
quic_zhenhuah@...cinc.com, christophe.leroy@...roup.eu,
yangyicong@...ilicon.com, linux-arm-kernel@...ts.infradead.org,
hughd@...gle.com, yang@...amperecomputing.com, ziy@...dia.com
Subject: Re: [PATCH v5 6/7] mm: Optimize mprotect() by PTE batching
On 06/08/25 2:51 pm, David Hildenbrand wrote:
> On 06.08.25 11:12, Lorenzo Stoakes wrote:
>> On Wed, Aug 06, 2025 at 10:08:33AM +0200, David Hildenbrand wrote:
>>> On 18.07.25 11:02, Dev Jain wrote:
>>>> Signed-off-by: Dev Jain <dev.jain@....com>
>>>
>>>
>>> I wanted to review this, but looks like it's already upstream and I
>>> suspect
>>> it's buggy (see the upstream report I cc'ed you on)
>>>
>>> [...]
>>>
>>>> +
>>>> +/*
>>>> + * This function is a result of trying our very best to retain the
>>>> + * "avoid the write-fault handler" optimization. In
>>>> can_change_pte_writable(),
>>>> + * if the vma is a private vma, and we cannot determine whether to
>>>> change
>>>> + * the pte to writable just from the vma and the pte, we then need
>>>> to look
>>>> + * at the actual page pointed to by the pte. Unfortunately, if we
>>>> have a
>>>> + * batch of ptes pointing to consecutive pages of the same anon
>>>> large folio,
>>>> + * the anon-exclusivity (or the negation) of the first page does
>>>> not guarantee
>>>> + * the anon-exclusivity (or the negation) of the other pages
>>>> corresponding to
>>>> + * the pte batch; hence in this case it is incorrect to decide to
>>>> change or
>>>> + * not change the ptes to writable just by using information from
>>>> the first
>>>> + * pte of the batch. Therefore, we must individually check all
>>>> pages and
>>>> + * retrieve sub-batches.
>>>> + */
>>>> +static void commit_anon_folio_batch(struct vm_area_struct *vma,
>>>> + struct folio *folio, unsigned long addr, pte_t *ptep,
>>>> + pte_t oldpte, pte_t ptent, int nr_ptes, struct mmu_gather
>>>> *tlb)
>>>> +{
>>>> + struct page *first_page = folio_page(folio, 0);
>>>
>>> Who says that we have the first page of the folio mapped into the
>>> first PTE
>>> of the batch?
>>
>> Yikes, missed this sorry. Got too tied up in alogrithm here.
>>
>> You mean in _this_ PTE of the batch right? As we're invoking these on
>> each part
>> of the PTE table.
>>
>> I mean I guess we can simply do:
>>
>> struct page *first_page = pte_page(ptent);
>>
>> Right?
>
> Yes, but we should forward the result from vm_normal_page(), which does
> exactly that for you, and increment the page accordingly as required,
> just like with the pte we are processing.
Makes sense, so I guess I will have to change the signature of
prot_numa_skip()
to pass a double ptr to a page instead of folio and derive the folio in
the caller,
and pass down both the folio and the page to
set_write_prot_commit_flush_ptes.
>
> ...
>
>>>
>>>> + else
>>>> + prot_commit_flush_ptes(vma, addr, pte, oldpte, ptent,
>>>> + nr_ptes, /* idx = */ 0, /* set_write = */
>>>> false, tlb);
>>>
>>> Semi-broken intendation.
>>
>> Because of else then 2 lines after?
>
> prot_commit_flush_ptes(vma, addr, pte, oldpte, ptent,
> nr_ptes, /* idx = */ 0, /* set_write = */ false, tlb);
>
> Is what I would have expected.
>
>
> I think a smart man once said, that if you need more than one line per
> statement in
> an if/else clause, a set of {} can aid readability. But I don't
> particularly care :)
>
Powered by blists - more mailing lists