[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4yeygfzna6SRG3poD9cXhFNz21-he9psiKvMTMG8WBgmg@mail.gmail.com>
Date: Wed, 22 Oct 2025 23:22:02 +1300
From: Barry Song <21cnbao@...il.com>
To: "Huang, Ying" <ying.huang@...ux.alibaba.com>
Cc: Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>, David Hildenbrand <david@...hat.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Vlastimil Babka <vbabka@...e.cz>, Zi Yan <ziy@...dia.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>, Ryan Roberts <ryan.roberts@....com>,
Yang Shi <yang@...amperecomputing.com>, "Christoph Lameter (Ampere)" <cl@...two.org>, Dev Jain <dev.jain@....com>,
Anshuman Khandual <anshuman.khandual@....com>, Yicong Yang <yangyicong@...ilicon.com>,
Kefeng Wang <wangkefeng.wang@...wei.com>, Kevin Brodsky <kevin.brodsky@....com>,
Yin Fengwei <fengwei_yin@...ux.alibaba.com>, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH -v2 2/2] arm64, tlbflush: don't TLBI broadcast if page
reused in write fault
On Wed, Oct 22, 2025 at 10:55 PM Barry Song <21cnbao@...il.com> wrote:
>
> On Wed, Oct 22, 2025 at 10:46 PM Huang, Ying
> <ying.huang@...ux.alibaba.com> wrote:
>
> > >
> > > I agree. Yet the ish barrier can still avoid the page faults during CPU0's PTL.
> >
> > IIUC, you think that dsb(ish) compared with dsb(nsh) can accelerate
> > memory writing (visible to other CPUs). TBH, I suspect that this is the
> > case.
>
> Why? In any case, nsh is not a smp domain.
>
> I believe a dmb(ishst) is sufficient to ensure that the new PTE writes
> are visible
> to other CPUs. I’m not quite sure why the current flush code uses dsb(ish);
> it seems like overkill.
On second thought, the PTE/page table walker might not be a typical
SMP sync case,
so a dmb may not be sufficient—we are not dealing with standard load/store
instruction sequences across multiple threads. In any case, my point is that
dsb(ish) might be slightly slower than your dsb(nsh), but it makes the PTE
visible to other CPUs earlier and helps avoid some page faults after we’ve
written the PTE. However, if your current nsh version actually provides better
performance—even when multiple threads may access the data simultaneously—
It should be completely fine.
Now you are
write pte
don't broadcast pte
tlbi
don't broadcast tlbi
we might be:
write pte
broadcast pte
tlbi
don't broadcast tlbi
Thanks
Barry
Powered by blists - more mailing lists