[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190709080308.uueqgxuycfp5y2db@willie-the-truck>
Date: Tue, 9 Jul 2019 09:03:09 +0100
From: Will Deacon <will@...nel.org>
To: Jon Masters <jcm@...masters.org>
Cc: "qi.fuli@...itsu.com" <qi.fuli@...itsu.com>,
Will Deacon <will.deacon@....com>,
"indou.takao@...itsu.com" <indou.takao@...itsu.com>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
"peterz@...radead.org" <peterz@...radead.org>,
Catalin Marinas <catalin.marinas@....com>,
Jonathan Corbet <corbet@....net>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH 0/2] arm64: Introduce boot parameter to disable TLB flush
instruction within the same inner shareable domain
On Mon, Jul 08, 2019 at 08:29:26PM -0400, Jon Masters wrote:
> On 7/8/19 8:25 PM, Jon Masters wrote:
> > On 7/2/19 10:45 PM, qi.fuli@...itsu.com wrote:
> >
> >> However, we found that with the increase of that the TLB flash was called,
> >> the noise was also increasing. Here we understood that the cause of this
> >> issue is the implementation of Linux's TLB flush for arm64, especially use of
> >> TLBI-is instruction which is a broadcast to all processor core on the system.
> >
> > Are you saying that for a microbenchmark in which very large numbers of
> > threads are created and destroyed rapidly there are a large number of
> > associated tlb range flushes which always use broadcast TLBIs?
> >
> > If that's the case, and the hardware doesn't do any ASID filtering and
> > each TLBI results in a DVM to every PE, would it make sense to look at
> > whether there are ways to improve batching/switch to an IPI approach
> > rather than relying on broadcasts, as a more generic solution?
>
> What I meant was a heuristic to do this automatically, rather than via a
> command line.
One of my main initial objections to this patch [1] still applies to that
approach, though, which is that I don't want the maintenance headache of
maintaining two very different TLB invalidation schemes in the kernel.
Dynamically switching between them is arguably worse. If "jitter" is such a
big deal, then I don't think changing our TLBI mechanism even helps on a
system that has broadcast cache maintenance (including for the I-side) as
well as shared levels of cache further from the CPUs -- it just happens to
solve the case of a spinning mprotect(), well yeah, maybe don't do that if
your hardware can't handle it gracefully.
What I would be interested in seeing is an evaluation of a real workload
that suffers due to our mmu_gather/tlb_flush implementation on arm64 so that
we can understand where the problem lies and whether or not we can do
something to address it. But "jitter is bad, use IPIs" isn't helpful at all.
Will
[1] https://lkml.kernel.org/r/20190617170328.GJ30800@fuggles.cambridge.arm.com
Powered by blists - more mailing lists