[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YMtmjUzmv5QW9b7x@hirez.programming.kicks-ass.net>
Date: Thu, 17 Jun 2021 17:13:17 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Andy Lutomirski <luto@...nel.org>
Cc: Mark Rutland <mark.rutland@....com>,
"Russell King (Oracle)" <linux@...linux.org.uk>,
the arch/x86 maintainers <x86@...nel.org>,
Dave Hansen <dave.hansen@...el.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Nicholas Piggin <npiggin@...il.com>,
linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH 7/8] membarrier: Remove arm (32) support for SYNC_CORE
On Thu, Jun 17, 2021 at 05:01:53PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 17, 2021 at 07:00:26AM -0700, Andy Lutomirski wrote:
> > On Thu, Jun 17, 2021, at 6:51 AM, Mark Rutland wrote:
>
> > > It's not clear to me what "the right thing" would mean specifically, and
> > > on architectures with userspace cache maintenance JITs can usually do
> > > the most optimal maintenance, and only need help for the context
> > > synchronization.
> > >
> >
> > This I simply don't believe -- I doubt that any sane architecture
> > really works like this. I wrote an email about it to Intel that
> > apparently generated internal discussion but no results. Consider:
> >
> > mmap(some shared library, some previously unmapped address);
> >
> > this does no heavyweight synchronization, at least on x86. There is
> > no "serializing" instruction in the fast path, and it *works* despite
> > anything the SDM may or may not say.
>
> I'm confused; why do you think that is relevant?
>
> The only way to get into a memory address space is CR3 write, which is
> serializing and will flush everything. Since there wasn't anything
> mapped, nothing could be 'cached' from that location.
>
> So that has to work...
Ooh, you mean mmap where there was something mmap'ed before. Not virgin
space so to say.
But in that case, the unmap() would've caused a TLB invalidate, which on
x86 is IPIs, which is IRET.
Other architectures include I/D cache flushes in their TLB
invalidations -- but as elsewhere in the thread, that might not be
suffient on its own.
But yes, I think TLBI has to imply flushing micro-arch instruction
related buffers for any of that to work.
Powered by blists - more mailing lists