linux-kernel - Re: [RFC][PATCH 1/5] mm: Rework {set,clear,mm}_tlb_flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170801163903.wuwrk6ysyd52dwxm@hirez.programming.kicks-ass.net>
Date:   Tue, 1 Aug 2017 18:39:03 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Benjamin Herrenschmidt <benh@...nel.crashing.org>
Cc:     Will Deacon <will.deacon@....com>, torvalds@...ux-foundation.org,
        oleg@...hat.com, paulmck@...ux.vnet.ibm.com, mpe@...erman.id.au,
        npiggin@...il.com, linux-kernel@...r.kernel.org, mingo@...nel.org,
        stern@...land.harvard.edu, Mel Gorman <mgorman@...e.de>,
        Rik van Riel <riel@...hat.com>
Subject: Re: [RFC][PATCH 1/5] mm: Rework {set,clear,mm}_tlb_flush_pending()

On Tue, Aug 01, 2017 at 02:14:19PM +0200, Peter Zijlstra wrote:
> On Tue, Aug 01, 2017 at 10:02:45PM +1000, Benjamin Herrenschmidt wrote:
> > On Tue, 2017-08-01 at 11:31 +0100, Will Deacon wrote:
> > > Looks like that's what's currently relied upon:
> > > 
> > >   /* Clearing is done after a TLB flush, which also provides a barrier. */
> > > 
> > > It also provides barrier semantics on arm/arm64. In reality, I suspect
> > > all archs have to provide some order between set_pte_at and flush_tlb_range
> > > which is sufficient to hold up clearing the flag. :/
> > 
> > Hrm... not explicitely.
> > 
> > Most archs (powerpc among them) have set_pte_at be just a dumb store,
> > so the only barrier it has is the surrounding PTL.
> > 
> > Now flush_tlb_range() I assume has some internal strong barriers but
> > none of that is well defined or documented at all, so I suspect all
> > bets are off.
> 
> Right.. but seeing how we're in fact relying on things here it might be
> time to go figure this out and document bits.
> 
> *sigh*, I suppose its going to be me doing this.. :-)

So on the related question; does on_each_cpu() provide a full smp_mb(),
I think we can answer: yes.

on_each_cpu() does IPIs to all _other_ CPUs, and those IPIs are using
llist_add() which is cmpxchg() which implies smp_mb().

After that it runs the local function.

So we can see on_each_cpu() as doing a smp_mb() before running @func.

xtensa - it uses on_each_cpu() for TLB invalidates.

x86 - we use either on_each_cpu() (flush_tlb_all(),
flush_tlb_kernel_range()) or we use flush_tlb_mm_range() which does an
atomic_inc_return() at the very start. Not to mention that actually
flushing TLBs itself is a barrier. Arguably flush_tlb_mm_range() should
first do _others* and then self, because others will use
smp_call_function_many() and see above.

(TODO look into paravirt)

Tile - does mb() in flush_remote()

sparc32-smp !?

sparc64 -- nope, no-op functions, TLB flushes are contained inside the PTL.

sh - yes, per smp_call_function

s390 - has atomics when it flushes. ptep_modify_prot_start() can set
mm->flush_mm = 1, at which point flush_tlb_range() will actually do
something, in that case there will be a smp_mb as per the atomics.
Otherwise the TLB invalidate is contained inside the PTL.

powerpc - radix - PTESYNC
	  hash - flush inside PTL

parisc - has all PTE and TLB operations serialized using a global lock

nm10300 - *ugh* but yes, smp_call_function() for remote CPUs

mips - smp_call_function for remote CPUs

metag - mmio write

m32r - doesn't seem to have smp_mb()

ia64 - smp_call_function_*()

hexagon - HVM trap, no smp_mb()

blackfin - nommu

arm - dsb ish

arm64  - dsb ish

arc - no barrier

alpha - no barrier

Now the architectures that do not have a barrier, like alpha, arc,
metag, the  PTL spin_unlock has a smp_mb, however I don't think that is
enough, because then the flush_tlb_range() might still be pending. That
said, these architectures probably don't have transparant huge pages so
it doesn't matter.

Still this is all rather unsatisfactory. Either we should define
flush_tlb*() to imply a barrier when its not a no-op (sparc64/ppc-hash)
or simply make clear_tlb_flush_pending() an smp_store_release().

I prefer the latter option.

Opinions?