linux-kernel - Re: [RFC][PATCH 1/5] mm: Rework {set,clear,mm}_tlb_flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170728174533.kbxu7uppdmle6t6d@hirez.programming.kicks-ass.net>
Date:   Fri, 28 Jul 2017 19:45:33 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Will Deacon <will.deacon@....com>
Cc:     torvalds@...ux-foundation.org, oleg@...hat.com,
        paulmck@...ux.vnet.ibm.com, benh@...nel.crashing.org,
        mpe@...erman.id.au, npiggin@...il.com,
        linux-kernel@...r.kernel.org, mingo@...nel.org,
        stern@...land.harvard.edu, Mel Gorman <mgorman@...e.de>,
        Rik van Riel <riel@...hat.com>
Subject: Re: [RFC][PATCH 1/5] mm: Rework {set,clear,mm}_tlb_flush_pending()

On Fri, Jun 09, 2017 at 03:45:54PM +0100, Will Deacon wrote:
> On Wed, Jun 07, 2017 at 06:15:02PM +0200, Peter Zijlstra wrote:
> > Commit:
> > 
> >   af2c1401e6f9 ("mm: numa: guarantee that tlb_flush_pending updates are visible before page table updates")
> > 
> > added smp_mb__before_spinlock() to set_tlb_flush_pending(). I think we
> > can solve the same problem without this barrier.
> > 
> > If instead we mandate that mm_tlb_flush_pending() is used while
> > holding the PTL we're guaranteed to observe prior
> > set_tlb_flush_pending() instances.
> > 
> > For this to work we need to rework migrate_misplaced_transhuge_page()
> > a little and move the test up into do_huge_pmd_numa_page().
> > 
> > Cc: Mel Gorman <mgorman@...e.de>
> > Cc: Rik van Riel <riel@...hat.com>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> > ---
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -527,18 +527,16 @@ static inline cpumask_t *mm_cpumask(stru
> >   */
> >  static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
> >  {
> > -	barrier();
> > +	/*
> > +	 * Must be called with PTL held; such that our PTL acquire will have
> > +	 * observed the store from set_tlb_flush_pending().
> > +	 */
> >  	return mm->tlb_flush_pending;
> >  }
> >  static inline void set_tlb_flush_pending(struct mm_struct *mm)
> >  {
> >  	mm->tlb_flush_pending = true;
> > -
> > -	/*
> > -	 * Guarantee that the tlb_flush_pending store does not leak into the
> > -	 * critical section updating the page tables
> > -	 */
> > -	smp_mb__before_spinlock();
> > +	barrier();
> 
> Why do you need the barrier() here? Isn't the ptl unlock sufficient?

So I was going through these here patches again, and wrote the
following comment:

static inline void set_tlb_flush_pending(struct mm_struct *mm)
{
	mm->tlb_flush_pending = true;
	/*
	 * The only time this value is relevant is when there are indeed pages
	 * to flush. And we'll only flush pages after changing them, which
	 * requires the PTL.
	 *
	 * So the ordering here is:
	 *
	 * 	mm->tlb_flush_pending = true;
	 * 	spin_lock(&ptl);
	 *	...
	 * 	set_pte_at();
	 * 	spin_unlock(&ptl);
	 *
	 *
	 * 				spin_lock(&ptl)
	 * 				mm_tlb_flush_pending();
	 * 				....
	 * 				spin_unlock(&ptl);
	 *
	 * 	flush_tlb_range();
	 * 	mm->tlb_flush_pending = false;
	 */
}

And while the ptl locks are indeed sufficient to constrain the true
assignment, what constrains the false assignment? As in the above there
is nothing stopping the false from ending up visible at
mm_tlb_flush_pending().

Or does flush_tlb_range() have implicit ordering? It does on x86, but is
this generally so?