[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170813125019.ihqjud37ytgri7bn@hirez.programming.kicks-ass.net>
Date: Sun, 13 Aug 2017 14:50:19 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Nadav Amit <namit@...are.com>
Cc: Ingo Molnar <mingo@...nel.org>,
Stephen Rothwell <sfr@...b.auug.org.au>,
Andrew Morton <akpm@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...e.hu>, "H. Peter Anvin" <hpa@...or.com>,
Linux-Next Mailing List <linux-next@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linus <torvalds@...ux-foundation.org>,
"minchan@...nel.org" <minchan@...nel.org>
Subject: Re: linux-next: manual merge of the akpm-current tree with the tip
tree
On Sun, Aug 13, 2017 at 06:06:32AM +0000, Nadav Amit wrote:
> > however mm_tlb_flush_nested() is a mystery, it appears to care about
> > anything inside the range. For now rely on it doing at least _a_ PTL
> > lock instead of taking _the_ PTL lock.
>
> It does not care about “anything” inside the range, but only on situations
> in which there is at least one (same) PT that was modified by one core and
> then read by the other. So, yes, it will always be _the_ same PTL, and not
> _a_ PTL - in the cases that flush is really needed.
>
> The issue that might require additional barriers is that
> inc_tlb_flush_pending() and mm_tlb_flush_nested() are called when the PTL is
> not held. IIUC, since the release-acquire might not behave as a full memory
> barrier, this requires an explicit memory barrier.
So I'm not entirely clear about this yet.
How about:
CPU0 CPU1
tlb_gather_mmu()
lock PTLn
no mod
unlock PTLn
tlb_gather_mmu()
lock PTLm
mod
include in tlb range
unlock PTLm
lock PTLn
mod
unlock PTLn
tlb_finish_mmu()
force = mm_tlb_flush_nested(tlb->mm);
arch_tlb_finish_mmu(force);
... more ...
tlb_finish_mmu()
In this case you also want CPU1's mm_tlb_flush_nested() call to return
true, right?
But even with an smp_mb__after_atomic() at CPU0's tlg_bather_mmu()
you're not guaranteed CPU1 sees the increment. The only way to do that
is to make the PTL locks RCsc and that is a much more expensive
proposition.
What about:
CPU0 CPU1
tlb_gather_mmu()
lock PTLn
no mod
unlock PTLn
lock PTLm
mod
include in tlb range
unlock PTLm
tlb_gather_mmu()
lock PTLn
mod
unlock PTLn
tlb_finish_mmu()
force = mm_tlb_flush_nested(tlb->mm);
arch_tlb_finish_mmu(force);
... more ...
tlb_finish_mmu()
Do we want CPU1 to see it here? If so, where does it end?
CPU0 CPU1
tlb_gather_mmu()
lock PTLn
no mod
unlock PTLn
lock PTLm
mod
include in tlb range
unlock PTLm
tlb_finish_mmu()
force = mm_tlb_flush_nested(tlb->mm);
tlb_gather_mmu()
lock PTLn
mod
unlock PTLn
arch_tlb_finish_mmu(force);
... more ...
tlb_finish_mmu()
This?
Could you clarify under what exact condition mm_tlb_flush_nested() must
return true?
Powered by blists - more mailing lists