[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0F858068-D41D-46E3-B4A8-8A95B4EDB94F@vmware.com>
Date: Mon, 14 Aug 2017 05:07:19 +0000
From: Nadav Amit <namit@...are.com>
To: Minchan Kim <minchan@...nel.org>
CC: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>,
Stephen Rothwell <sfr@...b.auug.org.au>,
Andrew Morton <akpm@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
"Ingo Molnar" <mingo@...e.hu>, "H. Peter Anvin" <hpa@...or.com>,
"Linux-Next Mailing List" <linux-next@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linus <torvalds@...ux-foundation.org>
Subject: Re: linux-next: manual merge of the akpm-current tree with the tip
tree
Minchan Kim <minchan@...nel.org> wrote:
> On Sun, Aug 13, 2017 at 02:50:19PM +0200, Peter Zijlstra wrote:
>> On Sun, Aug 13, 2017 at 06:06:32AM +0000, Nadav Amit wrote:
>>>> however mm_tlb_flush_nested() is a mystery, it appears to care about
>>>> anything inside the range. For now rely on it doing at least _a_ PTL
>>>> lock instead of taking _the_ PTL lock.
>>>
>>> It does not care about “anything” inside the range, but only on situations
>>> in which there is at least one (same) PT that was modified by one core and
>>> then read by the other. So, yes, it will always be _the_ same PTL, and not
>>> _a_ PTL - in the cases that flush is really needed.
>>>
>>> The issue that might require additional barriers is that
>>> inc_tlb_flush_pending() and mm_tlb_flush_nested() are called when the PTL is
>>> not held. IIUC, since the release-acquire might not behave as a full memory
>>> barrier, this requires an explicit memory barrier.
>>
>> So I'm not entirely clear about this yet.
>>
>> How about:
>>
>>
>> CPU0 CPU1
>>
>> tlb_gather_mmu()
>>
>> lock PTLn
>> no mod
>> unlock PTLn
>>
>> tlb_gather_mmu()
>>
>> lock PTLm
>> mod
>> include in tlb range
>> unlock PTLm
>>
>> lock PTLn
>> mod
>> unlock PTLn
>>
>> tlb_finish_mmu()
>> force = mm_tlb_flush_nested(tlb->mm);
>> arch_tlb_finish_mmu(force);
>>
>>
>> ... more ...
>>
>> tlb_finish_mmu()
>>
>>
>>
>> In this case you also want CPU1's mm_tlb_flush_nested() call to return
>> true, right?
>
> No, because CPU 1 mofified pte and added it into tlb range
> so regardless of nested, it will flush TLB so there is no stale
> TLB problem.
>
>> But even with an smp_mb__after_atomic() at CPU0's tlg_bather_mmu()
>> you're not guaranteed CPU1 sees the increment. The only way to do that
>> is to make the PTL locks RCsc and that is a much more expensive
>> proposition.
>>
>>
>> What about:
>>
>>
>> CPU0 CPU1
>>
>> tlb_gather_mmu()
>>
>> lock PTLn
>> no mod
>> unlock PTLn
>>
>>
>> lock PTLm
>> mod
>> include in tlb range
>> unlock PTLm
>>
>> tlb_gather_mmu()
>>
>> lock PTLn
>> mod
>> unlock PTLn
>>
>> tlb_finish_mmu()
>> force = mm_tlb_flush_nested(tlb->mm);
>> arch_tlb_finish_mmu(force);
>>
>>
>> ... more ...
>>
>> tlb_finish_mmu()
>>
>> Do we want CPU1 to see it here? If so, where does it end?
>
> Ditto. Since CPU 1 has added range, it will flush TLB regardless
> of nested condition.
>
>> CPU0 CPU1
>>
>> tlb_gather_mmu()
>>
>> lock PTLn
>> no mod
>> unlock PTLn
>>
>>
>> lock PTLm
>> mod
>> include in tlb range
>> unlock PTLm
>>
>> tlb_finish_mmu()
>> force = mm_tlb_flush_nested(tlb->mm);
>>
>> tlb_gather_mmu()
>>
>> lock PTLn
>> mod
>> unlock PTLn
>>
>> arch_tlb_finish_mmu(force);
>>
>>
>> ... more ...
>>
>> tlb_finish_mmu()
>>
>>
>> This?
>>
>>
>> Could you clarify under what exact condition mm_tlb_flush_nested() must
>> return true?
>
> mm_tlb_flush_nested aims for the CPU side where there is no pte update
> but need TLB flush.
> As I wrote https://urldefense.proofpoint.com/v2/url?u=https-3A__marc.info_-3Fl-3Dlinux-2Dmm-26m-3D150267398226529-26w-3D2&d=DwIDaQ&c=uilaK90D4TOVoH58JNXRgQ&r=x9zhXCtCLvTDtvE65-BGSA&m=v2Z7eDi7z1H9zdngcjZvlNeBudWzA9KvcXFNpU2A77s&s=amaSu_gurmBHHPcl3Pxfdl0Tk_uTnmf60tMQAsNDHVU&e= ,
> it has stable TLB problem if we don't flush TLB although there is no
> pte modification.
To clarify: the main problem that these patches address is when the first
CPU updates the PTE, and second CPU sees the updated value and thinks: “the
PTE is already what I wanted - no flush is needed”.
For some reason (I would assume intentional), all the examples here first
“do not modify” the PTE, and then modify it - which is not an “interesting”
case. However, based on what I understand on the memory barriers, I think
there is indeed a missing barrier before reading it in
mm_tlb_flush_nested(). IIUC using smp_mb__after_unlock_lock() in this case,
before reading, would solve the problem with least impact on systems with
strong memory ordering.
Minchan, as for the solution you proposed, it seems to open again a race,
since the “pending” indication is removed before the actual TLB flush is
performed.
Nadav
Powered by blists - more mailing lists