[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrXLUewtYNQOdiGpKLvp=RL0eMLSj+v7_J0G1a_do+6G8Q@mail.gmail.com>
Date: Mon, 11 Jan 2016 13:50:24 -0800
From: Andy Lutomirski <luto@...capital.net>
To: Peter Zijlstra <peterz@...radead.org>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Rik van Riel <riel@...hat.com>,
Brian Gerst <brgerst@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Ingo Molnar <mingo@...nel.org>,
Denys Vlasenko <dvlasenk@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>,
Borislav Petkov <bp@...en8.de>,
Andrew Lutomirski <luto@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
"linux-tip-commits@...r.kernel.org"
<linux-tip-commits@...r.kernel.org>
Subject: Re: [tip:x86/urgent] x86/mm: Add barriers and document switch_mm()
-vs-flush synchronization
On Mon, Jan 11, 2016 at 10:25 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> On Mon, Jan 11, 2016 at 03:42:40AM -0800, tip-bot for Andy Lutomirski wrote:
>> --- a/arch/x86/include/asm/mmu_context.h
>> +++ b/arch/x86/include/asm/mmu_context.h
>> @@ -116,8 +116,34 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
>> #endif
>> cpumask_set_cpu(cpu, mm_cpumask(next));
>>
>> - /* Re-load page tables */
>> + /*
>> + * Re-load page tables.
>> + *
>> + * This logic has an ordering constraint:
>> + *
>> + * CPU 0: Write to a PTE for 'next'
>> + * CPU 0: load bit 1 in mm_cpumask. if nonzero, send IPI.
>> + * CPU 1: set bit 1 in next's mm_cpumask
>> + * CPU 1: load from the PTE that CPU 0 writes (implicit)
>> + *
>> + * We need to prevent an outcome in which CPU 1 observes
>> + * the new PTE value and CPU 0 observes bit 1 clear in
>> + * mm_cpumask. (If that occurs, then the IPI will never
>> + * be sent, and CPU 0's TLB will contain a stale entry.)
>> + *
>> + * The bad outcome can occur if either CPU's load is
>> + * reordered before that CPU's store, so both CPUs much
>
> s/much/must/ ?
Indeed. Is this worth a follow-up patch?
>
>> + * execute full barriers to prevent this from happening.
>> + *
>> + * Thus, switch_mm needs a full barrier between the
>> + * store to mm_cpumask and any operation that could load
>> + * from next->pgd. This barrier synchronizes with
>> + * remote TLB flushers. Fortunately, load_cr3 is
>> + * serializing and thus acts as a full barrier.
>> + *
>> + */
>> load_cr3(next->pgd);
>> +
>> trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
>>
>> /* Stop flush ipis for the previous mm */
>> @@ -156,10 +182,15 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
>> * schedule, protecting us from simultaneous changes.
>> */
>> cpumask_set_cpu(cpu, mm_cpumask(next));
>> +
>> /*
>> * We were in lazy tlb mode and leave_mm disabled
>> * tlb flush IPI delivery. We must reload CR3
>> * to make sure to use no freed page tables.
>> + *
>> + * As above, this is a barrier that forces
>> + * TLB repopulation to be ordered after the
>> + * store to mm_cpumask.
>
> somewhat confused by this comment, cpumask_set_cpu() is a LOCK BTS, that
> is already fully ordered.
There are more than enough barriers here. v1 had cpumask_set_cpu;
smp_mb__after_atomic, which is more portable and generates identical
code. I don't have a real preference for which barrier we should
consider to the important one.
>
>> */
>> load_cr3(next->pgd);
>> trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL);
>> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
>> index 8ddb5d0..8f4cc3d 100644
>
>
>> --- a/arch/x86/mm/tlb.c
>> +++ b/arch/x86/mm/tlb.c
>
>> @@ -188,17 +191,29 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
>
>> if (!current->mm) {
>> leave_mm(smp_processor_id());
>> +
>> + /* Synchronize with switch_mm. */
>> + smp_mb();
>> +
>> goto out;
>> }
>
>> + } else {
>> leave_mm(smp_processor_id());
>> +
>> + /* Synchronize with switch_mm. */
>> + smp_mb();
>> + }
>> }
>
> The alternative is making leave_mm() unconditionally imply a full
> barrier. I've not looked at other sites using it though.
For a quick fix, I preferred the more self-contained change.
--Andy
Powered by blists - more mailing lists