linux-kernel - Re: NMI between switch_mm and switch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20090803104303.GA18165@elte.hu>
Date:	Mon, 3 Aug 2009 12:43:03 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Paul Mackerras <paulus@...ba.org>
Cc:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	linux-kernel@...r.kernel.org
Subject: Re: NMI between switch_mm and switch_to


* Paul Mackerras <paulus@...ba.org> wrote:

> Ingo Molnar writes:
> 
> > * Peter Zijlstra <a.p.zijlstra@...llo.nl> wrote:
> > 
> > > On Tue, 2009-07-28 at 14:49 +1000, Paul Mackerras wrote:
> > >
> > > > Ben H. suggested there might be a problem if we get a PMU 
> > > > interrupt and try to do a stack trace of userspace in the 
> > > > interval between when we call switch_mm() from 
> > > > sched.c:context_switch() and when we call switch_to().  If we 
> > > > get an NMI in that interval and do a stack trace of userspace, 
> > > > we'll see the registers of the old task but when we peek at user 
> > > > addresses we'll see the memory image for the new task, so the 
> > > > stack trace we get will be completely bogus.
> > > > 
> > > > Is this in fact also a problem on x86, or is there some subtle 
> > > > reason why it can't happen there?
> > > 
> > > I can't spot one, maybe Ingo can when he's back :-)
> > > 
> > > So I think this is very good spotting from Ben.
> > 
> > Yeah.
> > 
> > > We could use preempt notifiers (or put in our own hooks) to 
> > > disable callchains during the context switch I suppose.
> > 
> > I think we should only disable user call-chains i think - the 
> > in-kernel call-chain is still reliable.
> > 
> > Also, i think we dont need preempt notifiers, we can use a simple 
> > check like this:
> > 
> > 	if (current->mm &&
> > 		cpu_isset(smp_processor_id(), &current->mm->cpu_vm_mask) {
> 
> On x86, do you clear the current processor's bit in cpu_vm_mask 
> when you switch the MMU away from a task?  We don't on powerpc, 
> which would render the above test incorrect.  (But then we don't 
> actually have the problem on powerpc since interrupts get 
> hard-disabled in switch_mm and stay hard-disabled until they get 
> soft-enabled.)

This is what x86 does in arch/x86/include/asm/mmu_context.h:

static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
			     struct task_struct *tsk)
{
	unsigned cpu = smp_processor_id();

	if (likely(prev != next)) {
		/* stop flush ipis for the previous mm */
		cpu_clear(cpu, prev->cpu_vm_mask);
#ifdef CONFIG_SMP
		percpu_write(cpu_tlbstate.state, TLBSTATE_OK);
		percpu_write(cpu_tlbstate.active_mm, next);
#endif
		cpu_set(cpu, next->cpu_vm_mask);

		/* Re-load page tables */
		load_cr3(next->pgd);

		/*
		 * load the LDT, if the LDT is different:
		 */
		if (unlikely(prev->context.ldt != next->context.ldt))
			load_LDT_nolock(&next->context);
	}
#ifdef CONFIG_SMP
	else {
		percpu_write(cpu_tlbstate.state, TLBSTATE_OK);
		BUG_ON(percpu_read(cpu_tlbstate.active_mm) != next);

		if (!cpu_test_and_set(cpu, next->cpu_vm_mask)) {
			/* We were in lazy tlb mode and leave_mm disabled
			 * tlb flush IPI delivery. We must reload CR3
			 * to make sure to use no freed page tables.
			 */
			load_cr3(next->pgd);
			load_LDT_nolock(&next->context);
		}
	}
#endif
}

which would suggest to me that cpu_vm_mask is precise.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/