linux-kernel - Re: NMI vs #PF clash

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120522152255.GB25697@Krystal>
Date:	Tue, 22 May 2012 11:22:55 -0400
From:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	Avi Kivity <avi@...hat.com>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...e.hu>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	"H. Peter Anvin" <hpa@...or.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Paul Turner <pjt@...gle.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Frederic Weisbecker <fweisbec@...il.com>
Subject: Re: NMI vs #PF clash

* Steven Rostedt (rostedt@...dmis.org) wrote:
> On Tue, 2012-05-22 at 17:37 +0300, Avi Kivity wrote:
> > > 
> > > 
> > > Is reading it fast? Then we could do a two reads and only write when
> > > needed.
> > 
> > The upside is 70 cycles on one machine, see d3edefc0035669.
> 
> Thanks
> 
> > 
> > 
> > > 
> > > Something like this pseudo assembly
> > >  
> > > 	mov cr2, rax
> > > 	push rax
> > > 
> > > 	call do_nmi
> > > 
> > > 	pop rax
> > > 	mov cr2, rbx
> > > 	cmp rax, rbx
> > > 	be skip
> > > 	mov rax, cr2
> > > skip:
> > > 
> > 
> > 
> > Yes, provided no exceptions can happen at those points.
> 
> Yes, exceptions can only happen in the do_nmi area. There should not be
> any breakpoints or page faults in the assembly code of the NMI handler.
> 
> Now another NMI may come in at any point here, but it will detect that
> it is nested and return without doing anything (but telling this NMI to
> repeat itself).

That should take care of cr2. Those are faraway memories, but I think we
should be careful about pdg_offset too. If we look at x86-64
vmalloc_fault(), we notice that it uses the current task struct:

        WARN_ON_ONCE(in_nmi()); <--- we should take that as a hint ;)

        /*
         * Copy kernel mappings over when needed. This can also
         * happen within a race in page table update. In the later
         * case just flush:
         */
        pgd = pgd_offset(current->active_mm, address);

x86-32 does not have this problem, since it reads the cr3 register to
get the pgd_addr.

x86-64 using the current task can be an issue if the NMI nests over the
scheduler execution.

A few years ago, I posted this patch
http://www.gossamer-threads.com/lists/linux/kernel/1249694?do=post_view_threaded
that tried to fix this by reading cr3 on x86_64. However, after reports
that it caused some x86_64 machines to fail to boot, I removed this
patch from the LTTng patchset. So there was certainly something I missed
back then.

Just food for thoughts,

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/