[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090616141956.GB6541@Krystal>
Date: Tue, 16 Jun 2009 10:19:56 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To: Ingo Molnar <mingo@...e.hu>
Cc: "H. Peter Anvin" <hpa@...or.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Linus Torvalds <torvalds@...ux-foundation.org>,
mingo@...hat.com, paulus@...ba.org, acme@...hat.com,
linux-kernel@...r.kernel.org, penberg@...helsinki.fi,
vegard.nossum@...il.com, efault@....de, jeremy@...p.org,
npiggin@...e.de, tglx@...utronix.de,
linux-tip-commits@...r.kernel.org
Subject: Re: [tip:perfcounters/core] perf_counter: x86: Fix call-chain
support to use NMI-safe methods
* Ingo Molnar (mingo@...e.hu) wrote:
>
> * Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca> wrote:
>
> > I am not asking for the pf handler to handle every possible kind
> > of fault recursively. Just to keep the in-kernel page fault
> > related code for vmalloc (and possibly for prefetch ?) paths
> > NMI-reentrant :
> >
> > void do_page_fault(struct pt_regs *regs, unsigned long error_code)
> >
> > address = read_cr2();
>
> Why would this be needed? We read the cr2 as the first thing in
> do_page_fault(). It can be destroyed and re-faulted at will after
> that point, it wont matter a bit - we have already read it.
>
With respect to cr2, yes, this is the only window we care about.
However, the rest of vmalloc_fault() must be audited for other non
nmi-suitable data structure use (e.g. "current"), which I did in the
past.
My intent was just to respond to Peter's concerns by showing that the
part of page fault handler which needs to be NMI-reentrant is really not
that big.
Mathieu
> The only window to be careful about wrt. cr2 is the small window
> starting at <page_fault>, leading into <do_page_fault>:
>
> ffffffff8154085f <do_page_fault>:
> ffffffff8154085f: 55 push %rbp
> ffffffff81540860: 48 89 e5 mov %rsp,%rbp
> ffffffff81540863: 41 57 push %r15
> ffffffff81540865: 41 56 push %r14
> ffffffff81540867: 49 89 f6 mov %rsi,%r14
> ffffffff8154086a: 41 55 push %r13
> ffffffff8154086c: 49 89 fd mov %rdi,%r13
> ffffffff8154086f: 41 54 push %r12
> ffffffff81540871: 53 push %rbx
> ffffffff81540872: 48 83 ec 18 sub $0x18,%rsp
> ffffffff81540876: 65 4c 8b 3c 25 00 b0 mov %gs:0xb000,%r15
> ffffffff8154087d: 00 00
> ffffffff8154087f: 49 8b 87 48 02 00 00 mov 0x248(%r15),%rax
> ffffffff81540886: 48 89 45 d0 mov %rax,-0x30(%rbp)
> ffffffff8154088a: 48 83 c0 60 add $0x60,%rax
> ffffffff8154088e: 48 89 45 c8 mov %rax,-0x38(%rbp)
> ffffffff81540892: 0f 18 08 prefetcht0 (%rax)
> ffffffff81540895: 41 0f 20 d4 mov %cr2,%r12
>
> Look how early we read out cr2 - after trapping we read it after
> about 40 straight instructions, with no other function call
> inbetween. Only an NMI (or an MCE and similar deep-atomic contexts)
> can get in that window.
>
> ( Btw., a sidenote: the prefetcht0 right before the cr2 read is a
> real bug. Prefetches can sometimes generate false faults and thus
> destroy the value cr2. I'll send a patch for that soon. )
>
> Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists