[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161117171731.GV3157@twins.programming.kicks-ass.net>
Date: Thu, 17 Nov 2016 18:17:31 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Josh Poimboeuf <jpoimboe@...hat.com>
Cc: Vince Weaver <vincent.weaver@...ne.edu>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
"davej@...emonkey.org.uk" <davej@...emonkey.org.uk>,
"dvyukov@...gle.com" <dvyukov@...gle.com>,
Stephane Eranian <eranian@...il.com>
Subject: Re: perf: fuzzer KASAN unwind_get_return_address
On Thu, Nov 17, 2016 at 05:07:00PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 17, 2016 at 09:18:48AM -0600, Josh Poimboeuf wrote:
> > Thanks, I just waded through this and it turned up some good clues. And
> > according to 'git blame', you might be able to help :-)
> >
> > It's not stack corruption. Instead it looks like
> > __intel_pmu_pebs_event() is creating a bad or stale pt_regs which gets
> > passed to the unwinder. Specifically, regs->bp points to a seemingly
> > random address on the NMI stack. Which seems odd, considering the code
> > itself is running on the same NMI stack.
> >
> > I don't know much about the PEBS code but it seems like it's passing
> > some stale data. Either that or there's some NMI nesting going on.
So the puzzle was BP,SP pointing into the NMI stack at random spots. But
I think I can explain this; if the event has a very _very_ short period,
then the tail __intel_pmu_enable_all() call from the PMI handler will
'insta' trigger a record and raise another PMI.
We then get back-to-back NMIs with a record pointing to a now
overwritten stack.
The other scenario, where there is an (i)ret between the record and the
interrupt would be less confusing but still wrong.
Solve this by always using iregs->{bp,sp} for callchains.
The below patch still copies the record BP,SP when !CALLCHAINS &&
SAMPLE_REGS; does this make sense?
The fuzzer is still running with this patch applied.. I'll let it run
for a while.
---
arch/x86/events/intel/ds.c | 35 +++++++++++++++++++++++------------
arch/x86/events/perf_event.h | 2 +-
2 files changed, 24 insertions(+), 13 deletions(-)
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 0319311dbdbb..be202390bbd3 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1108,20 +1108,20 @@ static void setup_pebs_sample_data(struct perf_event *event,
}
/*
- * We use the interrupt regs as a base because the PEBS record
- * does not contain a full regs set, specifically it seems to
- * lack segment descriptors, which get used by things like
- * user_mode().
+ * We use the interrupt regs as a base because the PEBS record does not
+ * contain a full regs set, specifically it seems to lack segment
+ * descriptors, which get used by things like user_mode().
*
- * In the simple case fix up only the IP and BP,SP regs, for
- * PERF_SAMPLE_IP and PERF_SAMPLE_CALLCHAIN to function properly.
- * A possible PERF_SAMPLE_REGS will have to transfer all regs.
+ * In the simple case fix up only the IP for PERF_SAMPLE_IP.
+ *
+ * We must however always use BP,SP from iregs for the unwinder to stay
+ * sane; the record BP,SP can point into thin air when the record is
+ * from a previous PMI context or an (I)RET happend between the record
+ * and PMI.
*/
*regs = *iregs;
regs->flags = pebs->flags;
set_linear_ip(regs, pebs->ip);
- regs->bp = pebs->bp;
- regs->sp = pebs->sp;
if (sample_type & PERF_SAMPLE_REGS_INTR) {
regs->ax = pebs->ax;
@@ -1130,10 +1130,21 @@ static void setup_pebs_sample_data(struct perf_event *event,
regs->dx = pebs->dx;
regs->si = pebs->si;
regs->di = pebs->di;
- regs->bp = pebs->bp;
- regs->sp = pebs->sp;
- regs->flags = pebs->flags;
+ /*
+ * Per the above; only set BP,SP if we don't need callchains.
+ *
+ * XXX: does this make sense?
+ */
+ if (!(sample_type & PERF_SAMPLE_CALLCHAIN)) {
+ regs->bp = pebs->bp;
+ regs->sp = pebs->sp;
+ }
+
+ /*
+ * Preserve PERF_EFLAGS_VM from set_linear_ip().
+ */
+ regs->flags = pebs->flags | (regs->flags & PERF_EFLAGS_VM);
#ifndef CONFIG_X86_32
regs->r8 = pebs->r8;
regs->r9 = pebs->r9;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 5874d8de1f8d..a77ee026643d 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -113,7 +113,7 @@ struct debug_store {
* Per register state.
*/
struct er_account {
- raw_spinlock_t lock; /* per-core: protect structure */
+ raw_spinlock_t lock; /* per-core: protect structure */
u64 config; /* extra MSR config */
u64 reg; /* extra MSR number */
atomic_t ref; /* reference count */
Powered by blists - more mailing lists