[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150225151639.GL5029@twins.programming.kicks-ass.net>
Date: Wed, 25 Feb 2015 16:16:39 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Vince Weaver <vincent.weaver@...ne.edu>
Cc: linux-kernel@...r.kernel.org, Paul Mackerras <paulus@...ba.org>,
Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Jiri Olsa <jolsa@...hat.com>
Subject: Re: perf: fuzzer causes lockup in x86_pmu_event_init()
On Mon, Feb 23, 2015 at 10:56:10PM -0500, Vince Weaver wrote:
> On Tue, 17 Feb 2015, Vince Weaver wrote:
>
> > This is on a Haswell machine, current git as of this past Friday.
> >
> > I let the perf_fuzzer run and it took 4 days to find this.
> > Sadly it doesn't seem to be reproducible so I am not sure
> > how it exactly got into this state.
>
> I have hit this on another machine, my core2 machine (after 10 days of
> fuzzing). So this seems to be a real issue although hard to hit.
>
> The problem seems to map to
> arch/x86/kernel/cpu/perf_event.c:824
>
> It is stuck forever in this loop in collect_events()
>
> list_for_each_entry(event, &leader->sibling_list, group_entry) {
> if (!is_x86_event(event) ||
> event->state <= PERF_EVENT_STATE_OFF)
> continue;
>
> if (n >= max_count)
> return -EINVAL;
>
> cpuc->event_list[n] = event;
> n++;
> }
>
> [884044.228001] RIP: 0010:[<ffffffff810138a8>] [<ffffffff810138a8>] x86_pmu_event_init+0x138/0x31d
> [884044.228001] Call Trace:
> [884044.228001] [<ffffffff810cec1b>] perf_try_init_event+0x25/0x47
> [884044.228001] [<ffffffff810d488d>] perf_init_event+0x93/0xca
> [884044.228001] [<ffffffff810d4b5f>] perf_event_alloc+0x29b/0x32d
> [884044.228001] [<ffffffff810d5008>] SYSC_perf_event_open+0x417/0x89c
> [884044.228001] [<ffffffff810d57fe>] SyS_perf_event_open+0x9/0xb
That smells like a corrupted sibling_list, I see no other way for that
loop to not end.
It occurs to me that that list iteration is entirely unserialized, we
should be holding a ctx lock or mutex, but we do not.
Now IIRC the perf fuzzer is single threaded, so it would not actually
trigger the most horrible cases here; but this does smell bad.
Does something like the below make sense and/or help? Jolsa?
---
kernel/events/core.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index af924bc38121..763e7c02e796 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7049,12 +7049,23 @@ EXPORT_SYMBOL_GPL(perf_pmu_unregister);
static int perf_try_init_event(struct pmu *pmu, struct perf_event *event)
{
+ struct perf_event_context *ctx = NULL;
int ret;
if (!try_module_get(pmu->module))
return -ENODEV;
+
+ if (event->group_leader != event) {
+ ctx = perf_event_ctx_lock(event->group_leader);
+ BUG_ON(!ctx);
+ }
+
event->pmu = pmu;
ret = pmu->event_init(event);
+
+ if (ctx)
+ perf_event_ctx_unlock(event->group_leader, ctx);
+
if (ret)
module_put(pmu->module);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists