lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150129144749.GC24151@twins.programming.kicks-ass.net>
Date:	Thu, 29 Jan 2015 15:47:49 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	mingo@...nel.org, linux-kernel@...r.kernel.org
Cc:	vincent.weaver@...ne.edu, eranian@...il.com, jolsa@...hat.com,
	mark.rutland@....com, torvalds@...ux-foundation.org,
	tglx@...utronix.de
Subject: Re: [RFC][PATCH 2/3] perf: Add a bit of paranoia

On Mon, Jan 26, 2015 at 05:26:39PM +0100, Peter Zijlstra wrote:
> On Fri, Jan 23, 2015 at 01:52:01PM +0100, Peter Zijlstra wrote:
> Jiri reported triggering that WARN_ON_ONCE over the weekend:
> 
>  event_sched_out.isra.79+0x2b9/0x2d0
>  group_sched_out+0x69/0xc0
>  ctx_sched_out+0x106/0x130
>  task_ctx_sched_out+0x37/0x70
>  __perf_install_in_context+0x70/0x1a0
>  remote_function+0x48/0x60
>  generic_exec_single+0x15b/0x1d0
>  smp_call_function_single+0x67/0xa0
>  task_function_call+0x53/0x80
>  perf_install_in_context+0x8b/0x110
> 
> 
> I think the below should cure this; if we install a group leader it will
> iterate the (still intact) group list and find its siblings and try and
> install those too -- even though those still have the old event->ctx --
> in the new ctx.
> 
> Upon installing the first group sibling we'd try and schedule out the
> group and trigger the above warn.
> 
> Fix this by installing the group leader last, installing siblings would
> have no effect, they're not reachable through the group lists and
> therefore we don't schedule them.
> 
> Also delay resetting the state until we're absolutely sure the events
> are quiescent -- which raises the question; should perf_pmu_migrate()
> not also have perf_event__state_init() calls in?

So perf_pmu_migrate_context() does an explicit OFF->INACTIVE change on
the install side, which seems good enough.

That said, it does need to do that sibling first leaders later install
order too. So I've put the below on top.

---
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7817,8 +7817,35 @@ void perf_pmu_migrate_context(struct pmu
 		list_add(&event->migrate_entry, &events);
 	}
 
+	/*
+	 * Wait for the events to quiesce before re-instating them.
+	 */
 	synchronize_rcu();
 
+	/*
+	 * Re-instate events in 2 passes.
+	 *
+	 * Skip over group leaders and only install siblings on this first
+	 * pass, siblings will not get enabled without a leader, however a
+	 * leader will enable its siblings, even if those are still on the old
+	 * context.
+	 */
+	list_for_each_entry_safe(event, tmp, &events, migrate_entry) {
+		if (event->group_leader == event)
+			continue;
+
+		list_del(&event->migrate_entry);
+		if (event->state >= PERF_EVENT_STATE_OFF)
+			event->state = PERF_EVENT_STATE_INACTIVE;
+		account_event_cpu(event, dst_cpu);
+		perf_install_in_context(dst_ctx, event, dst_cpu);
+		get_ctx(dst_ctx);
+	}
+
+	/*
+	 * Once all the siblings are setup properly, install the group leaders
+	 * to make it go.
+	 */
 	list_for_each_entry_safe(event, tmp, &events, migrate_entry) {
 		list_del(&event->migrate_entry);
 		if (event->state >= PERF_EVENT_STATE_OFF)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ