[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YFhhQgUzXLSTlcu0@elver.google.com>
Date: Mon, 22 Mar 2021 10:20:02 +0100
From: Marco Elver <elver@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: alexander.shishkin@...ux.intel.com, acme@...nel.org,
mingo@...hat.com, jolsa@...hat.com, mark.rutland@....com,
namhyung@...nel.org, tglx@...utronix.de, glider@...gle.com,
viro@...iv.linux.org.uk, arnd@...db.de, christian@...uner.io,
dvyukov@...gle.com, jannh@...gle.com, axboe@...nel.dk,
mascasa@...gle.com, pcc@...gle.com, irogers@...gle.com,
kasan-dev@...glegroups.com, linux-arch@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
x86@...nel.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH RFC v2 3/8] perf/core: Add support for event removal on
exec
On Tue, Mar 16, 2021 at 05:22PM +0100, Peter Zijlstra wrote:
> On Wed, Mar 10, 2021 at 11:41:34AM +0100, Marco Elver wrote:
> > Adds bit perf_event_attr::remove_on_exec, to support removing an event
> > from a task on exec.
> >
> > This option supports the case where an event is supposed to be
> > process-wide only, and should not propagate beyond exec, to limit
> > monitoring to the original process image only.
> >
> > Signed-off-by: Marco Elver <elver@...gle.com>
>
> > +/*
> > + * Removes all events from the current task that have been marked
> > + * remove-on-exec, and feeds their values back to parent events.
> > + */
> > +static void perf_event_remove_on_exec(void)
> > +{
> > + int ctxn;
> > +
> > + for_each_task_context_nr(ctxn) {
> > + struct perf_event_context *ctx;
> > + struct perf_event *event, *next;
> > +
> > + ctx = perf_pin_task_context(current, ctxn);
> > + if (!ctx)
> > + continue;
> > + mutex_lock(&ctx->mutex);
> > +
> > + list_for_each_entry_safe(event, next, &ctx->event_list, event_entry) {
> > + if (!event->attr.remove_on_exec)
> > + continue;
> > +
> > + if (!is_kernel_event(event))
> > + perf_remove_from_owner(event);
> > + perf_remove_from_context(event, DETACH_GROUP);
>
> There's a comment on this in perf_event_exit_event(), if this task
> happens to have the original event, then DETACH_GROUP will destroy the
> grouping.
>
> I think this wants to be:
>
> perf_remove_from_text(event,
> child_event->parent ? DETACH_GROUP : 0);
>
> or something.
>
> > + /*
> > + * Remove the event and feed back its values to the
> > + * parent event.
> > + */
> > + perf_event_exit_event(event, ctx, current);
>
> Oooh, and here we call it... but it will do list_del_even() /
> perf_group_detach() *again*.
>
> So the problem is that perf_event_exit_task_context() doesn't use
> remove_from_context(), but instead does task_ctx_sched_out() and then
> relies on the events not being active.
>
> Whereas above you *DO* use remote_from_context(), but then
> perf_event_exit_event() will try and remove it more.
AFAIK, we want to deallocate the events and not just remove them, so
doing what perf_event_exit_event() is the right way forward? Or did you
have something else in mind?
I'm still trying to make sense of the zoo of synchronisation mechanisms
at play here. No matter what I try, it seems I get stuck on the fact
that I can't cleanly "pause" the context to remove the events (warnings
in event_function()).
This is what I've been playing with to understand:
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 450ea9415ed7..c585cef284a0 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4195,6 +4195,88 @@ static void perf_event_enable_on_exec(int ctxn)
put_ctx(clone_ctx);
}
+static void perf_remove_from_owner(struct perf_event *event);
+static void perf_event_exit_event(struct perf_event *child_event,
+ struct perf_event_context *child_ctx,
+ struct task_struct *child);
+
+/*
+ * Removes all events from the current task that have been marked
+ * remove-on-exec, and feeds their values back to parent events.
+ */
+static void perf_event_remove_on_exec(void)
+{
+ struct perf_event *event, *next;
+ int ctxn;
+
+ /***************** BROKEN BROKEN BROKEN *****************/
+
+ for_each_task_context_nr(ctxn) {
+ struct perf_event_context *ctx;
+ bool removed = false;
+
+ ctx = perf_pin_task_context(current, ctxn);
+ if (!ctx)
+ continue;
+ mutex_lock(&ctx->mutex);
+
+ raw_spin_lock_irq(&ctx->lock);
+ /*
+ * WIP: Ok, we will unschedule the context, _and_ tell everyone
+ * still trying to use that it's dead... even though it isn't.
+ *
+ * This can't be right...
+ */
+ task_ctx_sched_out(__get_cpu_context(ctx), ctx, EVENT_ALL);
+ RCU_INIT_POINTER(current->perf_event_ctxp[ctxn], NULL);
+ WRITE_ONCE(ctx->task, TASK_TOMBSTONE);
This code here is obviously bogus, because it removes the context from
the task: we might still need it since this task is not dead yet.
What's the right way to pause the context to remove the events from it?
+ raw_spin_unlock_irq(&ctx->lock);
+
+ list_for_each_entry_safe(event, next, &ctx->event_list, event_entry) {
+ if (!event->attr.remove_on_exec)
+ continue;
+ removed = true;
+
+ if (!is_kernel_event(event))
+ perf_remove_from_owner(event);
+
+ /*
+ * WIP: Want to free the event and feed back its values
+ * to the parent (if any) ...
+ */
+ perf_event_exit_event(event, ctx, current);
+ }
+
... need to schedule context back in here?
+
+ mutex_unlock(&ctx->mutex);
+ perf_unpin_context(ctx);
+ put_ctx(ctx);
+ }
+}
+
struct perf_read_data {
struct perf_event *event;
bool group;
@@ -7553,6 +7635,8 @@ void perf_event_exec(void)
true);
}
rcu_read_unlock();
+
+ perf_event_remove_on_exec();
}
Thanks,
-- Marco
Powered by blists - more mailing lists