[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZyJUzhzHGDu5CLdi@localhost.localdomain>
Date: Wed, 30 Oct 2024 16:46:22 +0100
From: Frederic Weisbecker <frederic@...nel.org>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: "Lai, Yi" <yi1.lai@...ux.intel.com>, linux-perf-users@...r.kernel.org,
linux-kernel@...r.kernel.org,
Adrian Hunter <adrian.hunter@...el.com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Daniel Bristot de Oliveira <bristot@...nel.org>,
Ian Rogers <irogers@...gle.com>, Ingo Molnar <mingo@...hat.com>,
Jiri Olsa <jolsa@...nel.org>, Kan Liang <kan.liang@...ux.intel.com>,
Marco Elver <elver@...gle.com>, Mark Rutland <mark.rutland@....com>,
Namhyung Kim <namhyung@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Arnaldo Carvalho de Melo <acme@...hat.com>, yi1.lai@...el.com,
syzkaller-bugs@...glegroups.com
Subject: Re: [PATCH v4 2/6] perf: Enqueue SIGTRAP always via task_work.
Le Wed, Oct 30, 2024 at 03:07:21PM +0100, Sebastian Andrzej Siewior a écrit :
> On 2024-10-29 18:21:31 [+0100], To Frederic Weisbecker wrote:
> > On 2024-10-28 13:21:39 [+0100], Frederic Weisbecker wrote:
> > > Ah the perf_pending_task work is pending but perf_pending_task_sync()
> > > fails to cancel there:
> > >
> > > /*
> > > * If the task is queued to the current task's queue, we
> > > * obviously can't wait for it to complete. Simply cancel it.
> > > */
> > > if (task_work_cancel(current, head)) {
> > > event->pending_work = 0;
> > > local_dec(&event->ctx->nr_no_switch_fast);
> > > return;
> > > }
> > >
> > > And that's because the work is not anymore on the task work
> > > list in task->task_works. Instead it's in the executing list
> > > in task_work_run(). It's a blind spot for task_work_cancel()
> > > if the current task is already running the task works. And it
> > > does since it's running the fput delayed work.
> > >
> > > Something like this untested?
> >
> > Tested. Not sure if this is a good idea.
> > Couldn't we take the ->next pointer and add it to current::task_works
> > instead?
> > That patch in ZtYyXG4fYbUdoBpk@...ilion.home gets rid of that
> > rcuwait_wait_event().
>
> Just tested. That patch from ZtYyXG4fYbUdoBpk@...ilion.home also solves
> that problem. Could we take that one instead?
This needs more thoughts. We must make sure that the parent is put _after_
the child because it's dereferenced on release, for example:
put_event()
free_event()
irq_work_sync(&event->pending_irq);
====> IRQ or irq_workd
perf_event_wakeup()
ring_buffer_wakeup()
event = event->parent;
rcu_dereference(event->rb);
And now after this patch it's possible that this happens after
the parent has been released.
We could put the parent from the child's free_event() but some
places (inherit_event()) may call free_event() on a child without
having held a reference to the parent.
Also note that with this patch the task may receive late irrelevant
signals after the event is removed. It's probably not that bad but
still... This could be a concern for exec(), is there a missing
task_work_run() there before flush_signal_handlers()?
Anyway here is an updated version:
diff --git a/kernel/events/core.c b/kernel/events/core.c
index e3589c4287cb..4031d0dbc47b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5448,10 +5448,17 @@ static void perf_remove_from_owner(struct perf_event *event)
static void put_event(struct perf_event *event)
{
+ struct perf_event *parent;
+
if (!atomic_long_dec_and_test(&event->refcount))
return;
+ parent = event->parent;
_free_event(event);
+
+ /* Matches the refcount bump in inherit_event() */
+ if (parent)
+ put_event(parent);
}
/*
@@ -5535,11 +5542,6 @@ int perf_event_release_kernel(struct perf_event *event)
if (tmp == child) {
perf_remove_from_context(child, DETACH_GROUP);
list_move(&child->child_list, &free_list);
- /*
- * This matches the refcount bump in inherit_event();
- * this can't be the last reference.
- */
- put_event(event);
} else {
var = &ctx->refcount;
}
@@ -5565,7 +5567,7 @@ int perf_event_release_kernel(struct perf_event *event)
void *var = &child->ctx->refcount;
list_del(&child->child_list);
- free_event(child);
+ put_event(child);
/*
* Wake any perf_event_free_task() waiting for this event to be
@@ -13325,8 +13327,7 @@ perf_event_exit_event(struct perf_event *event, struct perf_event_context *ctx)
* Kick perf_poll() for is_event_hup();
*/
perf_event_wakeup(parent_event);
- free_event(event);
- put_event(parent_event);
+ put_event(event);
return;
}
@@ -13444,13 +13445,11 @@ static void perf_free_event(struct perf_event *event,
list_del_init(&event->child_list);
mutex_unlock(&parent->child_mutex);
- put_event(parent);
-
raw_spin_lock_irq(&ctx->lock);
perf_group_detach(event);
list_del_event(event, ctx);
raw_spin_unlock_irq(&ctx->lock);
- free_event(event);
+ put_event(event);
}
/*
Powered by blists - more mailing lists