[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e8dfc3df-69d6-46ba-81db-4a04484fb7c3@amd.com>
Date: Wed, 19 Feb 2025 18:53:46 +0530
From: Ravi Bangoria <ravi.bangoria@....com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: "mingo@...nel.org" <mingo@...nel.org>,
"lucas.demarchi@...el.com" <lucas.demarchi@...el.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"willy@...radead.org" <willy@...radead.org>,
"acme@...nel.org" <acme@...nel.org>,
"namhyung@...nel.org" <namhyung@...nel.org>,
"mark.rutland@....com" <mark.rutland@....com>,
"alexander.shishkin@...ux.intel.com" <alexander.shishkin@...ux.intel.com>,
"jolsa@...nel.org" <jolsa@...nel.org>,
"irogers@...gle.com" <irogers@...gle.com>,
"adrian.hunter@...el.com" <adrian.hunter@...el.com>,
"kan.liang@...ux.intel.com" <kan.liang@...ux.intel.com>,
Ravi Bangoria <ravi.bangoria@....com>
Subject: Re: [PATCH v2 24/24] perf: Make perf_pmu_unregister() useable
Hi Peter,
>>>> Apparently not, it ends up with:
>>>>
>>>> ------------[ cut here ]------------
>>>> WARNING: CPU: 145 PID: 5459 at kernel/events/core.c:281 event_function+0xd2/0xf0
>>>> WARNING: CPU: 145 PID: 5459 at kernel/events/core.c:286 event_function+0xd6/0xf0
>>>
>>>> remote_function+0x4f/0x70
>>>> generic_exec_single+0x7f/0x160
>>>> smp_call_function_single+0x110/0x160
>>>> event_function_call+0x98/0x1d0
>>>> _perf_event_disable+0x41/0x70
>>>> perf_event_for_each_child+0x40/0x90
>>>> _perf_ioctl+0xac/0xb00
>>>> perf_ioctl+0x45/0x80
>>>
>>> Took me a long while trying to blame this on the 'event->parent =
>>> NULL;', but AFAICT this is a new, unrelated issue.
>>>
>>> What I think happens is this perf_ioctl(DISABLE) vs pmu_detach_events()
>>> race, where the crux is that perf_ioctl() path does not take
>>> event2->mutex which allows the following interleave:
>>
>> This one was only with perf_fuzzer, so pmu_detach_events() code path was
>> not invoked.
>
> I think the issue is, unaccount_event() gets called for the child event
> after the child is detached. Since event->parent is NULL, unaccount_event()
> abruptly corrupts the perf_sched_work.
A different manifestation of the same underlying issue (perf_sched_work
is getting corrupted because of event->parent = NULL):
WARNING: CPU: 0 PID: 5799 at kernel/events/core.c:286 event_function+0xd6/0xf0
CPU: 0 UID: 1002 PID: 5799 Comm: perf_fuzzer Kdump: loaded Not tainted 6.14.0-rc1-pmu-unregister+ #263
RIP: 0010:event_function+0xd6/0xf0
RSP: 0018:ffa0000006a87828 EFLAGS: 00010086
RAX: 0000000000000007 RBX: ff11001008830ee0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffa0000006a87850 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffa0000006a87918
R13: 0000000000000000 R14: ff1100010e347c00 R15: ff110001226c9a40
FS: 0000000000000000(0000) GS:ff11001008800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000595fed7d3e12 CR3: 000000016954c005 CR4: 0000000000f71ef0
PKRU: 55555554
Call Trace:
<TASK>
? event_function+0xd6/0xf0
? event_function+0x3d/0xf0
remote_function+0x4c/0x70
generic_exec_single+0x7c/0x160
smp_call_function_single+0x110/0x160
event_function_call+0x98/0x1d0
perf_remove_from_context+0x46/0xa0
perf_event_release_kernel+0x1f3/0x210
perf_release+0x12/0x20
With the above trace as reference, something like below is what is
happening. (Code paths are based on my understanding of traces, I've
not verified each an every bit :P)
Assume:
task1: event1
task2 (child of task1): event2 (child of event1)
task3: event3
CPU 1 CPU 2 CPU 3 CPU 4
/* task3 sched in */
perf_event_task_sched_in()
if (static_branch_unlikely(&perf_sched_events)) /* true */
__perf_event_task_sched_in()
...
task2:
perf_event_exit_event() /* event2 */
event->parent = NULL /* by perf_perf_child_detach() */
free_event()
_free_event()
unaccount_event()
if (event->parent) /* false */
return; /* won't get executed */
perf_sched_work <== OFF ------.
|
| /* task3 sched out */
| perf_event_task_sched_out()
'-----> if (static_branch_unlikely(&perf_sched_events)) /* false */
| /* __perf_event_task_sched_out() won't get called */
|
| /* task3 sched in */
| perf_event_task_sched_in()
'--------------------------------------------------------------------------> if (static_branch_unlikely(&perf_sched_events)) /* false */
/* __perf_event_task_sched_in() won't get called
So cpuctx->task_ctx will remains NULL; */
close(event3)
perf_release()
perf_event_release_kernel()
perf_remove_from_context()
event_function_call()
/* IPI to CPU 3 ---> */
/* ---> IPI from CPU 4 */
event_function()
if (ctx->task) {
/* task_ctx is NULL whereas ctx is !NULL */
WARN_ON_ONCE(task_ctx != ctx);
^^^^^^^^^^^^
IIUC, event->parent is the only way to detect whether the event is a
child or not, so it makes sense to preserve the ->parent pointer even
after the child is detached.
Thanks,
Ravi
Powered by blists - more mailing lists