[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <72c1852.372d.19731462b73.Coremail.00107082@163.com>
Date: Mon, 2 Jun 2025 23:32:51 +0800 (CST)
From: "David Wang" <00107082@....com>
To: "Yeoreum Yun" <yeoreum.yun@....com>
Cc: peterz@...radead.org, mingo@...hat.com, acme@...nel.org,
namhyung@...nel.org, mingo@...nel.org, leo.yan@....com,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [BUG][6.15][perf] Kernel panic not syncing: Fatal exception in
interrupt
At 2025-06-02 23:13:10, "Yeoreum Yun" <yeoreum.yun@....com> wrote:
>Hi David,
>
>> > > Before I start testing, I feel concerned about following chain:
>> > >
>> > > ./kernel/fork.c:
>> > > bad_fork_cleanup_perf:
>> > > perf_event_free_task()
>> > > perf_free_event()
>> > > list_del_event()
>> > >
>> > > This patch seems changes the behavior in this callchain.
>> > > Would this have other side-effect?
>> >
>> > What behavior is changed you're worry about?
>> > both error patch is handled by __perf_remove_from_context(),
>> > There wouldn't be no problem since this patch just move the
>> > time of disabling cgroup before changing event state.
>> >
>> > also, the cgroup event is for only cpuctx not added in taskctx.
>> > So, there's no effect for event attached in taskctx.
>> >
>> > Thanks.
>>
>> Am I reading it wrong?
>> The call chain I mentioned above dose not walk through __perf_remove_from_context,
>> It is a fail path in fork, which happens rarely, but still possible. I guess...
>
>Since commit 90661365021a
>("perf Unify perf_event_free_task() / perf_evenet_exit_task_context()")
>
>perf_event_free_task() is integrated with perf_event_exit_task_context()
>So, it calls __perf_remove_from_context().
Good to know~
>
>In v6.15, I think you can test with below change only:
>@@ -2471,6 +2459,16 @@ __perf_remove_from_context(struct perf_event *event,
>
> ctx_time_update(cpuctx, ctx);
>
>+ /*
>+ * If event was in error state, then keep it
>+ * that way, otherwise bogus counts will be
>+ * returned on read(). The only way to get out
>+ * of error state is by explicit re-enabling
>+ * of the event
>+ */
>+ if (event->state > PERF_EVENT_STATE_OFF)
>+ perf_cgroup_event_disable(event, ctx);
>+
> /*
> * Ensure event_sched_out() switches to OFF, at the very least
> * this avoids raising perf_pending_task() at this time.
>
>not with modification with "list_del_event()".
... I apply your original patch on 6.15, up to now, 7 rounds of test show no sign of kernel panic.
I think, the patch does fix it.
Tested-by: David Wang <00107082@....com>
Thanks
David
Powered by blists - more mailing lists