linux-kernel - Re: [BUG][6.15][perf] Kernel panic not syncing: Fatal exception in interrupt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <72c1852.372d.19731462b73.Coremail.00107082@163.com>
Date: Mon, 2 Jun 2025 23:32:51 +0800 (CST)
From: "David Wang" <00107082@....com>
To: "Yeoreum Yun" <yeoreum.yun@....com>
Cc: peterz@...radead.org, mingo@...hat.com, acme@...nel.org,
	namhyung@...nel.org, mingo@...nel.org, leo.yan@....com,
	linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [BUG][6.15][perf] Kernel panic not syncing: Fatal exception in
 interrupt


At 2025-06-02 23:13:10, "Yeoreum Yun" <yeoreum.yun@....com> wrote:
>Hi David,
>
>> > > Before I start testing, I feel concerned about following chain:
>> > >
>> > > ./kernel/fork.c:
>> > > bad_fork_cleanup_perf:
>> > >     perf_event_free_task()
>> > >         perf_free_event()
>> > >             list_del_event()
>> > >
>> > > This patch seems changes the behavior in this callchain.
>> > > Would this have other side-effect?
>> >
>> > What behavior is changed you're worry about?
>> > both error patch is handled by __perf_remove_from_context(),
>> > There wouldn't be no problem since this patch just move the
>> > time of disabling cgroup before changing event state.
>> >
>> > also, the cgroup event is for only cpuctx not added in taskctx.
>> > So, there's no effect for event attached in taskctx.
>> >
>> > Thanks.
>>
>> Am I reading it wrong?
>> The call chain I mentioned above dose not walk through __perf_remove_from_context,
>> It is a fail path in fork, which happens rarely, but still possible. I guess...
>
>Since commit 90661365021a
>("perf Unify perf_event_free_task() / perf_evenet_exit_task_context()")
>
>perf_event_free_task() is integrated with perf_event_exit_task_context()
>So, it calls __perf_remove_from_context().


Good to know~

>
>In v6.15, I think you can test with below change only:
>@@ -2471,6 +2459,16 @@ __perf_remove_from_context(struct perf_event *event,
>
>        ctx_time_update(cpuctx, ctx);
>
>+       /*
>+        * If event was in error state, then keep it
>+        * that way, otherwise bogus counts will be
>+        * returned on read(). The only way to get out
>+        * of error state is by explicit re-enabling
>+        * of the event
>+        */
>+       if (event->state > PERF_EVENT_STATE_OFF)
>+               perf_cgroup_event_disable(event, ctx);
>+
>        /*
>         * Ensure event_sched_out() switches to OFF, at the very least
>         * this avoids raising perf_pending_task() at this time.
>
>not with modification with "list_del_event()".

... I apply your original patch on 6.15, up to now, 7 rounds of test show no sign of kernel panic.
I think, the patch does fix it.

Tested-by: David Wang <00107082@....com>


Thanks
David