[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ba94795e-367c-429a-a19f-2a220e33a117@linaro.org>
Date: Thu, 4 Sep 2025 14:15:21 +0100
From: James Clark <james.clark@...aro.org>
To: Leo Yan <leo.yan@....com>, Yabin Cui <yabinc@...gle.com>
Cc: coresight@...ts.linaro.org, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, Suzuki K Poulose <suzuki.poulose@....com>,
Mike Leach <mike.leach@...aro.org>, Levi Yun <yeoreum.yun@....com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Keita Morisaki <keyz@...gle.com>, Yuanfang Zhang <quic_yuanfang@...cinc.com>
Subject: Re: [PATCH v2 25/28] coresight: trbe: Save and restore state across
CPU low power state
On 01/07/2025 3:53 pm, Leo Yan wrote:
> From: Yabin Cui <yabinc@...gle.com>
>
> Similar to ETE, TRBE may lose its context when a CPU enters low power
> state. To make things worse, if ETE is restored without TRBE being
> restored, an enabled source device with no enabled sink devices can
> cause CPU hang on some devices (e.g., Pixel 9).
>
> The save and restore flows are described in the section K5.5 "Context
> switching" of Arm ARM (ARM DDI 0487 L.a). This commit adds save and
> restore callbacks with following the software usages defined in the
> architecture manual.
>
> Signed-off-by: Yabin Cui <yabinc@...gle.com>
> Co-developed-by: Leo Yan <leo.yan@....com>
> Signed-off-by: Leo Yan <leo.yan@....com>
> ---
Hi Leo,
I tested this commit to try to avoid hitting any issues with the last 3
hotplug changes but ran into two issues. They seemed to be hit when
running the CPU online/offline/enable_source stress test and then after
that running the Perf "Check Arm CoreSight trace data recording and
synthesized samples" test.
It hit when doing them in either order, but not when doing only one
after a reboot.
First one is just when running one of the tests:
=====================================================
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
6.16.0-rc3+ #475 Not tainted
-----------------------------------------------------
perf-exec/709 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
ffff000804002cd0 (&drvdata->spinlock){+.+.}-{2:2}, at:
cti_enable+0x40/0x130 [coresight_cti]
and this task is already holding:
ffff00080ab67e18 (&ctx->lock){....}-{2:2}, at: perf_event_exec+0xc4/0x6b8
which would create a new lock dependency:
(&ctx->lock){....}-{2:2} -> (&drvdata->spinlock){+.+.}-{2:2}
but this new dependency connects a HARDIRQ-irq-safe lock:
(&cpuctx_lock){-...}-{2:2}
... which became HARDIRQ-irq-safe at:
lock_acquire+0x130/0x2c0
_raw_spin_lock+0x60/0xa8
__perf_install_in_context+0x5c/0x2f0
remote_function+0x58/0x78
__flush_smp_call_function_queue+0x1d8/0x9c0
generic_smp_call_function_single_interrupt+0x20/0x38
ipi_handler+0x118/0x338
handle_percpu_devid_irq+0xb0/0x180
generic_handle_domain_irq+0x4c/0x78
gic_handle_irq+0x68/0xf0
call_on_irq_stack+0x24/0x30
do_interrupt_handler+0x88/0xd0
el1_interrupt+0x34/0x68
el1h_64_irq_handler+0x18/0x28
el1h_64_irq+0x6c/0x70
arch_local_irq_enable+0x8/0x10
cpuidle_enter+0x44/0x68
do_idle+0x1b0/0x2b8
cpu_startup_entry+0x40/0x50
rest_init+0x1c4/0x1d0
start_kernel+0x394/0x458
__primary_switched+0x88/0x98
to a HARDIRQ-irq-unsafe lock:
(&drvdata->spinlock){+.+.}-{2:2}
... which became HARDIRQ-irq-unsafe at:
...
lock_acquire+0x130/0x2c0
_raw_spin_lock+0x60/0xa8
cti_disable+0x38/0xe8 [coresight_cti]
coresight_disable_source+0x88/0xa8 [coresight]
coresight_disable_sysfs+0xd0/0x1f0 [coresight]
enable_source_store+0x78/0xb0 [coresight]
dev_attr_store+0x24/0x40
sysfs_kf_write+0xa8/0xd0
kernfs_fop_write_iter+0x114/0x1c0
vfs_write+0x2d8/0x310
ksys_write+0x80/0xf8
__arm64_sys_write+0x28/0x40
invoke_syscall+0x4c/0x110
el0_svc_common+0xb8/0xf0
do_el0_svc+0x28/0x40
el0_svc+0x4c/0xe8
el0t_64_sync_handler+0x84/0x108
el0t_64_sync+0x198/0x1a0
other info that might help us debug this:
Chain exists of:
&cpuctx_lock --> &ctx->lock --> &drvdata->spinlock
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&drvdata->spinlock);
local_irq_disable();
lock(&cpuctx_lock);
lock(&ctx->lock);
<Interrupt>
lock(&cpuctx_lock);
*** DEADLOCK ***
4 locks held by perf-exec/709:
#0: ffff0008066b66f8 (&sig->cred_guard_mutex){+.+.}-{4:4}, at:
bprm_execve+0x54/0x690
#1: ffff0008066b67a0 (&sig->exec_update_lock){++++}-{4:4}, at:
exec_mmap+0x48/0x2b0
#2: ffff000976a467f0 (&cpuctx_lock){-...}-{2:2}, at:
perf_event_exec+0xb4/0x6b8
#3: ffff00080ab67e18 (&ctx->lock){....}-{2:2}, at:
perf_event_exec+0xc4/0x6b8
the dependencies between HARDIRQ-irq-safe lock and the
holding lock:
-> (&cpuctx_lock){-...}-{2:2} {
IN-HARDIRQ-W at:
lock_acquire+0x130/0x2c0
_raw_spin_lock+0x60/0xa8
__perf_install_in_context+0x5c/0x2f0
remote_function+0x58/0x78
__flush_smp_call_function_queue+0x1d8/0x9c0
generic_smp_call_function_single_interrupt+0x20/0x38
ipi_handler+0x118/0x338
handle_percpu_devid_irq+0xb0/0x180
generic_handle_domain_irq+0x4c/0x78
gic_handle_irq+0x68/0xf0
call_on_irq_stack+0x24/0x30
do_interrupt_handler+0x88/0xd0
el1_interrupt+0x34/0x68
el1h_64_irq_handler+0x18/0x28
el1h_64_irq+0x6c/0x70
arch_local_irq_enable+0x8/0x10
cpuidle_enter+0x44/0x68
do_idle+0x1b0/0x2b8
cpu_startup_entry+0x40/0x50
rest_init+0x1c4/0x1d0
start_kernel+0x394/0x458
__primary_switched+0x88/0x98
INITIAL USE at:
lock_acquire+0x130/0x2c0
_raw_spin_lock+0x60/0xa8
__perf_event_exit_context+0x3c/0xb0
generic_exec_single+0xb0/0x3a8
smp_call_function_single+0x180/0xa98
perf_event_exit_cpu+0x344/0x3d8
cpuhp_invoke_callback+0x120/0x2a0
cpuhp_thread_fun+0x170/0x1d8
smpboot_thread_fn+0x1c0/0x328
kthread+0x148/0x250
ret_from_fork+0x10/0x20
}
... key at: [<ffff800082bbe238>] cpuctx_lock+0x0/0x10
-> (&ctx->lock){....}-{2:2} {
INITIAL USE at:
lock_acquire+0x130/0x2c0
_raw_spin_lock_irq+0x70/0xb8
find_get_pmu_context+0x88/0x238
__arm64_sys_perf_event_open+0x794/0x1150
invoke_syscall+0x4c/0x110
el0_svc_common+0xb8/0xf0
do_el0_svc+0x28/0x40
el0_svc+0x4c/0xe8
el0t_64_sync_handler+0x84/0x108
el0t_64_sync+0x198/0x1a0
}
... key at: [<ffff800082bbe1d0>]
__perf_event_init_context.__key+0x0/0x10
... acquired at:
_raw_spin_lock+0x60/0xa8
__perf_install_in_context+0x6c/0x2f0
remote_function+0x58/0x78
generic_exec_single+0xb0/0x3a8
smp_call_function_single+0x180/0xa98
perf_install_in_context+0x1a0/0x290
__arm64_sys_perf_event_open+0x103c/0x1150
invoke_syscall+0x4c/0x110
el0_svc_common+0xb8/0xf0
do_el0_svc+0x28/0x40
el0_svc+0x4c/0xe8
el0t_64_sync_handler+0x84/0x108
el0t_64_sync+0x198/0x1a0
the dependencies between the lock to be acquired
and HARDIRQ-irq-unsafe lock:
-> (&drvdata->spinlock){+.+.}-{2:2} {
HARDIRQ-ON-W at:
lock_acquire+0x130/0x2c0
_raw_spin_lock+0x60/0xa8
cti_disable+0x38/0xe8 [coresight_cti]
coresight_disable_source+0x88/0xa8 [coresight]
coresight_disable_sysfs+0xd0/0x1f0 [coresight]
enable_source_store+0x78/0xb0 [coresight]
dev_attr_store+0x24/0x40
sysfs_kf_write+0xa8/0xd0
kernfs_fop_write_iter+0x114/0x1c0
vfs_write+0x2d8/0x310
ksys_write+0x80/0xf8
__arm64_sys_write+0x28/0x40
invoke_syscall+0x4c/0x110
el0_svc_common+0xb8/0xf0
do_el0_svc+0x28/0x40
el0_svc+0x4c/0xe8
el0t_64_sync_handler+0x84/0x108
el0t_64_sync+0x198/0x1a0
SOFTIRQ-ON-W at:
lock_acquire+0x130/0x2c0
_raw_spin_lock+0x60/0xa8
cti_disable+0x38/0xe8 [coresight_cti]
coresight_disable_source+0x88/0xa8 [coresight]
coresight_disable_sysfs+0xd0/0x1f0 [coresight]
enable_source_store+0x78/0xb0 [coresight]
dev_attr_store+0x24/0x40
sysfs_kf_write+0xa8/0xd0
kernfs_fop_write_iter+0x114/0x1c0
vfs_write+0x2d8/0x310
ksys_write+0x80/0xf8
__arm64_sys_write+0x28/0x40
invoke_syscall+0x4c/0x110
el0_svc_common+0xb8/0xf0
do_el0_svc+0x28/0x40
el0_svc+0x4c/0xe8
el0t_64_sync_handler+0x84/0x108
el0t_64_sync+0x198/0x1a0
INITIAL USE at:
lock_acquire+0x130/0x2c0
_raw_spin_lock+0x60/0xa8
cti_cpu_pm_notify+0x54/0x160 [coresight_cti]
notifier_call_chain+0xb8/0x1b8
raw_notifier_call_chain_robust+0x50/0xb0
cpu_pm_enter+0x50/0x90
psci_enter_idle_state+0x3c/0x80
cpuidle_enter_state+0x158/0x340
cpuidle_enter+0x44/0x68
do_idle+0x1b0/0x2b8
cpu_startup_entry+0x40/0x50
secondary_start_kernel+0x120/0x150
__secondary_switched+0xc0/0xc8
}
... key at: [<ffff80007b10d2a8>]
cti_probe.__key+0x0/0xffffffffffffdd58 [coresight_cti]
... acquired at:
_raw_spin_lock_irqsave+0x70/0xc0
cti_enable+0x40/0x130 [coresight_cti]
_coresight_enable_path+0x134/0x3c0 [coresight]
coresight_enable_path+0x28/0x88 [coresight]
etm_event_start+0xe0/0x228 [coresight]
etm_event_add+0x40/0x68 [coresight]
event_sched_in+0x270/0x418
visit_groups_merge+0x428/0xcd0
__pmu_ctx_sched_in+0xa0/0xe0
ctx_sched_in+0x110/0x188
ctx_resched+0x1c0/0x2b8
perf_event_exec+0x29c/0x6b8
begin_new_exec+0x378/0x558
load_elf_binary+0x2b0/0xb00
bprm_execve+0x394/0x690
do_execveat_common+0x2a0/0x300
__arm64_sys_execve+0x50/0x70
invoke_syscall+0x4c/0x110
el0_svc_common+0xb8/0xf0
do_el0_svc+0x28/0x40
el0_svc+0x4c/0xe8
el0t_64_sync_handler+0x84/0x108
el0t_64_sync+0x198/0x1a0
===============================================
And the second one is when reloading the modules:
$ sudo rmmod coresight_stm coresight_funnel stm_core
coresight_replicator coresight_tpiu coresight_etm4x coresight_tmc
coresight_cti coresight_cpu_debug coresight_trbe coresight
$ sudo modprobe coresight; sudo modprobe coresight_stm ; sudo
modprobe coresight_funnel; sudo modprobe stm_core; sudo modprobe
coresight_replicator; sudo modprobe coresight_cpu_debug; sudo modprobe
coresight_tpiu; sudo modprobe coresight_etm4x; sudo modprobe
coresight_tmc; sudo modprobe coresight_trbe ; sudo modprobe coresight_cti ;
Unable to handle kernel NULL pointer dereference at virtual address
00000000000004f0
pc : cti_cpu_pm_notify+0x74/0x160 [coresight_cti]
lr : cti_cpu_pm_notify+0x54/0x160 [coresight_cti]
Call trace:
cti_cpu_pm_notify+0x74/0x160 [coresight_cti] (P)
notifier_call_chain+0xb8/0x1b8
raw_notifier_call_chain_robust+0x50/0xb0
cpu_pm_enter+0x50/0x90
psci_enter_idle_state+0x3c/0x80
cpuidle_enter_state+0x158/0x340
cpuidle_enter+0x44/0x68
do_idle+0x1b0/0x2b8
cpu_startup_entry+0x40/0x50
secondary_start_kernel+0x120/0x150
__secondary_switched+0xc0/0xc8
Powered by blists - more mailing lists