linux-kernel - Re: [PATCH v2 25/28] coresight: trbe: Save and restore state across CPU low power state

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ba94795e-367c-429a-a19f-2a220e33a117@linaro.org>
Date: Thu, 4 Sep 2025 14:15:21 +0100
From: James Clark <james.clark@...aro.org>
To: Leo Yan <leo.yan@....com>, Yabin Cui <yabinc@...gle.com>
Cc: coresight@...ts.linaro.org, linux-arm-kernel@...ts.infradead.org,
 linux-kernel@...r.kernel.org, Suzuki K Poulose <suzuki.poulose@....com>,
 Mike Leach <mike.leach@...aro.org>, Levi Yun <yeoreum.yun@....com>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
 Keita Morisaki <keyz@...gle.com>, Yuanfang Zhang <quic_yuanfang@...cinc.com>
Subject: Re: [PATCH v2 25/28] coresight: trbe: Save and restore state across
 CPU low power state



On 01/07/2025 3:53 pm, Leo Yan wrote:
> From: Yabin Cui <yabinc@...gle.com>
> 
> Similar to ETE, TRBE may lose its context when a CPU enters low power
> state. To make things worse, if ETE is restored without TRBE being
> restored, an enabled source device with no enabled sink devices can
> cause CPU hang on some devices (e.g., Pixel 9).
> 
> The save and restore flows are described in the section K5.5 "Context
> switching" of Arm ARM (ARM DDI 0487 L.a). This commit adds save and
> restore callbacks with following the software usages defined in the
> architecture manual.
> 
> Signed-off-by: Yabin Cui <yabinc@...gle.com>
> Co-developed-by: Leo Yan <leo.yan@....com>
> Signed-off-by: Leo Yan <leo.yan@....com>
> ---

Hi Leo,

I tested this commit to try to avoid hitting any issues with the last 3 
hotplug changes but ran into two issues. They seemed to be hit when 
running the CPU online/offline/enable_source stress test and then after 
that running the Perf "Check Arm CoreSight trace data recording and 
synthesized samples" test.

It hit when doing them in either order, but not when doing only one 
after a reboot.

First one is just when running one of the tests:

  =====================================================
  WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
  6.16.0-rc3+ #475 Not tainted
  -----------------------------------------------------
  perf-exec/709 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
  ffff000804002cd0 (&drvdata->spinlock){+.+.}-{2:2}, at: 
cti_enable+0x40/0x130 [coresight_cti]

                and this task is already holding:
  ffff00080ab67e18 (&ctx->lock){....}-{2:2}, at: perf_event_exec+0xc4/0x6b8
  which would create a new lock dependency:
   (&ctx->lock){....}-{2:2} -> (&drvdata->spinlock){+.+.}-{2:2}

                but this new dependency connects a HARDIRQ-irq-safe lock:
   (&cpuctx_lock){-...}-{2:2}

                ... which became HARDIRQ-irq-safe at:
    lock_acquire+0x130/0x2c0
    _raw_spin_lock+0x60/0xa8
    __perf_install_in_context+0x5c/0x2f0
    remote_function+0x58/0x78
    __flush_smp_call_function_queue+0x1d8/0x9c0
    generic_smp_call_function_single_interrupt+0x20/0x38
    ipi_handler+0x118/0x338
    handle_percpu_devid_irq+0xb0/0x180
    generic_handle_domain_irq+0x4c/0x78
    gic_handle_irq+0x68/0xf0
    call_on_irq_stack+0x24/0x30
    do_interrupt_handler+0x88/0xd0
    el1_interrupt+0x34/0x68
    el1h_64_irq_handler+0x18/0x28
    el1h_64_irq+0x6c/0x70
    arch_local_irq_enable+0x8/0x10
    cpuidle_enter+0x44/0x68
    do_idle+0x1b0/0x2b8
    cpu_startup_entry+0x40/0x50
    rest_init+0x1c4/0x1d0
    start_kernel+0x394/0x458
    __primary_switched+0x88/0x98

                to a HARDIRQ-irq-unsafe lock:
   (&drvdata->spinlock){+.+.}-{2:2}

                ... which became HARDIRQ-irq-unsafe at:
  ...
    lock_acquire+0x130/0x2c0
    _raw_spin_lock+0x60/0xa8
    cti_disable+0x38/0xe8 [coresight_cti]
    coresight_disable_source+0x88/0xa8 [coresight]
    coresight_disable_sysfs+0xd0/0x1f0 [coresight]
    enable_source_store+0x78/0xb0 [coresight]
    dev_attr_store+0x24/0x40
    sysfs_kf_write+0xa8/0xd0
    kernfs_fop_write_iter+0x114/0x1c0
    vfs_write+0x2d8/0x310
    ksys_write+0x80/0xf8
    __arm64_sys_write+0x28/0x40
    invoke_syscall+0x4c/0x110
    el0_svc_common+0xb8/0xf0
    do_el0_svc+0x28/0x40
    el0_svc+0x4c/0xe8
    el0t_64_sync_handler+0x84/0x108
    el0t_64_sync+0x198/0x1a0

                other info that might help us debug this:

  Chain exists of:
                  &cpuctx_lock --> &ctx->lock --> &drvdata->spinlock

   Possible interrupt unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(&drvdata->spinlock);
                                 local_irq_disable();
                                 lock(&cpuctx_lock);
                                 lock(&ctx->lock);
    <Interrupt>
      lock(&cpuctx_lock);

                 *** DEADLOCK ***

  4 locks held by perf-exec/709:
   #0: ffff0008066b66f8 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: 
bprm_execve+0x54/0x690
   #1: ffff0008066b67a0 (&sig->exec_update_lock){++++}-{4:4}, at: 
exec_mmap+0x48/0x2b0
   #2: ffff000976a467f0 (&cpuctx_lock){-...}-{2:2}, at: 
perf_event_exec+0xb4/0x6b8
   #3: ffff00080ab67e18 (&ctx->lock){....}-{2:2}, at: 
perf_event_exec+0xc4/0x6b8

                the dependencies between HARDIRQ-irq-safe lock and the 
holding lock:
   -> (&cpuctx_lock){-...}-{2:2} {
      IN-HARDIRQ-W at:
                        lock_acquire+0x130/0x2c0
                        _raw_spin_lock+0x60/0xa8
                        __perf_install_in_context+0x5c/0x2f0
                        remote_function+0x58/0x78
                        __flush_smp_call_function_queue+0x1d8/0x9c0
                        generic_smp_call_function_single_interrupt+0x20/0x38
                        ipi_handler+0x118/0x338
                        handle_percpu_devid_irq+0xb0/0x180
                        generic_handle_domain_irq+0x4c/0x78
                        gic_handle_irq+0x68/0xf0
                        call_on_irq_stack+0x24/0x30
                        do_interrupt_handler+0x88/0xd0
                        el1_interrupt+0x34/0x68
                        el1h_64_irq_handler+0x18/0x28
                        el1h_64_irq+0x6c/0x70
                        arch_local_irq_enable+0x8/0x10
                        cpuidle_enter+0x44/0x68
                        do_idle+0x1b0/0x2b8
                        cpu_startup_entry+0x40/0x50
                        rest_init+0x1c4/0x1d0
                        start_kernel+0x394/0x458
                        __primary_switched+0x88/0x98
      INITIAL USE at:
                       lock_acquire+0x130/0x2c0
                       _raw_spin_lock+0x60/0xa8
                       __perf_event_exit_context+0x3c/0xb0
                       generic_exec_single+0xb0/0x3a8
                       smp_call_function_single+0x180/0xa98
                       perf_event_exit_cpu+0x344/0x3d8
                       cpuhp_invoke_callback+0x120/0x2a0
                       cpuhp_thread_fun+0x170/0x1d8
                       smpboot_thread_fn+0x1c0/0x328
                       kthread+0x148/0x250
                       ret_from_fork+0x10/0x20
    }
    ... key      at: [<ffff800082bbe238>] cpuctx_lock+0x0/0x10
  -> (&ctx->lock){....}-{2:2} {
     INITIAL USE at:
                     lock_acquire+0x130/0x2c0
                     _raw_spin_lock_irq+0x70/0xb8
                     find_get_pmu_context+0x88/0x238
                     __arm64_sys_perf_event_open+0x794/0x1150
                     invoke_syscall+0x4c/0x110
                     el0_svc_common+0xb8/0xf0
                     do_el0_svc+0x28/0x40
                     el0_svc+0x4c/0xe8
                     el0t_64_sync_handler+0x84/0x108
                     el0t_64_sync+0x198/0x1a0
   }
   ... key      at: [<ffff800082bbe1d0>] 
__perf_event_init_context.__key+0x0/0x10
   ... acquired at:
     _raw_spin_lock+0x60/0xa8
     __perf_install_in_context+0x6c/0x2f0
     remote_function+0x58/0x78
     generic_exec_single+0xb0/0x3a8
     smp_call_function_single+0x180/0xa98
     perf_install_in_context+0x1a0/0x290
     __arm64_sys_perf_event_open+0x103c/0x1150
     invoke_syscall+0x4c/0x110
     el0_svc_common+0xb8/0xf0
     do_el0_svc+0x28/0x40
     el0_svc+0x4c/0xe8
     el0t_64_sync_handler+0x84/0x108
     el0t_64_sync+0x198/0x1a0


                the dependencies between the lock to be acquired
   and HARDIRQ-irq-unsafe lock:
  -> (&drvdata->spinlock){+.+.}-{2:2} {
     HARDIRQ-ON-W at:
                      lock_acquire+0x130/0x2c0
                      _raw_spin_lock+0x60/0xa8
                      cti_disable+0x38/0xe8 [coresight_cti]
                      coresight_disable_source+0x88/0xa8 [coresight]
                      coresight_disable_sysfs+0xd0/0x1f0 [coresight]
                      enable_source_store+0x78/0xb0 [coresight]
                      dev_attr_store+0x24/0x40
                      sysfs_kf_write+0xa8/0xd0
                      kernfs_fop_write_iter+0x114/0x1c0
                      vfs_write+0x2d8/0x310
                      ksys_write+0x80/0xf8
                      __arm64_sys_write+0x28/0x40
                      invoke_syscall+0x4c/0x110
                      el0_svc_common+0xb8/0xf0
                      do_el0_svc+0x28/0x40
                      el0_svc+0x4c/0xe8
                      el0t_64_sync_handler+0x84/0x108
                      el0t_64_sync+0x198/0x1a0
     SOFTIRQ-ON-W at:
                      lock_acquire+0x130/0x2c0
                      _raw_spin_lock+0x60/0xa8
                      cti_disable+0x38/0xe8 [coresight_cti]
                      coresight_disable_source+0x88/0xa8 [coresight]
                      coresight_disable_sysfs+0xd0/0x1f0 [coresight]
                      enable_source_store+0x78/0xb0 [coresight]
                      dev_attr_store+0x24/0x40
                      sysfs_kf_write+0xa8/0xd0
                      kernfs_fop_write_iter+0x114/0x1c0
                      vfs_write+0x2d8/0x310
                      ksys_write+0x80/0xf8
                      __arm64_sys_write+0x28/0x40
                      invoke_syscall+0x4c/0x110
                      el0_svc_common+0xb8/0xf0
                      do_el0_svc+0x28/0x40
                      el0_svc+0x4c/0xe8
                      el0t_64_sync_handler+0x84/0x108
                      el0t_64_sync+0x198/0x1a0
     INITIAL USE at:
                     lock_acquire+0x130/0x2c0
                     _raw_spin_lock+0x60/0xa8
                     cti_cpu_pm_notify+0x54/0x160 [coresight_cti]
                     notifier_call_chain+0xb8/0x1b8
                     raw_notifier_call_chain_robust+0x50/0xb0
                     cpu_pm_enter+0x50/0x90
                     psci_enter_idle_state+0x3c/0x80
                     cpuidle_enter_state+0x158/0x340
                     cpuidle_enter+0x44/0x68
                     do_idle+0x1b0/0x2b8
                     cpu_startup_entry+0x40/0x50
                     secondary_start_kernel+0x120/0x150
                     __secondary_switched+0xc0/0xc8
   }
   ... key      at: [<ffff80007b10d2a8>] 
cti_probe.__key+0x0/0xffffffffffffdd58 [coresight_cti]
   ... acquired at:
     _raw_spin_lock_irqsave+0x70/0xc0
     cti_enable+0x40/0x130 [coresight_cti]
     _coresight_enable_path+0x134/0x3c0 [coresight]
     coresight_enable_path+0x28/0x88 [coresight]
     etm_event_start+0xe0/0x228 [coresight]
     etm_event_add+0x40/0x68 [coresight]
     event_sched_in+0x270/0x418
     visit_groups_merge+0x428/0xcd0
     __pmu_ctx_sched_in+0xa0/0xe0
     ctx_sched_in+0x110/0x188
     ctx_resched+0x1c0/0x2b8
     perf_event_exec+0x29c/0x6b8
     begin_new_exec+0x378/0x558
     load_elf_binary+0x2b0/0xb00
     bprm_execve+0x394/0x690
     do_execveat_common+0x2a0/0x300
     __arm64_sys_execve+0x50/0x70
     invoke_syscall+0x4c/0x110
     el0_svc_common+0xb8/0xf0
     do_el0_svc+0x28/0x40
     el0_svc+0x4c/0xe8
     el0t_64_sync_handler+0x84/0x108
     el0t_64_sync+0x198/0x1a0

===============================================

And the second one is when reloading the modules:

  $ sudo rmmod coresight_stm coresight_funnel stm_core 
coresight_replicator coresight_tpiu coresight_etm4x coresight_tmc 
coresight_cti coresight_cpu_debug coresight_trbe coresight
  $ sudo modprobe coresight;  sudo modprobe coresight_stm ; sudo 
modprobe coresight_funnel; sudo modprobe stm_core; sudo modprobe 
coresight_replicator; sudo modprobe coresight_cpu_debug; sudo modprobe 
coresight_tpiu; sudo modprobe coresight_etm4x; sudo modprobe 
coresight_tmc; sudo modprobe coresight_trbe ; sudo modprobe coresight_cti ;

  Unable to handle kernel NULL pointer dereference at virtual address 
00000000000004f0
  pc : cti_cpu_pm_notify+0x74/0x160 [coresight_cti]
  lr : cti_cpu_pm_notify+0x54/0x160 [coresight_cti]
  Call trace:
   cti_cpu_pm_notify+0x74/0x160 [coresight_cti] (P)
   notifier_call_chain+0xb8/0x1b8
   raw_notifier_call_chain_robust+0x50/0xb0
   cpu_pm_enter+0x50/0x90
   psci_enter_idle_state+0x3c/0x80
   cpuidle_enter_state+0x158/0x340
   cpuidle_enter+0x44/0x68
   do_idle+0x1b0/0x2b8
   cpu_startup_entry+0x40/0x50
   secondary_start_kernel+0x120/0x150
   __secondary_switched+0xc0/0xc8