[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250804090812.GI143191@e132581.arm.com>
Date: Mon, 4 Aug 2025 10:08:12 +0100
From: Leo Yan <leo.yan@....com>
To: Anshuman Khandual <anshuman.khandual@....com>
Cc: Suzuki K Poulose <suzuki.poulose@....com>,
Mike Leach <mike.leach@...aro.org>,
James Clark <james.clark@...aro.org>,
Levi Yun <yeoreum.yun@....com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Yabin Cui <yabinc@...gle.com>, Keita Morisaki <keyz@...gle.com>,
Yuanfang Zhang <quic_yuanfang@...cinc.com>,
coresight@...ts.linaro.org, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 02/28] coresight: etm4x: Always set tracer's device
mode on target CPU
On Tue, Jul 15, 2025 at 12:56:54PM +0530, Anshuman Khandual wrote:
>
> On 01/07/25 8:23 PM, Leo Yan wrote:
> > When enabling a tracer via SysFS interface, the device mode may be set
> > by any CPU - not necessarily the target CPU. This can lead to race
> > condition in SMP, and may result in incorrect mode values being read.
> >
> > Consider the following example, where CPU0 attempts to enable the tracer
> > on CPU1 (the target CPU):
> >
> > CPU0 CPU1
> > etm4_enable()
> > ` coresight_take_mode(SYSFS)
> > ` etm4_enable_sysfs()
> > ` smp_call_function_single() ----> etm4_enable_hw_smp_call()
> > /
> > / CPU idle:
> > / etm4_cpu_save()
> > / ` coresight_get_mode()
> > Failed to enable h/w / ^^^
> > ` coresight_set_mode(DISABLED) <-' Read the intermediate SYSFS mode
>
> The problem is - CPU1's HW state and CPU1's sysfs mode state might not
> remain in sync if CPU1 goes into idle state just after an unsuccessful
> etm4_enable_sysfs() attempt from CPU0. In which case a subsequent read
> coresight_get_mode() on CPU1 might erroneously give us DISABLED state,
In this case, CPU1 reads an intermediate "SYSFS" state, even though it
failed in etm4_enable_hw_smp_call(). The current code defers setting
the state to DISABLED on CPU0. As a result, CPU1 will incorrectly save
and restore the ETM context based on the intermediate "SYSFS" state.
> which actually does not seem to be too bad as the earlier enablement
> attempt had failed anyway. Just trying to understand what is the real
> problem here.
The problem is CPU1 might get an intermediate state, it turns out a
stale value and might guide CPU idle flow to wrongly save and restore
ETM context.
> > In this case, CPU0 initiates the operation by taking the SYSFS mode to
> > avoid conflicts with the Perf mode. It then sends an IPI to CPU1 to
> > configure the tracer registers. If any error occurs during this process,
>
> What kind of error can happen during this process ?
So far, it might fail to claim a device and return an error.
A similar issue might occur when CPU1 exits an idle state. For example,
if CPU0 initiates the ETM enabling flow and sets the SYSFS mode in
advance, once CPU1 is woken up from idle by an IPI, it reads the ETM
state (SYSFS mode) and then restores and enables the ETM. This can
happen even before CPU1 invokes etm4_enable_hw_smp_call() to complete
the ETM enable flow.
> > CPU0 rolls back by setting the mode to DISABLED.
>
> Which seems OK.
>
> >
> > However, if CPU1 enters an idle state during this time, it might read
> > the intermediate SYSFS mode. As a result, the CPU PM flow could wrongly
> > save and restore tracer context that is actually disabled.
>
> Right but CPU0 had marked the CPU1' state as DISABLED after the enable
> attempt had failed. So what is the problem ?
There is a race condition between CPU0 writing the state and CPU1
reading the state (during its CPU idle flow). CPU1 might read a state
that is inconsistent with the actual ETM hardware state, which causes
CPU1 to save and restore the ETM context incorrectly.
A wider view is this series heavily relies on the ETM state to decide
the linked path has been enabled and take action for saving and
restoring all components on the path (not for ETM device only). We need
a reliable state machine to reflect hardware state. To avoid any
intermediate state, we always set the state on the target CPU.
Thanks,
Leo
Powered by blists - more mailing lists