[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251003192233.1618447-1-a.shimko.dev@gmail.com>
Date: Fri, 3 Oct 2025 22:22:33 +0300
From: Artem Shimko <a.shimko.dev@...il.com>
To: Sudeep Holla <sudeep.holla@....com>,
Cristian Marussi <cristian.marussi@....com>
Cc: a.shimko.dev@...il.com,
arm-scmi@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org
Subject: [PATCH v2] drivers: scmi: Add completion timeout handling for raw mode transfers
Fix race conditions in SCMI raw mode implementation by adding proper
completion timeout handling. Multiple tests in the SCMI test suite
were failing due to early clearing of SCMI_XFER_FLAG_IS_RAW flag in
scmi_xfer_raw_put() function.
TRANS=raw
PROTOCOLS=base,clock,power_domain,performance,system_power,sensor,
voltage,reset,powercap,pin_control VERBOSE=5
The root cause:
Tests were failing on poll() system calls with this condition:
if (!raw || (idx == SCMI_RAW_REPLY_QUEUE && !SCMI_XFER_IS_RAW(xfer)))
return;
The SCMI_XFER_FLAG_IS_RAW flag was being cleared prematurely before
the transfer completion was properly acknowledged, causing the poll
to return on timeout and tests to fail.
Fix ensures:
- Proper synchronization between transfer completion and flag clearing
- Stable test execution by maintaining correct flag states
An example of a random test failure:
817: Voltage get ext name for invalid domain
[Check 1] Get extended name for invalid domain
MSG HDR : 0x04585c09
NUM PARAM : 1
PARAMETER[00] : 0x0000000c
CHECK STATUS : PASSED [SCMI_NOT_FOUND_ERR]
CHECK HEADER : PASSED [0x04585c09]
RETURN COUNT : 0
NUM DOMAINS : 11
VOLTAGE DOMAIN : 0
[Check 2] Get extended name for unsupp. domain
MSG HDR : 0x045c5c09
NUM PARAM : 1
PARAMETER[00] : 0x00000000
CHECK STATUS : FAILED
EXPECTED : SCMI_NOT_FOUND_ERR
RECEIVED : SCMI_GENERIC_ERROR : NON CONFORMANT
After making these changes, the tests stopped failing.
$mount -t debugfs none /sys/kernel/debug
$scmi_test_agent
[ 127.865032] arm-scmi arm-scmi.1.auto: Resetting SCMI Raw stack.
[ 128.360503] arm-scmi arm-scmi.1.auto: Using Base channel for protocol 0x12
$tail -n 6 arm_scmi_test_log.txt
****************************************************
TOTAL TESTS: 167 PASSED: 120 FAILED: 0 SKIPPED: 47
****************************************************
An ftrace log with of passed test:
0) | scmi_rx_callback()
0) | scmi_raw_message_report()
7) | scmi_xfer_raw_wait_for_message_response()
7) + 22.000 us | scmi_wait_for_reply();
0) | /* scmi_raw_message_report*/
7) | scmi_xfer_raw_put()
An ftrace log with of failed test:
0) | scmi_rx_callback() {
0) | scmi_raw_message_report()
5) | scmi_xfer_raw_wait_for_message_response()
5) ! 383.000 us | scmi_wait_for_reply();
5) | scmi_xfer_raw_put() {
0) | /* scmi_raw_message_report*/
Link [1] https://gitlab.arm.com/tests/scmi-tests/-/releases
Fixes: 3095a3e25d8f7 (firmware: arm_scmi: Add xfer helpers to provide raw access)
Suggested-by: Cristian Marussi <cristian.marussi@....com>
Signed-off-by: Artem Shimko <a.shimko.dev@...il.com>
---
Hi Cristian,
Good point about CONFIG_ARM_SCMI_RAW_MODE_SUPPORT_COEX.
I can confirm this setting doesn't impact the test failures in my environment.
The issue reproduces consistently with COEX both enabled and disabled.
Thank you!
Best regards,
Artem Shimko
ChangeLog:
v1:
* https://lore.kernel.org/arm-scmi/20250929142856.540590-1-a.shimko.dev@gmail.com/
v2:
* Use simpler approach suggested by Cristian Marussi
* Clear all xfer flags in __scmi_xfer_put() under spinlock protection
* Add Fixes tag as requested
* Drop completion timeout mechanism from v1
drivers/firmware/arm_scmi/driver.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c
index bd56a877fdfc..0976bfdbb44b 100644
--- a/drivers/firmware/arm_scmi/driver.c
+++ b/drivers/firmware/arm_scmi/driver.c
@@ -821,6 +821,7 @@ __scmi_xfer_put(struct scmi_xfers_info *minfo, struct scmi_xfer *xfer)
scmi_dec_count(info->dbg->counters, XFERS_INFLIGHT);
}
+ xfer->flags = 0;
hlist_add_head(&xfer->node, &minfo->free_xfers);
}
spin_unlock_irqrestore(&minfo->xfer_lock, flags);
@@ -839,8 +840,6 @@ void scmi_xfer_raw_put(const struct scmi_handle *handle, struct scmi_xfer *xfer)
{
struct scmi_info *info = handle_to_scmi_info(handle);
- xfer->flags &= ~SCMI_XFER_FLAG_IS_RAW;
- xfer->flags &= ~SCMI_XFER_FLAG_CHAN_SET;
return __scmi_xfer_put(&info->tx_minfo, xfer);
}
--
2.43.0
Powered by blists - more mailing lists