lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <aOVGGaY9NmKqUwPG@pluto>
Date: Tue, 7 Oct 2025 17:55:53 +0100
From: Cristian Marussi <cristian.marussi@....com>
To: Artem Shimko <a.shimko.dev@...il.com>
Cc: Sudeep Holla <sudeep.holla@....com>,
	Cristian Marussi <cristian.marussi@....com>,
	arm-scmi@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] drivers: scmi: Add completion timeout handling for
 raw mode transfers

On Fri, Oct 03, 2025 at 10:22:33PM +0300, Artem Shimko wrote:
> Fix race conditions in SCMI raw mode implementation by adding proper
> completion timeout handling. Multiple tests in the SCMI test suite
> were failing due to early clearing of SCMI_XFER_FLAG_IS_RAW flag in
> scmi_xfer_raw_put() function.

Hi Artem,

LGTM now .... but ... now the commit message is no more describing what you
are doing, right ? ... it is no more handled with completions...

Please fix the commit message to reflect what you are doing; also it
would be good to at first explain the issue (like you are doing
already), and THEN describe the solution applied...

Following the rules in "Describe your changes" in:

	https://www.kernel.org/doc/html/v6.17/process/submitting-patches.html

(if you already know this ... just ignore me)

> 
> TRANS=raw
> PROTOCOLS=base,clock,power_domain,performance,system_power,sensor,
> voltage,reset,powercap,pin_control VERBOSE=5
> 
> The root cause:
> Tests were failing on poll() system calls with this condition:
>     if (!raw || (idx == SCMI_RAW_REPLY_QUEUE && !SCMI_XFER_IS_RAW(xfer)))
>         return;
> 
> The SCMI_XFER_FLAG_IS_RAW flag was being cleared prematurely before
> the transfer completion was properly acknowledged, causing the poll
> to return on timeout and tests to fail.
> 
> Fix ensures:
> - Proper synchronization between transfer completion and flag clearing
> - Stable test execution by maintaining correct flag states
> 
> An example of a random test failure:
>  817: Voltage get ext name for invalid domain
>      [Check 1] Get extended name for invalid domain
>        MSG HDR        : 0x04585c09
>        NUM PARAM      : 1
>        PARAMETER[00]  : 0x0000000c
>        CHECK STATUS   : PASSED [SCMI_NOT_FOUND_ERR]
>        CHECK HEADER   : PASSED [0x04585c09]
>        RETURN COUNT   : 0
>        NUM DOMAINS    : 11
>        VOLTAGE DOMAIN : 0
>      [Check 2] Get extended name for unsupp. domain
>        MSG HDR        : 0x045c5c09
>        NUM PARAM      : 1
>        PARAMETER[00]  : 0x00000000
>        CHECK STATUS   : FAILED
>            EXPECTED   : SCMI_NOT_FOUND_ERR
>            RECEIVED   : SCMI_GENERIC_ERROR  : NON CONFORMANT
> 
> After making these changes, the tests stopped failing.
> 

I think also you can trim and drop this further explanation down here...
you have described clearly enough the issue above...

> $mount -t debugfs none /sys/kernel/debug
> $scmi_test_agent
> [  127.865032] arm-scmi arm-scmi.1.auto: Resetting SCMI Raw stack.
> [  128.360503] arm-scmi arm-scmi.1.auto: Using Base channel for protocol 0x12
> $tail -n 6 arm_scmi_test_log.txt
> ****************************************************
>   TOTAL TESTS: 167    PASSED: 120    FAILED: 0    SKIPPED: 47
> ****************************************************
> 
> An ftrace log with of passed test:
> 0)               |  scmi_rx_callback()
> 0)               |    scmi_raw_message_report()
> 7)               |    scmi_xfer_raw_wait_for_message_response()
> 7) + 22.000 us   |      scmi_wait_for_reply();
> 0)               |        /* scmi_raw_message_report*/
> 7)               |    scmi_xfer_raw_put()
> 
> An ftrace log with of failed test:
> 0)               |  scmi_rx_callback() {
> 0)               |    scmi_raw_message_report()
> 5)               |    scmi_xfer_raw_wait_for_message_response()
> 5) ! 383.000 us  |      scmi_wait_for_reply();
> 5)               |    scmi_xfer_raw_put() {
> 0)               |  /* scmi_raw_message_report*/
> 
> Link [1] https://gitlab.arm.com/tests/scmi-tests/-/releases
> 
> Fixes: 3095a3e25d8f7 (firmware: arm_scmi: Add xfer helpers to provide raw access)
> Suggested-by: Cristian Marussi <cristian.marussi@....com>
> Signed-off-by: Artem Shimko <a.shimko.dev@...il.com>
> ---
> Hi Cristian,
> 
> Good point about CONFIG_ARM_SCMI_RAW_MODE_SUPPORT_COEX. 
> 
> I can confirm this setting doesn't impact the test failures in my environment.
> The issue reproduces consistently with COEX both enabled and disabled.
> 
> Thank you!
> 

Good...

Thanks to you
Cristian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ