linux-kernel - Re: [PATCH v3] soundwire: stream: Prepare ports in parallel to reduce stream start latency

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <795fd33c-7a0f-4600-87be-1690cb0c0ea3@opensource.cirrus.com>
Date: Tue, 9 Dec 2025 13:36:17 +0000
From: Richard Fitzgerald <rf@...nsource.cirrus.com>
To: Pierre-Louis Bossart <pierre-louis.bossart@...ux.dev>, vkoul@...nel.org,
        yung-chuan.liao@...ux.intel.com
Cc: linux-sound@...r.kernel.org, linux-kernel@...r.kernel.org,
        patches@...nsource.cirrus.com
Subject: Re: [PATCH v3] soundwire: stream: Prepare ports in parallel to reduce
 stream start latency

On 09/12/2025 1:04 pm, Pierre-Louis Bossart wrote:
> On 11/25/25 16:56, Richard Fitzgerald wrote:
>> Issue DP prepare to all ports that use full CP_SM. Then wait for the
>> prepare to complete. This allow all the DP to prepare in parallel to
>> reduce the latency of starting an audio stream.
>>
>> On a system with six CS35L56 amps, this reduces the startup latency,
>> from runtime_resume to all amps ready to play, from ~160 ms to ~60 ms.
>>
>> (Test hardware: UpXtreme i14, BIOS v1.2, Core Ultra 7 155H, 3x CS35L56
>> on link 0, 3x CS35L56 on link 1).
>>
>> An initial read of DPn_PREPARESTATUS is done before dropping into the wait,
>> so that a quick exit can be made if the port is already prepared. Currently
>> this is essential because the wait deadlocks - the stream setup takes
>> bus_lock, which blocks the interrupt handler - so the wait for completion
>> will always timeout.
>>
>> However, an experiment of removing the bus_lock from stream setup, so that
>> the interrupt will work, shows that wait for completion takes ~700..800 us
>> but the quick-exit read takes 50..200 us. So the quick exit is still
>> valuable even if the stream.c code was rewritten to allow the completion
>> interrupt to work. Rewriting the code so it doesn't take bus_lock is risky.
>> The deadlock only lasts until the wait times out so it's not a serious
>> problem now that the DP prepare happens in parallel.
> 
> The only limitation I see with the stream mechanism is that we cannot start two or more streams at the same time, even if the hardware supports it. On paper it'd be interesting to e.g. start capture and playback with the same trigger (bank switch). Like you said I am not sure anyone is ready for now to test all the corner cases to remove this bus lock.
> 
>>
>> Signed-off-by: Richard Fitzgerald <rf@...nsource.cirrus.com>
>> ---
>> Changes in V3:
>> - Removed duplicate deferencing of s_rt->slave->prop.dp0_prop.
>>    V2 saved it into dp0_prop, so use that.
>>
>> Changes in V2:
>> +	if (simple_ch_prep_sm)
>> +		return 0;
>> +
>> +	/*
>> +	 * Check if already prepared. Avoid overhead of waiting for interrupt
>> +	 * and port_ready completion if we don't need to.
>> +	 */
>> +	val = sdw_read_no_pm(s_rt->slave, SDW_DPN_PREPARESTATUS(p_rt->num));
>> +	if (val < 0) {
>> +		ret = val;
>> +		goto err;
>> +	}
>> +
>> +	if (val & p_rt->ch_mask) {
> 
> Can you explain why we don't use the ch_mask in the already-prepared case? I am missing something.
> 
I'm not sure what you mean here. The if() immediately above your comment
uses ch_mask to check the already-prepared state.

>> +		/* Wait for completion on port ready */
>> +		port_ready = &s_rt->slave->port_ready[p_rt->num];
>> +		wait_for_completion_timeout(port_ready, msecs_to_jiffies(ch_prep_timeout));
> 
> I understand the code is the same as before but would there be any merit in checking the timeout before starting a read? If the device is already in the weeds, doing another read adds even more time before reporting an error.
> 
Do you mean save the system time when the DPN_PREPARE was written to
that peripheral and then check here whether the timeout period has
already elapsed?

If that's what you mean, I don't see much advantage in that. If the
hardware is working correctly, this will be detected by the read above
that checks if the peripheral has already prepared. If it has we skip
the wait_for_completion_timeout().

If the peripheral is "in the weeds", so that its prepare time has
already passed and it still isn't ready, we're no longer in a state
where we care about minimizing audio startup time because the hardware
is now broken. So it's probably not worth complicating the code to
take a few milliseconds off that case.

>> +		val = sdw_read_no_pm(s_rt->slave, SDW_DPN_PREPARESTATUS(p_rt->num));
>> +		if ((val < 0) || (val & p_rt->ch_mask)) {
>> +			ret = (val < 0) ? val : -ETIMEDOUT;
>> +			goto err;
>> +		}
>> +	}
> T
>