linux-kernel - Re: [PATCH v3] soundwire: stream: Prepare ports in parallel to reduce stream start latency

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <c81f581a-ee9f-4692-8ed8-736b5c5053af@linux.dev>
Date: Tue, 23 Dec 2025 14:19:11 +0100
From: Pierre-Louis Bossart <pierre-louis.bossart@...ux.dev>
To: Richard Fitzgerald <rf@...nsource.cirrus.com>, vkoul@...nel.org,
 yung-chuan.liao@...ux.intel.com
Cc: linux-sound@...r.kernel.org, linux-kernel@...r.kernel.org,
 patches@...nsource.cirrus.com
Subject: Re: [PATCH v3] soundwire: stream: Prepare ports in parallel to reduce
 stream start latency

On 12/23/25 11:47, Richard Fitzgerald wrote:
> On 23/12/25 10:29, Pierre-Louis Bossart wrote:
> <SNIP>
> 
>>>> I can see how preparing all ports in parallel would reduce the total time compared to a serial approach, even with a timeout, but if we end-up always timing out even in the case where there is a single device it'd be quite odd.
>>>
>>> Yes, but that's a separate problem that needs fixing. The current code
>>> in the kernel does the timeout for every amp. So if you have 4 amps with
>>> a timeout of 10ms it takes >40ms to do the port prepare.
>>
>> so is this fair to say that by not testing the return value of wait_for_completion_timeout(), we never detected the 'always-timeout' behavior since the initial contribution circa 2018, even with a single device per link?
>>
>> Wow, that'd be a big 7-year old conceptual miss...
> 
> It's just  a design flaw in the code. It wasn't found because it was
> tested against Realtek codecs, which use the simplified CP_SM so the
> code can skip the Prepare stage.

ok, thanks for the precision.

> The original code checked for timeout and then always failed, because it
> always timed out. So none of the Cirrus amps and codecs worked. I sent
> the "temporary" workaround of reading the status register after the
> wait_for_completion() and ignoring the timeout if the device is now
> reporting prepared, until I could fix it properly. I've only just got
> around to looking at this again.

ok, so this is a follow-up to commit 3d3e88e from 2021, where the timeout check was removed?
This should probably be mentioned in the commit description, with a clear explanation that the timeout happens in all cases.

I personally misunderstood that earlier commit, which states 

'The previous implementation would always fail if the
wait_for_completion() timed out, even if the port was reporting
successful prepare.
'

 
>>>> In other words does this patch reduce the start latency only in the case of multiple devices, but adds a 'tax' for all other cases?
>>>
>>> This only affects peripherals that use the full CP_SM (not simplified).
>>>
>>> If you have only 1 peripheral there is either no change or a possibility
>>> of skipping the wait. Before this change it would always wait the full
>>> timeout before noticing that it is ready.
>>>
>>> After this change, you might be able to skip the wait. If it prepared
>>> immediately, so that it is already reporting prepared when the first
>>> read tests for that, it will skip the wait.
>>> (BTW I see that happening with CS35L56. It prepares so fast that it
>>> is already done, without waiting for any of the timeouts.)
>>>
>>> If it isn't ready that early, you get the timeout as before.
>>>
>>> It isn't adding any penalty to single devices, unless you count the
>>> trivial time it takes to do the register read, which is negligible
>>> compared to getting deadlocked in the wait_for_completion_timeout().
>>
>> ok, now I get the point that the use of sdw_acquire_bus_lock() in sdw_prepare_stream() prevents the use of device-initiated interrupts.
>>
>> One possible change is a departure from the use of the PORT_READY alert. This was meant to be a worst-case scenario for cases where the prepare time is in hundreds of ms - IIRC we thought devices would need to adjust their internal clock to that of the bus, which might change during a prepare operation. But as you mentioned it in practice devices are prepared much faster. So we could have a typical loop with multiple read/sleep instead. See for example cdns_clear_bit() as an example of what I have in mind, classic kernel code really.
>>
>> That might speed-up the prepare time even for the single device case - and then that may be enough to avoid the deferred check on PORT_READY and make the code simpler for multiple devices as well.
>>
> 
> I've got an experimental patch to replace the
> wait_for_completion_timeout() with a polling loop. It's a brute-force
> approach but reduces the overhead without the scary rewrite of the bus
> locking.

sounds good, it's not that bad in terms of brute-forcing and certainly less bad than a broken design where an expected interrupt cannot be serviced by construction.

>> At any rate you're onto something, thanks for submitting this and bearing with my questions.
> 
> I need to send a new version of this patch anyway because it will
> need rebasing onto https://lore.kernel.org/linux-sound/20251219173830.407-1-niranjan.hy@ti.com/T/#u
> That one is fixing a problem so makes sense to take it first.
> 
> I'll improve the commenting in V2 to explain better what the code
> is doing.

I'd recommend the 'brute-force' approach, this wait_for_completion_timeout() makes no sense - and it'd remove the need for the extra read before the nonsensical wait.

Probably something for 2026 now ;-)