linux-kernel - Re: [PATCH v2 4/5] soundwire: cadence: Fix lost ATTACHED interrupts when enumerating

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <91b18b16-c3c5-554b-2875-93857521cf4c@opensource.cirrus.com>
Date:   Mon, 12 Sep 2022 13:36:14 +0100
From:   Richard Fitzgerald <rf@...nsource.cirrus.com>
To:     Pierre-Louis Bossart <pierre-louis.bossart@...ux.intel.com>,
        <vkoul@...nel.org>, <yung-chuan.liao@...ux.intel.com>,
        <sanyog.r.kale@...el.com>
CC:     <alsa-devel@...a-project.org>, <linux-kernel@...r.kernel.org>,
        <patches@...nsource.cirrus.com>
Subject: Re: [PATCH v2 4/5] soundwire: cadence: Fix lost ATTACHED interrupts
 when enumerating



On 12/09/2022 12:05, Pierre-Louis Bossart wrote:
> 
> 
> On 9/7/22 10:52, Richard Fitzgerald wrote:
>> The correct way to handle interrupts is to clear the bits we
>> are about to handle _before_ handling them. Thus if the condition
>> then re-asserts during the handling we won't lose it.
>>
>> This patch changes cdns_update_slave_status_work() to do this.
>>
>> The previous code cleared the interrupts after handling them.
>> The problem with this is that when handling enumeration of devices
>> the ATTACH statuses can be accidentally cleared and so some or all
>> of the devices never complete their enumeration.
>>
>> Thus we can have a situation like this:
>> - one or more devices are reverting to ID #0
>>
>> - accumulated status bits indicate some devices attached and some
>>    on ID #0. (Remember: status bits are sticky until they are handled)
>>
>> - Because of device on #0 sdw_handle_slave_status() programs the
>>    device ID and exits without handling the other status, expecting
>>    to get an ATTACHED from this reprogrammed device.
>>
>> - The device immediately starts reporting ATTACHED in PINGs, which
>>    will assert its CDNS_MCP_SLAVE_INTSTAT_ATTACHED bit.
>>
>> - cdns_update_slave_status_work() clears INTSTAT0/1. If the initial
>>    status had CDNS_MCP_SLAVE_INTSTAT_ATTACHED bit set it will be
>>    cleared.
>>
>> - The ATTACHED change for the device has now been lost.
>>
>> - cdns_update_slave_status_work() clears CDNS_MCP_INT_SLAVE_MASK so
>>    if the new ATTACHED state had set it, it will be cleared without
>>    ever having been handled.
>>
>> Unless there is some other state change from another device to cause
>> a new interrupt, the ATTACHED state of the reprogrammed device will
>> never cause an interrupt so its enumeration will not be completed.
>>
>> Signed-off-by: Richard Fitzgerald <rf@...nsource.cirrus.com>
>> ---
>>   drivers/soundwire/cadence_master.c | 18 ++++++++++++++----
>>   1 file changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/soundwire/cadence_master.c b/drivers/soundwire/cadence_master.c
>> index 245191d22ccd..3acd7b89c940 100644
>> --- a/drivers/soundwire/cadence_master.c
>> +++ b/drivers/soundwire/cadence_master.c
>> @@ -954,9 +954,22 @@ static void cdns_update_slave_status_work(struct work_struct *work)
>>   	u32 device0_status;
>>   	int retry_count = 0;
>>   
>> +	/*
>> +	 * Clear main interrupt first so we don't lose any assertions
>> +	 * the happen during this function.
>> +	 */
>> +	cdns_writel(cdns, CDNS_MCP_INTSTAT, CDNS_MCP_INT_SLAVE_MASK);
>> +
>>   	slave0 = cdns_readl(cdns, CDNS_MCP_SLAVE_INTSTAT0);
>>   	slave1 = cdns_readl(cdns, CDNS_MCP_SLAVE_INTSTAT1);
>>   
>> +	/*
>> +	 * Clear the bits before handling so we don't lose any
>> +	 * bits that re-assert.
>> +	 */
>> +	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT0, slave0);
>> +	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT1, slave1);
>> +
>>   	/* combine the two status */
>>   	slave_intstat = ((u64)slave1 << 32) | slave0;
>>   
>> @@ -964,8 +977,6 @@ static void cdns_update_slave_status_work(struct work_struct *work)
>>   
>>   update_status:
>>   	cdns_update_slave_status(cdns, slave_intstat);
>> -	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT0, slave0);
>> -	cdns_writel(cdns, CDNS_MCP_SLAVE_INTSTAT1, slave1);
> 
> this one is hard to review, if you don't clear the status here, then how
> does the retry work if there is a new event?

The retry loop doesn't work off the interrupt status bits. Precisely
because the #0 ATTACH bit probably doesn't re-assert if the PING status
for #0 doesn't change, the retry checks the most recent PING response
instead.

> 
> Put differently, do we need to retry and the 'goto update_status' any more?
> 

Yes, I believe you do still need it. The Cadence interrupts appear to
assert when there is a change of status. If there are multiple devices
reporting on dev ID #0 then the PING status of #0 will not change until
they have all been reprogrammed, so it will not automatically re-assert.

Anyway, I don't want to mix bugfixes with code improvements. If the loop
_could_ be removed that should be done separately from fixing the
interrupt handling bug.

>>   
>>   	/*
>>   	 * When there is more than one peripheral per link, it's
>> @@ -1001,8 +1012,7 @@ static void cdns_update_slave_status_work(struct work_struct *work)
>>   		}
>>   	}
>>   
>> -	/* clear and unmask Slave interrupt now */
>> -	cdns_writel(cdns, CDNS_MCP_INTSTAT, CDNS_MCP_INT_SLAVE_MASK);
>> +	/* unmask Slave interrupt now */
>>   	cdns_updatel(cdns, CDNS_MCP_INTMASK,
>>   		     CDNS_MCP_INT_SLAVE_MASK, CDNS_MCP_INT_SLAVE_MASK);
>>