[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <11e57078-78c7-6f99-8633-e5e945330550@opensource.cirrus.com>
Date: Tue, 13 Sep 2022 16:30:07 +0100
From: Richard Fitzgerald <rf@...nsource.cirrus.com>
To: Pierre-Louis Bossart <pierre-louis.bossart@...ux.intel.com>,
<vkoul@...nel.org>, <yung-chuan.liao@...ux.intel.com>,
<sanyog.r.kale@...el.com>
CC: <patches@...nsource.cirrus.com>, <alsa-devel@...a-project.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 5/5] soundwire: bus: Don't exit early if no device IDs
were programmed
On 12/09/2022 18:09, Pierre-Louis Bossart wrote:
>
>
> On 9/12/22 14:25, Richard Fitzgerald wrote:
>> On 12/09/2022 12:43, Pierre-Louis Bossart wrote:
>>>
>>>
>>> On 9/7/22 10:52, Richard Fitzgerald wrote:
>>>> Only exit sdw_handle_slave_status() right after calling
>>>> sdw_program_device_num() if it actually programmed an ID into at
>>>> least one device.
>>>>
>>>> sdw_handle_slave_status() should protect itself against phantom
>>>> device #0 ATTACHED indications. In that case there is no actual
>>>> device still on #0. The early exit relies on there being a status
>>>> change to ATTACHED on the reprogrammed device to trigger another
>>>> call to sdw_handle_slave_status() which will then handle the status
>>>> of all peripherals. If no device was actually programmed with an
>>>> ID there won't be a new ATTACHED indication. This can lead to the
>>>> status of other peripherals not being handled.
>>>>
>>>> The status passed to sdw_handle_slave_status() is obviously always
>>>> from a point of time in the past, and may indicate accumulated
>>>> unhandled events (depending how the bus manager operates). It's
>>>> possible that a device ID is reprogrammed but the last PING status
>>>> captured state just before that, when it was still reporting on
>>>> ID #0. Then sdw_handle_slave_status() is called with this PING info,
>>>> just before a new PING status is available showing it now on its new
>>>> ID. So sdw_handle_slave_status() will receive a phantom report of a
>>>> device on #0, but it will not find one.
>>>>
>>>> Signed-off-by: Richard Fitzgerald <rf@...nsource.cirrus.com>
>>>> ---
>>>> drivers/soundwire/bus.c | 27 +++++++++++++++------------
>>>> 1 file changed, 15 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
>>>> index 6e569a875a9b..0bcc2d161eb9 100644
>>>> --- a/drivers/soundwire/bus.c
>>>> +++ b/drivers/soundwire/bus.c
>>>> @@ -736,20 +736,19 @@ static int sdw_program_device_num(struct
>>>> sdw_bus *bus)
>>>> struct sdw_slave_id id;
>>>> struct sdw_msg msg;
>>>> bool found;
>>>> - int count = 0, ret;
>>>> + int count = 0, num_programmed = 0, ret;
>>>> u64 addr;
>>>> /* No Slave, so use raw xfer api */
>>>> ret = sdw_fill_msg(&msg, NULL, SDW_SCP_DEVID_0,
>>>> SDW_NUM_DEV_ID_REGISTERS, 0, SDW_MSG_FLAG_READ, buf);
>>>> if (ret < 0)
>>>> - return ret;
>>>> + return 0;
>>>
>>> this doesn't seem quite right to me, there are multiple -EINVAL cases
>>> handled in sdw_fill_msg().
>>>
>>> I didn't check if all these error cases are irrelevant in that specific
>>> enumeration case, if that was the case maybe we need to break that
>>> function in two helpers so that all the checks can be skipped.
>>>
>>
>> I don't think that there's anything useful that
>> sdw_modify_slave_status() could do to recover from an error.
>>
>> If any device IDs were programmed then, according to the statement in
>> sdw_modify_slave_status()
>>
>> * programming a device number will have side effects,
>> * so we deal with other devices at a later time
>>
>> if this is true, then we need to exit to deal with what _was_
>> programmed, even if one of them failed.
>>
>> If nothing was programmed, and there was an error, we can't bail out of
>> sdw_modify_slave_status(). We have status for other devices which
>> we can't simply ignore.
>>
>> Ultimately I can't see how pushing the error code up is useful.
>> sdw_modify_slave_status() can't really do any effective recovery action,
>> and the original behavior of giving up and returning means that
>> an error in programming dev ID potentially causes collateral damage to
>> the status of other peripherals.
>
> I was suggesting something like
>
>
> void sdw_fill_msg_data(...)
> {
> copy data in the msg structure
> }
>
> int sdw_fill_msg(...)
> {
> sdw_fill_msg_data();
> handle_error_cases
> }
>
> and in sdw sdw_program_device_num() we call directly sdw_fill_msg_data()
>
> So no change in functionality beyond explicit skip of error checks that
> are not relevant and cannot be handled even if they were.
>
sdw_fill_msg() will never report an error during
sdw_program_device_num() because the first check is to return if
the address doesn't need paging, and sdw_program_device_num() only
accesses SCP registers.
I don't want to mix coding improvements with bugfixes. Splitting
sdw_fill_msg() isn't needed to fix this bug.
>
>
Powered by blists - more mailing lists