linux-kernel - Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51782599a01a6a22409d01e5fc1f8a50@codeaurora.org>
Date:   Mon, 30 Aug 2021 14:09:37 -0700
From:   rishabhb@...eaurora.org
To:     Cristian Marussi <cristian.marussi@....com>
Cc:     sudeep.holla@....com, linux-arm-kernel@...ts.infradead.org,
        linux-kernel@...r.kernel.org, avajid@...eaurora.org,
        adharmap@...eaurora.org
Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe
 fails

Hi Christian
There seems to be another issue here. The response from agent can be 
delayed causing a timeout during base protocol acquire,
which leads to the probe failure. What I have observed is sometimes the 
failure of probe and rx_callback (due to a delayed message)
happens at the same time on different cpus.
Because of this race, the device memory may be cleared while the 
interrupt(rx_callback) is executing on another cpu.
How do you propose we solve this? Do you think it is better to take the 
setting up of base and other protocols out of probe and
in some delayed work? That would imply the device memory is not released 
until remove is called. Or should we add locking to
the interrupt handler(scmi_rx_callback) and the cleanup in probe to 
avoid the race?

On 2021-08-05 03:54, Cristian Marussi wrote:
> On Wed, Aug 04, 2021 at 02:19:59PM -0700, Rishabh Bhatnagar wrote:
>> Mailbox channels for the base protocol are setup during probe.
>> There can be a scenario where probe fails to acquire the base
>> protocol due to a timeout leading to cleaning up of all device
>> managed memory including the scmi_mailbox structure setup during
>> mailbox_chan_setup function.
>> [   12.735104]arm-scmi soc:qcom,scmi: timed out in resp(caller: 
>> version_get+0x84/0x140)
>> [   12.735224]arm-scmi soc:qcom,scmi: unable to communicate with SCMI
>> [   12.735947]arm-scmi: probe of soc:qcom,scmi failed with error -110
>> 
>> Now when a message arrives at cpu slightly after the timeout, the 
>> mailbox
>> controller will try to call the rx_callback of the client and might 
>> end
>> up accessing freed memory.
>> [   12.758363][    C0] Call trace:
>> [   12.758367][    C0]  rx_callback+0x24/0x160
>> [   12.758372][    C0]  mbox_chan_received_data+0x44/0x94
>> [   12.758386][    C0]  __handle_irq_event_percpu+0xd4/0x240
>> This patch frees the mailbox channels setup during probe and adds some 
>> more
>> error handling in case the probe fails.
>> 
>> Signed-off-by: Rishabh Bhatnagar <rishabhb@...eaurora.org>
>> ---
>>  drivers/firmware/arm_scmi/driver.c | 35 
>> ++++++++++++++++++++++++-----------
>>  1 file changed, 24 insertions(+), 11 deletions(-)
>> 
>> diff --git a/drivers/firmware/arm_scmi/driver.c 
>> b/drivers/firmware/arm_scmi/driver.c
>> index 9b2e8d4..ead3bd3 100644
>> --- a/drivers/firmware/arm_scmi/driver.c
>> +++ b/drivers/firmware/arm_scmi/driver.c
>> @@ -1390,6 +1390,21 @@ void scmi_protocol_device_unrequest(const 
>> struct scmi_device_id *id_table)
>>  	mutex_unlock(&scmi_requested_devices_mtx);
>>  }
>> 
> 
> Hi,
> 
>> +static int scmi_cleanup_txrx_channels(struct scmi_info *info)
>> +{
>> +	int ret;
>> +	struct idr *idr = &info->tx_idr;
>> +
>> +	ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
>> +	idr_destroy(&info->tx_idr);
>> +
>> +	idr = &info->rx_idr;
>> +	ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
>> +	idr_destroy(&info->rx_idr);
>> +
>> +	return ret;
>> +}
>> +
>>  static int scmi_probe(struct platform_device *pdev)
>>  {
>>  	int ret;
>> @@ -1430,7 +1445,7 @@ static int scmi_probe(struct platform_device 
>> *pdev)
>> 
>>  	ret = scmi_xfer_info_init(info);
>>  	if (ret)
>> -		return ret;
>> +		goto clear_txrx_setup;
>> 
>>  	if (scmi_notification_init(handle))
>>  		dev_err(dev, "SCMI Notifications NOT available.\n");
>> @@ -1443,7 +1458,7 @@ static int scmi_probe(struct platform_device 
>> *pdev)
>>  	ret = scmi_protocol_acquire(handle, SCMI_PROTOCOL_BASE);
>>  	if (ret) {
>>  		dev_err(dev, "unable to communicate with SCMI\n");
>> -		return ret;
>> +		goto notification_exit;
>>  	}
>> 
>>  	mutex_lock(&scmi_list_mutex);
>> @@ -1482,6 +1497,12 @@ static int scmi_probe(struct platform_device 
>> *pdev)
>>  	}
>> 
>>  	return 0;
>> +
>> +notification_exit:
>> +	scmi_notification_exit(&info->handle);
>> +clear_txrx_setup:
>> +	scmi_cleanup_txrx_channels(info);
>> +	return ret;
>>  }
>> 
>>  void scmi_free_channel(struct scmi_chan_info *cinfo, struct idr *idr, 
>> int id)
>> @@ -1493,7 +1514,6 @@ static int scmi_remove(struct platform_device 
>> *pdev)
>>  {
>>  	int ret = 0, id;
>>  	struct scmi_info *info = platform_get_drvdata(pdev);
>> -	struct idr *idr = &info->tx_idr;
>>  	struct device_node *child;
>> 
>>  	mutex_lock(&scmi_list_mutex);
>> @@ -1517,14 +1537,7 @@ static int scmi_remove(struct platform_device 
>> *pdev)
>>  	idr_destroy(&info->active_protocols);
>> 
>>  	/* Safe to free channels since no more users */
>> -	ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
>> -	idr_destroy(&info->tx_idr);
>> -
>> -	idr = &info->rx_idr;
>> -	ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
>> -	idr_destroy(&info->rx_idr);
>> -
>> -	return ret;
>> +	return scmi_cleanup_txrx_channels(info);
>>  }
>> 
> 
> Looks good to me.
> 
> Reviewed-by: Cristian Marussi <cristian.marussi@....com>
> Tested-by: Cristian Marussi <cristian.marussi@....com>
> (Re-tested on for-next/scmi on top of virtio-scmi series)
> 
> Thanks,
> Cristian