lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51782599a01a6a22409d01e5fc1f8a50@codeaurora.org>
Date:   Mon, 30 Aug 2021 14:09:37 -0700
From:   rishabhb@...eaurora.org
To:     Cristian Marussi <cristian.marussi@....com>
Cc:     sudeep.holla@....com, linux-arm-kernel@...ts.infradead.org,
        linux-kernel@...r.kernel.org, avajid@...eaurora.org,
        adharmap@...eaurora.org
Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe
 fails

Hi Christian
There seems to be another issue here. The response from agent can be 
delayed causing a timeout during base protocol acquire,
which leads to the probe failure. What I have observed is sometimes the 
failure of probe and rx_callback (due to a delayed message)
happens at the same time on different cpus.
Because of this race, the device memory may be cleared while the 
interrupt(rx_callback) is executing on another cpu.
How do you propose we solve this? Do you think it is better to take the 
setting up of base and other protocols out of probe and
in some delayed work? That would imply the device memory is not released 
until remove is called. Or should we add locking to
the interrupt handler(scmi_rx_callback) and the cleanup in probe to 
avoid the race?

On 2021-08-05 03:54, Cristian Marussi wrote:
> On Wed, Aug 04, 2021 at 02:19:59PM -0700, Rishabh Bhatnagar wrote:
>> Mailbox channels for the base protocol are setup during probe.
>> There can be a scenario where probe fails to acquire the base
>> protocol due to a timeout leading to cleaning up of all device
>> managed memory including the scmi_mailbox structure setup during
>> mailbox_chan_setup function.
>> [   12.735104]arm-scmi soc:qcom,scmi: timed out in resp(caller: 
>> version_get+0x84/0x140)
>> [   12.735224]arm-scmi soc:qcom,scmi: unable to communicate with SCMI
>> [   12.735947]arm-scmi: probe of soc:qcom,scmi failed with error -110
>> 
>> Now when a message arrives at cpu slightly after the timeout, the 
>> mailbox
>> controller will try to call the rx_callback of the client and might 
>> end
>> up accessing freed memory.
>> [   12.758363][    C0] Call trace:
>> [   12.758367][    C0]  rx_callback+0x24/0x160
>> [   12.758372][    C0]  mbox_chan_received_data+0x44/0x94
>> [   12.758386][    C0]  __handle_irq_event_percpu+0xd4/0x240
>> This patch frees the mailbox channels setup during probe and adds some 
>> more
>> error handling in case the probe fails.
>> 
>> Signed-off-by: Rishabh Bhatnagar <rishabhb@...eaurora.org>
>> ---
>>  drivers/firmware/arm_scmi/driver.c | 35 
>> ++++++++++++++++++++++++-----------
>>  1 file changed, 24 insertions(+), 11 deletions(-)
>> 
>> diff --git a/drivers/firmware/arm_scmi/driver.c 
>> b/drivers/firmware/arm_scmi/driver.c
>> index 9b2e8d4..ead3bd3 100644
>> --- a/drivers/firmware/arm_scmi/driver.c
>> +++ b/drivers/firmware/arm_scmi/driver.c
>> @@ -1390,6 +1390,21 @@ void scmi_protocol_device_unrequest(const 
>> struct scmi_device_id *id_table)
>>  	mutex_unlock(&scmi_requested_devices_mtx);
>>  }
>> 
> 
> Hi,
> 
>> +static int scmi_cleanup_txrx_channels(struct scmi_info *info)
>> +{
>> +	int ret;
>> +	struct idr *idr = &info->tx_idr;
>> +
>> +	ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
>> +	idr_destroy(&info->tx_idr);
>> +
>> +	idr = &info->rx_idr;
>> +	ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
>> +	idr_destroy(&info->rx_idr);
>> +
>> +	return ret;
>> +}
>> +
>>  static int scmi_probe(struct platform_device *pdev)
>>  {
>>  	int ret;
>> @@ -1430,7 +1445,7 @@ static int scmi_probe(struct platform_device 
>> *pdev)
>> 
>>  	ret = scmi_xfer_info_init(info);
>>  	if (ret)
>> -		return ret;
>> +		goto clear_txrx_setup;
>> 
>>  	if (scmi_notification_init(handle))
>>  		dev_err(dev, "SCMI Notifications NOT available.\n");
>> @@ -1443,7 +1458,7 @@ static int scmi_probe(struct platform_device 
>> *pdev)
>>  	ret = scmi_protocol_acquire(handle, SCMI_PROTOCOL_BASE);
>>  	if (ret) {
>>  		dev_err(dev, "unable to communicate with SCMI\n");
>> -		return ret;
>> +		goto notification_exit;
>>  	}
>> 
>>  	mutex_lock(&scmi_list_mutex);
>> @@ -1482,6 +1497,12 @@ static int scmi_probe(struct platform_device 
>> *pdev)
>>  	}
>> 
>>  	return 0;
>> +
>> +notification_exit:
>> +	scmi_notification_exit(&info->handle);
>> +clear_txrx_setup:
>> +	scmi_cleanup_txrx_channels(info);
>> +	return ret;
>>  }
>> 
>>  void scmi_free_channel(struct scmi_chan_info *cinfo, struct idr *idr, 
>> int id)
>> @@ -1493,7 +1514,6 @@ static int scmi_remove(struct platform_device 
>> *pdev)
>>  {
>>  	int ret = 0, id;
>>  	struct scmi_info *info = platform_get_drvdata(pdev);
>> -	struct idr *idr = &info->tx_idr;
>>  	struct device_node *child;
>> 
>>  	mutex_lock(&scmi_list_mutex);
>> @@ -1517,14 +1537,7 @@ static int scmi_remove(struct platform_device 
>> *pdev)
>>  	idr_destroy(&info->active_protocols);
>> 
>>  	/* Safe to free channels since no more users */
>> -	ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
>> -	idr_destroy(&info->tx_idr);
>> -
>> -	idr = &info->rx_idr;
>> -	ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
>> -	idr_destroy(&info->rx_idr);
>> -
>> -	return ret;
>> +	return scmi_cleanup_txrx_channels(info);
>>  }
>> 
> 
> Looks good to me.
> 
> Reviewed-by: Cristian Marussi <cristian.marussi@....com>
> Tested-by: Cristian Marussi <cristian.marussi@....com>
> (Re-tested on for-next/scmi on top of virtio-scmi series)
> 
> Thanks,
> Cristian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ