linux-kernel - Re: [PATCH v6 3/4] bus: mhi: core: Process execution environment changes serially

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <899da888-321a-c228-8537-b72821700dc7@quicinc.com>
Date:   Mon, 23 Aug 2021 12:43:31 -0600
From:   Jeffrey Hugo <quic_jhugo@...cinc.com>
To:     Bhaumik Bhatt <bbhatt@...eaurora.org>,
        <manivannan.sadhasivam@...aro.org>
CC:     <linux-arm-msm@...r.kernel.org>, <hemantk@...eaurora.org>,
        <linux-kernel@...r.kernel.org>, <loic.poulain@...aro.org>,
        <carl.yin@...ctel.com>, <naveen.kumar@...ctel.com>
Subject: Re: [PATCH v6 3/4] bus: mhi: core: Process execution environment
 changes serially

On 2/24/2021 4:23 PM, Bhaumik Bhatt wrote:
> In current design, whenever the BHI interrupt is fired, the
> execution environment is updated. This can cause race conditions
> and impede ongoing power up/down processing. For example, if a
> power down is in progress, MHI host updates to a local "disabled"
> execution environment. If a BHI interrupt fires later, that value
> gets replaced with one from the BHI EE register. This impacts the
> controller as it does not expect multiple RDDM execution
> environment change status callbacks as an example. Another issue
> would be that the device can enter mission mode and the execution
> environment is updated, while device creation for SBL channels is
> still going on due to slower PM state worker thread run, leading
> to multiple attempts at opening the same channel.
> 
> Ensure that EE changes are handled only from appropriate places
> and occur one after another and handle only PBL modes or RDDM EE
> changes as critical events directly from the interrupt handler.
> Simplify handling by waiting for SYS ERROR before handling RDDM.
> This also makes sure that we use the correct execution environment
> to notify the controller driver when the device resets to one of
> the PBL execution environments.
> 
> Signed-off-by: Bhaumik Bhatt <bbhatt@...eaurora.org>

<snip>

> @@ -452,27 +451,30 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, void *priv)
>   	}
>   	write_unlock_irq(&mhi_cntrl->pm_lock);
>   
> -	 /* If device supports RDDM don't bother processing SYS error */
> -	if (mhi_cntrl->rddm_image) {
> -		/* host may be performing a device power down already */
> -		if (!mhi_is_active(mhi_cntrl))
> -			goto exit_intvec;
> +	if (pm_state != MHI_PM_SYS_ERR_DETECT || ee == mhi_cntrl->ee)
> +		goto exit_intvec;
>   
> -		if (mhi_cntrl->ee == MHI_EE_RDDM && mhi_cntrl->ee != ee) {
> +	switch (ee) {
> +	case MHI_EE_RDDM:
> +		/* proceed if power down is not already in progress */
> +		if (mhi_cntrl->rddm_image && mhi_is_active(mhi_cntrl)) {
>   			mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_EE_RDDM);
> +			mhi_cntrl->ee = ee;
>   			wake_up_all(&mhi_cntrl->state_event);
>   		}
> -		goto exit_intvec;
> -	}
> -
> -	if (pm_state == MHI_PM_SYS_ERR_DETECT) {
> +		break;
> +	case MHI_EE_PBL:
> +	case MHI_EE_EDL:
> +	case MHI_EE_PTHRU:
> +		mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_FATAL_ERROR);
> +		mhi_cntrl->ee = ee;
>   		wake_up_all(&mhi_cntrl->state_event);
> -
> -		/* For fatal errors, we let controller decide next step */
> -		if (MHI_IN_PBL(ee))
> -			mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_FATAL_ERROR);
> -		else
> -			mhi_pm_sys_err_handler(mhi_cntrl);
> +		mhi_pm_sys_err_handler(mhi_cntrl);
> +		break;
> +	default:
> +		wake_up_all(&mhi_cntrl->state_event);
> +		mhi_pm_sys_err_handler(mhi_cntrl);
> +		break;
>   	}

Bhaumik, can you explain the above change?  Before this patch (which is 
now committed), if there was a fatal error, the controller was notified 
(MHI_CB_FATAL_ERROR) and it decided all action.  After this patch, the 
controller is notified, but also the core attempts to handle the syserr.

This is a change in behavior, and seems to make a mess of the 
controller, and possibly the core fighting each other.

Specifically, I'm rebasing the AIC100 driver onto 5.13, which has this 
change, and I'm seeing a serious regression.  I'm thinking that for the 
PBL/EDL/PTHRU case, mhi_pm_sys_err_handler() should not be called.

Thoughts?