linux-kernel - Re: [PATCH v1] misc: fastrpc: Trigger a panic using BUG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c575a775-1596-41d3-a4c4-b356406d7819@linaro.org>
Date: Mon, 19 Aug 2024 13:00:14 +0200
From: Caleb Connolly <caleb.connolly@...aro.org>
To: Abhishek Singh <quic_abhishes@...cinc.com>,
 srinivas.kandagatla@...aro.org, linux-arm-msm@...r.kernel.org
Cc: gregkh@...uxfoundation.org, quic_bkumar@...cinc.com,
 linux-kernel@...r.kernel.org, quic_ktadakam@...cinc.com,
 quic_chennak@...cinc.com, dri-devel@...ts.freedesktop.org
Subject: Re: [PATCH v1] misc: fastrpc: Trigger a panic using BUG_ON in device
 release

Hi Abishek,

On 30/07/2024 09:09, Abhishek Singh wrote:
> The user process on ARM closes the device node while closing the
> session, triggers a remote call to terminate the PD running on the
> DSP. If the DSP is in an unstable state and cannot process the remote
> request from the HLOS, glink fails to deliver the kill request to the
> DSP, resulting in a timeout error. Currently, this error is ignored,
> and the session is closed, causing all the SMMU mappings associated
> with that specific PD to be removed. However, since the PD is still
> operational on the DSP, any attempt to access these SMMU mappings
> results in an SMMU fault, leading to a panic.  As the SMMU mappings
> have already been removed, there is no available information on the
> DSP to determine the root cause of its unresponsiveness to remote
> calls. As the DSP is unresponsive to all process remote calls, use
> BUG_ON to prevent the removal of SMMU mappings and to properly
> identify the root cause of the DSP’s unresponsiveness to the remote
> calls.

Could you elaborate a little about what contexts this can happen? What 
SoC/board are you hitting this on? Is it running pre-production firmware?

Since fastrpc is not at all required for basic functionality of the 
device, maybe it would be better to print an error here and then prevent 
trying to open new connections to the DSP?

Kind regards,
> 
> Signed-off-by: Abhishek Singh <quic_abhishes@...cinc.com>
> ---
>   drivers/misc/fastrpc.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
> index 5204fda51da3..bac9c749564c 100644
> --- a/drivers/misc/fastrpc.c
> +++ b/drivers/misc/fastrpc.c
> @@ -97,6 +97,7 @@
>   #define FASTRPC_RMID_INIT_CREATE_STATIC	8
>   #define FASTRPC_RMID_INIT_MEM_MAP      10
>   #define FASTRPC_RMID_INIT_MEM_UNMAP    11
> +#define PROCESS_KILL_SC 0x01010000
>   
>   /* Protection Domain(PD) ids */
>   #define ROOT_PD		(0)
> @@ -1128,6 +1129,9 @@ static int fastrpc_invoke_send(struct fastrpc_session_ctx *sctx,
>   	fastrpc_context_get(ctx);
>   
>   	ret = rpmsg_send(cctx->rpdev->ept, (void *)msg, sizeof(*msg));
> +	/* trigger panic if glink communication is broken and the message is for PD kill */
> +	BUG_ON((ret == -ETIMEDOUT) && (handle == FASTRPC_INIT_HANDLE) &&
> +			(ctx->sc == PROCESS_KILL_SC));
>   
>   	if (ret)
>   		fastrpc_context_put(ctx);

-- 
// Caleb (they/them)