linux-kernel - Re: [PATCH v1] misc: fastrpc: Trigger a panic using BUG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7eab4618-9173-44f5-a185-0071f3893cc7@quicinc.com>
Date: Mon, 5 Aug 2024 16:36:28 +0530
From: Abhishek Singh <quic_abhishes@...cinc.com>
To: Greg KH <gregkh@...uxfoundation.org>
CC: <srinivas.kandagatla@...aro.org>, <linux-arm-msm@...r.kernel.org>,
        <quic_bkumar@...cinc.com>, <linux-kernel@...r.kernel.org>,
        <quic_ktadakam@...cinc.com>, <quic_chennak@...cinc.com>,
        <dri-devel@...ts.freedesktop.org>
Subject: Re: [PATCH v1] misc: fastrpc: Trigger a panic using BUG_ON in device
 release


On 7/30/2024 12:46 PM, Greg KH wrote:
> On Tue, Jul 30, 2024 at 12:39:45PM +0530, Abhishek Singh wrote:
>> The user process on ARM closes the device node while closing the
>> session, triggers a remote call to terminate the PD running on the
>> DSP. If the DSP is in an unstable state and cannot process the remote
>> request from the HLOS, glink fails to deliver the kill request to the
>> DSP, resulting in a timeout error. Currently, this error is ignored,
>> and the session is closed, causing all the SMMU mappings associated
>> with that specific PD to be removed. However, since the PD is still
>> operational on the DSP, any attempt to access these SMMU mappings
>> results in an SMMU fault, leading to a panic.  As the SMMU mappings
>> have already been removed, there is no available information on the
>> DSP to determine the root cause of its unresponsiveness to remote
>> calls. As the DSP is unresponsive to all process remote calls, use
>> BUG_ON to prevent the removal of SMMU mappings and to properly
>> identify the root cause of the DSP’s unresponsiveness to the remote
>> calls.
>>
>> Signed-off-by: Abhishek Singh <quic_abhishes@...cinc.com>
>> ---
>>  drivers/misc/fastrpc.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
>> index 5204fda51da3..bac9c749564c 100644
>> --- a/drivers/misc/fastrpc.c
>> +++ b/drivers/misc/fastrpc.c
>> @@ -97,6 +97,7 @@
>>  #define FASTRPC_RMID_INIT_CREATE_STATIC	8
>>  #define FASTRPC_RMID_INIT_MEM_MAP      10
>>  #define FASTRPC_RMID_INIT_MEM_UNMAP    11
>> +#define PROCESS_KILL_SC 0x01010000
>>  
>>  /* Protection Domain(PD) ids */
>>  #define ROOT_PD		(0)
>> @@ -1128,6 +1129,9 @@ static int fastrpc_invoke_send(struct fastrpc_session_ctx *sctx,
>>  	fastrpc_context_get(ctx);
>>  
>>  	ret = rpmsg_send(cctx->rpdev->ept, (void *)msg, sizeof(*msg));
>> +	/* trigger panic if glink communication is broken and the message is for PD kill */
>> +	BUG_ON((ret == -ETIMEDOUT) && (handle == FASTRPC_INIT_HANDLE) &&
>> +			(ctx->sc == PROCESS_KILL_SC));
> 
> You just crashed the machine completely, sorry, but no, properly handle
> the issue and clean up if you can detect it, do not break systems.
> 
But the Glink communication with DSP is already broken; we cannot communicate with the DSP.
The system will crash if we proceed with cleanup on the ARM side. If we don’t do cleanup,
a resource leak will occur. Eventually, the system will become dead. That’s why I am
crashing the device.
> greg k-h