[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <abcd7ea3-086b-b07c-bb08-63aaf2585040@quicinc.com>
Date: Mon, 31 Jul 2023 17:36:17 -0700
From: Chris Lew <quic_clew@...cinc.com>
To: Pavan Kondeti <quic_pkondeti@...cinc.com>,
Praveenkumar I <quic_ipkumar@...cinc.com>
CC: <agross@...nel.org>, <andersson@...nel.org>,
<konrad.dybcio@...aro.org>, <linux-arm-msm@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <quic_varada@...cinc.com>,
<quic_srichara@...cinc.com>
Subject: Re: [PATCH] soc: qcom: qmi: Signal the txn completion after releasing
the mutex
On 7/31/2023 8:19 AM, Pavan Kondeti wrote:
> On Mon, Jul 31, 2023 at 06:37:55PM +0530, Praveenkumar I wrote:
>> txn is in #1 stack
>>
>> Worker #1 Worker #2
>> ******** *********
>>
>> qmi_txn_wait(txn) qmi_handle_message
>> | |
>> | |
>> wait_for_complete(txn->complete) ....
>> | mutex_lock(txn->lock)
>> | |
>> mutex_lock(txn->lock) |
>> ..... complete(txn->lock)
>> | mutex_unlock(txn->lock)
>> |
>> mutex_unlock(txn->lock)
>>
>> In this case above, while #2 is doing the mutex_unlock(txn->lock),
>> in between releasing lock and doing other lock related wakeup, #2 gets
>> scheduled out. As a result #1, acquires the lock, unlocks, also
>> frees the txn also (where the lock resides)
>>
>> Now #2, gets scheduled again and tries to do the rest of the lock
>> related wakeup, but lock itself is invalid because txn itself is gone.
>>
>> Fixing this, by doing the mutex_unlock(txn->lock) first and then
>> complete(txn->lock) in #2
>>
>> Fixes: 3830d0771ef6 ("soc: qcom: Introduce QMI helpers")
>> Cc: stable@...r.kernel.org
>> Signed-off-by: Sricharan Ramabadhran <quic_srichara@...cinc.com>
>> Signed-off-by: Praveenkumar I <quic_ipkumar@...cinc.com>
>> ---
>> drivers/soc/qcom/qmi_interface.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/soc/qcom/qmi_interface.c b/drivers/soc/qcom/qmi_interface.c
>> index 78d7361fdcf2..92e29db97359 100644
>> --- a/drivers/soc/qcom/qmi_interface.c
>> +++ b/drivers/soc/qcom/qmi_interface.c
>> @@ -505,12 +505,13 @@ static void qmi_handle_message(struct qmi_handle *qmi,
>> pr_err("failed to decode incoming message\n");
>>
>> txn->result = ret;
>> - complete(&txn->completion);
>> } else {
>> qmi_invoke_handler(qmi, sq, txn, buf, len);
>> }
>>
>> mutex_unlock(&txn->lock);
>> + if (txn->dest && txn->ei)
>> + complete(&txn->completion);
>> } else {
>> /* Create a txn based on the txn_id of the incoming message */
>> memset(&tmp_txn, 0, sizeof(tmp_txn));
>
> What happens in a remote scenario where the waiter gets timed out at the
> very same time you are releasing the mutex but before calling
> complete()? The caller might end up freeing txn structure and it results
> in the same issue you are currently facing.
>
> Thanks,
> Pavan
I think downstream we had various attempts of moving the signal around
trying to avoid this, but hit scenarios like the one Pavan described.
We eventually settled on removing the txn->lock and treating the
qmi->txn_lock as a big lock. This remedied the issue where the txn->lock
goes out of scope since qmi->txn_lock is tied to the qmi handle.
Powered by blists - more mailing lists