linux-kernel - Re: [PATCH v2 1/2] bus: mhi: host: Add spinlock to protect WP access when queueing TREs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <00305327-d866-4da4-916c-fb414398bc3a@quicinc.com>
Date:   Tue, 7 Nov 2023 15:59:49 +0800
From:   Qiang Yu <quic_qianyu@...cinc.com>
To:     Manivannan Sadhasivam <mani@...nel.org>,
        Jeffrey Hugo <quic_jhugo@...cinc.com>
CC:     <mhi@...ts.linux.dev>, <linux-arm-msm@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, <quic_cang@...cinc.com>,
        <quic_mrana@...cinc.com>
Subject: Re: [PATCH v2 1/2] bus: mhi: host: Add spinlock to protect WP access
 when queueing TREs


On 11/6/2023 12:51 PM, Manivannan Sadhasivam wrote:
> On Fri, Oct 20, 2023 at 09:07:35AM -0600, Jeffrey Hugo wrote:
>> On 10/16/2023 2:46 AM, Qiang Yu wrote:
>>> On 9/29/2023 11:22 PM, Jeffrey Hugo wrote:
>>>> On 9/24/2023 9:10 PM, Qiang Yu wrote:
>>>>> On 9/22/2023 10:44 PM, Jeffrey Hugo wrote:
>>>>>> On 9/13/2023 2:47 AM, Qiang Yu wrote:
>>>>>>> From: Bhaumik Bhatt <bbhatt@...eaurora.org>
>>>>>>>
>>>>>>> Protect WP accesses such that multiple threads queueing buffers for
>>>>>>> incoming data do not race and access the same WP twice.
>>>>>>> Ensure read and
>>>>>>> write locks for the channel are not taken in succession
>>>>>>> by dropping the
>>>>>>> read lock from parse_xfer_event() such that a callback given to client
>>>>>>> can potentially queue buffers and acquire the write lock
>>>>>>> in that process.
>>>>>>> Any queueing of buffers should be done without channel
>>>>>>> read lock acquired
>>>>>>> as it can result in multiple locks and a soft lockup.
>>>>>>>
>>>>>>> Signed-off-by: Bhaumik Bhatt <bbhatt@...eaurora.org>
>>>>>>> Signed-off-by: Qiang Yu <quic_qianyu@...cinc.com>
>>>>>>> ---
>>>>>>>    drivers/bus/mhi/host/main.c | 11 ++++++++++-
>>>>>>>    1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c
>>>>>>> index dcf627b..13c4b89 100644
>>>>>>> --- a/drivers/bus/mhi/host/main.c
>>>>>>> +++ b/drivers/bus/mhi/host/main.c
>>>>>>> @@ -642,6 +642,7 @@ static int parse_xfer_event(struct
>>>>>>> mhi_controller *mhi_cntrl,
>>>>>>>                mhi_del_ring_element(mhi_cntrl, tre_ring);
>>>>>>>                local_rp = tre_ring->rp;
>>>>>>>    +            read_unlock_bh(&mhi_chan->lock);
>>>>>> This doesn't work due to the
>>>>>> write_lock_irqsave(&mhi_chan->lock, flags); on line 591.
>>>>> Write_lock_irqsave(&mhi_chan->lock, flags) is used in case of
>>>>> ev_code >= MHI_EV_CC_OOB. We only read_lock/read_unlock the
>>>>> mhi_chan while ev_code < MHI_EV_CC_OOB.
>>>> Sorry.  OOB != EOB
>>>>
>>>>>> I really don't like that we are unlocking the mhi_chan while
>>>>>> still using it.  It opens up a window where the mhi_chan
>>>>>> state can be updated between here and the client using the
>>>>>> callback to queue a buf.
>>>>>>
>>>>>> Perhaps we need a new lock that just protects the wp, and
>>>>>> needs to be only grabbed while mhi_chan->lock is held?
>>>>> Since we have employed mhi_chan lock to protect the channel and
>>>>> what we are concerned here is that client may queue buf to a
>>>>> disabled or stopped channel, can we check channel state after
>>>>> getting mhi_chan->lock like line 595.
>>>>>
>>>>> We can add the check after getting write lock in mhi_gen_tre()
>>>>> and after getting read lock again here.
>>>> I'm not sure that is sufficient.  After you unlock to notify the
>>>> client, MHI is going to manipulate the packet count and runtime_pm
>>>> without the lock (648-652).  It seems like that adds additional
>>>> races which won't be covered by the additional check you propose.
>>> I don't think read_lock_bh(&mhi_chan->lock) can protect runtime_pm and
>>> the packet count here. Even if we do not unlock, mhi state and packet
>>> count can still be changed because we did not get pm_lock here, which is
>>> used in all mhi state transition function.
>>>
>>> I also checked all places that mhi_chan->lock is grabbed, did not see
>>> packet count and runtime_pm be protected by write_lock(&mhi_chan->lock).
>>>
>>>
>>> If you really don't like the unlock operation, we can also take a new
>>> lock. But I think we only need to add the new lock in two places,
>>> mhi_gen_tre and mhi_pm_m0_transition while mhi_chan->lock is held.
>> Mani, if I recall correctly, you were the architect of the locking.  Do you
>> have an opinion?
>>
> TBH, the locking situation is a mess with MHI. Initially, we happen to have
> separate locks for protecting various operations, but then during review, it was
> advised to reuse existing locks and avoid having too many separate locks.
>
> This worked well but then we kind of abused the locks over time. I asked Hemant
> and Bhaumik to audit the locks and fix them, but both of them left Qcom.
>
> So in this situation, the intent of the pm_lock was to protect concurrent access
> against updating the pm_state. And it also happen to protect _other_things_ such
> as runtime_put, pending_pkts etc... But not properly, because most of the time
> read lock is taken in places where pm_state is being read. So there is still a
> possibility of race while accessing these _other_things_.
>
> For this patch, I'm happy with dropping chan->lock before calling xfer_cb() and
> I want someone (maybe Qiang) to do the audit of locking in general and come up
> with fixes where needed.
>
> - Mani

As discussed with Jeff before, we also need to check channel state 
before queue buffer and after re-lock

in parse_xfer_event, so I also add the channel state check in next 
version patch.

Probably I can do the audit of locking. It's a good chance for me to 
understand various locks in MHI host

driver completely.