[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4e06452a-080f-a2be-ab88-9ac992740ee0@codeaurora.org>
Date: Thu, 29 Jul 2021 01:45:45 -0700
From: Wesley Cheng <wcheng@...eaurora.org>
To: Felipe Balbi <balbi@...nel.org>,
Alan Stern <stern@...land.harvard.edu>
Cc: gregkh@...uxfoundation.org, linux-usb@...r.kernel.org,
linux-kernel@...r.kernel.org, jackp@...eaurora.org
Subject: Re: [PATCH] usb: dwc3: gadget: Use list_replace_init() before
traversing lists
Hi Felipe,
On 7/29/2021 1:09 AM, Felipe Balbi wrote:
>
> Hi,
>
> Wesley Cheng <wcheng@...eaurora.org> writes:
>
>> The list_for_each_entry_safe() macro saves the current item (n) and
>> the item after (n+1), so that n can be safely removed without
>> corrupting the list. However, when traversing the list and removing
>> items using gadget giveback, the DWC3 lock is briefly released,
>> allowing other routines to execute. There is a situation where, while
>> items are being removed from the cancelled_list using
>> dwc3_gadget_ep_cleanup_cancelled_requests(), the pullup disable
>> routine is running in parallel (due to UDC unbind). As the cleanup
>> routine removes n, and the pullup disable removes n+1, once the
>> cleanup retakes the DWC3 lock, it references a request who was already
>> removed/handled. With list debug enabled, this leads to a panic.
>> Ensure all instances of the macro are replaced where gadget giveback
>> is used.
>>
>> Example call stack:
>>
>> Thread#1:
>> __dwc3_gadget_ep_set_halt() - CLEAR HALT
>> -> dwc3_gadget_ep_cleanup_cancelled_requests()
>> ->list_for_each_entry_safe()
>> ->dwc3_gadget_giveback(n)
>> ->dwc3_gadget_del_and_unmap_request()- n deleted[cancelled_list]
>> ->spin_unlock
>> ->Thread#2 executes
>> ...
>> ->dwc3_gadget_giveback(n+1)
>> ->Already removed!
>>
>> Thread#2:
>> dwc3_gadget_pullup()
>> ->waiting for dwc3 spin_lock
>> ...
>> ->Thread#1 released lock
>> ->dwc3_stop_active_transfers()
>> ->dwc3_remove_requests()
>> ->fetches n+1 item from cancelled_list (n removed by Thread#1)
>> ->dwc3_gadget_giveback()
>> ->dwc3_gadget_del_and_unmap_request()- n+1
>> deleted[cancelled_list]
>> ->spin_unlock
>>
>> Fix this condition by utilizing list_replace_init(), and traversing
>> through a local copy of the current elements in the endpoint lists.
>> This will also set the parent list as empty, so if another thread is
>> also looping through the list, it will be empty on the next iteration.
>>
>> Fixes: d4f1afe5e896 ("usb: dwc3: gadget: move requests to cancelled_list")
>> Signed-off-by: Wesley Cheng <wcheng@...eaurora.org>
>>
>> ---
>> Previous patchset:
>> https://lore.kernel.org/linux-usb/1620716636-12422-1-git-send-email-wcheng@codeaurora.org/
>> ---
>> drivers/usb/dwc3/gadget.c | 18 ++++++++++++++++--
>> 1 file changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>> index a29a4ca..3ce6ed9 100644
>> --- a/drivers/usb/dwc3/gadget.c
>> +++ b/drivers/usb/dwc3/gadget.c
>> @@ -1926,9 +1926,13 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
>> {
>> struct dwc3_request *req;
>> struct dwc3_request *tmp;
>> + struct list_head local;
>> struct dwc3 *dwc = dep->dwc;
>>
>> - list_for_each_entry_safe(req, tmp, &dep->cancelled_list, list) {
>> +restart:
>> + list_replace_init(&dep->cancelled_list, &local);
>
> hmm, if the lock is held and IRQs disabled when this runs, then no other
> threads will be able to append requests to the list which makes the
> "restart" label unnecessary, no?
We do still call dwc3_gadget_giveback() which would release the lock
briefly, so if there was another thread waiting on dwc->lock, it would
be able to add additional items to that list.
>
> I wonder if we should release the lock and reenable interrupts after
> replacing the head. The problem is that
> dwc3_gadget_ep_cleanup_cancelled_requests() can run from the IRQ
> handler.
>
We would also need to consider that some of the APIs being called in
these situations would also have the assumption that the dwc->lock is
held, ie dwc3_gadget_giveback()
Thanks
Wesley Cheng
> Alan, could you provide your insight here? Do you think we should defer
> this to a low priority tasklet or something along those lines?
>
>> + list_for_each_entry_safe(req, tmp, &local, list) {
>> dwc3_gadget_ep_skip_trbs(dep, req);
>> switch (req->status) {
>> case DWC3_REQUEST_STATUS_DISCONNECTED:
>
>
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
Powered by blists - more mailing lists