linux-kernel - Re: [RFC PATCH v2 1/3] usb: dwc3: Flush pending SETUP data during stop active xfers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6d3470af-5d75-a989-dc21-00f42c022af0@quicinc.com>
Date:   Thu, 24 Mar 2022 11:53:20 -0700
From:   Wesley Cheng <quic_wcheng@...cinc.com>
To:     Thinh Nguyen <Thinh.Nguyen@...opsys.com>,
        Wesley Cheng <wcheng@...eaurora.org>,
        "balbi@...nel.org" <balbi@...nel.org>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>
CC:     "linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "quic_jackp@...cinc.com" <quic_jackp@...cinc.com>
Subject: Re: [RFC PATCH v2 1/3] usb: dwc3: Flush pending SETUP data during
 stop active xfers

Hi Thinh,

On 3/23/2022 5:39 PM, Thinh Nguyen wrote:
> Hi Wesley,
> 
> Wesley Cheng wrote:
>> Hi Thinh,
>>
>> On 3/22/2022 6:31 PM, Thinh Nguyen wrote:
>>> Hi Wesley,
>>>
>>> Wesley Cheng wrote:
>>>> Hi Thinh,
>>>>
>>>> On 2/15/2022 4:08 PM, Wesley Cheng wrote:
>>>>> While running the pullup disable sequence, if there are pending SETUP
>>>>> transfers stored in the controller, then the ENDTRANSFER commands on non
>>>>> control eps will fail w/ ETIMEDOUT.  As a suggestion from SNPS, in order
>>>>> to drain potentially cached SETUP packets, SW needs to issue a
>>>>> STARTTRANSFER command.  After issuing the STARTTRANSFER, and retrying the
>>>>> ENDTRANSFER, the command should succeed.  Else, if the endpoints are not
>>>>> properly stopped, the controller halt sequence will fail as well.
>>>>>
>>>>> One limitation is that the current logic will drop the SETUP data
>>>>> being received (ie dropping the SETUP packet), however, it should be
>>>>> acceptable in the pullup disable case, as the device is eventually
>>>>> going to disconnect from the host.
>>>>>
>>>>> Signed-off-by: Wesley Cheng <quic_wcheng@...cinc.com>
>>>>> ---
>>>> Just wondering if you had any inputs for this particular change?  I
>>>> think on the dequeue path discussion you had some concerns with parts of
>>>> this change?  I think the difficult part for the pullup disable path is
>>>> that we have this API running w/ interrupts disabled, so the EP0 state
>>>> won't be able to advance compared to the dequeue case.
>>>
>>> This doesn't sound right. The pullup disable (or device initiated
>>> disconnect) should wait for the EP0 state to be EP0_SETUP_PHASE before
>>> proceeding, which it does.
>>>
>> That is correct, but even though that check is passed, it doesn't
>> prevent the host from sending another SETUP token between the pending
>> setup packet check and run/stop clear.
>>
> 
> That should be fine, because we would have the TRB ready for that SETUP
> packet.
> 
I agree, its valid for the controller to be able to receive the next
setup packet.

>>>>
>>>> Ideally, if there is a way to ensure that we avoid reading further setup
>>>> packets, that would be nice, but from our discussions with Synopsys,
>>>> this was not possible. (controller is always armed and ready to ACK
>>>> setup tokens)
>>>>
>>>> I did evaluate keeping IRQs enabled and periodically releasing/attaining
>>>> the lock between the stop active xfer calls, but that opened another can
>>>> of worms.  If you think this is the approach we should take, I can take
>>>> a look at this implementation further.
>>>>
>>>
>>> This patch doesn't look right to me. The change I suggested before
>>> should address this (I believe Greg already picked it up). What other
>>> problem do you see, I'm not clear what's the problem here. One potential
>>> problem that I can see is that currently dwc3 driver doesn't wait for
>>> active endpoints to complete/end before clearing the run_stop bit on
>>> device initiated disconnect, but I'm not sure if that's what you're seeing.
>>>
>>> Please help clarify further. If there's trace points log, that'd help.
>>>
>> Main difference between the issue Greg recently pulled in and this one
>> is that this is on the pullup disable patch (no dequeue involved).  In
>> this situation we will also stop active transfers, so that the
>> controller can halt properly.
>>
>> I attached a few files, and will give a summary of them below:
>> 1.  pullup_disable_timeout.txt - ftrace of an instance of when we see
>> several endxfer timeouts.  Refer to line#2016.
>>
>> 2.  lecroy+ftrace_snip.png - picture showing a matching bus sniffer log
>> and a ftrace collected (difference instance to #1)
>>
>> #2 will show that we completed a SETUP transfer before entering into
>> dwc3_gadget_stop_active_transfers().  In here, we then see DWC ACK
>> another SETUP token, which leads to endxfer timeouts on all subsequent
>> endpoints.
> 
> Thanks for the captures. I think the problem here is because the dwc3
> driver disables the control endpoint. It shouldn't do that.
> 
> Can you try this:
> 
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index ab725d2262d6..f0b9ea353620 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -1010,12 +1010,12 @@ static int __dwc3_gadget_ep_disable(struct
> dwc3_ep *dep)
>         if (dep->flags & DWC3_EP_STALL)
>                 __dwc3_gadget_ep_set_halt(dep, 0, false);
> 
> -       reg = dwc3_readl(dwc->regs, DWC3_DALEPENA);
> -       reg &= ~DWC3_DALEPENA_EP(dep->number);
> -       dwc3_writel(dwc->regs, DWC3_DALEPENA, reg);
> -
> -       /* Clear out the ep descriptors for non-ep0 */
>         if (dep->number > 1) {
> +               reg = dwc3_readl(dwc->regs, DWC3_DALEPENA);
> +               reg &= ~DWC3_DALEPENA_EP(dep->number);
> +               dwc3_writel(dwc->regs, DWC3_DALEPENA, reg);
> +
> +               /* Clear out the ep descriptors for non-ep0 */
>                 dep->endpoint.comp_desc = NULL;
>                 dep->endpoint.desc = NULL;
>         }
> 
I was able to reproduce the endxfer timeout w/ the above change.  I'm
assuming you didn't want me to include any parts of my change while
testing, right?

Current sequence in dwc3_gadget_pullup(0) is that we should call
dwc3_stop_active_transfers() before we call __dwc3_gadget_stop().
(gadget stop will call the ep disable for EP[0] and EP[1])  This might
be why the issue would still be occurring.

The attached ftrace that captures a situation where a forced BUG will
occur on the first instance of the timeout.

[ 1573.687689437       0x724885c22]   event (000020c0): ep0out: Transfer
Not Ready [0] (Not Active) [Status Phase]
[ 1573.687738707       0x724885fd3]   dbg_send_ep_cmd: ep0out: cmd
'Start Transfer' [406] params 00000000 efffa000 00000000 --> status:
Successful
[ 1573.698040582       0x7248b6477]   dbg_send_ep_cmd: ep1out: cmd 'End
Transfer' [20c08] params 00000000 00000000 00000000 --> status: Timed Out

We definitely started the STATUS phase, so host could have read it and
sent the next SETUP packet while we were already in the
dwc3_stop_active_transfers() loop.

Thanks
Wesley Cheng
View attachment "a600000.ssusb.trace.txt" of type "text/plain" (166829 bytes)