linux-kernel - Re: More dwc3 gadget issues with adb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALAqxLUdzKRV6nrcLpWsykK+WPnqhUK4iwRe4_Xmo-TvEV5KOg@mail.gmail.com>
Date:   Tue, 21 Apr 2020 21:38:47 -0700
From:   John Stultz <john.stultz@...aro.org>
To:     Felipe Balbi <balbi@...nel.org>
Cc:     Josh Gao <jmgao@...gle.com>, YongQin Liu <yongqin.liu@...aro.org>,
        Anurag Kumar Vulisha <anurag.kumar.vulisha@...inx.com>,
        Yang Fei <fei.yang@...el.com>,
        Thinh Nguyen <thinhn@...opsys.com>,
        Tejas Joglekar <tejas.joglekar@...opsys.com>,
        Andrzej Pietrasiewicz <andrzej.p@...labora.com>,
        Jack Pham <jackp@...eaurora.org>, Todd Kjos <tkjos@...gle.com>,
        Greg KH <gregkh@...uxfoundation.org>,
        Linux USB List <linux-usb@...r.kernel.org>,
        lkml <linux-kernel@...r.kernel.org>
Subject: Re: More dwc3 gadget issues with adb

On Thu, Apr 16, 2020 at 1:19 AM Felipe Balbi <balbi@...nel.org> wrote:
> One thing I noticed is that we're missing a giveback on ep1out. Here's a
> working case:
>

Hey Felipe,
  So I found some time to dig around on this today and I started
trying to understand this issue that you've pointed out about missing
the giveback.

It seems part of the issue is we get to a point where we have some req
where pending_sgs is more than one.

We call dwc3_prepare_one_trb_sg() on it:
  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/dwc3/gadget.c?h=v5.7-rc2#n1068

And we process the sg list incrementing req->num_queued_sgs for each one.

then later, dwc3_gadget_ep_cleanup_completed_request() is called on the request:
  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/dwc3/gadget.c?h=v5.7-rc2#n2522

We call dwc3_gadget_ep_reclaim_trb_sg()
  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/dwc3/gadget.c?h=v5.7-rc2#n2470

Where we iterate over the req->sg, ideally decrementing
num_pending_sgs each time and return.

But back in dwc3_gadget_ep_cleanup_completed_request()  and there
we're hitting the:
  if (!dwc3_gadget_ep_request_completed(req) ||
      req->num_pending_sgs) {
case which causes us to skip the call to dwc3_gadget_giveback().

Looking as to why the num_pending_sgs is non zero, that's because in
dwc3_gadget_ep_reclaim_trb_sg we're hitting the case where the trb has
the DWC3_TRB_CTRL_HWO ctrl flag set, which breaks us out of the loop
early before we decrement num_pending_sgs.

For that trb, we're setting the HWO flag in __dwc3_prepare_one_trb()
(called from dwc3_prepare_one_trb_sg() back at the beginning):
  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/dwc3/gadget.c?h=v5.7-rc2#n921

I added logic showing every time we set or clear that flag, and it
seems like we're always setting it but never clearing it. And often
that's not an issue as we only have one sg entry. But if its set on a
trb in a request with multiple sgs, that's where it seems to be
causing the issue.

I'll continue to dig around to try to understand where it might be
going awry (why we never clear the HWO flag). But figured I'd try to
explain this much in case it rings any bells to you.

> One interesting thing is that TRB addresses are "odd". I can't find a
> proper lifetime for these TRBs. Do you have IOMMU enabled? Can you run
> without it? For example, nowhere in the log can I find the place where
> trb 0000000092deef41 was first enqueue. I'm assuming the log to be
> ordered, which means that trb is the same as 00000000f3db4076. But why
> are the addresses different?
>
> Another weird thing is that even though we CHN bit being set in
> 0000000092deef41, we don't see where the second trb (the one its chained
> to) was prepared. It seems like it was *never* prepared, what gives?

I suspect these bits were due to the tracing happening after some
minor amount of initial adb traffic began at bootup? So the trace
isn't capturing all the events.

Let me know if that doesn't sound reasonable and I'll try to dig a bit
futher on those questions.

thanks
-john