lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALAqxLUQ0ciJTLrmEAu9WKCJHAbpY9szuVm=+VapN2QWWGnNjA@mail.gmail.com>
Date:   Wed, 5 Feb 2020 13:03:51 -0800
From:   John Stultz <john.stultz@...aro.org>
To:     Felipe Balbi <balbi@...nel.org>
Cc:     "Yang, Fei" <fei.yang@...el.com>,
        Andrzej Pietrasiewicz <andrzej.p@...labora.com>,
        lkml <linux-kernel@...r.kernel.org>,
        Felipe Balbi <felipe.balbi@...ux.intel.com>,
        Thinh Nguyen <thinhn@...opsys.com>,
        Tejas Joglekar <tejas.joglekar@...opsys.com>,
        Jack Pham <jackp@...eaurora.org>, Todd Kjos <tkjos@...gle.com>,
        Greg KH <gregkh@...uxfoundation.org>,
        Linux USB List <linux-usb@...r.kernel.org>,
        stable <stable@...r.kernel.org>, Jens Axboe <axboe@...nel.dk>,
        Christoph Hellwig <hch@...radead.org>
Subject: Re: [RFC][PATCH 0/2] Avoiding DWC3 transfer stalls/hangs when using
 adb over f_fs

On Thu, Jan 23, 2020 at 9:46 AM Felipe Balbi <balbi@...nel.org> wrote:
> >>>>>
> >>>>> Since ~4.20, when the functionfs gadget enabled scatter-gather
> >>>>> support, we have seen problems with adb connections stalling and
> >>>>> stopping to function on hardware with dwc3 usb controllers.
> >>>>> Specifically, HiKey960, Dragonboard 845c, and Pixel3 devices.
...
> >>
> >> I'm pretty sure this should be solved at the DMA API level, just want to confirm.
> >
> > I have sent you the tracepoints long time ago. Also my analysis of the
> > problem (BTW, I don't think the tracepoints helped much). It's
> > basically a logic problem in function dwc3_gadget_ep_reclaim_trb_sg().
>
> AFAICT, this is caused by DMA API merging pages together when map an
> sglist for DMA. While doing that, it does *not* move the SG_END flag
> which sg_is_last() checks.
>
> I consider that an overlook on the DMA API, wouldn't you? Why should DMA
> API users care if pages were merged or not while mapping the sglist? We
> have for_each_sg() and sg_is_last() for a reason.
>

>From an initial look, I agree this is pretty confusing.   dma_map_sg()
can coalesce entries in the sg list, modifying the sg entires
themselves, however, in doing so it doesn't modify the number of
entries in the sglist (nor the end state bit).  That's pretty subtle!

My initial naive attempt to fix the dma-iommu path to set the end bit
at the tail of __finalize_sg() which does the sg entry modifications,
only caused trouble elsewhere, as there's plenty of logic that expects
the number of entries to not change, so having sg_next() return NULL
before that point results in lots of null pointer traversals.

Further, looking at the history, while apparently quirky, this has
been the documented behavior in DMA-API.txt for over almost 14 years
(at least).  It clearly states that that dma_map_api can return fewer
mapped entries then sg entries, and one should loop only that many
times (for_each_sg() having a max number of entries to iterate over
seems specifically for this purpose).  Additionally, it says one must
preserve the original number of entries (not # mapped entries) for
dma_unmap_sg().

So I'm not sure that sg_is_last() is really valid for iterating on
mapped sg lists.

Should it be? Probably (at least with my unfamiliar eyes), but
sg_is_last() has been around for almost as long coexisting with this
behavioral quirk, so I'm also not sure this is the best hill for the
dwc3 driver to die on. :)

The fix here:
  https://lore.kernel.org/lkml/20200122222645.38805-3-john.stultz@linaro.org/
Or maybe the slightly cleaner varient here:
  https://git.linaro.org/people/john.stultz/android-dev.git/commit/?h=dev/db845c-mainline-WIP&id=61a5816aa71ec719e77df9f2260dbbf6bcec7c99
seems like it would correctly address things following the
documentation and avoid the failures we're seeing.

As to if dma_map_sg() should fixup the state bits or properly shrink
the sg list when it coalesces entries, that seems like it would be a
much more intrusive change across quite a bit of the kernel that was
written to follow the documented method. So my confidence that such a
change would make it upstream in a reasonable amount of time isn't
very high, and it seems like a bad idea to block the driver from
working properly in the meantime.

Pulling in Christoph and Jens as I suspect they have more context on
this, and maybe can explain thats its not so quirky with the right
perspective?

Thoughts? Maybe there is an easier way to make it less quirky then
what I imagine?

thanks
-john

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ