[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8a52e868-0ca1-55b7-5ad2-ddb0cbb5e45d@redhat.com>
Date: Fri, 27 Nov 2020 13:32:16 +0100
From: Hans de Goede <hdegoede@...hat.com>
To: Christoph Hellwig <hch@....de>, Tom Yan <tom.ty89@...il.com>
Cc: Mathias Nyman <mathias.nyman@...el.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
linux-usb <linux-usb@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-pci@...r.kernel.org
Subject: Re: 5.10 regression caused by: "uas: fix sdev->host->dma_dev": many
XHCI swiotlb buffer is full / DMAR: Device bounce map failed errors on
thunderbolt connected XHCI controller
Hi,
On 11/27/20 12:41 PM, Hans de Goede wrote:
> Hi,
>
> On 11/24/20 11:27 AM, Christoph Hellwig wrote:
>> On Mon, Nov 23, 2020 at 03:49:09PM +0100, Hans de Goede wrote:
>>> Hi,
>>>
>>> +Cc Christoph Hellwig <hch@....de>
>>>
>>> Christoph, this is still an issue, so I've been looking around a bit and think this
>>> might have something to do with the dma-mapping-5.10 changes.
>>>
>>> Do you have any suggestions to debug this, or is it time to do a git bisect
>>> on this before 5.10 ships with regression?
>>
>> Given that DMAR prefix this seems to be about using intel-iommu + bounce
>> buffering for external devices. I can't really think of anything specific
>> in 5.10 related to that, so maybe you'll need to bisect.
>>
>> I doub this means we are actually leaking swiotlb buffers, so while
>> I'm pretty sure we broke something in lower layers this also means
>> xhci doesn't handle swiotlb operation very gracefully in general.
>
> I've done a git bisect, and the result is somewhat surprising. The git-bisect
> points to:
>
> commit 558033c2828f ("uas: fix sdev->host->dma_dev")
>
> Use scsi_add_host_with_dma() instead of scsi_add_host().
>
> When the scsi request queue is initialized/allocated, hw_max_sectors is clamped
> to the dma max mapping size. Therefore, the correct device that should be used
> for the clamping needs to be set.
>
> The same clamping is still needed in uas as hw_max_sectors could be changed
> there. The original clamping would be invalidated in such cases.
>
> I do have an UAS drive connected to the thunderbolt-dock, so I guess that this
> change is causing the UAS driver to gobble all all available swiotlb space.
I ran some more tests, I can confirm that reverting:
5df7ef7d32fe "uas: bump hw_max_sectors to 2048 blocks for SS or faster drives"
558033c2828f "uas: fix sdev->host->dma_dev"
Makes the problem go away while running a 5.10 kernel. I also tried doubling
the swiotlb size by adding: swiotlb=65536 to the kernel commandline but that
does not help.
Some more observations:
1. The usb-storage driver does not cause this issue, even though it has a
very similar change.
2. The problem does not happen until I plug an UAS decvice into the dock.
3. The problem continues to happen even after I unplug the UAS device and
rmmod the uas module
3. made me take a bit closer look to the troublesome commit, it passes:
udev->bus->sysdev, which I assume is the XHCI controller itself as device
to scsi_add_host_with_dma, which in turn seems to cause permanent changes
to the dma settings for the XHCI controller. I'm not all that familiar with
the DMA APIs but I'm getting the feeling that passing the actual XHCI-controller's
device as dma-device to scsi_add_host_with_dma is simply the wrong thing to
do; and that the intended effects (honor XHCI dma limits, but do not cause
any changes the XHCI dma settings) should be achieved differently.
Note that if this is indeed wrong, the matching usb-storage change should
likely also be dropped.
Regards,
Hans
Powered by blists - more mailing lists