[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1e8f53d5.2c21.198fa70f4f7.Coremail.00107082@163.com>
Date: Sat, 30 Aug 2025 18:06:03 +0800 (CST)
From: "David Wang" <00107082@....com>
To: Michał Pecio <michal.pecio@...il.com>
Cc: WeitaoWang-oc@...oxin.com, mathias.nyman@...ux.intel.com,
gregkh@...uxfoundation.org, linux-usb@...r.kernel.org,
regressions@...ts.linux.dev, linux-kernel@...r.kernel.org,
surenb@...gle.com, kent.overstreet@...ux.dev
Subject: Re: [REGRESSION 6.17-rc3] usb/xhci: possible memory leak after
suspend/resume cycle.
At 2025-08-30 17:48:28, "Michał Pecio" <michal.pecio@...il.com> wrote:
>On Sat, 30 Aug 2025 02:13:54 +0800, David Wang wrote:
>> Hi,
>>
>> I have been watching kernel memory usage for drivers for a while, via /proc/allocinfo.
>> After upgrade to 6.17-rc3, I notice memory usage behavior changes for usb drivers:
>>
>> Before rc3, after several suspend/resume cycles, usb devices's memory usage is very stable:
>>
>> 40960 5 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 9
>> 1024 1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 2
>> 320 10 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 31
>> 1920 15 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 31
>> 112 12 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 32
>> 1792 28 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 59
>>
>> But with rc3, the memory usage increase after each suspend/resume cycle:
>>
>> #1:
>> 49152 6 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 9
>> 1024 1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 2
>> 384 12 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 32
>> 2176 17 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 32
>> 128 14 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 34
>> 2048 32 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 61
>> #2:
>> 57344 7 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 13
>> 1024 1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 3
>> 448 14 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 46
>> 2432 19 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 43
>> 144 16 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 44
>> 2304 36 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 82
>> #3:
>> 65536 8 drivers/usb/host/xhci-mem.c:980 [xhci_hcd] func:xhci_alloc_virt_device 17
>> 1024 1 drivers/usb/host/xhci-mem.c:841 [xhci_hcd] func:xhci_alloc_tt_info 4
>> 512 16 drivers/usb/host/xhci-mem.c:461 [xhci_hcd] func:xhci_alloc_container_ctx 60
>> 2688 21 drivers/usb/host/xhci-mem.c:377 [xhci_hcd] func:xhci_ring_alloc 54
>> 160 18 drivers/usb/host/xhci-mem.c:49 [xhci_hcd] func:xhci_segment_alloc 54
>> 2560 40 drivers/usb/host/xhci-mem.c:38 [xhci_hcd] func:xhci_segment_alloc 103
>>
>> The memory increasing pattern keeps going on for each suspend/resume afterwards, I am not
>> sure whether those memory would be released sometime later.
>>
>> And in kernel log, two lines of error always showed up after suspend/resume:
>>
>> [ 295.613598] xhci_hcd 0000:03:00.0: dma_pool_destroy xHCI ring segments busy
>> [ 295.613605] xhci_hcd 0000:03:00.0: dma_pool_destroy xHCI input/output contexts busy
>
>Hi,
>
>Good work, looks like suspend/resume is a little understested corner
>of this driver.
>
>Did you check whether the same leak occurs if you simply disconnect
>a device or if it's truly unique to suspend?
>
>> And bisect narrow down to commit 2eb03376151bb8585caa23ed2673583107bb5193(
>> "usb: xhci: Fix slot_id resource race conflict"):
>
>I see a trivial bug which everyone (myself included tbh) missed before.
>Does this help?
>
>diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
>index f11e13f9cdb4..f294032c2ad7 100644
>--- a/drivers/usb/host/xhci-mem.c
>+++ b/drivers/usb/host/xhci-mem.c
>@@ -932,7 +932,7 @@ void xhci_free_virt_device(struct xhci_hcd *xhci, struct xhci_virt_device *dev,
> */
> static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_id)
> {
>- struct xhci_virt_device *vdev;
>+ struct xhci_virt_device *vdev, *tmp_vdev;
> struct list_head *tt_list_head;
> struct xhci_tt_bw_info *tt_info, *next;
> int i;
>@@ -952,8 +952,8 @@ static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_i
> if (tt_info->slot_id == slot_id) {
> /* are any devices using this tt_info? */
> for (i = 1; i < HCS_MAX_SLOTS(xhci->hcs_params1); i++) {
>- vdev = xhci->devs[i];
>- if (vdev && (vdev->tt_info == tt_info))
>+ tmp_vdev = xhci->devs[i];
>+ if (tmp_vdev && (tmp_vdev->tt_info == tt_info))
> xhci_free_virt_devices_depth_first(
> xhci, i);
> }
I notice this too, just a few minutes ago, I just started building a patch with this *silly* bug fixed.
(The device pointer is wrong, that is most likely the culprit: the virtual device is not properly freed and hence memory leak.)
Will update later.
David
>
Powered by blists - more mailing lists