[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b3d53ea7-b78f-3415-b0ed-7ddf292ec7fc@linux.intel.com>
Date: Thu, 23 Mar 2023 13:01:19 +0200
From: Mathias Nyman <mathias.nyman@...ux.intel.com>
To: Hongyu Xie <xiehongyu1@...inos.cn>, mathias.nyman@...el.com,
gregkh@...uxfoundation.org
Cc: linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org,
sunke <sunke@...inos.cn>
Subject: Re: [PATCH -next] usb: xhci: do not free an empty cmd ring
On 20.3.2023 6.22, Hongyu Xie wrote:
> It was first found on HUAWEI Kirin 9006C platform with a builtin xhci
> controller during stress cycle test(stress-ng, glmark2, x11perf, S4...).
>
> phase one:
> [26788.706878] PM: dpm_run_callback(): platform_pm_thaw+0x0/0x68 returns -12
> [26788.706878] PM: Device xhci-hcd.1.auto failed to thaw async: error -12
> ...
> phase two:
> [28650.583496] [2023:01:19 04:43:29]Unable to handle kernel NULL pointer dereference at virtual address 0000000000000028
> ...
> [28650.583526] user pgtable: 4k pages, 39-bit VAs, pgdp=000000027862a000
> [28650.583557] [0000000000000028] pgd=0000000000000000
> ...
> [28650.583587] pc : xhci_suspend+0x154/0x5b0
> [28650.583618] lr : xhci_suspend+0x148/0x5b0
> [28650.583618] sp : ffffffc01c7ebbd0
> [28650.583618] x29: ffffffc01c7ebbd0 x28: ffffffec834d0000
> [28650.583618] x27: ffffffc0106a3cc8 x26: ffffffb2c540c848
> [28650.583618] x25: 0000000000000000 x24: ffffffec82ee30b0
> [28650.583618] x23: ffffffb43b31c2f8 x22: 0000000000000000
> [28650.583618] x21: 0000000000000000 x20: ffffffb43b31c000
> [28650.583648] x19: ffffffb43b31c2a8 x18: 0000000000000001
> [28650.583648] x17: 0000000000000803 x16: 00000000fffffffe
> [28650.583648] x15: 0000000000001000 x14: ffffffb150b67e00
> [28650.583648] x13: 00000000f0000000 x12: 0000000000000001
> [28650.583648] x11: 0000000000000000 x10: 0000000000000a80
> [28650.583648] x9 : ffffffc01c7eba00 x8 : ffffffb43ad10ae0
> [28650.583648] x7 : ffffffb84cd98dc0 x6 : 0000000cceb6a101
> [28650.583679] x5 : 00ffffffffffffff x4 : 0000000000000001
> [28650.583679] x3 : 0000000000000011 x2 : 0000000000e2cfa8
> [28650.583679] x1 : 00000000823535e1 x0 : 0000000000000000
>
> gdb:
> (gdb) l *(xhci_suspend+0x154)
> 0xffffffc010b6cd44 is in xhci_suspend (/.../drivers/usb/host/xhci.c:854).
> 849 {
> 850 struct xhci_ring *ring;
> 851 struct xhci_segment *seg;
> 852
> 853 ring = xhci->cmd_ring;
> 854 seg = ring->deq_seg;
> (gdb) disassemble 0xffffffc010b6cd44
> ...
> 0xffffffc010b6cd40 <+336>: ldr x22, [x19, #160]
> 0xffffffc010b6cd44 <+340>: ldr x20, [x22, #40]
> 0xffffffc010b6cd48 <+344>: mov w1, #0x0 // #0
>
> During phase one, platform_pm_thaw called xhci_plat_resume which called
> xhci_resume. The rest possible calling routine might be
> xhci_resume->xhci_init->xhci_mem_init, and xhci->cmd_ring was cleaned in
> xhci_mem_cleanup before xhci_mem_init returned -ENOMEM.
>
Thanks for reporting this.
xhci_mem_init() failing with -ENOMEM doesn't sound good.
Are we really running out of memory? does kmemleak say anything?
Any chance you could look into where exactly xhci_mem_init() fails as
xhci_mem_init() always returns -ENOMEM on failure?
> During phase two, systemd was tring to hibernate again and called
> xhci_suspend, then xhci_clear_command_ring dereferenced xhci->cmd_ring
> which was already NULL.
>
> So if xhci->cmd_ring is NULL, xhci_clear_command_ring just return.
If xhci_mem_init() failed then xhci driver is completely unusable.
it shouldn't be used at all after this.
Thanks
Mathias
Powered by blists - more mailing lists