lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMTwNXB0QLP-b=RmLPtRJo=T_efN_3H4dd5AiMNYrJDXddJkMA@mail.gmail.com>
Date: Mon, 19 May 2025 23:43:21 +0530
From: Udipto Goswami <udipto.goswami@....qualcomm.com>
To: Mathias Nyman <mathias.nyman@...ux.intel.com>
Cc: Roy Luo <royluo@...gle.com>, mathias.nyman@...el.com,
        quic_ugoswami@...cinc.com, Thinh.Nguyen@...opsys.com,
        gregkh@...uxfoundation.org, michal.pecio@...il.com,
        linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org,
        stable@...r.kernel.org
Subject: Re: [PATCH v1] Revert "usb: xhci: Implement xhci_handshake_check_state()
 helper"

On Mon, May 19, 2025 at 6:23 PM Mathias Nyman
<mathias.nyman@...ux.intel.com> wrote:
>
> On 17.5.2025 7.39, Roy Luo wrote:
> > This reverts commit 6ccb83d6c4972ebe6ae49de5eba051de3638362c.
> >
> > Commit 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state()
> > helper") was introduced to workaround watchdog timeout issues on some
> > platforms, allowing xhci_reset() to bail out early without waiting
> > for the reset to complete.
> >
> > Skipping the xhci handshake during a reset is a dangerous move. The
> > xhci specification explicitly states that certain registers cannot
> > be accessed during reset in section 5.4.1 USB Command Register (USBCMD),
> > Host Controller Reset (HCRST) field:
> > "This bit is cleared to '0' by the Host Controller when the reset
> > process is complete. Software cannot terminate the reset process
> > early by writinga '0' to this bit and shall not write any xHC
> > Operational or Runtime registers until while HCRST is '1'."
> >
> > This behavior causes a regression on SNPS DWC3 USB controller with
> > dual-role capability. When the DWC3 controller exits host mode and
> > removes xhci while a reset is still in progress, and then tries to
> > configure its hardware for device mode, the ongoing reset leads to
> > register access issues; specifically, all register reads returns 0.
> > These issues extend beyond the xhci register space (which is expected
> > during a reset) and affect the entire DWC3 IP block, causing the DWC3
> > device mode to malfunction.
>
> I agree with you and Thinh that waiting for the HCRST bit to clear during
> reset is the right thing to do, especially now when we know skipping it
> causes issues for SNPS DWC3, even if it's only during remove phase.
>
> But reverting this patch will re-introduce the issue originally worked
> around by Udipto Goswami, causing regression.
>
> Best thing to do would be to wait for HCRST to clear for all other platforms
> except the one with the issue.
>
> Udipto Goswami, can you recall the platforms that needed this workaroud?
> and do we have an easy way to detect those?

Hi Mathias,

>From what I recall, we saw this issue coming up on our QCOM mobile
platforms but it was not consistent. It was only reported in long runs
i believe. The most recent instance when I pushed this patch was with
platform SM8650, it was a watchdog timeout issue where xhci_reset() ->
xhci_handshake() polling read timeout upon xhci remove. Unfortunately
I was not able to simulate the scenario for more granular testing and
had validated it with long hours stress testing.
The callstack was like so:

Full call stack on core6:
-000|readl([X19] addr = 0xFFFFFFC03CC08020)
-001|xhci_handshake(inline)
-001|xhci_reset([X19] xhci = 0xFFFFFF8942052250, [X20] timeout_us = 10000000)
-002|xhci_resume([X20] xhci = 0xFFFFFF8942052250, [?] hibernated = ?)
-003|xhci_plat_runtime_resume([locdesc] dev = ?)
-004|pm_generic_runtime_resume([locdesc] dev = ?)
-005|__rpm_callback([X23] cb = 0xFFFFFFE3F09307D8, [X22] dev =
0xFFFFFF890F619C10)
-006|rpm_callback(inline)
-006|rpm_resume([X19] dev = 0xFFFFFF890F619C10,
[NSD:0xFFFFFFC041453AD4] rpmflags = 4)
-007|__pm_runtime_resume([X20] dev = 0xFFFFFF890F619C10, [X19] rpmflags = 4)
-008|pm_runtime_get_sync(inline)
-008|xhci_plat_remove([X20] dev = 0xFFFFFF890F619C00)
-009|platform_remove([X19] _dev = 0xFFFFFF890F619C10)
-010|device_remove(inline)
-010|__device_release_driver(inline)
-010|device_release_driver_internal([X20] dev = 0xFFFFFF890F619C10,
[?] drv = ?, [X19] parent = 0x0)
-011|device_release_driver(inline)
-011|bus_remove_device([X19] dev = 0xFFFFFF890F619C10)
-012|device_del([X20] dev = 0xFFFFFF890F619C10)
-013|platform_device_del(inline)
-013|platform_device_unregister([X19] pdev = 0xFFFFFF890F619C00)
-014|dwc3_host_exit(inline)
-014|__dwc3_set_mode([X20] work = 0xFFFFFF886072F840)
-015|process_one_work([X19] worker = 0xFFFFFF887AEE46C0, [X21] work =
0xFFFFFF886072F840)
-016|worker_thread([X19] __worker = 0xFFFFFF887AEE46C0)
-017|kthread([X22] _create = 0xFFFFFF8937350600)
-018|ret_from_fork(asm)
---|end of frame

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ