lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aXnNcQa5Ooq2wIX2@oa-jiangdayu.localdomain>
Date: Wed, 28 Jan 2026 16:48:49 +0800
From: Dayu Jiang <jiangdayu@...omi.com>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
CC: Mathias Nyman <mathias.nyman@...el.com>, Longfang Liu
	<liulongfang@...wei.com>, <linux-usb@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, yudongbin <yudongbin@...omi.com>, guhuinan
	<guhuinan@...omi.com>, chenyu45 <chenyu45@...omi.com>, mahongwei3
	<mahongwei3@...omi.com>
Subject: Re: [PATCH] usb: xhci: add xhci_halt() for HCE Handling

On Tue, Jan 27, 2026 at 12:22:40PM +0100, Greg Kroah-Hartman wrote:
> On Tue, Jan 27, 2026 at 07:04:22PM +0800, jiangdayu wrote:
> > When the xHCI controller reports a Host Controller Error (HCE) status
> > in the interrupt handler, the driver currently only logs a warning and
> > continues execution. However, a Host Controller Error indicates a
> > critical hardware failure that requires the controller to be halted.
> > 
> > Add xhci_halt(xhci) call after the HCE warning to properly halt the
> > controller when this error condition is detected. This ensures the
> > controller is in a consistent state and prevents further operations
> > on a failed hardware. Additionally, if there are still unhandled
> > interrupts at this point, it may cause interrupt storm.
> > 
> > The change is made in xhci_irq() function where STS_HCE status is
> > checked, mirroring the existing error handling pattern used for
> > STS_FATAL errors.
> > 
> > Fixes: 2a25e66d676df ("xhci: print warning when HCE was set")
> > Signed-off-by: jiangdayu <jiangdayu@...omi.com>
> 
> We need a full name, not an email alias, sorry.
Sorry for the confusion, I will use my full legal name (instead of the 
email alias) in the Signed-off-by line in the revised patch.  
> 
> And this isn't really "fixing" that commit, there's nothing wrong with
> it as-is.  This is adding new functionality to the code.
I initially used the Fixes tag because the original commit only logged
a warning for HCE with no further action, this incomplete handling 
risks interrupt storms on the SoC (since the interrupt isn’t cleared). 
That’s a robustness gap I wanted to fix with this patch.
> 
> > ---
> >  drivers/usb/host/xhci-ring.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> > index 9315ba18310d..1cbefee3c4ca 100644
> > --- a/drivers/usb/host/xhci-ring.c
> > +++ b/drivers/usb/host/xhci-ring.c
> > @@ -3195,6 +3195,7 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd)
> >  
> >  	if (status & STS_HCE) {
> >  		xhci_warn(xhci, "WARNING: Host Controller Error\n");
> > +		xhci_halt(xhci);
> 
> What is going to start things back up again?  And as you are calling
> this function, why is the warning message needed anymore?  The
> tracepoint information will give you that message now, right?
When HCE is triggered, it indicates a critical hardware failure. 
Aligning with the handling of HSE (STS_FATAL) by adding 
xhci_halt() here is more reasonable: without xhci_halt(), the 
USB controller may fall into an unpredictable and unstable state, 
which could exacerbate system issues.  

Retaining the warning message is necessary because it is directly 
visible in dmesg, whereas tracepoint information requires explicitly 
enabling xHCI tracepoints. Additionally, if xhci_halt() is called in 
xhci_irq() without the warning log, it would be impossible to 
distinguish whether the halt was triggered by HCE or HSE.
> 
> And is this just papering over a hardware bug?  Should this really be
> happening for any normal system?
Yes, this issue has been reproducible on real-world hardware: HCE is 
triggered in UAS Storage Device plug/unplug scenarios on Android 
devices, which enters this error branch and causes an interrupt storm, 
leading to severe system-level faults.
> 
> thanks,
> 
> greg k-h

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ