lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bd440f1d-5ea4-485e-9924-433997765adc@rowland.harvard.edu>
Date:   Wed, 26 Jul 2023 10:20:25 -0400
From:   Alan Stern <stern@...land.harvard.edu>
To:     liulongfang <liulongfang@...wei.com>
Cc:     gregkh@...uxfoundation.org, linux-usb@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] USB:bugfix a controller halt error

On Wed, Jul 26, 2023 at 02:58:18PM +0800, liulongfang wrote:
> On 2023/7/21 22:57, Alan Stern Wrote:
> > On Fri, Jul 21, 2023 at 06:00:15PM +0800, liulongfang wrote:
> >> On systems that use ECC memory. The ECC error of the memory will
> >> cause the USB controller to halt. It causes the usb_control_msg()
> >> operation to fail.
> > 
> > How often does this happen in real life?  (Besides, it seems to me that 
> > if your system is getting a bunch of ECC memory errors then you've got 
> > much worse problems than a simple USB failure!)
> >
> 
> This problem is on ECC memory platform.
> In the test scenario, the problem is 100% reproducible.
> 
> > And why do you worry about ECC memory failures in particular?  Can't 
> > _any_ kind of failure cause the usb_control_msg() operation to fail?
> > 
> >> At this point, the returned buffer data is an abnormal value, and
> >> continuing to use it will lead to incorrect results.
> > 
> > The driver already contains code to check for abnormal values.  The 
> > check is not perfect, but it should prevent things from going too 
> > badly wrong.
> >
> 
> If it is ECC memory error. These parameter checks would also
> actually be invalid.
> 
> >> Therefore, it is necessary to judge the return value and exit.
> >>
> >> Signed-off-by: liulongfang <liulongfang@...wei.com>
> > 
> > There is a flaw in your reasoning.
> > 
> > The operation carried out here is deliberately unsafe (for full-speed 
> > devices).  It is made before we know the actual maxpacket size for ep0, 
> > and as a result it might return an error code even when it works okay.  
> > This shouldn't happen, but a lot of USB hardware is unreliable.
> > 
> > Therefore we must not ignore the result merely because r < 0.  If we do 
> > that, the kernel might stop working with some devices.
> > 
> It may be that the handling solution for ECC errors is different from that
> of the OS platform. On the test platform, after usb_control_msg() fails,
> reading the memory data of buf will directly lead to kernel crash:

All right, then here's a proposal for a different way to solve the 
problem: Change the kernel's handler for the ECC error notification.  
Have it clear the affected parts of memory, so that the kernel can go 
ahead and use them without crashing.

It seems to me that something along these lines must be necessary in 
any case.  Unless the bad memory is cleared somehow, it would never be 
usable again.  The kernel might deallocate it, then reallocate for 
another purpose, and then crash when the new user tries to access it.  

In fact, this scenario could still happen even with your patch, which 
means the patch doesn't really fix the problem.

Alan Stern

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ