lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 28 Apr 2016 09:51:37 +0300
From:	Maxim Uvarov <muvarov@...il.com>
To:	Bin Liu <b-liu@...com>, Maxim Uvarov <muvarov@...il.com>,
	linux-kernel@...r.kernel.org, linux-usb@...r.kernel.org,
	Greg KH <gregkh@...uxfoundation.org>,
	sergei.shtylyov@...entembedded.com
Subject: Re: [PATCHv2] musb_host: fix lockup on rxcsr_h_error

2016-04-28 0:28 GMT+03:00 Bin Liu <b-liu@...com>:
> Hi,
>
> On Wed, Apr 27, 2016 at 02:13:56PM -0500, Bin Liu wrote:
>> Hi,
>>
>> On Wed, Apr 27, 2016 at 09:26:10PM +0300, Maxim Uvarov wrote:
>> > 2016-04-27 18:46 GMT+03:00 Bin Liu <b-liu@...com>:
>> > > Hi,
>> > >
>> > > On Wed, Apr 27, 2016 at 09:51:58AM +0300, Max Uvarov wrote:
>> > >> Fix soft lockup when resetting remote device attached
>> > >> to usb host. Configuration:
>> > >> pppd -> musb hub -> usb-serial -> gsm modem
>> > >
>> > > I have heard a few reports similar to this symptom, but never been able
>> > > to reproduce it on my side.
>> > >
>> >
>> > Ok, I can reproduce it almost very easy.
>> >
>> > >> When gsm modem resets, musb rolls in incoming rx interrupts
>> > >> which does not give any time to other application as result
>> > >> it totally lock ups. Solution is to keep original logic for RXCSR_H_ERROR
>> > >
>> > > Have you looked where exact place in the interrupt routine the execution
>> > > has stuck in?
>> > >
>> >
>> > It does not stuck. It goes to that line which print proto error over
>> > and over again and
>> > nothing stops that. After some time kernel reports lockup. But
>> > actually it's not stuck,
>> > all cpu time was eaten by executing that handlers.
>> >
>> >
>> > >> and merge RXCSR_DATAERROR and RXCSR_H_ERROR branches to call same code
>> > >> for setting rx stall with MUSB_RXCSR_H_WZC_BITS.
>> > >
>> > > MUSB_RXCSR_H_WZC_BITS itself does not set rx stall, it just ensures
>> > > MUSB_RXCSR_H_RXSTALL not to be cleared. Please check its comment in
>> > > musb_regs.h.
>> > >
>> > >>
>> > >> Signed-off-by: Max Uvarov <muvarov@...il.com>
>> > >> ---
>> > >>  v2: use bitwise or for error flags before logical and. (Sergei Shtylyov).
>> > >>
>> > >>  drivers/usb/musb/musb_host.c | 12 +++++-------
>> > >>  1 file changed, 5 insertions(+), 7 deletions(-)
>> > >>
>> > >> diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c
>> > >> index c3d5fc9..2d9aa78 100644
>> > >> --- a/drivers/usb/musb/musb_host.c
>> > >> +++ b/drivers/usb/musb/musb_host.c
>> > >> @@ -1592,14 +1592,12 @@ void musb_host_rx(struct musb *musb, u8 epnum)
>> > >
>> > > What kernel do you use? This line # is away off from upstream kernel.
>> > >
>> >
>> > I did this patch for 4.1 but 4.6 has the same problem and patch
>> > cleanly applies to the latest torvalds/linux.git v4.6-rc5. This
>> > interrupt handler has the same code.  And looks like on 3.14
>>
>> Yeah, this code hasn't been chaned for year. But in general, it is
>> prepfered to create patches on latest kernel to avoid other headache.
>>
>> > everything worked. I don't have a time to diff 2 versions. Might be
>> > regression.
>> >
>> >
>> > >>
>> > >>               /* stall; record URB status */
>> > >>               status = -EPIPE;
>> > >> +     } else if (rx_csr & (MUSB_RXCSR_DATAERROR | MUSB_RXCSR_H_ERROR)) {
>> > >>
>> > >> -     } else if (rx_csr & MUSB_RXCSR_H_ERROR) {
>> > >> -             dev_dbg(musb->controller, "end %d RX proto error\n", epnum);
>> > >> -
>> > >> -             status = -EPROTO;
>> > >> -             musb_writeb(epio, MUSB_RXINTERVAL, 0);
>> > >> -
>> > >> -     } else if (rx_csr & MUSB_RXCSR_DATAERROR) {
>> > >> +             if (rx_csr & MUSB_RXCSR_H_ERROR) {
>> > >> +                     status = -EPROTO;
>> > >> +                     musb_writeb(epio, MUSB_RXINTERVAL, 0);
>> > >> +             }
>> > >
>> > > Please help me to understand how this change fixes the issue. I see the
>> > > most effect of the change here is directly 'goto finish' so that 'done'
>> > > flag is not set, then musb_advance_schedule() is not called. Is this the
>> > > case or I missed other important pieces?
>> > >
>> >
>> > Right that is the goal. On this rxcsr_h_error kernel reschedules
>> > current interrupt.  And that continues forever. For example adding
>>
>> The MUSB Programming Guide says CPU should clear this MUSB_RXCSR_H_ERROR
>> bit, but the current driver doesn't. I am wondering if this causes the
>> controller keeps generating the same interrupt. Can you please try the
>> following change instead to see if the lockup goes away?
>>
>> @@ -1870,6 +1870,9 @@ void musb_host_rx(struct musb *musb, u8 epnum)
>>                 status = -EPROTO;
>>                 musb_writeb(epio, MUSB_RXINTERVAL, 0);
>>
>> +               rx_csr &= ~MUSB_RXCSR_H_ERROR;
>> +               musb_writew(epio, MUSB_RXCSR, rx_csr);
>
> +               goto finish;
>
> Please also add the line above. I will spend more time to understand
> what is happening...
>

Hello Bin,

yes, it also works with that reset and go to finish:

diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c
index c3d5fc9..8cd98e7 100644
--- a/drivers/usb/musb/musb_host.c
+++ b/drivers/usb/musb/musb_host.c
@@ -1599,6 +1599,10 @@ void musb_host_rx(struct musb *musb, u8 epnum)
                status = -EPROTO;
                musb_writeb(epio, MUSB_RXINTERVAL, 0);

+               rx_csr &= ~MUSB_RXCSR_H_ERROR;
+               musb_writew(epio, MUSB_RXCSR, rx_csr);
+
+               goto finish;
        } else if (rx_csr & MUSB_RXCSR_DATAERROR) {

                if (USB_ENDPOINT_XFER_ISOC != qh->type) {


That I think a key thing, which is done in other error. If that change
is good for you than I'm also happy with it.

I also not sure if musb_writeb(epio, MUSB_RXINTERVAL, 0); is needed.
In my case it's the same result with it and without it.
In other scenarios might be reasonable...


> First of all, I don't like the idea of merging the two branches, it
> makes the code ugly.

Yes, I don't like that function at all, it's too long and difficult to
read if you first look on it first time. It will be good to split it
on 3 small functions for each big if.

Maxim.

>
> Regards,
> -Bin.
>
>> +
>>         } else if (rx_csr & MUSB_RXCSR_DATAERROR) {
>>
>>                 if (USB_ENDPOINT_XFER_ISOC != qh->type) {
>>
>> Regards,
>> -Bin.
>>
>> > msleep() can give some time for other processes. I'm not an expert in
>> > this chip but I think that right solution in that case is not try to
>> > reschedule and quick and allow hub to make reset and once again init
>> > all devices (in my case ppp/pppd also shutdowns and then I bring
>> > everything up with script.). The same behavior with dma and pio mode.
>> >
>> > Regards,
>> > Max.
>> >
>> > > Thanks,
>> > > -Bin.
>> > >
>> > >>
>> > >>               if (USB_ENDPOINT_XFER_ISOC != qh->type) {
>> > >>                       dev_dbg(musb->controller, "RX end %d NAK timeout\n", epnum);
>> > >> --
>> > >> 1.9.1
>> > >>
>> > >> --
>> > >> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
>> > >> the body of a message to majordomo@...r.kernel.org
>> > >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>> >
>> >
>> > --
>> > Best regards,
>> > Maxim Uvarov



-- 
Best regards,
Maxim Uvarov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ