lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJGZr0JNgYccDdsPLPWGyY3ASnN4PBxfJuhxeEEWk87T0qh2fg@mail.gmail.com>
Date:	Wed, 27 Apr 2016 21:26:10 +0300
From:	Maxim Uvarov <muvarov@...il.com>
To:	Bin Liu <b-liu@...com>, Max Uvarov <muvarov@...il.com>,
	linux-kernel@...r.kernel.org, linux-usb@...r.kernel.org,
	Greg KH <gregkh@...uxfoundation.org>,
	sergei.shtylyov@...entembedded.com
Subject: Re: [PATCHv2] musb_host: fix lockup on rxcsr_h_error

2016-04-27 18:46 GMT+03:00 Bin Liu <b-liu@...com>:
> Hi,
>
> On Wed, Apr 27, 2016 at 09:51:58AM +0300, Max Uvarov wrote:
>> Fix soft lockup when resetting remote device attached
>> to usb host. Configuration:
>> pppd -> musb hub -> usb-serial -> gsm modem
>
> I have heard a few reports similar to this symptom, but never been able
> to reproduce it on my side.
>

Ok, I can reproduce it almost very easy.

>> When gsm modem resets, musb rolls in incoming rx interrupts
>> which does not give any time to other application as result
>> it totally lock ups. Solution is to keep original logic for RXCSR_H_ERROR
>
> Have you looked where exact place in the interrupt routine the execution
> has stuck in?
>

It does not stuck. It goes to that line which print proto error over
and over again and
nothing stops that. After some time kernel reports lockup. But
actually it's not stuck,
all cpu time was eaten by executing that handlers.


>> and merge RXCSR_DATAERROR and RXCSR_H_ERROR branches to call same code
>> for setting rx stall with MUSB_RXCSR_H_WZC_BITS.
>
> MUSB_RXCSR_H_WZC_BITS itself does not set rx stall, it just ensures
> MUSB_RXCSR_H_RXSTALL not to be cleared. Please check its comment in
> musb_regs.h.
>
>>
>> Signed-off-by: Max Uvarov <muvarov@...il.com>
>> ---
>>  v2: use bitwise or for error flags before logical and. (Sergei Shtylyov).
>>
>>  drivers/usb/musb/musb_host.c | 12 +++++-------
>>  1 file changed, 5 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/usb/musb/musb_host.c b/drivers/usb/musb/musb_host.c
>> index c3d5fc9..2d9aa78 100644
>> --- a/drivers/usb/musb/musb_host.c
>> +++ b/drivers/usb/musb/musb_host.c
>> @@ -1592,14 +1592,12 @@ void musb_host_rx(struct musb *musb, u8 epnum)
>
> What kernel do you use? This line # is away off from upstream kernel.
>

I did this patch for 4.1 but 4.6 has the same problem and patch
cleanly applies to the
latest torvalds/linux.git v4.6-rc5. This interrupt handler has the
same code.  And looks
like on 3.14 everything worked. I don't have a time to diff 2
versions. Might be regression.


>>
>>               /* stall; record URB status */
>>               status = -EPIPE;
>> +     } else if (rx_csr & (MUSB_RXCSR_DATAERROR | MUSB_RXCSR_H_ERROR)) {
>>
>> -     } else if (rx_csr & MUSB_RXCSR_H_ERROR) {
>> -             dev_dbg(musb->controller, "end %d RX proto error\n", epnum);
>> -
>> -             status = -EPROTO;
>> -             musb_writeb(epio, MUSB_RXINTERVAL, 0);
>> -
>> -     } else if (rx_csr & MUSB_RXCSR_DATAERROR) {
>> +             if (rx_csr & MUSB_RXCSR_H_ERROR) {
>> +                     status = -EPROTO;
>> +                     musb_writeb(epio, MUSB_RXINTERVAL, 0);
>> +             }
>
> Please help me to understand how this change fixes the issue. I see the
> most effect of the change here is directly 'goto finish' so that 'done'
> flag is not set, then musb_advance_schedule() is not called. Is this the
> case or I missed other important pieces?
>

Right that is the goal. On this rxcsr_h_error kernel reschedules
current interrupt.
And that continues forever. For example adding msleep() can give some
time for other
processes. I'm not an expert in this chip but I think that right
solution in that case is not
try to reschedule and quick and allow hub to make reset and once again
init all devices
(in my case ppp/pppd also shutdowns and then I bring everything up
with script.). The
same behavior with dma and pio mode.

Regards,
Max.

> Thanks,
> -Bin.
>
>>
>>               if (USB_ENDPOINT_XFER_ISOC != qh->type) {
>>                       dev_dbg(musb->controller, "RX end %d NAK timeout\n", epnum);
>> --
>> 1.9.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
>> the body of a message to majordomo@...r.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best regards,
Maxim Uvarov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ