netdev - Re: [PATCH] usbnet: fix kernel crash after disconnect

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1555569464.7835.4.camel@suse.com>
Date:   Thu, 18 Apr 2019 08:37:44 +0200
From:   Oliver Neukum <oneukum@...e.com>
To:     Kloetzke Jan <Jan.Kloetzke@...h.de>
Cc:     "linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH] usbnet: fix kernel crash after disconnect

On Mi, 2019-04-17 at 09:19 +0000, Kloetzke Jan wrote:
> When disconnecting cdc_ncm the kernel sporadically crashes shortly
> after the disconnect:
> 
>   [   57.868812] Unable to handle kernel NULL pointer dereference at virtual address 00000000
>   ...
>   [   58.006653] PC is at 0x0
>   [   58.009202] LR is at call_timer_fn+0xec/0x1b4
>   [   58.013567] pc : [<0000000000000000>] lr : [<ffffff80080f5130>] pstate: 00000145
>   [   58.020976] sp : ffffff8008003da0
>   [   58.024295] x29: ffffff8008003da0 x28: 0000000000000001
>   [   58.029618] x27: 000000000000000a x26: 0000000000000100
>   [   58.034941] x25: 0000000000000000 x24: ffffff8008003e68
>   [   58.040263] x23: 0000000000000000 x22: 0000000000000000
>   [   58.045587] x21: 0000000000000000 x20: ffffffc68fac1808
>   [   58.050910] x19: 0000000000000100 x18: 0000000000000000
>   [   58.056232] x17: 0000007f885aff8c x16: 0000007f883a9f10
>   [   58.061556] x15: 0000000000000001 x14: 000000000000006e
>   [   58.066878] x13: 0000000000000000 x12: 00000000000000ba
>   [   58.072201] x11: ffffffc69ff1db30 x10: 0000000000000020
>   [   58.077524] x9 : 8000100008001000 x8 : 0000000000000001
>   [   58.082847] x7 : 0000000000000800 x6 : ffffff8008003e70
>   [   58.088169] x5 : ffffffc69ff17a28 x4 : 00000000ffff138b
>   [   58.093492] x3 : 0000000000000000 x2 : 0000000000000000
>   [   58.098814] x1 : 0000000000000000 x0 : 0000000000000000
>   ...
>   [   58.205800] [<          (null)>]           (null)
>   [   58.210521] [<ffffff80080f5298>] expire_timers+0xa0/0x14c
>   [   58.215937] [<ffffff80080f542c>] run_timer_softirq+0xe8/0x128
>   [   58.221702] [<ffffff8008081120>] __do_softirq+0x298/0x348
>   [   58.227118] [<ffffff80080a6304>] irq_exit+0x74/0xbc
>   [   58.232009] [<ffffff80080e17dc>] __handle_domain_irq+0x78/0xac
>   [   58.237857] [<ffffff8008080cf4>] gic_handle_irq+0x80/0xac
>   ...
> 
> The crash happens roughly 125..130ms after the disconnect. This
> correlates with the 'delay' timer that is started on certain USB tx/rx
> errors in the URB completion handler.
> 
> The suspected problem is a race of usbnet_stop() with
> usbnet_start_xmit(). In usbnet_stop() we call usbnet_terminate_urbs()
> to cancel all URBs in flight. This only makes sense if no new URBs are
> submitted concurrently, though. But the usbnet_start_xmit() can run at
> the same time on another CPU which almost unconditionally submits an
> URB. The error callback of the new URB will then schedule the timer
> after it was already stopped.

Hi,

interesting. How sure are you of the details of your analysis?
I am asking because usbnet_stop() does a del_timer_sync().
It is indeed written under the assumption that the upper layer
will have ceased transmission when it stops an interface.

So I am wondering whether the correct fix would not be to make
sure the timer is started.

	Regards
		Oliver