linux-kernel - Re: [PATCH] USB: core: Avoid race of async_completed() w/ usbdev

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.44L0.1708111407110.1732-100000@iolanthe.rowland.org>
Date:   Fri, 11 Aug 2017 14:08:18 -0400 (EDT)
From:   Alan Stern <stern@...land.harvard.edu>
To:     Douglas Anderson <dianders@...omium.org>
cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Guenter Roeck <groeck@...omium.org>,
        Deepa Dinamani <deepa.kernel@...il.com>,
        <linux-usb@...r.kernel.org>,
        Mateusz Berezecki <mateuszb@...tmail.fm>,
        <linux-kernel@...r.kernel.org>,
        Vamsi Krishna Samavedam <vskrishn@...eaurora.org>,
        Ingo Molnar <mingo@...nel.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] USB: core: Avoid race of async_completed() w/ usbdev_release()

On Thu, 10 Aug 2017, Douglas Anderson wrote:

> While running reboot tests w/ a specific set of USB devices (and
> slub_debug enabled), I found that once every few hours my device would
> be crashed with a stack that looked like this:
> 
> [   14.012445] BUG: spinlock bad magic on CPU#0, modprobe/2091
> [   14.012460]  lock: 0xffffffc0cb055978, .magic: ffffffc0, .owner: cryption contexts: %lu/%lu
> [   14.012460] /1025536097, .owner_cpu: 0
> [   14.012466] CPU: 0 PID: 2091 Comm: modprobe Not tainted 4.4.79 #352
> [   14.012468] Hardware name: Google Kevin (DT)
> [   14.012471] Call trace:
> [   14.012483] [<....>] dump_backtrace+0x0/0x160
> [   14.012487] [<....>] show_stack+0x20/0x28
> [   14.012494] [<....>] dump_stack+0xb4/0xf0
> [   14.012500] [<....>] spin_dump+0x8c/0x98
> [   14.012504] [<....>] spin_bug+0x30/0x3c
> [   14.012508] [<....>] do_raw_spin_lock+0x40/0x164
> [   14.012515] [<....>] _raw_spin_lock_irqsave+0x64/0x74
> [   14.012521] [<....>] __wake_up+0x2c/0x60
> [   14.012528] [<....>] async_completed+0x2d0/0x300
> [   14.012534] [<....>] __usb_hcd_giveback_urb+0xc4/0x138
> [   14.012538] [<....>] usb_hcd_giveback_urb+0x54/0xf0
> [   14.012544] [<....>] xhci_irq+0x1314/0x1348
> [   14.012548] [<....>] usb_hcd_irq+0x40/0x50
> [   14.012553] [<....>] handle_irq_event_percpu+0x1b4/0x3f0
> [   14.012556] [<....>] handle_irq_event+0x4c/0x7c
> [   14.012561] [<....>] handle_fasteoi_irq+0x158/0x1c8
> [   14.012564] [<....>] generic_handle_irq+0x30/0x44
> [   14.012568] [<....>] __handle_domain_irq+0x90/0xbc
> [   14.012572] [<....>] gic_handle_irq+0xcc/0x18c
> 
> Investigation using kgdb() found that the wait queue that was passed
> into wake_up() had been freed (it was filled with slub_debug poison).
> 
> I analyzed and instrumented the code and reproduced.  My current
> belief is that this is happening:
> 
> 1. async_completed() is called (from IRQ).  Moves "as" onto the
>    completed list.
> 2. On another CPU, proc_reapurbnonblock_compat() calls
>    async_getcompleted().  Blocks on spinlock.
> 3. async_completed() releases the lock; keeps running; gets blocked
>    midway through wake_up().
> 4. proc_reapurbnonblock_compat() => async_getcompleted() gets the
>    lock; removes "as" from completed list and frees it.
> 5. usbdev_release() is called.  Frees "ps".
> 6. async_completed() finally continues running wake_up().  ...but
>    wake_up() has a pointer to the freed "ps".

That looks like a good analysis.

> The instrumentation that led me to believe this was based on adding
> some trace_printk() calls in a select few functions and then using
> kdb's "ftdump" at crash time.  The trace follows (NOTE: in the trace
> below I cheated a little bit and added a udelay(1000) in
> async_completed() after releasing the spinlock because I wanted it to
> trigger quicker):
> 
> <...>-2104   0d.h2 13759034us!: async_completed at start: as=ffffffc0cc638200
> mtpd-2055    3.... 13759356us : async_getcompleted before spin_lock_irqsave
> mtpd-2055    3d..1 13759362us : async_getcompleted after list_del_init: as=ffffffc0cc638200
> mtpd-2055    3.... 13759371us+: proc_reapurbnonblock_compat: free_async(ffffffc0cc638200)
> mtpd-2055    3.... 13759422us+: async_getcompleted before spin_lock_irqsave
> mtpd-2055    3.... 13759479us : usbdev_release at start: ps=ffffffc0cc042080
> mtpd-2055    3.... 13759487us : async_getcompleted before spin_lock_irqsave
> mtpd-2055    3.... 13759497us!: usbdev_release after kfree(ps): ps=ffffffc0cc042080
> <...>-2104   0d.h2 13760294us : async_completed before wake_up(): as=ffffffc0cc638200
> 
> To fix this problem we can just move the wake_up() under the ps->lock.
> There should be no issues there that I'm aware of.
> 
> Signed-off-by: Douglas Anderson <dianders@...omium.org>
> ---
> NOTE: All testing of this patch was done on the Chromium OS kernel
> version 4.4.  The machine tested only barely boots upstream Linux and
> it seemed unlikely I'd be able to reproduce the exact failure there,
> so I didn't try.  The issue seems to clearly apply to upstream based
> on code analysis, though.
> 
>  drivers/usb/core/devio.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
> index ebe27595c4af..0ff0feddfd1f 100644
> --- a/drivers/usb/core/devio.c
> +++ b/drivers/usb/core/devio.c
> @@ -623,6 +623,8 @@ static void async_completed(struct urb *urb)
>  	if (as->status < 0 && as->bulk_addr && as->status != -ECONNRESET &&
>  			as->status != -ENOENT)
>  		cancel_bulk_urbs(ps, as->bulk_addr);
> +
> +	wake_up(&ps->wait);
>  	spin_unlock(&ps->lock);
>  
>  	if (signr) {
> @@ -630,8 +632,6 @@ static void async_completed(struct urb *urb)
>  		put_pid(pid);
>  		put_cred(cred);
>  	}
> -
> -	wake_up(&ps->wait);
>  }
>  
>  static void destroy_async(struct usb_dev_state *ps, struct list_head *list)

Acked-by: Alan Stern <stern@...land.harvard.edu>