netdev - RE: [PATCH] hv_netvsc: fix schedule in RCU context

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BN6PR21MB0161CB582949DE7A122408F0CA1A0@BN6PR21MB0161.namprd21.prod.outlook.com>
Date:   Thu, 13 Sep 2018 15:33:56 +0000
From:   Haiyang Zhang <haiyangz@...rosoft.com>
To:     Stephen Hemminger <stephen@...workplumber.org>,
        KY Srinivasan <kys@...rosoft.com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Stephen Hemminger <sthemmin@...rosoft.com>
Subject: RE: [PATCH] hv_netvsc: fix schedule in RCU context



> -----Original Message-----
> From: Stephen Hemminger <stephen@...workplumber.org>
> Sent: Thursday, September 13, 2018 11:04 AM
> To: KY Srinivasan <kys@...rosoft.com>; Haiyang Zhang
> <haiyangz@...rosoft.com>
> Cc: netdev@...r.kernel.org; Stephen Hemminger <sthemmin@...rosoft.com>
> Subject: [PATCH] hv_netvsc: fix schedule in RCU context
> 
> When netvsc device is removed it can call reschedule in RCU context.
> This happens because canceling the subchannel setup work could (in theory)
> cause a reschedule when manipulating the timer.
> 
> To reproduce, run with lockdep enabled kernel and unbind
> a network device from hv_netvsc (via sysfs).
> 
> [  160.682011] WARNING: suspicious RCU usage
> [  160.707466] 4.19.0-rc3-uio+ #2 Not tainted
> [  160.709937] -----------------------------
> [  160.712352] ./include/linux/rcupdate.h:302 Illegal context switch in RCU
> read-side critical section!
> [  160.723691]
> [  160.723691] other info that might help us debug this:
> [  160.723691]
> [  160.730955]
> [  160.730955] rcu_scheduler_active = 2, debug_locks = 1
> [  160.762813] 5 locks held by rebind-eth.sh/1812:
> [  160.766851]  #0: 000000008befa37a (sb_writers#6){.+.+}, at:
> vfs_write+0x184/0x1b0
> [  160.773416]  #1: 00000000b097f236 (&of->mutex){+.+.}, at:
> kernfs_fop_write+0xe2/0x1a0
> [  160.783766]  #2: 0000000041ee6889 (kn->count#3){++++}, at:
> kernfs_fop_write+0xeb/0x1a0
> [  160.787465]  #3: 0000000056d92a74 (&dev->mutex){....}, at:
> device_release_driver_internal+0x39/0x250
> [  160.816987]  #4: 0000000030f6031e (rcu_read_lock){....}, at:
> netvsc_remove+0x1e/0x250 [hv_netvsc]
> [  160.828629]
> [  160.828629] stack backtrace:
> [  160.831966] CPU: 1 PID: 1812 Comm: rebind-eth.sh Not tainted 4.19.0-rc3-
> uio+ #2
> [  160.832952] Hardware name: Microsoft Corporation Virtual Machine/Virtual
> Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012
> [  160.832952] Call Trace:
> [  160.832952]  dump_stack+0x85/0xcb
> [  160.832952]  ___might_sleep+0x1a3/0x240
> [  160.832952]  __flush_work+0x57/0x2e0
> [  160.832952]  ? __mutex_lock+0x83/0x990
> [  160.832952]  ? __kernfs_remove+0x24f/0x2e0
> [  160.832952]  ? __kernfs_remove+0x1b2/0x2e0
> [  160.832952]  ? mark_held_locks+0x50/0x80
> [  160.832952]  ? get_work_pool+0x90/0x90
> [  160.832952]  __cancel_work_timer+0x13c/0x1e0
> [  160.832952]  ? netvsc_remove+0x1e/0x250 [hv_netvsc]
> [  160.832952]  ? __lock_is_held+0x55/0x90
> [  160.832952]  netvsc_remove+0x9a/0x250 [hv_netvsc]
> [  160.832952]  vmbus_remove+0x26/0x30
> [  160.832952]  device_release_driver_internal+0x18a/0x250
> [  160.832952]  unbind_store+0xb4/0x180
> [  160.832952]  kernfs_fop_write+0x113/0x1a0
> [  160.832952]  __vfs_write+0x36/0x1a0
> [  160.832952]  ? rcu_read_lock_sched_held+0x6b/0x80
> [  160.832952]  ? rcu_sync_lockdep_assert+0x2e/0x60
> [  160.832952]  ? __sb_start_write+0x141/0x1a0
> [  160.832952]  ? vfs_write+0x184/0x1b0
> [  160.832952]  vfs_write+0xbe/0x1b0
> [  160.832952]  ksys_write+0x55/0xc0
> [  160.832952]  do_syscall_64+0x60/0x1b0
> [  160.832952]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  160.832952] RIP: 0033:0x7fe48f4c8154
> 
> Resolve this by getting RTNL earlier. This is safe because the subchannel
> work queue does trylock on RTNL and will detect the race.
> 
> Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic")
> Signed-off-by: Stephen Hemminger <sthemmin@...rosoft.com>

Reviewed-by: Haiyang Zhang <haiyangz@...rosoft.com>

Thank you!