[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87h8r7s261.fsf@vitty.brq.redhat.com>
Date: Sat, 27 Jan 2018 17:21:26 +0100
From: Vitaly Kuznetsov <vkuznets@...hat.com>
To: Stephen Hemminger <stephen@...workplumber.org>
Cc: kys@...rosoft.com, haiyangz@...rosoft.com, mgamal@...hat.com,
netdev@...r.kernel.org, Stephen Hemminger <sthemmin@...rosoft.com>
Subject: Re: [RFC 0/2] hv_netvsc shutdown redo
Stephen Hemminger <stephen@...workplumber.org> writes:
> These patches change how teardown of Hyper-V network devices
> is done. These are tested on WS2012 and WS2016.
>
> It moves the tx/rx shutdown into the rndis close handling,
> and that makes earlier gpadl changes unnecsssary.
>
Thank you Stephen,
I gave these a try and they didn't survive my 'death row' test on
WS2016: I run 3 things in parallel:
1) iperf to some external IP
2) while true; do ethtool -L ethX combined 6; ethtool -L ethX combined 8; done
3) while true; do ip link set dev ethX mtu 1400; ip link set dev ethX mtu 1450; done
I ended up with a hang:
[ 1226.710034] INFO: task ip:2357 blocked for more than 120 seconds.
[ 1226.712397] Not tainted 4.15.0-rc9+ #321
[ 1226.714030] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1226.716724] ip D 0 2357 1474 0x00000000
[ 1226.718698] Call Trace:
[ 1226.719588] ? __schedule+0x1da/0x7b0
[ 1226.720910] ? get_page_from_freelist+0x106d/0x15c0
[ 1226.722648] schedule+0x28/0x80
[ 1226.723807] schedule_preempt_disabled+0xa/0x10
[ 1226.725952] __mutex_lock.isra.1+0x1a0/0x4e0
[ 1226.727915] ? rtnetlink_rcv_msg+0x212/0x2d0
[ 1226.729849] rtnetlink_rcv_msg+0x212/0x2d0
[ 1226.731611] ? rtnl_calcit.isra.28+0x110/0x110
[ 1226.733824] netlink_rcv_skb+0x4a/0x120
[ 1226.736916] netlink_unicast+0x19d/0x250
[ 1226.738907] netlink_sendmsg+0x2a5/0x3a0
[ 1226.740762] sock_sendmsg+0x30/0x40
[ 1226.742552] SYSC_sendto+0x10e/0x140
[ 1226.744310] ? __do_page_fault+0x26d/0x4c0
[ 1226.746332] entry_SYSCALL_64_fastpath+0x20/0x83
[ 1226.748730] RIP: 0033:0x7ff2cdc9aa7d
[ 1226.750776] RSP: 002b:00007ffd0a3455e8 EFLAGS: 00000246
[ 1349.590041] INFO: task kworker/3:6:1586 blocked for more than 120 seconds.
[ 1349.595358] Not tainted 4.15.0-rc9+ #321
[ 1349.597335] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1349.600638] kworker/3:6 D 0 1586 2 0x80000000
[ 1349.603335] Workqueue: ipv6_addrconf addrconf_verify_work
[ 1349.605779] Call Trace:
[ 1349.607080] ? __schedule+0x1da/0x7b0
[ 1349.608856] ? update_load_avg+0x563/0x6d0
[ 1349.610834] ? update_curr+0xb9/0x190
[ 1349.613050] schedule+0x28/0x80
[ 1349.615290] schedule_preempt_disabled+0xa/0x10
[ 1349.617306] __mutex_lock.isra.1+0x1a0/0x4e0
[ 1349.619072] ? addrconf_verify_work+0xa/0x20
[ 1349.621108] addrconf_verify_work+0xa/0x20
[ 1349.623107] process_one_work+0x188/0x380
[ 1349.625012] worker_thread+0x2e/0x390
[ 1349.626976] ? process_one_work+0x380/0x380
[ 1349.628925] kthread+0x111/0x130
[ 1349.630498] ? kthread_create_worker_on_cpu+0x70/0x70
[ 1349.632786] ? do_group_exit+0x3a/0xa0
[ 1349.634598] ret_from_fork+0x35/0x40
....
(I'm not 100% sure this is a _new_ issue btw, it can happen that the
race was always there and it's just easier to trigger it now).
I'll try to do more testing next week.
Thanks,
--
Vitaly
Powered by blists - more mailing lists