netdev - Re: [RFC 0/2] hv_netvsc shutdown redo

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <87h8r7s261.fsf@vitty.brq.redhat.com>
Date:   Sat, 27 Jan 2018 17:21:26 +0100
From:   Vitaly Kuznetsov <vkuznets@...hat.com>
To:     Stephen Hemminger <stephen@...workplumber.org>
Cc:     kys@...rosoft.com, haiyangz@...rosoft.com, mgamal@...hat.com,
        netdev@...r.kernel.org, Stephen Hemminger <sthemmin@...rosoft.com>
Subject: Re: [RFC 0/2] hv_netvsc shutdown redo

Stephen Hemminger <stephen@...workplumber.org> writes:

> These patches change how teardown of Hyper-V network devices
> is done. These are tested on WS2012 and WS2016.
>
> It moves the tx/rx shutdown into the rndis close handling,
> and that makes earlier gpadl changes unnecsssary.
>

Thank you Stephen,

I gave these a try and they didn't survive my 'death row' test on
WS2016: I run 3 things in parallel:

1) iperf to some external IP
2) while true; do ethtool -L ethX combined 6; ethtool -L ethX combined 8; done
3) while true; do ip link set dev ethX mtu 1400; ip link set dev ethX mtu 1450; done

I ended up with a hang:

[ 1226.710034] INFO: task ip:2357 blocked for more than 120 seconds.
[ 1226.712397]       Not tainted 4.15.0-rc9+ #321
[ 1226.714030] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1226.716724] ip              D    0  2357   1474 0x00000000
[ 1226.718698] Call Trace:
[ 1226.719588]  ? __schedule+0x1da/0x7b0
[ 1226.720910]  ? get_page_from_freelist+0x106d/0x15c0
[ 1226.722648]  schedule+0x28/0x80
[ 1226.723807]  schedule_preempt_disabled+0xa/0x10
[ 1226.725952]  __mutex_lock.isra.1+0x1a0/0x4e0
[ 1226.727915]  ? rtnetlink_rcv_msg+0x212/0x2d0
[ 1226.729849]  rtnetlink_rcv_msg+0x212/0x2d0
[ 1226.731611]  ? rtnl_calcit.isra.28+0x110/0x110
[ 1226.733824]  netlink_rcv_skb+0x4a/0x120
[ 1226.736916]  netlink_unicast+0x19d/0x250
[ 1226.738907]  netlink_sendmsg+0x2a5/0x3a0
[ 1226.740762]  sock_sendmsg+0x30/0x40
[ 1226.742552]  SYSC_sendto+0x10e/0x140
[ 1226.744310]  ? __do_page_fault+0x26d/0x4c0
[ 1226.746332]  entry_SYSCALL_64_fastpath+0x20/0x83
[ 1226.748730] RIP: 0033:0x7ff2cdc9aa7d
[ 1226.750776] RSP: 002b:00007ffd0a3455e8 EFLAGS: 00000246
[ 1349.590041] INFO: task kworker/3:6:1586 blocked for more than 120 seconds.
[ 1349.595358]       Not tainted 4.15.0-rc9+ #321
[ 1349.597335] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1349.600638] kworker/3:6     D    0  1586      2 0x80000000
[ 1349.603335] Workqueue: ipv6_addrconf addrconf_verify_work
[ 1349.605779] Call Trace:
[ 1349.607080]  ? __schedule+0x1da/0x7b0
[ 1349.608856]  ? update_load_avg+0x563/0x6d0
[ 1349.610834]  ? update_curr+0xb9/0x190
[ 1349.613050]  schedule+0x28/0x80
[ 1349.615290]  schedule_preempt_disabled+0xa/0x10
[ 1349.617306]  __mutex_lock.isra.1+0x1a0/0x4e0
[ 1349.619072]  ? addrconf_verify_work+0xa/0x20
[ 1349.621108]  addrconf_verify_work+0xa/0x20
[ 1349.623107]  process_one_work+0x188/0x380
[ 1349.625012]  worker_thread+0x2e/0x390
[ 1349.626976]  ? process_one_work+0x380/0x380
[ 1349.628925]  kthread+0x111/0x130
[ 1349.630498]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 1349.632786]  ? do_group_exit+0x3a/0xa0
[ 1349.634598]  ret_from_fork+0x35/0x40

....

(I'm not 100% sure this is a _new_ issue btw, it can happen that the
race was always there and it's just easier to trigger it now).

I'll try to do more testing next week.

Thanks,

-- 
  Vitaly