lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221108105343.vjczwdxcsxhfghk7@p1>
Date:   Tue, 8 Nov 2022 11:53:43 +0100
From:   Stefan Assmann <sassmann@...hat.com>
To:     Ivan Vecera <ivecera@...hat.com>
Cc:     netdev@...r.kernel.org, Jacob Keller <jacob.e.keller@...el.com>,
        Patryk Piotrowski <patryk.piotrowski@...el.com>,
        SlawomirX Laba <slawomirx.laba@...el.com>,
        Jesse Brandeburg <jesse.brandeburg@...el.com>,
        Tony Nguyen <anthony.l.nguyen@...el.com>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        "moderated list:INTEL ETHERNET DRIVERS" 
        <intel-wired-lan@...ts.osuosl.org>,
        open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net] iavf: Fix a crash during reset task

On 2022-11-08 10:35, Ivan Vecera wrote:
> Recent commit aa626da947e9 ("iavf: Detach device during reset task")
> removed netif_tx_stop_all_queues() with an assumption that Tx queues
> are already stopped by netif_device_detach() in the beginning of
> reset task. This assumption is incorrect because during reset
> task a potential link event can start Tx queues again.
> Revert this change to fix this issue.
> 
> Reproducer:
> 1. Run some Tx traffic (e.g. iperf3) over iavf interface
> 2. Switch MTU of this interface in a loop
> 
> [root@...t ~]# cat repro.sh
> #!/bin/sh
> 
> IF=enp2s0f0v0
> 
> iperf3 -c 192.168.0.1 -t 600 --logfile /dev/null &
> sleep 2
> 
> while :; do
>         for i in 1280 1500 2000 900 ; do
>                 ip link set $IF mtu $i
>                 sleep 2
>         done
> done

With this patch applied iavf doesn't crash anymore but after a few
cycles with the reproducer tx timeouts are observed.

[   47.551151] iavf 0000:00:09.0 eth0: NIC Link is Up Speed is 10 Gbps Full Duplex
[   54.035902] ------------[ cut here ]------------
[   54.037397] NETDEV WATCHDOG: eth0 (iavf): transmit queue 3 timed out
[   54.039264] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:526 dev_watchdog+0x20f/0x250
[   54.041524] Modules linked in: 8021q intel_rapl_msr intel_rapl_common kvm_intel kvm irqbypass rapl pcspkr drm ramoops reed_solomon crct10dif_pclmul crc32_pclmul crc32c_intel ata_generic pata_acpi ghash_clmulni_intel ata_piix aesni_intel crypto_simd iavf libata be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
[   54.049723] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.1.0-rc2+ #90
[   54.051049] Hardware name: Red Hat KVM, BIOS 1.15.0-2.module+el8.6.0+14757+c25ee005 04/01/2014
[   54.052898] RIP: 0010:dev_watchdog+0x20f/0x250
[   54.053907] Code: 00 e9 4d ff ff ff 48 89 df c6 05 92 24 96 01 01 e8 c6 f2 f8 ff 44 89 e9 48 89 de 48 c7 c7 28 7f f6 a0 48 89 c2 e8 6e 65 23 00 <0f> 0b e9 2f ff ff ff e8 25 06 2a 00 85 c0 74 b5 80 3d 74 1b 96 01
[   54.057282] RSP: 0018:ffffaf56c00e0e80 EFLAGS: 00010282
[   54.058164] RAX: 0000000000000000 RBX: ffff993ed95b8000 RCX: 0000000000000103
[   54.059345] RDX: 0000000000000103 RSI: 00000000000000f6 RDI: 00000000ffffffff
[   54.060473] RBP: ffff993ed95b8508 R08: 0000000000000000 R09: c0000000fff7ffff
[   54.061558] R10: 0000000000000001 R11: ffffaf56c00e0d18 R12: ffff993ed95b8420
[   54.062640] R13: 0000000000000003 R14: ffff993ed95b8508 R15: ffff993ef74a06c0
[   54.063681] FS:  0000000000000000(0000) GS:ffff993ef7480000(0000) knlGS:0000000000000000
[   54.064867] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   54.065654] CR2: 00007f42309e1280 CR3: 0000000107f6a003 CR4: 0000000000170ee0
[   54.066612] Call Trace:
[   54.066985]  <IRQ>
[   54.067265]  ? mq_change_real_num_tx+0xd0/0xd0
[   54.067844]  call_timer_fn+0xa1/0x2c0
[   54.068330]  ? mq_change_real_num_tx+0xd0/0xd0
[   54.068916]  run_timer_softirq+0x527/0x550
[   54.069447]  ? lock_is_held_type+0xd8/0x130
[   54.069998]  __do_softirq+0xc3/0x481
[   54.070469]  irq_exit_rcu+0xe4/0x120
[   54.070963]  sysvec_apic_timer_interrupt+0x9e/0xc0
[   54.071604]  </IRQ>
[   54.071909]  <TASK>
[   54.072223]  asm_sysvec_apic_timer_interrupt+0x16/0x20
[   54.072942] RIP: 0010:default_idle+0x10/0x20
[   54.073533] Code: 89 df 31 f6 5b 5d e9 ff 1c a5 ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 eb 07 0f 00 2d f2 2a 42 00 fb f4 <c3> 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 65

This only occurs when the device is detached and reattached during reset.

  Stefan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ