[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZztOKRQ1ap5weI9E@lzaremba-mobl.ger.corp.intel.com>
Date: Mon, 18 Nov 2024 15:24:41 +0100
From: Larysa Zaremba <larysa.zaremba@...el.com>
To: Alasdair McWilliam <alasdair.mcwilliam@...look.com>
CC: Thorsten Leemhuis <linux@...mhuis.info>, Maciej Fijalkowski
<maciej.fijalkowski@...el.com>, Magnus Karlsson <magnus.karlsson@...il.com>,
"xdp-newbies@...r.kernel.org" <xdp-newbies@...r.kernel.org>, "Linux kernel
regressions list" <regressions@...ts.linux.dev>, Jacob Keller
<jacob.e.keller@...el.com>, netdev <netdev@...r.kernel.org>
Subject: Re: ICE + XSK ZC - page faults on 6.1 LTS when process exits?
On Mon, Nov 04, 2024 at 12:18:07PM +0000, Alasdair McWilliam wrote:
> On 04/11/2024 07:11, Larysa Zaremba wrote:
>
> >> It's been a minute since I've looked at this due to other commitments
> >> but accidentally bumped into the fault again when testing the latest 6.6
> >> LTS for a new feature of our software. (I forgot to revert the commit
> >> for "ice: remove af_xdp_zc_qps bitmap" in our build system.)
> >>
> >> This led me to wonder about the current version, and can trigger the
> >> same crash on 6.11.5 [3].
> >>
> >> Reverting "ice: remove af_xdp_zc_qps bitmap" [1] in the current mainline
> >> is a little more complicated as commit ebc33a3f8d0a ("ice: improve
> >> updating ice_{t,r}x_ring::xsk_pool") also changes things a little so the
> >> reversion doesn't work cleanly.
> >>
> >> I have tweaked everything a little the below patch [2] applies cleanly
> >> to 6.11.5 and 6.12-rc5 and seems to fix the fault.
> >>
> >> Thought I'd bubble this up as it's definitely still an issue in the
> >> mainline kernel as of now.
> >>
> >> Thanks
> >> Alasdair
> >>
> >
> > Hello,
> > Could you please share the reproduction steps? I will look into this.
>
> Hello,
>
> I should probably have provided better steps to reproduce - apologies.
>
> Our stack uses AF_XDP in zero copy mode with shared UMEM between XSK
> sockets.
Thanks! Just letting you and anyone interested know that I was able to reliably
reproduce the issue and have found the root cause. Hopefully, will be able to
send the exact fix soon.
>
> To isolate other bugs in the past we've used a modified xdpsock app
> based on code previously in kernel samples. The original sample has
> since been taken out the kernel repo, but we maintained the modified
> version in our public repos here [1].
>
> There's lots in the readme but suffice to say if you run the build.sh
> with bash, it will compile xdpsock_multi user-space app and accompanying
> xdpsock_multi.bpf eBPF app. You'll also need to necessary dependencies
> libxdp/libbpf et al.
>
> I can reproduce the issue with this app using 8 channels. It can fault
> in two ways (step C or D) below.
>
> Terminal 1:
>
> A# ethtool -L <nic> combined 8
> B# ./xdpsock_multi --l2fwd --interface ice1_1 --zero-copy --channels 8
>
> Terminal 2:
>
> C# kill -9 $(pidof xdpsock_multi)
> D# ip link set dev <nic> xdp off
>
> Sometimes the act of killing the process (step C) causes a kernel crash [2].
>
> Other times, it may survive, leaving an orphaned XDP program attached to
> the NIC. Unloading this manually (step D) causes a kernel crash [3].
>
> Stack traces are actually different so hence I've provided both.
>
> Affects:
> 6.1.x
> 6.6.x
> 6.11.x
>
> Hardware is E810-CQDA2
> Firmware is 3.20 0x8000d83e 1.3146.0
>
> Let me know if you need anything further.
>
> Thanks!
> Alasdair
>
>
> [1] https://github.com/OpenSource-THG/xdpsock-sample
>
> [2] Kernel crash triggered by step C
>
> [ 220.921136] BUG: unable to handle page fault for address:
> ffffa3eee1637f14
> [ 220.921175] #PF: supervisor write access in kernel mode
> [ 220.921196] #PF: error_code(0x0002) - not-present page
> [ 220.921217] PGD 100000067 P4D 100000067 PUD 100238067 PMD 0
> [ 220.921244] Oops: Oops: 0002 [#1] PREEMPT SMP PTI
> [ 220.921267] CPU: 5 UID: 0 PID: 0 Comm: swapper/5 Kdump: loaded
> Tainted: G E
> 6.11.5-1.thg.836e8867d7.241031.135507.el9.x86_64 #1
> [ 220.921315] Tainted: [E]=UNSIGNED_MODULE
> [ 220.921331] Hardware name: Supermicro SYS-1028R-TDW/X10DDW-i, BIOS
> 3.2 12/16/2019
> [ 220.921357] RIP: 0010:ice_clean_rx_irq_zc+0xde/0x7d0 [ice]
> [ 220.921489] Code: 0f 84 d0 01 00 00 44 3b 7c 24 08 0f 84 a1 02 00 00
> 48 8b 53 38 41 0f b7 4d 04 4c 8b 24 c2 89 c8 81 e1 ff 3f 00 00 66 25 ff
> 3f <41> c7 44 24 34 00 00 00 00 49 8b 74 24 18 48 8d 96 00 01 00 00 49
> [ 220.921518] RSP: 0018:ffffa3eec64d0d88 EFLAGS: 00010206
> [ 220.921529] RAX: 000000000000014d RBX: ffff89bbc2aa2a00 RCX:
> 000000000000014d
> [ 220.921542] RDX: ffff89b408830000 RSI: 0000000000000040 RDI:
> ffff89bbc2aa2a00
> [ 220.921554] RBP: 0000000000000000 R08: 0000000000000000 R09:
> ffff89b407655000
> [ 220.921566] R10: 0000ffffffffffff R11: ffffa3eec64d0ff8 R12:
> ffffa3eee1637ee0
> [ 220.921578] R13: ffff89b414710000 R14: ffff89bbc7919500 R15:
> 0000000000000000
> [ 220.921591] FS: 0000000000000000(0000) GS:ffff89bb5fc80000(0000)
> knlGS:0000000000000000
> [ 220.921605] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 220.921616] CR2: ffffa3eee1637f14 CR3: 00000001d9820006 CR4:
> 00000000001706f0
> [ 220.921628] Call Trace:
> [ 220.921639] <IRQ>
> [ 220.921647] ? __die+0x20/0x70
> [ 220.921663] ? page_fault_oops+0x80/0x150
> [ 220.921676] ? exc_page_fault+0xcd/0x170
> [ 220.921690] ? asm_exc_page_fault+0x22/0x30
> [ 220.921707] ? ice_clean_rx_irq_zc+0xde/0x7d0 [ice]
> [ 220.921759] ? ice_clean_tx_irq+0x166/0x3c0 [ice]
> [ 220.921808] ice_napi_poll+0xb2/0x2a0 [ice]
> [ 220.921858] __napi_poll+0x2c/0x1b0
> [ 220.921870] net_rx_action+0x30d/0x3e0
> [ 220.921881] ? __raise_softirq_irqoff+0x18/0x80
> [ 220.921896] ? __napi_schedule+0xa6/0xc0
> [ 220.921907] ? ice_msix_clean_rings+0x4f/0x60 [ice]
> [ 220.921959] handle_softirqs+0xf0/0x2e0
> [ 220.921972] __irq_exit_rcu+0x80/0xe0
> [ 220.921983] common_interrupt+0xb7/0xd0
> [ 220.921995] </IRQ>
> [ 220.922001] <TASK>
> [ 220.922008] asm_common_interrupt+0x22/0x40
> [ 220.922022] RIP: 0010:cpuidle_enter_state+0xc8/0x420
> [ 220.922034] Code: 0e b6 3e ff e8 09 ee ff ff 8b 55 04 49 89 c5 0f 1f
> 44 00 00 31 ff e8 97 69 3d ff 45 84 ff 0f 85 38 02 00 00 fb 0f 1f 44 00
> 00 <45> 85 f6 0f 88 6a 01 00 00 49 63 d6 4c 2b 2c 24 48 8d 04 52 48 8d
> [ 220.922061] RSP: 0018:ffffa3eec4377e78 EFLAGS: 00000246
> [ 220.922072] RAX: ffff89bb5fc80000 RBX: 0000000000000004 RCX:
> 000000000000001f
> [ 220.922085] RDX: 0000000000000005 RSI: ffffffffb255a8a3 RDI:
> ffffffffb2533173
> [ 220.922098] RBP: ffff89bb5fcc0cc8 R08: 000000336fecb8ce R09:
> 0000000000000018
> [ 220.922109] R10: 000000000000453f R11: ffff89bb5fcb47e4 R12:
> ffffffffb32bdce0
> [ 220.922121] R13: 000000336fecb8ce R14: 0000000000000004 R15:
> 0000000000000000
> [ 220.922135] ? cpuidle_enter_state+0xb9/0x420
> [ 220.922147] cpuidle_enter+0x29/0x40
> [ 220.922161] cpuidle_idle_call+0x100/0x170
> [ 220.922175] do_idle+0x7d/0xd0
> [ 220.922185] cpu_startup_entry+0x25/0x30
> [ 220.922195] start_secondary+0x116/0x140
> [ 220.922206] common_startup_64+0x13e/0x141
> [ 220.922222] </TASK>
> [ 220.922229] Modules linked in: bonding(E) tls(E) nft_fib_inet(E)
> nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E)
> nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E)
> nft_chain_nat(E) nf_nat(E) nf_conntr
> ack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) ip_set(E)
> nf_tables(E) libcrc32c(E) nfnetlink(E) vfat(E) fat(E) intel_rapl_msr(E)
> intel_rapl_common(E) sb_edac(E) x86_pkg_temp_thermal(E)
> intel_powerclamp(E) coretemp(E) kvm
> _intel(E) ipmi_ssif(E) kvm(E) iTCO_wdt(E) intel_pmc_bxt(E)
> iTCO_vendor_support(E) rapl(E) intel_cstate(E) ast(E) intel_uncore(E)
> drm_shmem_helper(E) pcspkr(E) drm_kms_helper(E) i2c_i801(E) mei_me(E)
> i2c_mux(E) mxm_wmi(E) mei(E
> ) i2c_smbus(E) lpc_ich(E) ioatdma(E) acpi_power_meter(E) ipmi_si(E)
> acpi_ipmi(E) ipmi_devintf(E) ipmi_msghandler(E) joydev(E) acpi_pad(E)
> drm(E) fuse(E) ext4(E) mbcache(E) jbd2(E) sd_mod(E) sg(E) ice(E) ahci(E)
> crct10dif_pclmu
> l(E) crc32_pclmul(E) libahci(E) crc32c_intel(E) polyval_clmulni(E)
> polyval_generic(E) igb(E) libata(E) ghash_clmulni_intel(E)
> [ 220.922280] i2c_algo_bit(E) dca(E) libie(E) wmi(E) dm_mirror(E)
> dm_region_hash(E) dm_log(E) dm_mod(E)
> [ 220.922416] CR2: ffffa3eee1637f14
>
> [3] Kernel crash triggered by step D
>
> [ 894.619896] BUG: unable to handle page fault for address:
> ffffb5818c2d7f14
> [ 894.619921] #PF: supervisor read access in kernel mode
> [ 894.619932] #PF: error_code(0x0000) - not-present page
> [ 894.619942] PGD 100000067 P4D 100000067 PUD 100237067 PMD 0
> [ 894.619957] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> [ 894.619970] CPU: 5 UID: 0 PID: 2540 Comm: ip Kdump: loaded Tainted: G
> E 6.11.5-1.thg.836e8867d7.241031.135507.el9.x86_64 #1
> [ 894.619994] Tainted: [E]=UNSIGNED_MODULE
> [ 894.620002] Hardware name: Supermicro SYS-1028R-TDW/X10DDW-i, BIOS
> 3.2 12/16/2019
> [ 894.620014] RIP: 0010:ice_xsk_clean_rx_ring+0x37/0x110 [ice]
> [ 894.620086] Code: 55 53 48 83 ec 08 44 0f b7 af a4 00 00 00 0f b7 af
> a2 00 00 00 66 41 39 ed 74 33 48 89 fb 48 8b 4b 38 41 0f b7 c5 4c 8b 34
> c1 <41> f6 46 34 01 75 30 4c 89 f7 41 83 c5 01 e8 f6 5c c6 da 31 c0 66
> [ 894.620113] RSP: 0018:ffffb58189c376d8 EFLAGS: 00010293
> [ 894.620124] RAX: 0000000000000000 RBX: ffff92f681f6b800 RCX:
> ffff9302f2860000
> [ 894.620136] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> ffff92f681f6b800
> [ 894.620148] RBP: 00000000000007ff R08: 000000000000081f R09:
> 0000000000000000
> [ 894.620159] R10: ffff92f684dc0000 R11: 0000000000000020 R12:
> 0000000000000010
> [ 894.620171] R13: 0000000000000000 R14: ffffb5818c2d7ee0 R15:
> ffff92f681fcd740
> [ 894.620183] FS: 00007f7ee9e27740(0000) GS:ffff92fd9fc80000(0000)
> knlGS:0000000000000000
> [ 894.620196] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 894.620206] CR2: ffffb5818c2d7f14 CR3: 000000010e25e003 CR4:
> 00000000001706f0
> [ 894.620218] Call Trace:
> [ 894.620228] <TASK>
> [ 894.620236] ? __die+0x20/0x70
> [ 894.620254] ? page_fault_oops+0x80/0x150
> [ 894.620268] ? exc_page_fault+0xcd/0x170
> [ 894.620283] ? asm_exc_page_fault+0x22/0x30
> [ 894.620298] ? ice_xsk_clean_rx_ring+0x37/0x110 [ice]
> [ 894.620350] ice_clean_rx_ring+0x16e/0x190 [ice]
> [ 894.620401] ice_down+0x2f8/0x3c0 [ice]
> [ 894.620443] ice_xdp_setup_prog+0x193/0x460 [ice]
> [ 894.620485] ice_xdp+0x7a/0xb0 [ice]
> [ 894.620527] ? __pfx_ice_xdp+0x10/0x10 [ice]
> [ 894.620567] dev_xdp_install+0xc7/0x100
> [ 894.620584] dev_xdp_attach+0x205/0x5d0
> [ 894.620597] do_setlink+0x7d3/0xc20
> [ 894.620611] ? __nla_validate_parse+0x125/0x1d0
> [ 894.620626] __rtnl_newlink+0x4f7/0x630
> [ 894.620639] ? __kmalloc_cache_noprof+0x225/0x2b0
> [ 894.620652] rtnl_newlink+0x44/0x70
> [ 894.620662] rtnetlink_rcv_msg+0x15c/0x410
> [ 894.620676] ? __rmqueue_pcplist+0x5f/0x2c0
> [ 894.620686] ? __rmqueue_pcplist+0x5f/0x2c0
> [ 894.620695] ? avc_has_perm_noaudit+0x67/0xf0
> [ 894.620708] ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> [ 894.620721] netlink_rcv_skb+0x57/0x100
> [ 894.620735] netlink_unicast+0x246/0x370
> [ 894.620747] netlink_sendmsg+0x1f6/0x430
> [ 894.620758] ____sys_sendmsg+0x3be/0x3f0
> [ 894.620771] ? import_iovec+0x16/0x20
> [ 894.620783] ? copy_msghdr_from_user+0x6d/0xa0
> [ 894.620795] ___sys_sendmsg+0x88/0xd0
> [ 894.620807] ? __mod_memcg_lruvec_state+0xce/0x1c0
> [ 894.620822] ? mod_objcg_state+0xc9/0x2f0
> [ 894.620833] __sys_sendmsg+0x59/0xa0
> [ 894.620844] ? syscall_trace_enter+0xfb/0x190
> [ 894.620856] do_syscall_64+0x60/0x180
> [ 894.620867] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 894.620881] RIP: 0033:0x7f7ee9d0f917
> [ 894.620891] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f
> 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f
> 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
> [ 894.620920] RSP: 002b:00007ffd0b9a9e58 EFLAGS: 00000246 ORIG_RAX:
> 000000000000002e
> [ 894.620935] RAX: ffffffffffffffda RBX: 000000006728b03b RCX:
> 00007f7ee9d0f917
> [ 894.620948] RDX: 0000000000000000 RSI: 00007ffd0b9a9ec0 RDI:
> 0000000000000003
> [ 894.620959] RBP: 0000000000000000 R08: 0000000000000001 R09:
> 0000000000000078
> [ 894.620971] R10: 000000000000009b R11: 0000000000000246 R12:
> 0000000000000001
> [ 894.620983] R13: 00007ffd0b9a9f70 R14: 0000000000000000 R15:
> 000055784e873040
> [ 894.620997] </TASK>
> [ 894.621004] Modules linked in: bonding(E) tls(E) nft_fib_inet(E)
> nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E)
> nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E)
> nft_chain_nat(E) nf_nat(E) nf_conntr
> ack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) ip_set(E)
> nf_tables(E) libcrc32c(E) nfnetlink(E) vfat(E) fat(E) intel_rapl_msr(E)
> intel_rapl_common(E) sb_edac(E) x86_pkg_temp_thermal(E)
> intel_powerclamp(E) coretemp(E) kvm
> _intel(E) ipmi_ssif(E) iTCO_wdt(E) intel_pmc_bxt(E) kvm(E)
> iTCO_vendor_support(E) rapl(E) ast(E) mei_me(E) intel_cstate(E)
> intel_uncore(E) drm_shmem_helper(E) pcspkr(E) i2c_i801(E) i2c_mux(E)
> drm_kms_helper(E) mei(E) mxm_wmi(E
> ) lpc_ich(E) i2c_smbus(E) ioatdma(E) acpi_power_meter(E) ipmi_si(E)
> acpi_ipmi(E) ipmi_devintf(E) ipmi_msghandler(E) joydev(E) acpi_pad(E)
> drm(E) fuse(E) ext4(E) mbcache(E) jbd2(E) sd_mod(E) sg(E) ice(E) ahci(E)
> crct10dif_pclmu
> l(E) crc32_pclmul(E) crc32c_intel(E) libahci(E) polyval_clmulni(E)
> polyval_generic(E) igb(E) libata(E) ghash_clmulni_intel(E)
> [ 894.621056] i2c_algo_bit(E) dca(E) libie(E) wmi(E) dm_mirror(E)
> dm_region_hash(E) dm_log(E) dm_mod(E)
> [ 894.621193] CR2: ffffb5818c2d7f14
Powered by blists - more mailing lists