lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZztOKRQ1ap5weI9E@lzaremba-mobl.ger.corp.intel.com>
Date: Mon, 18 Nov 2024 15:24:41 +0100
From: Larysa Zaremba <larysa.zaremba@...el.com>
To: Alasdair McWilliam <alasdair.mcwilliam@...look.com>
CC: Thorsten Leemhuis <linux@...mhuis.info>, Maciej Fijalkowski
	<maciej.fijalkowski@...el.com>, Magnus Karlsson <magnus.karlsson@...il.com>,
	"xdp-newbies@...r.kernel.org" <xdp-newbies@...r.kernel.org>, "Linux kernel
 regressions list" <regressions@...ts.linux.dev>, Jacob Keller
	<jacob.e.keller@...el.com>, netdev <netdev@...r.kernel.org>
Subject: Re: ICE + XSK ZC - page faults on 6.1 LTS when process exits?

On Mon, Nov 04, 2024 at 12:18:07PM +0000, Alasdair McWilliam wrote:
> On 04/11/2024 07:11, Larysa Zaremba wrote:
> 
> >> It's been a minute since I've looked at this due to other commitments
> >> but accidentally bumped into the fault again when testing the latest 6.6
> >> LTS for a new feature of our software. (I forgot to revert the commit
> >> for "ice: remove af_xdp_zc_qps bitmap" in our build system.)
> >>
> >> This led me to wonder about the current version, and can trigger the
> >> same crash on 6.11.5 [3].
> >>
> >> Reverting "ice: remove af_xdp_zc_qps bitmap" [1] in the current mainline
> >> is a little more complicated as commit ebc33a3f8d0a ("ice: improve
> >> updating ice_{t,r}x_ring::xsk_pool") also changes things a little so the
> >> reversion doesn't work cleanly.
> >>
> >> I have tweaked everything a little the below patch [2] applies cleanly
> >> to 6.11.5 and 6.12-rc5 and seems to fix the fault.
> >>
> >> Thought I'd bubble this up as it's definitely still an issue in the
> >> mainline kernel as of now.
> >>
> >> Thanks
> >> Alasdair
> >>
> > 
> > Hello,
> > Could you please share the reproduction steps? I will look into this.
> 
> Hello,
> 
> I should probably have provided better steps to reproduce - apologies.
> 
> Our stack uses AF_XDP in zero copy mode with shared UMEM between XSK
> sockets.

Thanks! Just letting you and anyone interested know that I was able to reliably 
reproduce the issue and have found the root cause. Hopefully, will be able to 
send the exact fix soon.

> 
> To isolate other bugs in the past we've used a modified xdpsock app
> based on code previously in kernel samples. The original sample has
> since been taken out the kernel repo, but we maintained the modified
> version in our public repos here [1].
> 
> There's lots in the readme but suffice to say if you run the build.sh
> with bash, it will compile xdpsock_multi user-space app and accompanying
> xdpsock_multi.bpf eBPF app. You'll also need to necessary dependencies
> libxdp/libbpf et al.
> 
> I can reproduce the issue with this app using 8 channels. It can fault
> in two ways (step C or D) below.
> 
> Terminal 1:
> 
> A# ethtool -L <nic> combined 8
> B# ./xdpsock_multi --l2fwd --interface ice1_1 --zero-copy --channels 8
> 
> Terminal 2:
> 
> C# kill -9 $(pidof xdpsock_multi)
> D# ip link set dev <nic> xdp off
> 
> Sometimes the act of killing the process (step C) causes a kernel crash [2].
> 
> Other times, it may survive, leaving an orphaned XDP program attached to
> the NIC. Unloading this manually (step D) causes a kernel crash [3].
> 
> Stack traces are actually different so hence I've provided both.
> 
> Affects:
> 6.1.x
> 6.6.x
> 6.11.x
> 
> Hardware is E810-CQDA2
> Firmware is 3.20 0x8000d83e 1.3146.0
> 
> Let me know if you need anything further.
> 
> Thanks!
> Alasdair
> 
> 
> [1] https://github.com/OpenSource-THG/xdpsock-sample
> 
> [2] Kernel crash triggered by step C
> 
> [  220.921136] BUG: unable to handle page fault for address:
> ffffa3eee1637f14
> [  220.921175] #PF: supervisor write access in kernel mode
> [  220.921196] #PF: error_code(0x0002) - not-present page
> [  220.921217] PGD 100000067 P4D 100000067 PUD 100238067 PMD 0
> [  220.921244] Oops: Oops: 0002 [#1] PREEMPT SMP PTI
> [  220.921267] CPU: 5 UID: 0 PID: 0 Comm: swapper/5 Kdump: loaded
> Tainted: G            E
> 6.11.5-1.thg.836e8867d7.241031.135507.el9.x86_64 #1
> [  220.921315] Tainted: [E]=UNSIGNED_MODULE
> [  220.921331] Hardware name: Supermicro SYS-1028R-TDW/X10DDW-i, BIOS
> 3.2 12/16/2019
> [  220.921357] RIP: 0010:ice_clean_rx_irq_zc+0xde/0x7d0 [ice]
> [  220.921489] Code: 0f 84 d0 01 00 00 44 3b 7c 24 08 0f 84 a1 02 00 00
> 48 8b 53 38 41 0f b7 4d 04 4c 8b 24 c2 89 c8 81 e1 ff 3f 00 00 66 25 ff
> 3f <41> c7 44 24 34 00 00 00 00 49 8b 74 24 18 48 8d 96 00 01 00 00 49
> [  220.921518] RSP: 0018:ffffa3eec64d0d88 EFLAGS: 00010206
> [  220.921529] RAX: 000000000000014d RBX: ffff89bbc2aa2a00 RCX:
> 000000000000014d
> [  220.921542] RDX: ffff89b408830000 RSI: 0000000000000040 RDI:
> ffff89bbc2aa2a00
> [  220.921554] RBP: 0000000000000000 R08: 0000000000000000 R09:
> ffff89b407655000
> [  220.921566] R10: 0000ffffffffffff R11: ffffa3eec64d0ff8 R12:
> ffffa3eee1637ee0
> [  220.921578] R13: ffff89b414710000 R14: ffff89bbc7919500 R15:
> 0000000000000000
> [  220.921591] FS:  0000000000000000(0000) GS:ffff89bb5fc80000(0000)
> knlGS:0000000000000000
> [  220.921605] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  220.921616] CR2: ffffa3eee1637f14 CR3: 00000001d9820006 CR4:
> 00000000001706f0
> [  220.921628] Call Trace:
> [  220.921639]  <IRQ>
> [  220.921647]  ? __die+0x20/0x70
> [  220.921663]  ? page_fault_oops+0x80/0x150
> [  220.921676]  ? exc_page_fault+0xcd/0x170
> [  220.921690]  ? asm_exc_page_fault+0x22/0x30
> [  220.921707]  ? ice_clean_rx_irq_zc+0xde/0x7d0 [ice]
> [  220.921759]  ? ice_clean_tx_irq+0x166/0x3c0 [ice]
> [  220.921808]  ice_napi_poll+0xb2/0x2a0 [ice]
> [  220.921858]  __napi_poll+0x2c/0x1b0
> [  220.921870]  net_rx_action+0x30d/0x3e0
> [  220.921881]  ? __raise_softirq_irqoff+0x18/0x80
> [  220.921896]  ? __napi_schedule+0xa6/0xc0
> [  220.921907]  ? ice_msix_clean_rings+0x4f/0x60 [ice]
> [  220.921959]  handle_softirqs+0xf0/0x2e0
> [  220.921972]  __irq_exit_rcu+0x80/0xe0
> [  220.921983]  common_interrupt+0xb7/0xd0
> [  220.921995]  </IRQ>
> [  220.922001]  <TASK>
> [  220.922008]  asm_common_interrupt+0x22/0x40
> [  220.922022] RIP: 0010:cpuidle_enter_state+0xc8/0x420
> [  220.922034] Code: 0e b6 3e ff e8 09 ee ff ff 8b 55 04 49 89 c5 0f 1f
> 44 00 00 31 ff e8 97 69 3d ff 45 84 ff 0f 85 38 02 00 00 fb 0f 1f 44 00
> 00 <45> 85 f6 0f 88 6a 01 00 00 49 63 d6 4c 2b 2c 24 48 8d 04 52 48 8d
> [  220.922061] RSP: 0018:ffffa3eec4377e78 EFLAGS: 00000246
> [  220.922072] RAX: ffff89bb5fc80000 RBX: 0000000000000004 RCX:
> 000000000000001f
> [  220.922085] RDX: 0000000000000005 RSI: ffffffffb255a8a3 RDI:
> ffffffffb2533173
> [  220.922098] RBP: ffff89bb5fcc0cc8 R08: 000000336fecb8ce R09:
> 0000000000000018
> [  220.922109] R10: 000000000000453f R11: ffff89bb5fcb47e4 R12:
> ffffffffb32bdce0
> [  220.922121] R13: 000000336fecb8ce R14: 0000000000000004 R15:
> 0000000000000000
> [  220.922135]  ? cpuidle_enter_state+0xb9/0x420
> [  220.922147]  cpuidle_enter+0x29/0x40
> [  220.922161]  cpuidle_idle_call+0x100/0x170
> [  220.922175]  do_idle+0x7d/0xd0
> [  220.922185]  cpu_startup_entry+0x25/0x30
> [  220.922195]  start_secondary+0x116/0x140
> [  220.922206]  common_startup_64+0x13e/0x141
> [  220.922222]  </TASK>
> [  220.922229] Modules linked in: bonding(E) tls(E) nft_fib_inet(E)
> nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E)
> nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E)
> nft_chain_nat(E) nf_nat(E) nf_conntr
> ack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) ip_set(E)
> nf_tables(E) libcrc32c(E) nfnetlink(E) vfat(E) fat(E) intel_rapl_msr(E)
> intel_rapl_common(E) sb_edac(E) x86_pkg_temp_thermal(E)
> intel_powerclamp(E) coretemp(E) kvm
> _intel(E) ipmi_ssif(E) kvm(E) iTCO_wdt(E) intel_pmc_bxt(E)
> iTCO_vendor_support(E) rapl(E) intel_cstate(E) ast(E) intel_uncore(E)
> drm_shmem_helper(E) pcspkr(E) drm_kms_helper(E) i2c_i801(E) mei_me(E)
> i2c_mux(E) mxm_wmi(E) mei(E
> ) i2c_smbus(E) lpc_ich(E) ioatdma(E) acpi_power_meter(E) ipmi_si(E)
> acpi_ipmi(E) ipmi_devintf(E) ipmi_msghandler(E) joydev(E) acpi_pad(E)
> drm(E) fuse(E) ext4(E) mbcache(E) jbd2(E) sd_mod(E) sg(E) ice(E) ahci(E)
> crct10dif_pclmu
> l(E) crc32_pclmul(E) libahci(E) crc32c_intel(E) polyval_clmulni(E)
> polyval_generic(E) igb(E) libata(E) ghash_clmulni_intel(E)
> [  220.922280]  i2c_algo_bit(E) dca(E) libie(E) wmi(E) dm_mirror(E)
> dm_region_hash(E) dm_log(E) dm_mod(E)
> [  220.922416] CR2: ffffa3eee1637f14
> 
> [3] Kernel crash triggered by step D
> 
> [  894.619896] BUG: unable to handle page fault for address:
> ffffb5818c2d7f14
> [  894.619921] #PF: supervisor read access in kernel mode
> [  894.619932] #PF: error_code(0x0000) - not-present page
> [  894.619942] PGD 100000067 P4D 100000067 PUD 100237067 PMD 0
> [  894.619957] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> [  894.619970] CPU: 5 UID: 0 PID: 2540 Comm: ip Kdump: loaded Tainted: G
>            E      6.11.5-1.thg.836e8867d7.241031.135507.el9.x86_64 #1
> [  894.619994] Tainted: [E]=UNSIGNED_MODULE
> [  894.620002] Hardware name: Supermicro SYS-1028R-TDW/X10DDW-i, BIOS
> 3.2 12/16/2019
> [  894.620014] RIP: 0010:ice_xsk_clean_rx_ring+0x37/0x110 [ice]
> [  894.620086] Code: 55 53 48 83 ec 08 44 0f b7 af a4 00 00 00 0f b7 af
> a2 00 00 00 66 41 39 ed 74 33 48 89 fb 48 8b 4b 38 41 0f b7 c5 4c 8b 34
> c1 <41> f6 46 34 01 75 30 4c 89 f7 41 83 c5 01 e8 f6 5c c6 da 31 c0 66
> [  894.620113] RSP: 0018:ffffb58189c376d8 EFLAGS: 00010293
> [  894.620124] RAX: 0000000000000000 RBX: ffff92f681f6b800 RCX:
> ffff9302f2860000
> [  894.620136] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> ffff92f681f6b800
> [  894.620148] RBP: 00000000000007ff R08: 000000000000081f R09:
> 0000000000000000
> [  894.620159] R10: ffff92f684dc0000 R11: 0000000000000020 R12:
> 0000000000000010
> [  894.620171] R13: 0000000000000000 R14: ffffb5818c2d7ee0 R15:
> ffff92f681fcd740
> [  894.620183] FS:  00007f7ee9e27740(0000) GS:ffff92fd9fc80000(0000)
> knlGS:0000000000000000
> [  894.620196] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  894.620206] CR2: ffffb5818c2d7f14 CR3: 000000010e25e003 CR4:
> 00000000001706f0
> [  894.620218] Call Trace:
> [  894.620228]  <TASK>
> [  894.620236]  ? __die+0x20/0x70
> [  894.620254]  ? page_fault_oops+0x80/0x150
> [  894.620268]  ? exc_page_fault+0xcd/0x170
> [  894.620283]  ? asm_exc_page_fault+0x22/0x30
> [  894.620298]  ? ice_xsk_clean_rx_ring+0x37/0x110 [ice]
> [  894.620350]  ice_clean_rx_ring+0x16e/0x190 [ice]
> [  894.620401]  ice_down+0x2f8/0x3c0 [ice]
> [  894.620443]  ice_xdp_setup_prog+0x193/0x460 [ice]
> [  894.620485]  ice_xdp+0x7a/0xb0 [ice]
> [  894.620527]  ? __pfx_ice_xdp+0x10/0x10 [ice]
> [  894.620567]  dev_xdp_install+0xc7/0x100
> [  894.620584]  dev_xdp_attach+0x205/0x5d0
> [  894.620597]  do_setlink+0x7d3/0xc20
> [  894.620611]  ? __nla_validate_parse+0x125/0x1d0
> [  894.620626]  __rtnl_newlink+0x4f7/0x630
> [  894.620639]  ? __kmalloc_cache_noprof+0x225/0x2b0
> [  894.620652]  rtnl_newlink+0x44/0x70
> [  894.620662]  rtnetlink_rcv_msg+0x15c/0x410
> [  894.620676]  ? __rmqueue_pcplist+0x5f/0x2c0
> [  894.620686]  ? __rmqueue_pcplist+0x5f/0x2c0
> [  894.620695]  ? avc_has_perm_noaudit+0x67/0xf0
> [  894.620708]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
> [  894.620721]  netlink_rcv_skb+0x57/0x100
> [  894.620735]  netlink_unicast+0x246/0x370
> [  894.620747]  netlink_sendmsg+0x1f6/0x430
> [  894.620758]  ____sys_sendmsg+0x3be/0x3f0
> [  894.620771]  ? import_iovec+0x16/0x20
> [  894.620783]  ? copy_msghdr_from_user+0x6d/0xa0
> [  894.620795]  ___sys_sendmsg+0x88/0xd0
> [  894.620807]  ? __mod_memcg_lruvec_state+0xce/0x1c0
> [  894.620822]  ? mod_objcg_state+0xc9/0x2f0
> [  894.620833]  __sys_sendmsg+0x59/0xa0
> [  894.620844]  ? syscall_trace_enter+0xfb/0x190
> [  894.620856]  do_syscall_64+0x60/0x180
> [  894.620867]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  894.620881] RIP: 0033:0x7f7ee9d0f917
> [  894.620891] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f
> 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f
> 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
> [  894.620920] RSP: 002b:00007ffd0b9a9e58 EFLAGS: 00000246 ORIG_RAX:
> 000000000000002e
> [  894.620935] RAX: ffffffffffffffda RBX: 000000006728b03b RCX:
> 00007f7ee9d0f917
> [  894.620948] RDX: 0000000000000000 RSI: 00007ffd0b9a9ec0 RDI:
> 0000000000000003
> [  894.620959] RBP: 0000000000000000 R08: 0000000000000001 R09:
> 0000000000000078
> [  894.620971] R10: 000000000000009b R11: 0000000000000246 R12:
> 0000000000000001
> [  894.620983] R13: 00007ffd0b9a9f70 R14: 0000000000000000 R15:
> 000055784e873040
> [  894.620997]  </TASK>
> [  894.621004] Modules linked in: bonding(E) tls(E) nft_fib_inet(E)
> nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E)
> nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E)
> nft_chain_nat(E) nf_nat(E) nf_conntr
> ack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) ip_set(E)
> nf_tables(E) libcrc32c(E) nfnetlink(E) vfat(E) fat(E) intel_rapl_msr(E)
> intel_rapl_common(E) sb_edac(E) x86_pkg_temp_thermal(E)
> intel_powerclamp(E) coretemp(E) kvm
> _intel(E) ipmi_ssif(E) iTCO_wdt(E) intel_pmc_bxt(E) kvm(E)
> iTCO_vendor_support(E) rapl(E) ast(E) mei_me(E) intel_cstate(E)
> intel_uncore(E) drm_shmem_helper(E) pcspkr(E) i2c_i801(E) i2c_mux(E)
> drm_kms_helper(E) mei(E) mxm_wmi(E
> ) lpc_ich(E) i2c_smbus(E) ioatdma(E) acpi_power_meter(E) ipmi_si(E)
> acpi_ipmi(E) ipmi_devintf(E) ipmi_msghandler(E) joydev(E) acpi_pad(E)
> drm(E) fuse(E) ext4(E) mbcache(E) jbd2(E) sd_mod(E) sg(E) ice(E) ahci(E)
> crct10dif_pclmu
> l(E) crc32_pclmul(E) crc32c_intel(E) libahci(E) polyval_clmulni(E)
> polyval_generic(E) igb(E) libata(E) ghash_clmulni_intel(E)
> [  894.621056]  i2c_algo_bit(E) dca(E) libie(E) wmi(E) dm_mirror(E)
> dm_region_hash(E) dm_log(E) dm_mod(E)
> [  894.621193] CR2: ffffb5818c2d7f14

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ