[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<PA4P194MB10059D2195A387ACD32CA27E86562@PA4P194MB1005.EURP194.PROD.OUTLOOK.COM>
Date: Fri, 1 Nov 2024 12:37:41 +0000
From: Alasdair McWilliam <alasdair.mcwilliam@...look.com>
To: Thorsten Leemhuis <linux@...mhuis.info>,
Maciej Fijalkowski <maciej.fijalkowski@...el.com>
Cc: Magnus Karlsson <magnus.karlsson@...il.com>,
"xdp-newbies@...r.kernel.org" <xdp-newbies@...r.kernel.org>,
Linux kernel regressions list <regressions@...ts.linux.dev>,
Larysa Zaremba <larysa.zaremba@...el.com>,
Jacob Keller <jacob.e.keller@...el.com>, netdev <netdev@...r.kernel.org>
Subject: Re: ICE + XSK ZC - page faults on 6.1 LTS when process exits?
Good day,
On 27/09/2024 12:32, Thorsten Leemhuis wrote:
> [CCing a few people that were involved in mainlining the culprit
> (8adbf5a42341f6e ("ice: remove af_xdp_zc_qps bitmap") in case they want
> to provide advice]
>
> On 13.09.24 17:54, Alasdair McWilliam wrote:
>> On 05/09/2024 13:50, Alasdair McWilliam wrote:
>>
>>>> We've been working recently on somewhat related issues and it looks like
>>>> not every commit from [0] has been backported.
>>>>
>>>> $ git log --oneline v6.1.103..v6.1.104 drivers/net/ethernet/intel/ice/
>>>> 5a80b682e3e1 ice: add missing WRITE_ONCE when clearing ice_rx_ring::xdp_prog
>>>> 8782f0fcb19d ice: replace synchronize_rcu with synchronize_net
>>>> 15115033f056 ice: don't busy wait for Rx queue disable in ice_qp_dis()
>>>> 3dbc58774e58 ice: respect netif readiness in AF_XDP ZC related ndo's
>>>>
>>>> can you apply the rest of it on top of 6.1.107 and see the result?
>>
>>> The first one I've attempted doesn't apply cleanly to 6.1.107.
>>>
>>> Eg: d59227179949 ("ice: modify error handling when setting XSK pool in
>>> ndo_bpf"). The above looks to have been based on code from around 6.8 or
>>> 6.9 where the makeup of routines like ice_qp_ena() has changed. Looks
>>> like this happened around a292ba981324 ("ice: make ice_vsi_cfg_txq()
>>> static").
>>>
>>> Should I try and apply a292ba981324 as well?
>>
>> I just wondered if there was perhaps any further feedback on the above.
>
> Hmmm. No reply afaics -- but that's how it is sometimes with
> stable/longterm kernels series, as mainline developers are not required
> to participate in their development.
>
> Still it would be good to fix the problem. So unless the developers come
> up with plan, it might be best to just revert a62c50545b4d in 6.1.y;
> guess asking Greg to do so might be best way ahead unless some solutions
> comes into sight within a few days.
>
It's been a minute since I've looked at this due to other commitments
but accidentally bumped into the fault again when testing the latest 6.6
LTS for a new feature of our software. (I forgot to revert the commit
for "ice: remove af_xdp_zc_qps bitmap" in our build system.)
This led me to wonder about the current version, and can trigger the
same crash on 6.11.5 [3].
Reverting "ice: remove af_xdp_zc_qps bitmap" [1] in the current mainline
is a little more complicated as commit ebc33a3f8d0a ("ice: improve
updating ice_{t,r}x_ring::xsk_pool") also changes things a little so the
reversion doesn't work cleanly.
I have tweaked everything a little the below patch [2] applies cleanly
to 6.11.5 and 6.12-rc5 and seems to fix the fault.
Thought I'd bubble this up as it's definitely still an issue in the
mainline kernel as of now.
Thanks
Alasdair
[1] Commit adbf5a42341f6ea038d3626cd4437d9f0ad0b2dd
[2]
https://github.com/OpenSource-THG/kernel-patches/tree/main/2024-11-ice-xskzc-page-fault
[3] 6.11.5 ooops
[ 565.069120] BUG: unable to handle page fault for address:
ffffa566707380c4
[ 565.069144] #PF: supervisor read access in kernel mode
[ 565.069155] #PF: error_code(0x0000) - not-present page
[ 565.069167] PGD 100000067 P4D 100000067 PUD 20ef17067 PMD 0
[ 565.069183] Oops: Oops: 0000 [#1] PREEMPT SMP PTI
[ 565.069195] CPU: 7 UID: 0 PID: 6967 Comm: tlndd.bin Kdump: loaded
Tainted: G E
6.11.5-1.thg.836e8867d7.241031.135507.el9.x86_64 #1
[ 565.069220] Tainted: [E]=UNSIGNED_MODULE
[ 565.069228] Hardware name: Supermicro SYS-1028R-TDW/X10DDW-i, BIOS
3.2 12/16/2019
[ 565.069241] RIP: 0010:ice_xsk_clean_rx_ring+0x37/0x110 [ice]
[ 565.069338] Code: 55 53 48 83 ec 08 44 0f b7 af a4 00 00 00 0f b7 af
a2 00 00 00 66 41 39 ed 74 33 48 89 fb 48 8b 4b 38 41 0f b7 c5 4c 8b 34
c1 <41> f6 46 34 01 75 30 4c 89 f7 41 83 c5 01 e8 f6 0c 7e ce 31 c0 66
[ 565.069365] RSP: 0018:ffffa5660f8f36d8 EFLAGS: 00010293
[ 565.069375] RAX: 0000000000000000 RBX: ffff8bb105d38600 RCX:
ffff8bb184930000
[ 565.069387] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
ffff8bb105d38600
[ 565.069400] RBP: 00000000000007ff R08: 000000000000050b R09:
0000000000000000
[ 565.069411] R10: ffff8bb10f910000 R11: 0000000000000020 R12:
0000000000000004
[ 565.069422] R13: 0000000000000000 R14: ffffa56670738090 R15:
ffff8bb1116b5740
[ 565.069434] FS: 00007f677a5d1dc0(0000) GS:ffff8bb85fd80000(0000)
knlGS:0000000000000000
[ 565.069447] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 565.069457] CR2: ffffa566707380c4 CR3: 0000000120164005 CR4:
00000000001706f0
[ 565.069470] Call Trace:
[ 565.069480] <TASK>
[ 565.069489] ? __die+0x20/0x70
[ 565.069504] ? page_fault_oops+0x80/0x150
[ 565.069517] ? exc_page_fault+0xcd/0x170
[ 565.069531] ? asm_exc_page_fault+0x22/0x30
[ 565.069546] ? ice_xsk_clean_rx_ring+0x37/0x110 [ice]
[ 565.069598] ice_clean_rx_ring+0x16e/0x190 [ice]
[ 565.069650] ice_down+0x2f8/0x3c0 [ice]
[ 565.069692] ice_xdp_setup_prog+0x193/0x460 [ice]
[ 565.069734] ice_xdp+0x7a/0xb0 [ice]
[ 565.069774] ? __pfx_ice_xdp+0x10/0x10 [ice]
[ 565.069813] dev_xdp_install+0xc7/0x100
[ 565.069829] dev_xdp_attach+0x205/0x5d0
[ 565.069841] do_setlink+0x7d3/0xc20
[ 565.069853] ? dequeue_skb+0x80/0x4f0
[ 565.069866] ? __nla_validate_parse+0x125/0x1d0
[ 565.069880] __rtnl_newlink+0x4f7/0x630
[ 565.069892] ? __kmalloc_cache_noprof+0x225/0x2b0
[ 565.069905] rtnl_newlink+0x44/0x70
[ 565.069915] rtnetlink_rcv_msg+0x15c/0x410
[ 565.069928] ? avc_has_perm_noaudit+0x67/0xf0
[ 565.069943] ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[ 565.069956] netlink_rcv_skb+0x57/0x100
[ 565.069969] netlink_unicast+0x246/0x370
[ 565.069980] netlink_sendmsg+0x1f6/0x430
[ 565.069991] ____sys_sendmsg+0x3be/0x3f0
[ 565.070003] ? import_iovec+0x16/0x20
[ 565.070015] ? copy_msghdr_from_user+0x6d/0xa0
[ 565.070028] ___sys_sendmsg+0x88/0xd0
[ 565.070038] ? __memcg_slab_free_hook+0xd5/0x120
[ 565.070050] ? __inode_wait_for_writeback+0x7d/0xf0
[ 565.070065] ? mod_objcg_state+0xc9/0x2f0
[ 565.070076] __sys_sendmsg+0x59/0xa0
[ 565.070086] ? syscall_trace_enter+0xfb/0x190
[ 565.070098] do_syscall_64+0x60/0x180
[ 565.070111] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 565.070126] RIP: 0033:0x7f677ab0f94d
[ 565.070136] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 0a 67
f7 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 5e 67 f7 ff 48
[ 565.070164] RSP: 002b:00007ffd1e4f7a60 EFLAGS: 00000293 ORIG_RAX:
000000000000002e
[ 565.070178] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
00007f677ab0f94d
[ 565.070191] RDX: 0000000000000000 RSI: 000000001d698848 RDI:
000000000000000a
[ 565.070203] RBP: 000000001d5350e0 R08: 0000000000000000 R09:
0000000000465f98
[ 565.070215] R10: 0000000000200000 R11: 0000000000000293 R12:
000000001d535110
[ 565.070227] R13: 000000000051d798 R14: 000000001d698830 R15:
000000001d5384b0
[ 565.070240] </TASK>
[ 565.070248] Modules linked in: bonding(E) tls(E) nft_fib_inet(E)
nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E)
nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E)
nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_
defrag_ipv6(E) nf_defrag_ipv4(E) rfkill(E) ip_set(E) nf_tables(E)
libcrc32c(E) nfnetlink(E) vfat(E) fat(E) intel_rapl_msr(E)
intel_rapl_common(E) sb_edac(E) x86_pkg_temp_thermal(E)
intel_powerclamp(E) coretemp(E) kvm_intel(E) ipmi_ssif(
E) kvm(E) iTCO_wdt(E) intel_pmc_bxt(E) iTCO_vendor_support(E) rapl(E)
intel_cstate(E) intel_uncore(E) ast(E) i2c_i801(E) pcspkr(E) mei_me(E)
drm_shmem_helper(E) mxm_wmi(E) drm_kms_helper(E) i2c_mux(E) mei(E)
i2c_smbus(E) lpc_ich(E) ioat
dma(E) acpi_power_meter(E) ipmi_si(E) acpi_ipmi(E) joydev(E)
ipmi_devintf(E) ipmi_msghandler(E) acpi_pad(E) drm(E) fuse(E) ext4(E)
mbcache(E) jbd2(E) sd_mod(E) sg(E) ice(E) ahci(E) crct10dif_pclmul(E)
crc32_pclmul(E) crc32c_intel(E) lib
ahci(E) polyval_clmulni(E) igb(E) polyval_generic(E) libata(E)
ghash_clmulni_intel(E)
[ 565.070304] i2c_algo_bit(E) dca(E) libie(E) wmi(E) dm_mirror(E)
dm_region_hash(E) dm_log(E) dm_mod(E)
[ 565.071430] CR2: ffffa566707380c4
Powered by blists - more mailing lists