[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2892e61ddc986c0d1ccb86fd3d6309c8a484d158.camel@nvidia.com>
Date: Mon, 22 Jan 2024 10:11:59 +0000
From: Dragos Tatulea <dtatulea@...dia.com>
To: "ian.kumlien@...il.com" <ian.kumlien@...il.com>
CC: Gal Pressman <gal@...dia.com>, Saeed Mahameed <saeedm@...dia.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: Re: [mlx5e] FYI dmesg is filled with
mlx5e_page_release_fragmented.isra warnings in 6.6.12
On Thu, 2024-01-18 at 21:54 +0100, Ian Kumlien wrote:
> So this is a L3 openstack node, everything seemed to be working fine
> until the VPNaaS started working (strongswan in this case since the
> implementations leaves some to be desired)
>
> I have a longer dmesg output that is 6396 lines... Let me know if you
> want it - I assume it's not for a mailing list
>
That would be useful. Please send it privately.
We have a similar report that we're looking into currently.
> The ipsec uses xfrm and i assume it triggers offload... The problem is
> that this is a production system and i can't really test on it :/
>
> On Thu, Jan 18, 2024 at 5:22 PM Dragos Tatulea <dtatulea@...dia.com> wrote:
> >
> > On 01/18, Ian Kumlien wrote:
> > > ok, so after about 200 of these, we had a full kernel oops. more
> > > graceful than earlier kernels but...
> > >
> > > On Thu, Jan 18, 2024 at 4:08 PM Ian Kumlien <ian.kumlien@...il.com> wrote:
> > > >
> > > > [ 1068.937101] ------------[ cut here ]------------
> > > > [ 1068.937977] WARNING: CPU: 0 PID: 0 at
> > > > include/net/page_pool/helpers.h:130
> > > > mlx5e_page_release_fragmented.isra.0+0x46/0x50 [mlx5_core]
> > > > [ 1068.939407] Modules linked in: echainiv(E) esp4(E)
> > > > xfrm_interface(E) xfrm6_tunnel(E) tunnel4(E) tunnel6(E) xt_policy(E)
> > > > xt_physdev(E) xt_nat(E) xt_REDIRECT(E) xt_comment(E) xt_connmark(E)
> > > > xt_mark(E) vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E)
> > > > nfnetlink_cttimeout(E) xt_conntrack(E) nft_chain_nat(E)
> > > > xt_MASQUERADE(E) nf_conntrack_netlink(E) xt_addrtype(E) nft_compat(E)
> > > > nf_tables(E) nfnetlink(E) br_netfilter(E) bridge(E) 8021q(E) garp(E)
> > > > mrp(E) stp(E) llc(E) overlay(E) bonding(E) cfg80211(E) rfkill(E)
> > > > ipmi_ssif(E) intel_rapl_msr(E) intel_rapl_common(E) sb_edac(E)
> > > > x86_pkg_temp_thermal(E) intel_powerclamp(E) vfat(E) fat(E) coretemp(E)
> > > > kvm_intel(E) kvm(E) iTCO_wdt(E) mlx5_ib(E) intel_pmc_bxt(E)
> > > > iTCO_vendor_support(E) acpi_ipmi(E) i2c_algo_bit(E) ipmi_si(E)
> > > > irqbypass(E) ib_uverbs(E) drm_shmem_helper(E) ipmi_devintf(E)
> > > > ioatdma(E) rapl(E) i2c_i801(E) intel_cstate(E) ib_core(E)
> > > > intel_uncore(E) pcspkr(E) drm_kms_helper(E) joydev(E) lpc_ich(E)
> > > > hpilo(E) acpi_tad(E) ipmi_msghandler(E) acpi_power_meter(E) dca(E)
> > > > i2c_smbus(E) xfs(E)
> > > > [ 1068.939782] drm(E) openvswitch(E) nf_conncount(E) nf_nat(E)
> > > > ext4(E) mbcache(E) jbd2(E) mlx5_core(E) sd_mod(E) t10_pi(E) sg(E)
> > > > crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E)
> > > > polyval_generic(E) serio_raw(E) ghash_clmulni_intel(E) mlxfw(E) tg3(E)
> > > > hpsa(E) tls(E) hpwdt(E) scsi_transport_sas(E) psample(E) wmi(E)
> > > > pci_hyperv_intf(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> > > > nf_conntrack(E) libcrc32c(E) crc32c_intel(E) nf_defrag_ipv6(E)
> > > > nf_defrag_ipv4(E) ip6_tables(E) fuse(E)
> > > > [ 1068.947864] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: G
> > > > W E 6.6.12-1.el9.elrepo.x86_64 #1
> > > > [ 1068.949014] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360
> > > > Gen9, BIOS P89 11/23/2021
> > > > [ 1068.949552] RIP:
> > > > 0010:mlx5e_page_release_fragmented.isra.0+0x46/0x50 [mlx5_core]
> > > > [ 1068.951033] Code: f7 da f0 48 0f c1 56 28 48 39 c2 78 1d 74 05 c3
> > > > cc cc cc cc 48 8b bf 60 04 00 00 b9 01 00 00 00 ba ff ff ff ff e9 da
> > > > f7 f3 da <0f> 0b c3 cc cc cc cc 0f 1f 00 90 90 90 90 90 90 90 90 90 90
> > > > 90 90
> > > > [ 1068.952632] RSP: 0018:ffffb3a800003df0 EFLAGS: 00010297
> > > > [ 1068.953301] RAX: 000000000000003d RBX: ffff987f51b78000 RCX: 0000000000000050
> > > > [ 1068.954279] RDX: 0000000000000000 RSI: ffffdb5246508580 RDI: ffff987f51b78000
> > > > [ 1068.955358] RBP: ffff987fcdb0b540 R08: 0000000000000006 R09: ffff988ec44830c0
> > > > [ 1068.957674] R10: 0000000000000000 R11: ffff987fcab77040 R12: 0000000000000040
> > > > [ 1068.958669] R13: 0000000000000040 R14: ffff987fcdb0b168 R15: 000000000000003c
> > > > [ 1068.959828] FS: 0000000000000000(0000) GS:ffff988ebfc00000(0000)
> > > > knlGS:0000000000000000
> > > > [ 1068.960466] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [ 1068.961350] CR2: 00007f173925a4e0 CR3: 0000001067a1e006 CR4: 00000000001706f0
> > > > [ 1068.962230] Call Trace:
> > > > [ 1068.962478] <IRQ>
> > > > [ 1068.963055] ? __warn+0x80/0x130
> > > > [ 1068.963073] ? mlx5e_page_release_fragmented.isra.0+0x46/0x50 [mlx5_core]
> > > > [ 1068.964275] ? report_bug+0x1c3/0x1d0
> > > > [ 1068.964585] ? handle_bug+0x42/0x70
> > > > [ 1068.965228] ? exc_invalid_op+0x14/0x70
> > > > [ 1068.965538] ? asm_exc_invalid_op+0x16/0x20
> > > > [ 1068.965854] ? mlx5e_page_release_fragmented.isra.0+0x46/0x50 [mlx5_core]
> > > > [ 1068.966518] mlx5e_free_rx_mpwqe+0x18e/0x1c0 [mlx5_core]
> > > > [ 1068.967221] mlx5e_post_rx_mpwqes+0x1a5/0x280 [mlx5_core]
> > > > [ 1068.967810] mlx5e_napi_poll+0x143/0x710 [mlx5_core]
> > > > [ 1068.968416] ? __netif_receive_skb_one_core+0x92/0xa0
> > > > [ 1068.968799] __napi_poll+0x2c/0x1b0
> > > > [ 1068.970066] net_rx_action+0x2a7/0x370
> > > > [ 1068.971012] ? mlx5_cq_tasklet_cb+0x78/0x180 [mlx5_core]
> > > > [ 1068.971683] __do_softirq+0xf0/0x2ee
> > > > [ 1068.972002] __irq_exit_rcu+0x83/0xf0
> > > > [ 1068.972338] common_interrupt+0xb8/0xd0
> > > > [ 1068.972738] </IRQ>
> > > > [ 1068.973324] <TASK>
> > > > [ 1068.974019] asm_common_interrupt+0x22/0x40
> > > > [ 1068.974412] RIP: 0010:cpuidle_enter_state+0xc8/0x430
> > > > [ 1068.974787] Code: 0e c0 47 ff e8 99 f0 ff ff 8b 53 04 49 89 c5 0f
> > > > 1f 44 00 00 31 ff e8 87 99 46 ff 45 84 ff 0f 85 3f 02 00 00 fb 0f 1f
> > > > 44 00 00 <45> 85 f6 0f 88 6e 01 00 00 49 63 d6 4c 2b 2c 24 48 8d 04 52
> > > > 48 8d
> > > > [ 1068.976473] RSP: 0018:ffffffff9ca03e48 EFLAGS: 00000246
> > > > [ 1068.976872] RAX: ffff988ebfc00000 RBX: ffff988ebfc3da78 RCX: 000000000000001f
> > > > [ 1068.977869] RDX: 0000000000000000 RSI: ffffffff9c30e0ff RDI: ffffffff9c2e82f0
> > > > [ 1068.978824] RBP: 0000000000000004 R08: 000000f8e18f1bef R09: 0000000000000018
> > > > [ 1068.979802] R10: 0000000000009441 R11: ffff988ebfc317e4 R12: ffffffff9ceaf6c0
> > > > [ 1068.980841] R13: 000000f8e18f1bef R14: 0000000000000004 R15: 0000000000000000
> > > > [ 1068.981801] ? cpuidle_enter_state+0xb9/0x430
> > > > [ 1068.982669] cpuidle_enter+0x29/0x40
> > > > [ 1068.983003] cpuidle_idle_call+0x10a/0x170
> > > > [ 1068.983349] do_idle+0x7e/0xe0
> > > > [ 1068.984015] cpu_startup_entry+0x26/0x30
> > > > [ 1068.984333] rest_init+0xcd/0xd0
> > > > [ 1068.985008] arch_call_rest_init+0xa/0x30
> > > > [ 1068.985326] start_kernel+0x332/0x410
> > > > [ 1068.985628] x86_64_start_reservations+0x14/0x30
> > > > [ 1068.986337] x86_64_start_kernel+0x8e/0x90
> > > > [ 1068.986653] secondary_startup_64_no_verify+0x18f/0x19b
> > > > [ 1068.987068] </TASK>
> > > > [ 1068.987305] ---[ end trace 0000000000000000 ]---
> > >
> >
> > Thanks for the report. We got another similar report recently which we
> > don't see internally.
> >
> > Do you know what was the last known kernel working version?
> >
> > Could you describe the configuration and the reproduction steps?
> >
> > Thanks,
> > Dragos
Powered by blists - more mailing lists