[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <uxlqaq25tft55nwyfueyj7g5co2lva2j5qnbsijwazxr2ld4l4@uqhiteuyduhd>
Date: Thu, 18 Jan 2024 17:22:30 +0100
From: Dragos Tatulea <dtatulea@...dia.com>
To: Ian Kumlien <ian.kumlien@...il.com>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>,
saeedm@...dia.com, gal@...dia.com
Subject: Re: Re: [mlx5e] FYI dmesg is filled with
mlx5e_page_release_fragmented.isra warnings in 6.6.12
On 01/18, Ian Kumlien wrote:
> ok, so after about 200 of these, we had a full kernel oops. more
> graceful than earlier kernels but...
>
> On Thu, Jan 18, 2024 at 4:08 PM Ian Kumlien <ian.kumlien@...il.com> wrote:
> >
> > [ 1068.937101] ------------[ cut here ]------------
> > [ 1068.937977] WARNING: CPU: 0 PID: 0 at
> > include/net/page_pool/helpers.h:130
> > mlx5e_page_release_fragmented.isra.0+0x46/0x50 [mlx5_core]
> > [ 1068.939407] Modules linked in: echainiv(E) esp4(E)
> > xfrm_interface(E) xfrm6_tunnel(E) tunnel4(E) tunnel6(E) xt_policy(E)
> > xt_physdev(E) xt_nat(E) xt_REDIRECT(E) xt_comment(E) xt_connmark(E)
> > xt_mark(E) vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E)
> > nfnetlink_cttimeout(E) xt_conntrack(E) nft_chain_nat(E)
> > xt_MASQUERADE(E) nf_conntrack_netlink(E) xt_addrtype(E) nft_compat(E)
> > nf_tables(E) nfnetlink(E) br_netfilter(E) bridge(E) 8021q(E) garp(E)
> > mrp(E) stp(E) llc(E) overlay(E) bonding(E) cfg80211(E) rfkill(E)
> > ipmi_ssif(E) intel_rapl_msr(E) intel_rapl_common(E) sb_edac(E)
> > x86_pkg_temp_thermal(E) intel_powerclamp(E) vfat(E) fat(E) coretemp(E)
> > kvm_intel(E) kvm(E) iTCO_wdt(E) mlx5_ib(E) intel_pmc_bxt(E)
> > iTCO_vendor_support(E) acpi_ipmi(E) i2c_algo_bit(E) ipmi_si(E)
> > irqbypass(E) ib_uverbs(E) drm_shmem_helper(E) ipmi_devintf(E)
> > ioatdma(E) rapl(E) i2c_i801(E) intel_cstate(E) ib_core(E)
> > intel_uncore(E) pcspkr(E) drm_kms_helper(E) joydev(E) lpc_ich(E)
> > hpilo(E) acpi_tad(E) ipmi_msghandler(E) acpi_power_meter(E) dca(E)
> > i2c_smbus(E) xfs(E)
> > [ 1068.939782] drm(E) openvswitch(E) nf_conncount(E) nf_nat(E)
> > ext4(E) mbcache(E) jbd2(E) mlx5_core(E) sd_mod(E) t10_pi(E) sg(E)
> > crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E)
> > polyval_generic(E) serio_raw(E) ghash_clmulni_intel(E) mlxfw(E) tg3(E)
> > hpsa(E) tls(E) hpwdt(E) scsi_transport_sas(E) psample(E) wmi(E)
> > pci_hyperv_intf(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> > nf_conntrack(E) libcrc32c(E) crc32c_intel(E) nf_defrag_ipv6(E)
> > nf_defrag_ipv4(E) ip6_tables(E) fuse(E)
> > [ 1068.947864] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: G
> > W E 6.6.12-1.el9.elrepo.x86_64 #1
> > [ 1068.949014] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360
> > Gen9, BIOS P89 11/23/2021
> > [ 1068.949552] RIP:
> > 0010:mlx5e_page_release_fragmented.isra.0+0x46/0x50 [mlx5_core]
> > [ 1068.951033] Code: f7 da f0 48 0f c1 56 28 48 39 c2 78 1d 74 05 c3
> > cc cc cc cc 48 8b bf 60 04 00 00 b9 01 00 00 00 ba ff ff ff ff e9 da
> > f7 f3 da <0f> 0b c3 cc cc cc cc 0f 1f 00 90 90 90 90 90 90 90 90 90 90
> > 90 90
> > [ 1068.952632] RSP: 0018:ffffb3a800003df0 EFLAGS: 00010297
> > [ 1068.953301] RAX: 000000000000003d RBX: ffff987f51b78000 RCX: 0000000000000050
> > [ 1068.954279] RDX: 0000000000000000 RSI: ffffdb5246508580 RDI: ffff987f51b78000
> > [ 1068.955358] RBP: ffff987fcdb0b540 R08: 0000000000000006 R09: ffff988ec44830c0
> > [ 1068.957674] R10: 0000000000000000 R11: ffff987fcab77040 R12: 0000000000000040
> > [ 1068.958669] R13: 0000000000000040 R14: ffff987fcdb0b168 R15: 000000000000003c
> > [ 1068.959828] FS: 0000000000000000(0000) GS:ffff988ebfc00000(0000)
> > knlGS:0000000000000000
> > [ 1068.960466] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 1068.961350] CR2: 00007f173925a4e0 CR3: 0000001067a1e006 CR4: 00000000001706f0
> > [ 1068.962230] Call Trace:
> > [ 1068.962478] <IRQ>
> > [ 1068.963055] ? __warn+0x80/0x130
> > [ 1068.963073] ? mlx5e_page_release_fragmented.isra.0+0x46/0x50 [mlx5_core]
> > [ 1068.964275] ? report_bug+0x1c3/0x1d0
> > [ 1068.964585] ? handle_bug+0x42/0x70
> > [ 1068.965228] ? exc_invalid_op+0x14/0x70
> > [ 1068.965538] ? asm_exc_invalid_op+0x16/0x20
> > [ 1068.965854] ? mlx5e_page_release_fragmented.isra.0+0x46/0x50 [mlx5_core]
> > [ 1068.966518] mlx5e_free_rx_mpwqe+0x18e/0x1c0 [mlx5_core]
> > [ 1068.967221] mlx5e_post_rx_mpwqes+0x1a5/0x280 [mlx5_core]
> > [ 1068.967810] mlx5e_napi_poll+0x143/0x710 [mlx5_core]
> > [ 1068.968416] ? __netif_receive_skb_one_core+0x92/0xa0
> > [ 1068.968799] __napi_poll+0x2c/0x1b0
> > [ 1068.970066] net_rx_action+0x2a7/0x370
> > [ 1068.971012] ? mlx5_cq_tasklet_cb+0x78/0x180 [mlx5_core]
> > [ 1068.971683] __do_softirq+0xf0/0x2ee
> > [ 1068.972002] __irq_exit_rcu+0x83/0xf0
> > [ 1068.972338] common_interrupt+0xb8/0xd0
> > [ 1068.972738] </IRQ>
> > [ 1068.973324] <TASK>
> > [ 1068.974019] asm_common_interrupt+0x22/0x40
> > [ 1068.974412] RIP: 0010:cpuidle_enter_state+0xc8/0x430
> > [ 1068.974787] Code: 0e c0 47 ff e8 99 f0 ff ff 8b 53 04 49 89 c5 0f
> > 1f 44 00 00 31 ff e8 87 99 46 ff 45 84 ff 0f 85 3f 02 00 00 fb 0f 1f
> > 44 00 00 <45> 85 f6 0f 88 6e 01 00 00 49 63 d6 4c 2b 2c 24 48 8d 04 52
> > 48 8d
> > [ 1068.976473] RSP: 0018:ffffffff9ca03e48 EFLAGS: 00000246
> > [ 1068.976872] RAX: ffff988ebfc00000 RBX: ffff988ebfc3da78 RCX: 000000000000001f
> > [ 1068.977869] RDX: 0000000000000000 RSI: ffffffff9c30e0ff RDI: ffffffff9c2e82f0
> > [ 1068.978824] RBP: 0000000000000004 R08: 000000f8e18f1bef R09: 0000000000000018
> > [ 1068.979802] R10: 0000000000009441 R11: ffff988ebfc317e4 R12: ffffffff9ceaf6c0
> > [ 1068.980841] R13: 000000f8e18f1bef R14: 0000000000000004 R15: 0000000000000000
> > [ 1068.981801] ? cpuidle_enter_state+0xb9/0x430
> > [ 1068.982669] cpuidle_enter+0x29/0x40
> > [ 1068.983003] cpuidle_idle_call+0x10a/0x170
> > [ 1068.983349] do_idle+0x7e/0xe0
> > [ 1068.984015] cpu_startup_entry+0x26/0x30
> > [ 1068.984333] rest_init+0xcd/0xd0
> > [ 1068.985008] arch_call_rest_init+0xa/0x30
> > [ 1068.985326] start_kernel+0x332/0x410
> > [ 1068.985628] x86_64_start_reservations+0x14/0x30
> > [ 1068.986337] x86_64_start_kernel+0x8e/0x90
> > [ 1068.986653] secondary_startup_64_no_verify+0x18f/0x19b
> > [ 1068.987068] </TASK>
> > [ 1068.987305] ---[ end trace 0000000000000000 ]---
>
Thanks for the report. We got another similar report recently which we
don't see internally.
Do you know what was the last known kernel working version?
Could you describe the configuration and the reproduction steps?
Thanks,
Dragos
Powered by blists - more mailing lists