lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <uxlqaq25tft55nwyfueyj7g5co2lva2j5qnbsijwazxr2ld4l4@uqhiteuyduhd>
Date: Thu, 18 Jan 2024 17:22:30 +0100
From: Dragos Tatulea <dtatulea@...dia.com>
To: Ian Kumlien <ian.kumlien@...il.com>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>, 
	saeedm@...dia.com, gal@...dia.com
Subject: Re: Re: [mlx5e] FYI dmesg is filled with
 mlx5e_page_release_fragmented.isra warnings in 6.6.12

On 01/18, Ian Kumlien wrote:
> ok, so after about 200 of these, we had a full kernel oops. more
> graceful than earlier kernels but...
> 
> On Thu, Jan 18, 2024 at 4:08 PM Ian Kumlien <ian.kumlien@...il.com> wrote:
> >
> > [ 1068.937101] ------------[ cut here ]------------
> > [ 1068.937977] WARNING: CPU: 0 PID: 0 at
> > include/net/page_pool/helpers.h:130
> > mlx5e_page_release_fragmented.isra.0+0x46/0x50 [mlx5_core]
> > [ 1068.939407] Modules linked in: echainiv(E) esp4(E)
> > xfrm_interface(E) xfrm6_tunnel(E) tunnel4(E) tunnel6(E) xt_policy(E)
> > xt_physdev(E) xt_nat(E) xt_REDIRECT(E) xt_comment(E) xt_connmark(E)
> > xt_mark(E) vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E)
> > nfnetlink_cttimeout(E) xt_conntrack(E) nft_chain_nat(E)
> > xt_MASQUERADE(E) nf_conntrack_netlink(E) xt_addrtype(E) nft_compat(E)
> > nf_tables(E) nfnetlink(E) br_netfilter(E) bridge(E) 8021q(E) garp(E)
> > mrp(E) stp(E) llc(E) overlay(E) bonding(E) cfg80211(E) rfkill(E)
> > ipmi_ssif(E) intel_rapl_msr(E) intel_rapl_common(E) sb_edac(E)
> > x86_pkg_temp_thermal(E) intel_powerclamp(E) vfat(E) fat(E) coretemp(E)
> > kvm_intel(E) kvm(E) iTCO_wdt(E) mlx5_ib(E) intel_pmc_bxt(E)
> > iTCO_vendor_support(E) acpi_ipmi(E) i2c_algo_bit(E) ipmi_si(E)
> > irqbypass(E) ib_uverbs(E) drm_shmem_helper(E) ipmi_devintf(E)
> > ioatdma(E) rapl(E) i2c_i801(E) intel_cstate(E) ib_core(E)
> > intel_uncore(E) pcspkr(E) drm_kms_helper(E) joydev(E) lpc_ich(E)
> > hpilo(E) acpi_tad(E) ipmi_msghandler(E) acpi_power_meter(E) dca(E)
> > i2c_smbus(E) xfs(E)
> > [ 1068.939782]  drm(E) openvswitch(E) nf_conncount(E) nf_nat(E)
> > ext4(E) mbcache(E) jbd2(E) mlx5_core(E) sd_mod(E) t10_pi(E) sg(E)
> > crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E)
> > polyval_generic(E) serio_raw(E) ghash_clmulni_intel(E) mlxfw(E) tg3(E)
> > hpsa(E) tls(E) hpwdt(E) scsi_transport_sas(E) psample(E) wmi(E)
> > pci_hyperv_intf(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> > nf_conntrack(E) libcrc32c(E) crc32c_intel(E) nf_defrag_ipv6(E)
> > nf_defrag_ipv4(E) ip6_tables(E) fuse(E)
> > [ 1068.947864] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: G
> >       W   E      6.6.12-1.el9.elrepo.x86_64 #1
> > [ 1068.949014] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360
> > Gen9, BIOS P89 11/23/2021
> > [ 1068.949552] RIP:
> > 0010:mlx5e_page_release_fragmented.isra.0+0x46/0x50 [mlx5_core]
> > [ 1068.951033] Code: f7 da f0 48 0f c1 56 28 48 39 c2 78 1d 74 05 c3
> > cc cc cc cc 48 8b bf 60 04 00 00 b9 01 00 00 00 ba ff ff ff ff e9 da
> > f7 f3 da <0f> 0b c3 cc cc cc cc 0f 1f 00 90 90 90 90 90 90 90 90 90 90
> > 90 90
> > [ 1068.952632] RSP: 0018:ffffb3a800003df0 EFLAGS: 00010297
> > [ 1068.953301] RAX: 000000000000003d RBX: ffff987f51b78000 RCX: 0000000000000050
> > [ 1068.954279] RDX: 0000000000000000 RSI: ffffdb5246508580 RDI: ffff987f51b78000
> > [ 1068.955358] RBP: ffff987fcdb0b540 R08: 0000000000000006 R09: ffff988ec44830c0
> > [ 1068.957674] R10: 0000000000000000 R11: ffff987fcab77040 R12: 0000000000000040
> > [ 1068.958669] R13: 0000000000000040 R14: ffff987fcdb0b168 R15: 000000000000003c
> > [ 1068.959828] FS:  0000000000000000(0000) GS:ffff988ebfc00000(0000)
> > knlGS:0000000000000000
> > [ 1068.960466] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 1068.961350] CR2: 00007f173925a4e0 CR3: 0000001067a1e006 CR4: 00000000001706f0
> > [ 1068.962230] Call Trace:
> > [ 1068.962478]  <IRQ>
> > [ 1068.963055]  ? __warn+0x80/0x130
> > [ 1068.963073]  ? mlx5e_page_release_fragmented.isra.0+0x46/0x50 [mlx5_core]
> > [ 1068.964275]  ? report_bug+0x1c3/0x1d0
> > [ 1068.964585]  ? handle_bug+0x42/0x70
> > [ 1068.965228]  ? exc_invalid_op+0x14/0x70
> > [ 1068.965538]  ? asm_exc_invalid_op+0x16/0x20
> > [ 1068.965854]  ? mlx5e_page_release_fragmented.isra.0+0x46/0x50 [mlx5_core]
> > [ 1068.966518]  mlx5e_free_rx_mpwqe+0x18e/0x1c0 [mlx5_core]
> > [ 1068.967221]  mlx5e_post_rx_mpwqes+0x1a5/0x280 [mlx5_core]
> > [ 1068.967810]  mlx5e_napi_poll+0x143/0x710 [mlx5_core]
> > [ 1068.968416]  ? __netif_receive_skb_one_core+0x92/0xa0
> > [ 1068.968799]  __napi_poll+0x2c/0x1b0
> > [ 1068.970066]  net_rx_action+0x2a7/0x370
> > [ 1068.971012]  ? mlx5_cq_tasklet_cb+0x78/0x180 [mlx5_core]
> > [ 1068.971683]  __do_softirq+0xf0/0x2ee
> > [ 1068.972002]  __irq_exit_rcu+0x83/0xf0
> > [ 1068.972338]  common_interrupt+0xb8/0xd0
> > [ 1068.972738]  </IRQ>
> > [ 1068.973324]  <TASK>
> > [ 1068.974019]  asm_common_interrupt+0x22/0x40
> > [ 1068.974412] RIP: 0010:cpuidle_enter_state+0xc8/0x430
> > [ 1068.974787] Code: 0e c0 47 ff e8 99 f0 ff ff 8b 53 04 49 89 c5 0f
> > 1f 44 00 00 31 ff e8 87 99 46 ff 45 84 ff 0f 85 3f 02 00 00 fb 0f 1f
> > 44 00 00 <45> 85 f6 0f 88 6e 01 00 00 49 63 d6 4c 2b 2c 24 48 8d 04 52
> > 48 8d
> > [ 1068.976473] RSP: 0018:ffffffff9ca03e48 EFLAGS: 00000246
> > [ 1068.976872] RAX: ffff988ebfc00000 RBX: ffff988ebfc3da78 RCX: 000000000000001f
> > [ 1068.977869] RDX: 0000000000000000 RSI: ffffffff9c30e0ff RDI: ffffffff9c2e82f0
> > [ 1068.978824] RBP: 0000000000000004 R08: 000000f8e18f1bef R09: 0000000000000018
> > [ 1068.979802] R10: 0000000000009441 R11: ffff988ebfc317e4 R12: ffffffff9ceaf6c0
> > [ 1068.980841] R13: 000000f8e18f1bef R14: 0000000000000004 R15: 0000000000000000
> > [ 1068.981801]  ? cpuidle_enter_state+0xb9/0x430
> > [ 1068.982669]  cpuidle_enter+0x29/0x40
> > [ 1068.983003]  cpuidle_idle_call+0x10a/0x170
> > [ 1068.983349]  do_idle+0x7e/0xe0
> > [ 1068.984015]  cpu_startup_entry+0x26/0x30
> > [ 1068.984333]  rest_init+0xcd/0xd0
> > [ 1068.985008]  arch_call_rest_init+0xa/0x30
> > [ 1068.985326]  start_kernel+0x332/0x410
> > [ 1068.985628]  x86_64_start_reservations+0x14/0x30
> > [ 1068.986337]  x86_64_start_kernel+0x8e/0x90
> > [ 1068.986653]  secondary_startup_64_no_verify+0x18f/0x19b
> > [ 1068.987068]  </TASK>
> > [ 1068.987305] ---[ end trace 0000000000000000 ]---
>

Thanks for the report. We got another similar report recently which we
don't see internally.

Do you know what was the last known kernel working version?

Could you describe the configuration and the reproduction steps?

Thanks,
Dragos

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ