lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAA85sZser9Kd=mEYAKbQOxfGGR=b=17ObzBs46W5QtmWhnB3gQ@mail.gmail.com>
Date: Thu, 18 Jan 2024 21:54:42 +0100
From: Ian Kumlien <ian.kumlien@...il.com>
To: Dragos Tatulea <dtatulea@...dia.com>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>, saeedm@...dia.com, gal@...dia.com
Subject: Re: Re: [mlx5e] FYI dmesg is filled with mlx5e_page_release_fragmented.isra
 warnings in 6.6.12

So this is a L3 openstack node, everything seemed to be working fine
until the VPNaaS started working (strongswan in this case since the
implementations leaves some to be desired)

I have a longer dmesg output that is 6396 lines... Let me know if you
want it - I assume it's not for a mailing list

The ipsec uses xfrm and i assume it triggers offload... The problem is
that this is a production system and i can't really test on it :/

On Thu, Jan 18, 2024 at 5:22 PM Dragos Tatulea <dtatulea@...dia.com> wrote:
>
> On 01/18, Ian Kumlien wrote:
> > ok, so after about 200 of these, we had a full kernel oops. more
> > graceful than earlier kernels but...
> >
> > On Thu, Jan 18, 2024 at 4:08 PM Ian Kumlien <ian.kumlien@...il.com> wrote:
> > >
> > > [ 1068.937101] ------------[ cut here ]------------
> > > [ 1068.937977] WARNING: CPU: 0 PID: 0 at
> > > include/net/page_pool/helpers.h:130
> > > mlx5e_page_release_fragmented.isra.0+0x46/0x50 [mlx5_core]
> > > [ 1068.939407] Modules linked in: echainiv(E) esp4(E)
> > > xfrm_interface(E) xfrm6_tunnel(E) tunnel4(E) tunnel6(E) xt_policy(E)
> > > xt_physdev(E) xt_nat(E) xt_REDIRECT(E) xt_comment(E) xt_connmark(E)
> > > xt_mark(E) vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E)
> > > nfnetlink_cttimeout(E) xt_conntrack(E) nft_chain_nat(E)
> > > xt_MASQUERADE(E) nf_conntrack_netlink(E) xt_addrtype(E) nft_compat(E)
> > > nf_tables(E) nfnetlink(E) br_netfilter(E) bridge(E) 8021q(E) garp(E)
> > > mrp(E) stp(E) llc(E) overlay(E) bonding(E) cfg80211(E) rfkill(E)
> > > ipmi_ssif(E) intel_rapl_msr(E) intel_rapl_common(E) sb_edac(E)
> > > x86_pkg_temp_thermal(E) intel_powerclamp(E) vfat(E) fat(E) coretemp(E)
> > > kvm_intel(E) kvm(E) iTCO_wdt(E) mlx5_ib(E) intel_pmc_bxt(E)
> > > iTCO_vendor_support(E) acpi_ipmi(E) i2c_algo_bit(E) ipmi_si(E)
> > > irqbypass(E) ib_uverbs(E) drm_shmem_helper(E) ipmi_devintf(E)
> > > ioatdma(E) rapl(E) i2c_i801(E) intel_cstate(E) ib_core(E)
> > > intel_uncore(E) pcspkr(E) drm_kms_helper(E) joydev(E) lpc_ich(E)
> > > hpilo(E) acpi_tad(E) ipmi_msghandler(E) acpi_power_meter(E) dca(E)
> > > i2c_smbus(E) xfs(E)
> > > [ 1068.939782]  drm(E) openvswitch(E) nf_conncount(E) nf_nat(E)
> > > ext4(E) mbcache(E) jbd2(E) mlx5_core(E) sd_mod(E) t10_pi(E) sg(E)
> > > crct10dif_pclmul(E) crc32_pclmul(E) polyval_clmulni(E)
> > > polyval_generic(E) serio_raw(E) ghash_clmulni_intel(E) mlxfw(E) tg3(E)
> > > hpsa(E) tls(E) hpwdt(E) scsi_transport_sas(E) psample(E) wmi(E)
> > > pci_hyperv_intf(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> > > nf_conntrack(E) libcrc32c(E) crc32c_intel(E) nf_defrag_ipv6(E)
> > > nf_defrag_ipv4(E) ip6_tables(E) fuse(E)
> > > [ 1068.947864] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: G
> > >       W   E      6.6.12-1.el9.elrepo.x86_64 #1
> > > [ 1068.949014] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360
> > > Gen9, BIOS P89 11/23/2021
> > > [ 1068.949552] RIP:
> > > 0010:mlx5e_page_release_fragmented.isra.0+0x46/0x50 [mlx5_core]
> > > [ 1068.951033] Code: f7 da f0 48 0f c1 56 28 48 39 c2 78 1d 74 05 c3
> > > cc cc cc cc 48 8b bf 60 04 00 00 b9 01 00 00 00 ba ff ff ff ff e9 da
> > > f7 f3 da <0f> 0b c3 cc cc cc cc 0f 1f 00 90 90 90 90 90 90 90 90 90 90
> > > 90 90
> > > [ 1068.952632] RSP: 0018:ffffb3a800003df0 EFLAGS: 00010297
> > > [ 1068.953301] RAX: 000000000000003d RBX: ffff987f51b78000 RCX: 0000000000000050
> > > [ 1068.954279] RDX: 0000000000000000 RSI: ffffdb5246508580 RDI: ffff987f51b78000
> > > [ 1068.955358] RBP: ffff987fcdb0b540 R08: 0000000000000006 R09: ffff988ec44830c0
> > > [ 1068.957674] R10: 0000000000000000 R11: ffff987fcab77040 R12: 0000000000000040
> > > [ 1068.958669] R13: 0000000000000040 R14: ffff987fcdb0b168 R15: 000000000000003c
> > > [ 1068.959828] FS:  0000000000000000(0000) GS:ffff988ebfc00000(0000)
> > > knlGS:0000000000000000
> > > [ 1068.960466] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 1068.961350] CR2: 00007f173925a4e0 CR3: 0000001067a1e006 CR4: 00000000001706f0
> > > [ 1068.962230] Call Trace:
> > > [ 1068.962478]  <IRQ>
> > > [ 1068.963055]  ? __warn+0x80/0x130
> > > [ 1068.963073]  ? mlx5e_page_release_fragmented.isra.0+0x46/0x50 [mlx5_core]
> > > [ 1068.964275]  ? report_bug+0x1c3/0x1d0
> > > [ 1068.964585]  ? handle_bug+0x42/0x70
> > > [ 1068.965228]  ? exc_invalid_op+0x14/0x70
> > > [ 1068.965538]  ? asm_exc_invalid_op+0x16/0x20
> > > [ 1068.965854]  ? mlx5e_page_release_fragmented.isra.0+0x46/0x50 [mlx5_core]
> > > [ 1068.966518]  mlx5e_free_rx_mpwqe+0x18e/0x1c0 [mlx5_core]
> > > [ 1068.967221]  mlx5e_post_rx_mpwqes+0x1a5/0x280 [mlx5_core]
> > > [ 1068.967810]  mlx5e_napi_poll+0x143/0x710 [mlx5_core]
> > > [ 1068.968416]  ? __netif_receive_skb_one_core+0x92/0xa0
> > > [ 1068.968799]  __napi_poll+0x2c/0x1b0
> > > [ 1068.970066]  net_rx_action+0x2a7/0x370
> > > [ 1068.971012]  ? mlx5_cq_tasklet_cb+0x78/0x180 [mlx5_core]
> > > [ 1068.971683]  __do_softirq+0xf0/0x2ee
> > > [ 1068.972002]  __irq_exit_rcu+0x83/0xf0
> > > [ 1068.972338]  common_interrupt+0xb8/0xd0
> > > [ 1068.972738]  </IRQ>
> > > [ 1068.973324]  <TASK>
> > > [ 1068.974019]  asm_common_interrupt+0x22/0x40
> > > [ 1068.974412] RIP: 0010:cpuidle_enter_state+0xc8/0x430
> > > [ 1068.974787] Code: 0e c0 47 ff e8 99 f0 ff ff 8b 53 04 49 89 c5 0f
> > > 1f 44 00 00 31 ff e8 87 99 46 ff 45 84 ff 0f 85 3f 02 00 00 fb 0f 1f
> > > 44 00 00 <45> 85 f6 0f 88 6e 01 00 00 49 63 d6 4c 2b 2c 24 48 8d 04 52
> > > 48 8d
> > > [ 1068.976473] RSP: 0018:ffffffff9ca03e48 EFLAGS: 00000246
> > > [ 1068.976872] RAX: ffff988ebfc00000 RBX: ffff988ebfc3da78 RCX: 000000000000001f
> > > [ 1068.977869] RDX: 0000000000000000 RSI: ffffffff9c30e0ff RDI: ffffffff9c2e82f0
> > > [ 1068.978824] RBP: 0000000000000004 R08: 000000f8e18f1bef R09: 0000000000000018
> > > [ 1068.979802] R10: 0000000000009441 R11: ffff988ebfc317e4 R12: ffffffff9ceaf6c0
> > > [ 1068.980841] R13: 000000f8e18f1bef R14: 0000000000000004 R15: 0000000000000000
> > > [ 1068.981801]  ? cpuidle_enter_state+0xb9/0x430
> > > [ 1068.982669]  cpuidle_enter+0x29/0x40
> > > [ 1068.983003]  cpuidle_idle_call+0x10a/0x170
> > > [ 1068.983349]  do_idle+0x7e/0xe0
> > > [ 1068.984015]  cpu_startup_entry+0x26/0x30
> > > [ 1068.984333]  rest_init+0xcd/0xd0
> > > [ 1068.985008]  arch_call_rest_init+0xa/0x30
> > > [ 1068.985326]  start_kernel+0x332/0x410
> > > [ 1068.985628]  x86_64_start_reservations+0x14/0x30
> > > [ 1068.986337]  x86_64_start_kernel+0x8e/0x90
> > > [ 1068.986653]  secondary_startup_64_no_verify+0x18f/0x19b
> > > [ 1068.987068]  </TASK>
> > > [ 1068.987305] ---[ end trace 0000000000000000 ]---
> >
>
> Thanks for the report. We got another similar report recently which we
> don't see internally.
>
> Do you know what was the last known kernel working version?
>
> Could you describe the configuration and the reproduction steps?
>
> Thanks,
> Dragos

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ