lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 5 Oct 2022 00:16:46 +0000 From: Dexuan Cui <decui@...rosoft.com> To: Avihai Horon <avihaih@...dia.com>, Mark Bloch <mbloch@...dia.com>, Saeed Mahameed <saeedm@...dia.com>, Gavi Teitz <gavi@...lanox.com>, Vlad Buslov <vladbu@...lanox.com> CC: Haiyang Zhang <haiyangz@...rosoft.com>, Long Li <longli@...rosoft.com>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "netdev@...r.kernel.org" <netdev@...r.kernel.org> Subject: mlx5_init_fc_stats() hits OOM due to memory fragmentation? Hi, mlx5 folks, I got the call-trace from a RHEL 7 VM. It looks like mlx5_init_fc_stats() -> kzalloc() hits OOM due to memory fragmentation: // This is the code from RHEL 7: linux-3.10.0-1160.53.1.el7.x86_64 int mlx5_init_fc_stats(struct mlx5_core_dev *dev) { struct mlx5_fc_stats *fc_stats = &dev->priv.fc_stats; int max_bulk_len; int max_out_len; spin_lock_init(&fc_stats->counters_idr_lock); idr_init_ext(&fc_stats->counters_idr); INIT_LIST_HEAD(&fc_stats->counters); init_llist_head(&fc_stats->addlist); init_llist_head(&fc_stats->dellist); max_bulk_len = get_max_bulk_query_len(dev); max_out_len = mlx5_cmd_fc_get_bulk_query_out_len(max_bulk_len); fc_stats->bulk_query_out = kzalloc(max_out_len, GFP_KERNEL); if (!fc_stats->bulk_query_out) return -ENOMEM; ... I think the latest mainline kernel has the same issue. Can this kzalloc() be changed to vzalloc()? [10266192.131842] kworker/8:1: page allocation failure: order:8, mode:0xc0d0 [10266192.138260] CPU: 8 PID: 62790 Comm: kworker/8:1 Kdump: loaded Tainted: P E ------------ 3.10.0-1160.62.1.el7.x86_64 #1 [10266192.179718] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 10/27/2020 [10266192.191089] Workqueue: hv_pri_chan vmbus_add_channel_work [hv_vmbus] [10266192.196217] Call Trace: [10266192.197944] [<ffffffffbb1865a9>] dump_stack+0x19/0x1b [10266192.201866] [<ffffffffbabc4bd0>] warn_alloc_failed+0x110/0x180 [10266192.206103] [<ffffffffbabc976f>] __alloc_pages_nodemask+0x9df/0xbe0 [10266192.210484] [<ffffffffbac193a8>] alloc_pages_current+0x98/0x110 [10266192.214644] [<ffffffffbabe5fc8>] kmalloc_order+0x18/0x40 [10266192.218723] [<ffffffffbac24d76>] kmalloc_order_trace+0x26/0xa0 [10266192.222715] [<ffffffffbab4ff8e>] ? __irq_put_desc_unlock+0x1e/0x50 [10266192.227915] [<ffffffffbac28d01>] __kmalloc+0x211/0x230 [10266192.231529] [<ffffffffc07294d6>] mlx5_init_fc_stats+0x76/0x1d0 [mlx5_core] [10266192.236498] [<ffffffffc072831d>] mlx5_init_fs+0x2d/0x840 [mlx5_core] [10266192.242089] [<ffffffffc070c823>] mlx5_load_one+0x7e3/0xa30 [mlx5_core] [10266192.247841] [<ffffffffc070cf11>] init_one+0x411/0x5c0 [mlx5_core] [10266192.252484] [<ffffffffbadd704a>] local_pci_probe+0x4a/0xb0 [10266192.256825] [<ffffffffbadd8799>] pci_device_probe+0x109/0x160 [10266192.261383] [<ffffffffbaebbe75>] driver_probe_device+0xc5/0x3e0 [10266192.265771] [<ffffffffbaebc190>] ? driver_probe_device+0x3e0/0x3e0 [10266192.270080] [<ffffffffbaebc1d3>] __device_attach+0x43/0x50 [10266192.274260] [<ffffffffbaeb9af5>] bus_for_each_drv+0x75/0xc0 [10266192.278270] [<ffffffffbaebbcb0>] device_attach+0x90/0xb0 [10266192.282110] [<ffffffffbadcbbaf>] pci_bus_add_device+0x4f/0xa0 [10266192.286277] [<ffffffffbadcbc39>] pci_bus_add_devices+0x39/0x80 [10266192.290873] [<ffffffffc04d0d9b>] hv_pci_probe+0x9cb/0xcd0 [pci_hyperv] [10266192.295872] [<ffffffffc00b5b81>] vmbus_probe+0x41/0xa0 [hv_vmbus] [10266192.300110] [<ffffffffbaebbe75>] driver_probe_device+0xc5/0x3e0 [10266192.304084] [<ffffffffbaebc190>] ? driver_probe_device+0x3e0/0x3e0 [10266192.311115] [<ffffffffbaebc1d3>] __device_attach+0x43/0x50 [10266192.315732] [<ffffffffbaeb9af5>] bus_for_each_drv+0x75/0xc0 [10266192.319919] [<ffffffffbaebbcb0>] device_attach+0x90/0xb0 [10266192.357015] [<ffffffffbaebaed8>] bus_probe_device+0x98/0xd0 [10266192.362323] [<ffffffffbaeb877f>] device_add+0x4ff/0x7c0 [10266192.366081] [<ffffffffbaeb8a5a>] device_register+0x1a/0x20 [10266192.370472] [<ffffffffc00b65a6>] vmbus_device_register+0x66/0x100 [hv_vmbus] [10266192.377896] [<ffffffffc00b9e5d>] vmbus_add_channel_work+0x4cd/0x640 [hv_vmbus] [10266192.383035] [<ffffffffbaabdfdf>] process_one_work+0x17f/0x440 [10266192.390842] [<ffffffffbaabf0f6>] worker_thread+0x126/0x3c0 [10266192.395841] [<ffffffffbaabefd0>] ? manage_workers.isra.26+0x2a0/0x2a0 [10266192.405465] [<ffffffffbaac5fb1>] kthread+0xd1/0xe0 [10266192.408804] [<ffffffffbaac5ee0>] ? insert_kthread_work+0x40/0x40 [10266192.413519] [<ffffffffbb199df7>] ret_from_fork_nospec_begin+0x21/0x21 [10266192.418317] [<ffffffffbaac5ee0>] ? insert_kthread_work+0x40/0x40 [10266192.423137] Mem-Info: [10266192.425127] active_anon:16322977 inactive_anon:2861111 isolated_anon:0 [10266192.767489] mlx5_core 1727:00:02.0: Failed to init flow steering [10266192.963381] mlx5_core 1727:00:02.0: mlx5_load_one failed with error code -12 [10266192.969663] mlx5_core: probe of 1727:00:02.0 failed with error -12 Thanks, -- Dexuan
Powered by blists - more mailing lists