lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Wed, 5 Oct 2022 00:16:46 +0000
From:   Dexuan Cui <decui@...rosoft.com>
To:     Avihai Horon <avihaih@...dia.com>, Mark Bloch <mbloch@...dia.com>,
        Saeed Mahameed <saeedm@...dia.com>,
        Gavi Teitz <gavi@...lanox.com>,
        Vlad Buslov <vladbu@...lanox.com>
CC:     Haiyang Zhang <haiyangz@...rosoft.com>,
        Long Li <longli@...rosoft.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: mlx5_init_fc_stats() hits OOM due to memory fragmentation?

Hi, mlx5 folks,
I got the call-trace from a RHEL 7 VM. It looks like mlx5_init_fc_stats() -> kzalloc()
hits OOM due to memory fragmentation:

// This is the code from RHEL 7: linux-3.10.0-1160.53.1.el7.x86_64
int mlx5_init_fc_stats(struct mlx5_core_dev *dev)
{
        struct mlx5_fc_stats *fc_stats = &dev->priv.fc_stats;
        int max_bulk_len;
        int max_out_len;

        spin_lock_init(&fc_stats->counters_idr_lock);
        idr_init_ext(&fc_stats->counters_idr);
        INIT_LIST_HEAD(&fc_stats->counters);
        init_llist_head(&fc_stats->addlist);
        init_llist_head(&fc_stats->dellist);

        max_bulk_len = get_max_bulk_query_len(dev);
        max_out_len = mlx5_cmd_fc_get_bulk_query_out_len(max_bulk_len);
        fc_stats->bulk_query_out = kzalloc(max_out_len, GFP_KERNEL);
        if (!fc_stats->bulk_query_out)
                return -ENOMEM;
...

I think the latest mainline kernel has the same issue.
Can this kzalloc() be changed to vzalloc()?

[10266192.131842] kworker/8:1: page allocation failure: order:8, mode:0xc0d0
[10266192.138260] CPU: 8 PID: 62790 Comm: kworker/8:1 Kdump: loaded Tainted: P            E  ------------   3.10.0-1160.62.1.el7.x86_64 #1
[10266192.179718] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 10/27/2020
[10266192.191089] Workqueue: hv_pri_chan vmbus_add_channel_work [hv_vmbus]
[10266192.196217] Call Trace:
[10266192.197944]  [<ffffffffbb1865a9>] dump_stack+0x19/0x1b
[10266192.201866]  [<ffffffffbabc4bd0>] warn_alloc_failed+0x110/0x180
[10266192.206103]  [<ffffffffbabc976f>] __alloc_pages_nodemask+0x9df/0xbe0
[10266192.210484]  [<ffffffffbac193a8>] alloc_pages_current+0x98/0x110
[10266192.214644]  [<ffffffffbabe5fc8>] kmalloc_order+0x18/0x40
[10266192.218723]  [<ffffffffbac24d76>] kmalloc_order_trace+0x26/0xa0
[10266192.222715]  [<ffffffffbab4ff8e>] ? __irq_put_desc_unlock+0x1e/0x50
[10266192.227915]  [<ffffffffbac28d01>] __kmalloc+0x211/0x230
[10266192.231529]  [<ffffffffc07294d6>] mlx5_init_fc_stats+0x76/0x1d0 [mlx5_core]
[10266192.236498]  [<ffffffffc072831d>] mlx5_init_fs+0x2d/0x840 [mlx5_core]
[10266192.242089]  [<ffffffffc070c823>] mlx5_load_one+0x7e3/0xa30 [mlx5_core]
[10266192.247841]  [<ffffffffc070cf11>] init_one+0x411/0x5c0 [mlx5_core]
[10266192.252484]  [<ffffffffbadd704a>] local_pci_probe+0x4a/0xb0
[10266192.256825]  [<ffffffffbadd8799>] pci_device_probe+0x109/0x160
[10266192.261383]  [<ffffffffbaebbe75>] driver_probe_device+0xc5/0x3e0
[10266192.265771]  [<ffffffffbaebc190>] ? driver_probe_device+0x3e0/0x3e0
[10266192.270080]  [<ffffffffbaebc1d3>] __device_attach+0x43/0x50
[10266192.274260]  [<ffffffffbaeb9af5>] bus_for_each_drv+0x75/0xc0
[10266192.278270]  [<ffffffffbaebbcb0>] device_attach+0x90/0xb0
[10266192.282110]  [<ffffffffbadcbbaf>] pci_bus_add_device+0x4f/0xa0
[10266192.286277]  [<ffffffffbadcbc39>] pci_bus_add_devices+0x39/0x80
[10266192.290873]  [<ffffffffc04d0d9b>] hv_pci_probe+0x9cb/0xcd0 [pci_hyperv]
[10266192.295872]  [<ffffffffc00b5b81>] vmbus_probe+0x41/0xa0 [hv_vmbus]
[10266192.300110]  [<ffffffffbaebbe75>] driver_probe_device+0xc5/0x3e0
[10266192.304084]  [<ffffffffbaebc190>] ? driver_probe_device+0x3e0/0x3e0
[10266192.311115]  [<ffffffffbaebc1d3>] __device_attach+0x43/0x50
[10266192.315732]  [<ffffffffbaeb9af5>] bus_for_each_drv+0x75/0xc0
[10266192.319919]  [<ffffffffbaebbcb0>] device_attach+0x90/0xb0
[10266192.357015]  [<ffffffffbaebaed8>] bus_probe_device+0x98/0xd0
[10266192.362323]  [<ffffffffbaeb877f>] device_add+0x4ff/0x7c0
[10266192.366081]  [<ffffffffbaeb8a5a>] device_register+0x1a/0x20
[10266192.370472]  [<ffffffffc00b65a6>] vmbus_device_register+0x66/0x100 [hv_vmbus]
[10266192.377896]  [<ffffffffc00b9e5d>] vmbus_add_channel_work+0x4cd/0x640 [hv_vmbus]
[10266192.383035]  [<ffffffffbaabdfdf>] process_one_work+0x17f/0x440
[10266192.390842]  [<ffffffffbaabf0f6>] worker_thread+0x126/0x3c0
[10266192.395841]  [<ffffffffbaabefd0>] ? manage_workers.isra.26+0x2a0/0x2a0
[10266192.405465]  [<ffffffffbaac5fb1>] kthread+0xd1/0xe0
[10266192.408804]  [<ffffffffbaac5ee0>] ? insert_kthread_work+0x40/0x40
[10266192.413519]  [<ffffffffbb199df7>] ret_from_fork_nospec_begin+0x21/0x21
[10266192.418317]  [<ffffffffbaac5ee0>] ? insert_kthread_work+0x40/0x40
[10266192.423137] Mem-Info:
[10266192.425127] active_anon:16322977 inactive_anon:2861111 isolated_anon:0
[10266192.767489] mlx5_core 1727:00:02.0: Failed to init flow steering
[10266192.963381] mlx5_core 1727:00:02.0: mlx5_load_one failed with error code -12
[10266192.969663] mlx5_core: probe of 1727:00:02.0 failed with error -12

Thanks,
-- Dexuan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ