lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALzJLG-oiSfaA4mUOHp3Psun7=JvrGYpZsJWpAQUb_eZ5ewtGw@mail.gmail.com>
Date:   Thu, 3 Nov 2016 01:30:01 +0200
From:   Saeed Mahameed <saeedm@....mellanox.co.il>
To:     Sebastian Ott <sebott@...ux.vnet.ibm.com>
Cc:     Matan Barak <matanb@...lanox.com>,
        Leon Romanovsky <leonro@...lanox.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        Linux Netdev List <netdev@...r.kernel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: mlx5: ifup failure due to huge allocation

On Wed, Nov 2, 2016 at 3:37 PM, Sebastian Ott <sebott@...ux.vnet.ibm.com> wrote:
> Hi,
>
> Ifup on an interface provided by CX4 (MLX5 driver) on s390 fails with:
>
> [   22.318553] ------------[ cut here ]------------
> [   22.318564] WARNING: CPU: 1 PID: 399 at mm/page_alloc.c:3421 __alloc_pages_nodemask+0x2ee/0x1298
> [   22.318568] Modules linked in: mlx4_ib ib_core mlx5_core mlx4_en mlx4_core [...]
> [   22.318610] CPU: 1 PID: 399 Comm: NetworkManager Not tainted 4.8.0 #13
> [   22.318614] Hardware name: IBM              2964 N96              704              (LPAR)
> [   22.318618] task: 00000000dbe1c008 task.stack: 00000000dd9e4000
> [   22.318622] Krnl PSW : 0704c00180000000 00000000002a427e (__alloc_pages_nodemask+0x2ee/0x1298)
> [   22.318631]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
>                Krnl GPRS: 0000000000000000 0000000000ceb4d4 00000000024080c0 0000000000000001
> [   22.318640]            00000000002a4204 00000000ffffa410 00000000001fffff 0000000000000001
> [   22.318644]            00000000024080c0 0000000000000009 0000000000000000 0000000000000000
> [   22.318648]            00000000ffffa400 000000000088ea30 00000000002a4204 00000000dd9e7060
> [   22.318660] Krnl Code: 00000000002a4272: a7740592            brc     7,2a4d96
>                           00000000002a4276: 92011000            mvi     0(%r1),1
>                          #00000000002a427a: a7f40001            brc     15,2a427c
>                          >00000000002a427e: a7f4058c            brc     15,2a4d96
>                           00000000002a4282: 5830f0b4            l       %r3,180(%r15)
>                           00000000002a4286: 5030f0ec            st      %r3,236(%r15)
>                           00000000002a428a: 1823                lr      %r2,%r3
>                           00000000002a428c: a53e0048            llilh   %r3,72
> [   22.318695] Call Trace:
> [   22.318700] ([<00000000002a4204>] __alloc_pages_nodemask+0x274/0x1298)
> [   22.318706] ([<000000000030dac0>] alloc_pages_current+0x1c0/0x268)
> [   22.318712] ([<0000000000135aa6>] s390_dma_alloc+0x6e/0x1e0)
> [   22.318733] ([<000003ff8015474c>] mlx5_dma_zalloc_coherent_node+0xb4/0xf8 [mlx5_core])
> [   22.318748] ([<000003ff80154c58>] mlx5_buf_alloc_node+0x70/0x108 [mlx5_core])
> [   22.318765] ([<000003ff8015fe06>] mlx5_cqwq_create+0xf6/0x180 [mlx5_core])
> [   22.318783] ([<000003ff8016654c>] mlx5e_open_cq+0xac/0x1e0 [mlx5_core])
> [   22.318802] ([<000003ff801693e6>] mlx5e_open_channels+0xe66/0xeb8 [mlx5_core])
> [   22.318820] ([<000003ff8016982e>] mlx5e_open_locked+0x8e/0x1e0 [mlx5_core])
> [   22.318837] ([<000003ff801699c6>] mlx5e_open+0x46/0x68 [mlx5_core])
> [   22.318844] ([<0000000000748338>] __dev_open+0xa8/0x118)
> [   22.318848] ([<000000000074867a>] __dev_change_flags+0xc2/0x190)
> [   22.318853] ([<000000000074877e>] dev_change_flags+0x36/0x78)
> [   22.318858] ([<000000000075bc8a>] do_setlink+0x332/0xb30)
> [   22.318862] ([<000000000075de3a>] rtnl_newlink+0x3e2/0x820)
> [   22.318867] ([<000000000075e46e>] rtnetlink_rcv_msg+0x1f6/0x248)
> [   22.318873] ([<0000000000782202>] netlink_rcv_skb+0x92/0x108)
> [   22.318878] ([<000000000075c668>] rtnetlink_rcv+0x48/0x58)
> [   22.318882] ([<0000000000781ace>] netlink_unicast+0x14e/0x1f0)
> [   22.318887] ([<0000000000781f82>] netlink_sendmsg+0x32a/0x3b0)
> [   22.318892] ([<000000000071d502>] sock_sendmsg+0x5a/0x80)
> [   22.318897] ([<000000000071ed38>] ___sys_sendmsg+0x270/0x2a8)
> [   22.318901] ([<000000000071fe80>] __sys_sendmsg+0x60/0x90)
> [   22.318905] ([<00000000007207c6>] SyS_socketcall+0x2be/0x388)
> [   22.318912] ([<000000000086fcae>] system_call+0xd6/0x270)
> [   22.318916] 3 locks held by NetworkManager/399:
> [   22.318920]  #0:  (rtnl_mutex){+.+.+.}, at: [<000000000075c658>] rtnetlink_rcv+0x38/0x58
> [   22.318935]  #1:  (&priv->state_lock){+.+.+.}, at: [<000003ff801699bc>] mlx5e_open+0x3c/0x68 [mlx5_core]
> [   22.318962]  #2:  (&priv->alloc_mutex){+.+.+.}, at: [<000003ff801546e0>] mlx5_dma_zalloc_coherent_node+0x48/0xf8 [mlx5_core]
> [   22.318987] Last Breaking-Event-Address:
> [   22.318992]  [<00000000002a427a>] __alloc_pages_nodemask+0x2ea/0x1298
> [   22.318996] ---[ end trace d2b54f5a0cd00b89 ]---
> [   22.319001] mlx5_core 0001:00:00.0: 0001:00:00.0:mlx5_cqwq_create:121:(pid 399): mlx5_buf_alloc_node() failed, -12
> [   22.320548] mlx5_core 0001:00:00.0 enP1s171: mlx5e_open_locked: mlx5e_open_channels failed, -12
>
>
>
> This fails because the largest possible allocation on s390 is currently 1MB (order 8).
> Would it be possible to add the __GFP_NOWARN flag and try a smaller allocation if the
> big one failed? (The latter change also would make the device usable when it is added
> via hotplug and free memory is scattered).
>

Thanks Sebastian for the detailed report.

We are planing and working on a solution to allocate fragmented
buffers rather than demanding contiguous ones.
Hopefully we will have the solution upstream before 4.10 is released.
and yes  __GFP_NOWARN is reasonable, will have it as well, the return
value of mlx5_buf_alloc_node is sufficient in case of an error, the
stack trace is just noise.

-Saeed.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ