[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c87db444-266d-eb76-d8d5-d8a0c11038b1@oracle.com>
Date: Wed, 30 May 2018 13:30:55 -0700
From: Qing Huang <qing.huang@...cle.com>
To: Eric Dumazet <edumazet@...gle.com>,
"David S . Miller" <davem@...emloft.net>
Cc: netdev <netdev@...r.kernel.org>,
Eric Dumazet <eric.dumazet@...il.com>,
John Sperbeck <jsperbeck@...gle.com>,
Tarick Bedeir <tarick@...gle.com>,
Daniel Jurgens <danielj@...lanox.com>,
Zhu Yanjun <yanjun.zhu@...cle.com>,
Tariq Toukan <tariqt@...lanox.com>
Subject: Re: [PATCH net] mlx4_core: restore optimal ICM memory allocation
On 5/29/2018 9:11 PM, Eric Dumazet wrote:
> Commit 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
> brought a regression caught in our regression suite, thanks to KASAN.
If KASAN reported issue was really caused by smaller chunk sizes,
changing allocation
order dynamically will eventually hit the same issue.
> Note that mlx4_alloc_icm() is already able to try high order allocations
> and fallback to low-order allocations under high memory pressure.
>
> We only have to tweak gfp_mask a bit, to help falling back faster,
> without risking OOM killings.
>
> BUG: KASAN: slab-out-of-bounds in to_rdma_ah_attr+0x808/0x9e0 [mlx4_ib]
> Read of size 4 at addr ffff8817df584f68 by task qp_listing_test/92585
>
> CPU: 38 PID: 92585 Comm: qp_listing_test Tainted: G O
> Call Trace:
> [<ffffffffba80d7bb>] dump_stack+0x4d/0x72
> [<ffffffffb951dc5f>] print_address_description+0x6f/0x260
> [<ffffffffb951e1c7>] kasan_report+0x257/0x370
> [<ffffffffb951e339>] __asan_report_load4_noabort+0x19/0x20
> [<ffffffffc0256d28>] to_rdma_ah_attr+0x808/0x9e0 [mlx4_ib]
> [<ffffffffc02785b3>] mlx4_ib_query_qp+0x1213/0x1660 [mlx4_ib]
> [<ffffffffc02dbfdb>] qpstat_print_qp+0x13b/0x500 [ib_uverbs]
> [<ffffffffc02dc3ea>] qpstat_seq_show+0x4a/0xb0 [ib_uverbs]
> [<ffffffffb95f125c>] seq_read+0xa9c/0x1230
> [<ffffffffb96e0821>] proc_reg_read+0xc1/0x180
> [<ffffffffb9577918>] __vfs_read+0xe8/0x730
> [<ffffffffb9578057>] vfs_read+0xf7/0x300
> [<ffffffffb95794d2>] SyS_read+0xd2/0x1b0
> [<ffffffffb8e06b16>] do_syscall_64+0x186/0x420
> [<ffffffffbaa00071>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> RIP: 0033:0x7f851a7bb30d
> RSP: 002b:00007ffd09a758c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
> RAX: ffffffffffffffda RBX: 00007f84ff959440 RCX: 00007f851a7bb30d
> RDX: 000000000003fc00 RSI: 00007f84ff60a000 RDI: 000000000000000b
> RBP: 00007ffd09a75900 R08: 00000000ffffffff R09: 0000000000000000
> R10: 0000000000000022 R11: 0000000000000293 R12: 0000000000000000
> R13: 000000000003ffff R14: 000000000003ffff R15: 00007f84ff60a000
>
> Allocated by task 4488:
> save_stack+0x46/0xd0
> kasan_kmalloc+0xad/0xe0
> __kmalloc+0x101/0x5e0
> ib_register_device+0xc03/0x1250 [ib_core]
> mlx4_ib_add+0x27d6/0x4dd0 [mlx4_ib]
> mlx4_add_device+0xa9/0x340 [mlx4_core]
> mlx4_register_interface+0x16e/0x390 [mlx4_core]
> xhci_pci_remove+0x7a/0x180 [xhci_pci]
> do_one_initcall+0xa0/0x230
> do_init_module+0x1b9/0x5a4
> load_module+0x63e6/0x94c0
> SYSC_init_module+0x1a4/0x1c0
> SyS_init_module+0xe/0x10
> do_syscall_64+0x186/0x420
> entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>
> Freed by task 0:
> (stack is not available)
>
> The buggy address belongs to the object at ffff8817df584f40
> which belongs to the cache kmalloc-32 of size 32
> The buggy address is located 8 bytes to the right of
> 32-byte region [ffff8817df584f40, ffff8817df584f60)
> The buggy address belongs to the page:
> page:ffffea005f7d6100 count:1 mapcount:0 mapping:ffff8817df584000 index:0xffff8817df584fc1
> flags: 0x880000000000100(slab)
> raw: 0880000000000100 ffff8817df584000 ffff8817df584fc1 000000010000003f
> raw: ffffea005f3ac0a0 ffffea005c476760 ffff8817fec00900 ffff883ff78d26c0
> page dumped because: kasan: bad access detected
> page->mem_cgroup:ffff883ff78d26c0
>
> Memory state around the buggy address:
> ffff8817df584e00: 00 03 fc fc fc fc fc fc 00 03 fc fc fc fc fc fc
> ffff8817df584e80: 00 00 00 04 fc fc fc fc 00 00 00 fc fc fc fc fc
>> ffff8817df584f00: fb fb fb fb fc fc fc fc 00 00 00 00 fc fc fc fc
> ^
> ffff8817df584f80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
> ffff8817df585000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>
> Fixes: 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> Cc: John Sperbeck <jsperbeck@...gle.com>
> Cc: Tarick Bedeir <tarick@...gle.com>
> Cc: Qing Huang <qing.huang@...cle.com>
> Cc: Daniel Jurgens <danielj@...lanox.com>
> Cc: Zhu Yanjun <yanjun.zhu@...cle.com>
> Cc: Tariq Toukan <tariqt@...lanox.com>
> ---
> drivers/net/ethernet/mellanox/mlx4/icm.c | 17 +++++++++++------
> 1 file changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c
> index 685337d58276fc91baeeb64387c52985e1bc6dda..cae33d5c7dbd9ba7929adcf2127b104f6796fa5a 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/icm.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
> @@ -43,12 +43,13 @@
> #include "fw.h"
>
> /*
> - * We allocate in page size (default 4KB on many archs) chunks to avoid high
> - * order memory allocations in fragmented/high usage memory situation.
> + * We allocate in as big chunks as we can, up to a maximum of 256 KB
> + * per chunk. Note that the chunks are not necessarily in contiguous
> + * physical memory.
> */
> enum {
> - MLX4_ICM_ALLOC_SIZE = PAGE_SIZE,
> - MLX4_TABLE_CHUNK_SIZE = PAGE_SIZE,
> + MLX4_ICM_ALLOC_SIZE = 1 << 18,
> + MLX4_TABLE_CHUNK_SIZE = 1 << 18,
> };
>
> static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chunk)
> @@ -135,6 +136,7 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int npages,
> struct mlx4_icm *icm;
> struct mlx4_icm_chunk *chunk = NULL;
> int cur_order;
> + gfp_t mask;
> int ret;
>
> /* We use sg_set_buf for coherent allocs, which assumes low memory */
> @@ -178,13 +180,16 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int npages,
> while (1 << cur_order > npages)
> --cur_order;
>
> + mask = gfp_mask;
> + if (cur_order)
> + mask = (mask & ~__GFP_DIRECT_RECLAIM) | __GFP_NORETRY;
> if (coherent)
> ret = mlx4_alloc_icm_coherent(&dev->persist->pdev->dev,
> &chunk->mem[chunk->npages],
> - cur_order, gfp_mask);
> + cur_order, mask);
> else
> ret = mlx4_alloc_icm_pages(&chunk->mem[chunk->npages],
> - cur_order, gfp_mask,
> + cur_order, mask,
> dev->numa_node);
>
> if (ret) {
Powered by blists - more mailing lists