[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ea4800dc-0828-6d3e-b957-74b3595e0fd8@mellanox.com>
Date: Sun, 6 Jan 2019 08:30:35 +0000
From: Tariq Toukan <tariqt@...lanox.com>
To: Stephen Warren <swarren@...dotorg.org>,
Tariq Toukan <tariqt@...lanox.com>,
"xavier.huwei@...wei.com" <xavier.huwei@...wei.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
Doug Ledford <dledford@...hat.com>,
Jason Gunthorpe <jgg@...lanox.com>,
Christoph Hellwig <hch@....de>,
Stephen Warren <swarren@...dia.com>
Subject: Re: [PATCH v4 1/2] net/mlx4: Get rid of page operation after
dma_alloc_coherent
On 1/3/2019 7:23 PM, Stephen Warren wrote:
> From: Stephen Warren <swarren@...dia.com>
>
> This patch solves a crash at the time of mlx4 driver unload or system
> shutdown. The crash occurs because dma_alloc_coherent() returns one
> value in mlx4_alloc_icm_coherent(), but a different value is passed to
> dma_free_coherent() in mlx4_free_icm_coherent(). In turn this is because
> when allocated, that pointer is passed to sg_set_buf() to record it,
> then when freed it is re-calculated by calling
> lowmem_page_address(sg_page()) which returns a different value. Solve
> this by recording the value that dma_alloc_coherent() returns, and
> passing this to dma_free_coherent().
>
> This patch is roughly equivalent to commit 378efe798ecf ("RDMA/hns: Get
> rid of page operation after dma_alloc_coherent").
>
> Based-on-code-from: Christoph Hellwig <hch@....de>
> Signed-off-by: Stephen Warren <swarren@...dia.com>
> ---
> v4 (Jan 3):
> - Shortened commit description.
> - Use bool not int in struct mlx4_icm_chunk.
> - Tariq said "Thanks for your patch. It looks good to me." for v3.
> v3 (Dec 19):
> - Rework chunk data structure to store all data for coherent allocations
> separately from the sg list. Code from Christoph Hellwig with fixes by
> me. Notes:
> - chunk->coherent is an int not a bool since checkpatch complains about
> using bool in structs; see https://lkml.org/lkml/2017/11/21/384.
> - chunk->coherent is used rather than chunk->table->coherent since the
> table pointer isn't available when creating chunks. This duplicates
> data, but simplifies the patch.
> v2:
> - Rework mlx4_table_find() to explicitly calculate the returned address
> differently depending on wheter the table was allocated using
> dma_alloc_coherent() or alloc_pages(), which in turn allows the
> changes to mlx4_alloc_icm_pages() to be dropped.
> - Drop changes to mlx4_alloc/free_icm_pages. This path uses
> pci_map_sg() which can re-write the sg list which in turn would cause
> chunk->mem[] (the sg list) and chunk->buf[] to become inconsistent.
> - Enhance commit description.
>
> Note: I've tested this patch in a downstream 4.14 based kernel (using
> ibping, ib_read_bw, and ib_write_bw), but can't test it in mainline
> since my system isn't supported there yet. I have compile-tested it in
> mainline at least, for ARM64.
> ---
> drivers/net/ethernet/mellanox/mlx4/icm.c | 92 ++++++++++++++----------
> drivers/net/ethernet/mellanox/mlx4/icm.h | 22 +++++-
> 2 files changed, 75 insertions(+), 39 deletions(-)
>
Reviewed-by: Tariq Toukan <tariqt@...lanox.com>
Thanks.
Powered by blists - more mailing lists