[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <90a0e8c924093cfa50a482880ad7e7edb73dc19a.1623309971.git.leonro@nvidia.com>
Date: Thu, 10 Jun 2021 10:34:27 +0300
From: Leon Romanovsky <leon@...nel.org>
To: Doug Ledford <dledford@...hat.com>,
Jason Gunthorpe <jgg@...dia.com>
Cc: Alaa Hleihel <alaa@...dia.com>, Aharon Landau <aharonl@...dia.com>,
linux-kernel@...r.kernel.org, linux-rdma@...r.kernel.org,
Maor Gottlieb <maorg@...dia.com>
Subject: [PATCH rdma-rc 3/3] IB/mlx5: Fix initializing CQ fragments buffer
From: Alaa Hleihel <alaa@...dia.com>
Function init_cq_frag_buf() can be called to initialize the current CQ
fragments buffer cq->buf, or the temporary cq->resize_buf that is filled
during CQ resize operation.
However, the offending commit started to use function get_cqe() for
getting the CQEs, the issue with this change is that get_cqe() always
returns CQEs from cq->buf, which leads us to initialize the wrong
buffer, and in case of enlarging the CQ we try to access elements beyond
the size of the current cq->buf and eventually hit a kernel panic.
[exception RIP: init_cq_frag_buf+103]
[ffff9f799ddcbcd8] mlx5_ib_resize_cq at ffffffffc0835d60 [mlx5_ib]
[ffff9f799ddcbdb0] ib_resize_cq at ffffffffc05270df [ib_core]
[ffff9f799ddcbdc0] llt_rdma_setup_qp at ffffffffc0a6a712 [llt]
[ffff9f799ddcbe10] llt_rdma_cc_event_action at ffffffffc0a6b411 [llt]
[ffff9f799ddcbe98] llt_rdma_client_conn_thread at ffffffffc0a6bb75 [llt]
[ffff9f799ddcbec8] kthread at ffffffffa66c5da1
[ffff9f799ddcbf50] ret_from_fork_nospec_begin at ffffffffa6d95ddd
Fix it by getting the needed CQE by calling mlx5_frag_buf_get_wqe() that
takes the correct source buffer as a parameter.
Fixes: 388ca8be0037 ("IB/mlx5: Implement fragmented completion queue (CQ)")
Signed-off-by: Alaa Hleihel <alaa@...dia.com>
Signed-off-by: Leon Romanovsky <leonro@...dia.com>
---
drivers/infiniband/hw/mlx5/cq.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index eb92cefffd77..9ce01f729673 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -849,15 +849,14 @@ static void destroy_cq_user(struct mlx5_ib_cq *cq, struct ib_udata *udata)
ib_umem_release(cq->buf.umem);
}
-static void init_cq_frag_buf(struct mlx5_ib_cq *cq,
- struct mlx5_ib_cq_buf *buf)
+static void init_cq_frag_buf(struct mlx5_ib_cq_buf *buf)
{
int i;
void *cqe;
struct mlx5_cqe64 *cqe64;
for (i = 0; i < buf->nent; i++) {
- cqe = get_cqe(cq, i);
+ cqe = mlx5_frag_buf_get_wqe(&buf->fbc, i);
cqe64 = buf->cqe_size == 64 ? cqe : cqe + 64;
cqe64->op_own = MLX5_CQE_INVALID << 4;
}
@@ -883,7 +882,7 @@ static int create_cq_kernel(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq,
if (err)
goto err_db;
- init_cq_frag_buf(cq, &cq->buf);
+ init_cq_frag_buf(&cq->buf);
*inlen = MLX5_ST_SZ_BYTES(create_cq_in) +
MLX5_FLD_SZ_BYTES(create_cq_in, pas[0]) *
@@ -1184,7 +1183,7 @@ static int resize_kernel(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq,
if (err)
goto ex;
- init_cq_frag_buf(cq, cq->resize_buf);
+ init_cq_frag_buf(cq->resize_buf);
return 0;
--
2.31.1
Powered by blists - more mailing lists