[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c79821e0-307c-5736-6eb5-e20983097345@oracle.com>
Date: Mon, 1 Jul 2019 13:41:00 -0700
From: santosh.shilimkar@...cle.com
To: Gerd Rausch <gerd.rausch@...cle.com>, netdev@...r.kernel.org
Cc: David Miller <davem@...emloft.net>
Subject: Re: [PATCH net-next 3/7] net/rds: Wait for the FRMR_IS_FREE (or
FRMR_IS_STALE) transition after posting IB_WR_LOCAL_INV
On 7/1/19 9:39 AM, Gerd Rausch wrote:
> In order to:
> 1) avoid a silly bouncing between "clean_list" and "drop_list"
> triggered by function "rds_ib_reg_frmr" as it is releases frmr
> regions whose state is not "FRMR_IS_FREE" right away.
>
> 2) prevent an invalid access error in a race from a pending
> "IB_WR_LOCAL_INV" operation with a teardown ("dma_unmap_sg", "put_page")
> and de-registration ("ib_dereg_mr") of the corresponding
> memory region.
>
> Signed-off-by: Gerd Rausch <gerd.rausch@...cle.com>
> ---
> net/rds/ib_frmr.c | 89 ++++++++++++++++++++++++++++++-----------------
> net/rds/ib_mr.h | 2 ++
> 2 files changed, 59 insertions(+), 32 deletions(-)
>
> diff --git a/net/rds/ib_frmr.c b/net/rds/ib_frmr.c
> index 9f8aa310c27a..3c953034dca3 100644
> --- a/net/rds/ib_frmr.c
> +++ b/net/rds/ib_frmr.c
> @@ -76,6 +76,7 @@ static struct rds_ib_mr *rds_ib_alloc_frmr(struct rds_ib_device *rds_ibdev,
>
> frmr->fr_state = FRMR_IS_FREE;
> init_waitqueue_head(&frmr->fr_inv_done);
> + init_waitqueue_head(&frmr->fr_reg_done);
> return ibmr;
>
> out_no_cigar:
> @@ -124,6 +125,7 @@ static int rds_ib_post_reg_frmr(struct rds_ib_mr *ibmr)
> */
> ib_update_fast_reg_key(frmr->mr, ibmr->remap_count++);
> frmr->fr_state = FRMR_IS_INUSE;
> + frmr->fr_reg = true;
>
> memset(®_wr, 0, sizeof(reg_wr));
> reg_wr.wr.wr_id = (unsigned long)(void *)ibmr;
> @@ -144,7 +146,29 @@ static int rds_ib_post_reg_frmr(struct rds_ib_mr *ibmr)
> if (printk_ratelimit())
> pr_warn("RDS/IB: %s returned error(%d)\n",
> __func__, ret);
> + goto out;
> + }
> +
> + if (!frmr->fr_reg)
> + goto out;
> +
> + /* Wait for the registration to complete in order to prevent an invalid
> + * access error resulting from a race between the memory region already
> + * being accessed while registration is still pending.
> + */
> + wait_event_timeout(frmr->fr_reg_done, !frmr->fr_reg,
> + msecs_to_jiffies(100));
> +
This arbitrary timeout in this patch as well as pacth 1/7 which
Dave pointed out has any logic ?
MR registration command issued to hardware can at times take as
much as command timeout(e.g 60 seconds in CX3) and upto that its still
legitimate operation and not necessary failure. We shouldn't add
arbitrary time outs in ULPs.
Regards,
Santosh
Powered by blists - more mailing lists