lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJpMwyhhWSC_x4Fef32iW5Umzk5bLdJFweuZmN9LEJTQGyHfbQ@mail.gmail.com>
Date:   Tue, 24 May 2022 12:56:41 +0200
From:   Haris Iqbal <haris.iqbal@...os.com>
To:     "lizhijian@...itsu.com" <lizhijian@...itsu.com>
Cc:     Jason Gunthorpe <jgg@...dia.com>,
        Bob Pearson <rpearsonhpe@...il.com>,
        Zhu Yanjun <zyjzyj2000@...il.com>,
        "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Guoqing Jiang <guoqing.jiang@...ux.dev>,
        Aleksei Marov <aleksei.marov@...os.com>,
        Jinpu Wang <jinpu.wang@...os.com>
Subject: Re: [PATCH] RDMA/rxe: Use kzalloc() to alloc map_set

On Tue, May 24, 2022 at 6:00 AM lizhijian@...itsu.com
<lizhijian@...itsu.com> wrote:
>
> Hi Jason & Bob
> CC Guoqing
>
> @Guoqing, It may correlate with your previous bug report: https://lore.kernel.org/all/20220210073655.42281-1-guoqing.jiang@linux.dev/T/
>
>
> It's observed that a same MR in rnbd server will trigger below code
> path:
>   -> rxe_mr_init_fast()
>   |-> alloc map_set() # map_set is uninitialized
>   |...-> rxe_map_mr_sg() # build the map_set
>       |-> rxe_mr_set_page()
>   |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE that means
>                            # we can access host memory(such rxe_mr_copy)
>   |...-> rxe_invalidate_mr() # mr->state change to FREE from VALID
>   |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE,
>                            # but map_set was not built again
>   |...-> rxe_mr_copy() # kernel crash due to access wild addresses
>                        # that lookup from the map_set
>
> I draft a patch like below for it, but i wonder if it's rxe's responsibility to do such checking.
> Any comments are very welcome.
>
>
>  From e9d0bd821f07f5e049027f07b3ce9dc283624201 Mon Sep 17 00:00:00 2001
> From: Li Zhijian <lizhijian@...itsu.com>
> Date: Tue, 24 May 2022 10:56:19 +0800
> Subject: [PATCH] RDMA/rxe: check map_set valid when handle IB_WR_REG_MR
>
> It's observed that a same MR in rnbd server will trigger below code
> path:
>   -> rxe_mr_init_fast()
>   |-> alloc map_set() # map_set is uninitialized
>   |...-> rxe_map_mr_sg() # build the map_set
>       |-> rxe_mr_set_page()
>   |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE that means
>                            # we can access host memory(such rxe_mr_copy)
>   |...-> rxe_invalidate_mr() # mr->state change to FREE from VALID
>   |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE,
>                            # but map_set was not built again
>   |...-> rxe_mr_copy() # kernel crash due to access wild addresses
>                        # that lookup from the map_set
>
> Signed-off-by: Li Zhijian <lizhijian@...itsu.com>
> ---
>   drivers/infiniband/sw/rxe/rxe_mr.c    | 9 +++++++++
>   drivers/infiniband/sw/rxe/rxe_verbs.c | 1 +
>   drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
>   3 files changed, 11 insertions(+)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
> index 787c7dadc14f..09673d559c06 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> @@ -90,6 +90,7 @@ static int rxe_mr_alloc_map_set(int num_map, struct rxe_map_set **setp)
>          if (!set->map)
>                  goto err_free_set;
>
> +       set->valid = false;
>          for (i = 0; i < num_map; i++) {
>                  set->map[i] = kmalloc(sizeof(struct rxe_map), GFP_KERNEL);
>                  if (!set->map[i])
> @@ -216,6 +217,7 @@ int rxe_mr_init_user(struct rxe_pd *pd, u64 start, u64 length, u64 iova,
>          }
>
>          set = mr->cur_map_set;
> +       set->valid = true;
>          set->page_shift = PAGE_SHIFT;
>          set->page_mask = PAGE_SIZE - 1;
>
> @@ -643,6 +645,7 @@ int rxe_invalidate_mr(struct rxe_qp *qp, u32 rkey)
>          }
>
>          mr->state = RXE_MR_STATE_FREE;
> +       mr->cur_map_set->valid = mr->next_map_set->valid = false;
>          ret = 0;
>
>   err_drop_ref:
> @@ -679,12 +682,18 @@ int rxe_reg_fast_mr(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
>                  return -EINVAL;
>          }
>
> +       if (!mr->next_map_set->valid) {
> +               pr_warn("%s: map set is not valid\n", __func__);
> +               return -EINVAL;
> +       }
> +
>          mr->access = access;
>          mr->lkey = (mr->lkey & ~0xff) | key;
>          mr->rkey = (access & IB_ACCESS_REMOTE) ? mr->lkey : 0;
>          mr->state = RXE_MR_STATE_VALID;
>
>          set = mr->cur_map_set;
> +       set->valid = false;
>          mr->cur_map_set = mr->next_map_set;
>          mr->cur_map_set->iova = wqe->wr.wr.reg.mr->iova;
>          mr->next_map_set = set;
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
> index 58e4412b1d16..4b7ae2d1d921 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.c
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
> @@ -992,6 +992,7 @@ static int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
>          set->page_shift = ilog2(ibmr->page_size);
>          set->page_mask = ibmr->page_size - 1;
>          set->offset = set->iova & set->page_mask;
> +       set->valid = true;
>
>          return n;
>   }
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
> index 86068d70cd95..2edf31aab7e1 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
> @@ -289,6 +289,7 @@ struct rxe_map {
>
>   struct rxe_map_set {
>          struct rxe_map          **map;
> +       bool                    valid;
>          u64                     va;
>          u64                     iova;
>          size_t                  length;
> --
> 2.31.1

Thanks for posting the description and the patch.

We have been facing the exact same issue (only with rxe), and we also
realized that to get around this we will have to call ib_map_mr_sg()
before every IB_WR_REG_MR wr; even if we are reusing the MR and simply
invalidating and re-validating them.

In reference to this, we have 2 questions.

1) This change was made in the following commit.

commit 647bf13ce944f20f7402f281578423a952274e4a
Author: Bob Pearson <rpearsonhpe@...il.com>
Date:   Tue Sep 14 11:42:06 2021 -0500

    RDMA/rxe: Create duplicate mapping tables for FMRs

    For fast memory regions create duplicate mapping tables so ib_map_mr_sg()
    can build a new mapping table which is then swapped into place
    synchronously with the execution of an IB_WR_REG_MR work request.

    Currently the rxe driver uses the same table for receiving RDMA operations
    and for building new tables in preparation for reusing the MR. This
    exposes users to potentially incorrect results.

    Link: https://lore.kernel.org/r/20210914164206.19768-5-rpearsonhpe@gmail.com
    Signed-off-by: Bob Pearson <rpearsonhpe@...il.com>
    Signed-off-by: Jason Gunthorpe <jgg@...dia.com>

We tried to understand what potential incorrect result that commit
message talks about, but were not able to. If someone can through
light into this scenario where single mapping table can result into
issues, it would be great.

2)
We wanted to confirm that, with the above patch, it clearly means that
the use-case where we reuse the MR, by simply invalidating and
re-validating (IB_WR_REG_MR wr) is a correct one.
And there is no requirement saying that ib_map_mr_sg() needs to be
done everytime regardless.

(We were planning to send this in the coming days, but wanted other
discussions to get over. Since the patch got posted and discussion
started, we thought better to put this out.)

Regards

>
>
> On 23/05/2022 22:02, Li, Zhijian wrote:
> >
> > on 2022/5/20 22:45, Jason Gunthorpe wrote:
> >> On Wed, May 18, 2022 at 12:37:25PM +0800, Li Zhijian wrote:
> >>> Below call chains will alloc map_set without fully initializing map_set.
> >>> rxe_mr_init_fast()
> >>>   -> rxe_mr_alloc()
> >>>      -> rxe_mr_alloc_map_set()
> >>>
> >>> Uninitialized values inside struct rxe_map_set are possible to cause
> >>> kernel panic.
> >> If the value is uninitialized then why is 0 an OK value?
> >>
> >> Would be happier to know the exact value that is not initialized
> >
> > Well, good question. After re-think of this issue, it seems this patch wasn't the root cause though it made the crash disappear in some extent.
> >
> > I'm still working on the root cause :)
> >
> > Thanks
> >
> > Zhijian
> >
> >
> >>
> >> Jason
> >
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ