lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 9 Sep 2022 00:55:59 +0000
From:   "matsuda-daisuke@...itsu.com" <matsuda-daisuke@...itsu.com>
To:     'Haris Iqbal' <haris.iqbal@...os.com>
CC:     "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
        "leonro@...dia.com" <leonro@...dia.com>,
        "jgg@...dia.com" <jgg@...dia.com>,
        "zyjzyj2000@...il.com" <zyjzyj2000@...il.com>,
        "nvdimm@...ts.linux.dev" <nvdimm@...ts.linux.dev>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "rpearsonhpe@...il.com" <rpearsonhpe@...il.com>,
        "yangx.jy@...itsu.com" <yangx.jy@...itsu.com>,
        "lizhijian@...itsu.com" <lizhijian@...itsu.com>,
        "y-goto@...itsu.com" <y-goto@...itsu.com>,
        haris iqbal <haris.phnx@...il.com>
Subject: RE: [RFC PATCH 5/7] RDMA/rxe: Allow registering MRs for On-Demand
 Paging

On Fri, Sep 9, 2022 1:58 AM Haris Iqbal wrote:
> On Wed, Sep 7, 2022 at 4:45 AM Daisuke Matsuda
> <matsuda-daisuke@...itsu.com> wrote:
> >
> > Allow applications to register an ODP-enabled MR, in which case the flag
> > IB_ACCESS_ON_DEMAND is passed to rxe_reg_user_mr(). However, there is no
> > RDMA operation supported right now. They will be enabled later in the
> > subsequent two patches.
> >
> > rxe_odp_do_pagefault() is called to initialize an ODP-enabled MR here.
> > It syncs process address space from the CPU page table to the driver page
> > table(dma_list/pfn_list in umem_odp) when called with a
> > RXE_PAGEFAULT_SNAPSHOT flag. Additionally, It can be used to trigger page
> > fault when pages being accessed are not present or do not have proper
> > read/write permissions and possibly to prefetch pages in the future.
> >
> > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@...itsu.com>
> > ---
> >  drivers/infiniband/sw/rxe/rxe.c       |  7 +++
> >  drivers/infiniband/sw/rxe/rxe_loc.h   |  5 ++
> >  drivers/infiniband/sw/rxe/rxe_mr.c    |  7 ++-
> >  drivers/infiniband/sw/rxe/rxe_odp.c   | 80 +++++++++++++++++++++++++++
> >  drivers/infiniband/sw/rxe/rxe_resp.c  | 21 +++++--
> >  drivers/infiniband/sw/rxe/rxe_verbs.c |  8 ++-
> >  drivers/infiniband/sw/rxe/rxe_verbs.h |  2 +
> >  7 files changed, 121 insertions(+), 9 deletions(-)
> >

<...>

> > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> > index cadc8fa64dd0..dd8632e783f6 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> > @@ -535,8 +535,12 @@ static enum resp_states write_data_in(struct rxe_qp *qp,
> >         int     err;
> >         int data_len = payload_size(pkt);
> >
> > -       err = rxe_mr_copy(qp->resp.mr, qp->resp.va + qp->resp.offset,
> > -                         payload_addr(pkt), data_len, RXE_TO_MR_OBJ);
> > +       if (qp->resp.mr->odp_enabled)
> 
> You cannot use qp->resp.mr here, because for zero byte operations,
> resp.mr is not set in the function check_rkey().
> 
> The code fails for RTRS with the following stack trace,
> 
> [Thu Sep  8 20:12:22 2022] BUG: kernel NULL pointer dereference,
> address: 0000000000000158
> [Thu Sep  8 20:12:22 2022] #PF: supervisor read access in kernel mode
> [Thu Sep  8 20:12:22 2022] #PF: error_code(0x0000) - not-present page
> [Thu Sep  8 20:12:22 2022] PGD 0 P4D 0
> [Thu Sep  8 20:12:22 2022] Oops: 0000 [#1] PREEMPT SMP
> [Thu Sep  8 20:12:22 2022] CPU: 3 PID: 38 Comm: kworker/u8:1 Not
> tainted 6.0.0-rc2-pserver+ #17
> [Thu Sep  8 20:12:22 2022] Hardware name: QEMU Standard PC (i440FX +
> PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> [Thu Sep  8 20:12:22 2022] Workqueue: rxe_resp rxe_do_work [rdma_rxe]
> [Thu Sep  8 20:12:22 2022] RIP: 0010:rxe_responder+0x1910/0x1d90 [rdma_rxe]
> [Thu Sep  8 20:12:22 2022] Code: 06 48 63 88 fc 15 63 c0 0f b6 46 01
> 83 ea 04 c0 e8 04 29 ca 83 e0 03 29 c2 49 8b 87 08 05 00 00 49 03 87
> 00 05 00 00 4c 63 ea <80> bf 58 01 00 00 00 48 8d 14 0e 48 89 c6 4d 89
> ee 44 89 e9 0f 84
> [Thu Sep  8 20:12:22 2022] RSP: 0018:ffffb0358015fd80 EFLAGS: 00010246
> [Thu Sep  8 20:12:22 2022] RAX: 0000000000000000 RBX: ffff9af4839b5e28
> RCX: 0000000000000020
> [Thu Sep  8 20:12:22 2022] RDX: 0000000000000000 RSI: ffff9af485094a6a
> RDI: 0000000000000000
> [Thu Sep  8 20:12:22 2022] RBP: ffff9af488bd7128 R08: 0000000000000000
> R09: 0000000000000000
> [Thu Sep  8 20:12:22 2022] R10: ffff9af4808eaf7c R11: 0000000000000001
> R12: 0000000000000008
> [Thu Sep  8 20:12:22 2022] R13: 0000000000000000 R14: ffff9af488bd7380
> R15: ffff9af488bd7000
> [Thu Sep  8 20:12:22 2022] FS:  0000000000000000(0000)
> GS:ffff9af5b7d80000(0000) knlGS:0000000000000000
> [Thu Sep  8 20:12:22 2022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [Thu Sep  8 20:12:22 2022] CR2: 0000000000000158 CR3: 000000004a60a000
> CR4: 00000000000006e0
> [Thu Sep  8 20:12:22 2022] DR0: 0000000000000000 DR1: 0000000000000000
> DR2: 0000000000000000
> [Thu Sep  8 20:12:22 2022] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> DR7: 0000000000000400
> [Thu Sep  8 20:12:22 2022] Call Trace:
> [Thu Sep  8 20:12:22 2022]  <TASK>
> [Thu Sep  8 20:12:22 2022]  ? newidle_balance+0x2e5/0x400
> [Thu Sep  8 20:12:22 2022]  ? _raw_spin_unlock+0x12/0x30
> [Thu Sep  8 20:12:22 2022]  ? finish_task_switch+0x91/0x2a0
> [Thu Sep  8 20:12:22 2022]  rxe_do_work+0x86/0x110 [rdma_rxe]
> [Thu Sep  8 20:12:22 2022]  process_one_work+0x1dc/0x3a0
> [Thu Sep  8 20:12:22 2022]  worker_thread+0x4a/0x3b0
> [Thu Sep  8 20:12:22 2022]  ? process_one_work+0x3a0/0x3a0
> [Thu Sep  8 20:12:22 2022]  kthread+0xe7/0x110
> [Thu Sep  8 20:12:22 2022]  ? kthread_complete_and_exit+0x20/0x20
> [Thu Sep  8 20:12:22 2022]  ret_from_fork+0x22/0x30
> [Thu Sep  8 20:12:22 2022]  </TASK>
> [Thu Sep  8 20:12:22 2022] Modules linked in: rnbd_server rtrs_server
> rtrs_core rdma_ucm rdma_cm iw_cm ib_cm crc32_generic rdma_rxe
> ip6_udp_tunnel udp_tunnel ib_uverbs ib_core loop null_blk
> [Thu Sep  8 20:12:22 2022] CR2: 0000000000000158
> [Thu Sep  8 20:12:22 2022] ---[ end trace 0000000000000000 ]---
> [Thu Sep  8 20:12:22 2022] BUG: kernel NULL pointer dereference,
> address: 0000000000000158
> [Thu Sep  8 20:12:22 2022] RIP: 0010:rxe_responder+0x1910/0x1d90 [rdma_rxe]
> [Thu Sep  8 20:12:22 2022] #PF: supervisor read access in kernel mode
> [Thu Sep  8 20:12:22 2022] Code: 06 48 63 88 fc 15 63 c0 0f b6 46 01
> 83 ea 04 c0 e8 04 29 ca 83 e0 03 29 c2 49 8b 87 08 05 00 00 49 03 87
> 00 05 00 00 4c 63 ea <80> bf 58 01 00 00 00 48 8d 14 0e 48 89 c6 4d 89
> ee 44 89 e9 0f 84
> [Thu Sep  8 20:12:22 2022] #PF: error_code(0x0000) - not-present page
> [Thu Sep  8 20:12:22 2022] RSP: 0018:ffffb0358015fd80 EFLAGS: 00010246
> [Thu Sep  8 20:12:22 2022] PGD 0 P4D 0
> 
> Technically, for operations with 0 length, the code can simply not do
> any of the *_mr_copy, and carry on with success. So maybe you can
> check data_len first and copy only if needed.
> 

Good Catch!
I will fix this in the next post as you suggest.

Many Thanks

> 
> > +               err = -EOPNOTSUPP;
> > +       else
> > +               err = rxe_mr_copy(qp->resp.mr, qp->resp.va + qp->resp.offset,
> > +                                 payload_addr(pkt), data_len, RXE_TO_MR_OBJ);
> > +
> >         if (err) {
> >                 rc = RESPST_ERR_RKEY_VIOLATION;
> >                 goto out;
> > @@ -667,7 +671,10 @@ static enum resp_states rxe_atomic_reply(struct rxe_qp *qp,
> >                 if (mr->state != RXE_MR_STATE_VALID)
> >                         return RESPST_ERR_RKEY_VIOLATION;
> >
> > -               ret = rxe_atomic_ops(qp, pkt, mr);
> > +               if (mr->odp_enabled)
> > +                       ret = RESPST_ERR_UNSUPPORTED_OPCODE;
> > +               else
> > +                       ret = rxe_atomic_ops(qp, pkt, mr);
> >         } else
> >                 ret = RESPST_ACKNOWLEDGE;
> >
> > @@ -831,8 +838,12 @@ static enum resp_states read_reply(struct rxe_qp *qp,
> >         if (!skb)
> >                 return RESPST_ERR_RNR;
> >
> > -       err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
> > -                         payload, RXE_FROM_MR_OBJ);
> > +       if (mr->odp_enabled)
> > +               err = -EOPNOTSUPP;
> > +       else
> > +               err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
> > +                                 payload, RXE_FROM_MR_OBJ);
> > +
> >         if (err)
> >                 pr_err("Failed copying memory\n");
> >         if (mr)
> > diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
> > index 7510f25c5ea3..b00e9b847382 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_verbs.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
> > @@ -926,10 +926,14 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd,
> >                 goto err2;
> >         }
> >
> > -
> >         rxe_get(pd);
> >
> > -       err = rxe_mr_init_user(pd, start, length, iova, access, mr);
> > +       if (access & IB_ACCESS_ON_DEMAND)
> > +               err = rxe_create_user_odp_mr(&pd->ibpd, start, length, iova,
> > +                                            access, mr);
> > +       else
> > +               err = rxe_mr_init_user(pd, start, length, iova, access, mr);
> > +
> >         if (err)
> >                 goto err3;
> >
> > diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
> > index b09b4cb9897a..98d2bb737ebc 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
> > +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
> > @@ -324,6 +324,8 @@ struct rxe_mr {
> >         atomic_t                num_mw;
> >
> >         struct rxe_map          **map;
> > +
> > +       bool                    odp_enabled;
> >  };
> >
> >  enum rxe_mw_state {
> > --
> > 2.31.1
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ