[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210627231528.GA4459@nvidia.com>
Date: Sun, 27 Jun 2021 20:15:28 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Leon Romanovsky <leon@...nel.org>
Cc: Doug Ledford <dledford@...hat.com>, linux-kernel@...r.kernel.org,
linux-rdma@...r.kernel.org, Pavel Skripkin <paskripkin@...il.com>,
Shay Drory <shayd@...dia.com>
Subject: Re: [PATCH rdma-rc v2] RDMA/core: Simplify addition of restrack
object
On Sun, Jun 27, 2021 at 11:07:33AM +0300, Leon Romanovsky wrote:
> On Thu, Jun 24, 2021 at 02:48:41PM -0300, Jason Gunthorpe wrote:
> > On Tue, Jun 08, 2021 at 08:23:48AM +0300, Leon Romanovsky wrote:
> > > From: Leon Romanovsky <leonro@...dia.com>
> > >
> > > Change location of rdma_restrack_add() callers to be near attachment
> > > to device logic. Such improvement fixes the bug where task_struct was
> > > acquired but not released, causing to resource leak.
> > >
> > > ucma_create_id() {
> > > ucma_alloc_ctx();
> > > rdma_create_user_id() {
> > > rdma_restrack_new();
> > > rdma_restrack_set_name() {
> > > rdma_restrack_attach_task.part.0(); <--- task_struct was gotten
> > > }
> > > }
> > > ucma_destroy_private_ctx() {
> > > ucma_put_ctx();
> > > rdma_destroy_id() {
> > > _destroy_id() <--- id_priv was freed
> > > }
> > > }
> > > }
> >
> > I still don't understand this patch
> >
> > > @@ -1852,6 +1849,7 @@ static void _destroy_id(struct rdma_id_private *id_priv,
> > > {
> > > cma_cancel_operation(id_priv, state);
> > >
> > > + rdma_restrack_del(&id_priv->res);
> > > if (id_priv->cma_dev) {
> > > if (rdma_cap_ib_cm(id_priv->id.device, 1)) {
> > > if (id_priv->cm_id.ib)
> > > @@ -1861,7 +1859,6 @@ static void _destroy_id(struct rdma_id_private *id_priv,
> > > iw_destroy_cm_id(id_priv->cm_id.iw);
> > > }
> > > cma_leave_mc_groups(id_priv);
> > > - rdma_restrack_del(&id_priv->res);
> > > cma_release_dev(id_priv);
> >
> > This seems to be the only hunk that is actually necessary, ensuring a
> > non-added ID is always cleaned up is the necessary step to fixing the
> > trace above.
> >
> > What is the rest of this doing?? It looks wrong:
> >
> > int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
> > {
> > [..]
> > ret = cma_get_port(id_priv);
> > if (ret)
> > goto err2;
> > err2:
> > [..]
> > if (!cma_any_addr(addr))
> > rdma_restrack_del(&id_priv->res);
> >
> > Which means if rdma_bind_addr() fails then restrack will discard the
> > task, even though the cm_id is still valid! The ucma is free to try
> > bind again and keep using the ID.
>
> "Failure to bind" means that cma_attach_to_dev() needs to be unwind.
>
> It is the same if rdma_restrack_add() inside that function like in this
> patch or in the line before rdma_bind_addr() returns as it was in
> previous code.
The previous code didn't call restrack_del. restrack_del undoes the
restrack_set_name stuff, not just add - so it does not leave things
back the way it found them
Jason
Powered by blists - more mailing lists