lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230321075458.GP36557@unreal>
Date:   Tue, 21 Mar 2023 09:54:58 +0200
From:   Leon Romanovsky <leon@...nel.org>
To:     Jason Gunthorpe <jgg@...dia.com>
Cc:     Patrisious Haddad <phaddad@...dia.com>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>, linux-rdma@...r.kernel.org,
        netdev@...r.kernel.org, Paolo Abeni <pabeni@...hat.com>,
        Saeed Mahameed <saeedm@...dia.com>
Subject: Re: [PATCH rdma-next v1 2/3] RDMA/mlx5: Handling dct common resource
 destruction upon firmware failure

On Mon, Mar 20, 2023 at 04:18:14PM -0300, Jason Gunthorpe wrote:
> On Thu, Mar 16, 2023 at 03:39:27PM +0200, Leon Romanovsky wrote:
> > From: Patrisious Haddad <phaddad@...dia.com>
> > 
> > Previously when destroying a DCT, if the firmware function for the
> > destruction failed, the common resource would have been destroyed
> > either way, since it was destroyed before the firmware object.
> > Which leads to kernel warning "refcount_t: underflow" which indicates
> > possible use-after-free.
> > Which is triggered when we try to destroy the common resource for the
> > second time and execute refcount_dec_and_test(&common->refcount).
> > 
> > So, currently before destroying the common resource we check its
> > refcount and continue with the destruction only if it isn't zero.
> 
> This seems super sketchy
> 
> If the destruction fails why not set the refcount back to 1?

Because destruction will fail in destroy_rq_tracked() which is after
destroy_resource_common().

In first destruction attempt, we delete qp from radix tree and wait for all
reference to drop. In order do not undo all this logic (setting 1 alone is
not enough), it is much safer simply skip destroy_resource_common() in reentry
case.

Failure to delete means that something external to kernel holds reference to that
QP, but it is safe to delete from kernel as nothing in kernel can use it after call
to destroy_resource_common().

Thanks

> 
> Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ