[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20211004114413.GE964074@nvidia.com>
Date: Mon, 4 Oct 2021 08:44:13 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Hillf Danton <hdanton@...a.com>
Cc: syzbot <syzbot+ae4de2b6e34e89637fc2@...kaller.appspotmail.com>,
Leon Romanovsky <leonro@...dia.com>,
linux-kernel@...r.kernel.org, linux-rdma@...r.kernel.org,
syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] KASAN: use-after-free Read in addr_handler (5)
On Mon, Oct 04, 2021 at 04:58:29PM +0800, Hillf Danton wrote:
> >Fix 2ee9bf346fbf
> >("RDMA/addr: Fix race with netevent_callback()/rdma_addr_cancel()") by
>
> Sorry for my noise, given addr_wq is an ordered workqueue.
> I missed it.
It is probably fixed by this:
commit 305d568b72f17f674155a2a8275f865f207b3808
Author: Jason Gunthorpe <jgg@...pe.ca>
Date: Thu Sep 16 15:34:46 2021 -0300
RDMA/cma: Ensure rdma_addr_cancel() happens before issuing more requests
The FSM can run in a circle allowing rdma_resolve_ip() to be called twice
on the same id_priv. While this cannot happen without going through the
work, it violates the invariant that the same address resolution
background request cannot be active twice.
CPU 1 CPU 2
rdma_resolve_addr():
RDMA_CM_IDLE -> RDMA_CM_ADDR_QUERY
rdma_resolve_ip(addr_handler) #1
process_one_req(): for #1
addr_handler():
RDMA_CM_ADDR_QUERY -> RDMA_CM_ADDR_BOUND
mutex_unlock(&id_priv->handler_mutex);
[.. handler still running ..]
rdma_resolve_addr():
RDMA_CM_ADDR_BOUND -> RDMA_CM_ADDR_QUERY
rdma_resolve_ip(addr_handler)
!! two requests are now on the req_list
rdma_destroy_id():
destroy_id_handler_unlock():
_destroy_id():
cma_cancel_operation():
rdma_addr_cancel()
// process_one_req() self removes it
spin_lock_bh(&lock);
cancel_delayed_work(&req->work);
if (!list_empty(&req->list)) == true
! rdma_addr_cancel() returns after process_on_req #1 is done
kfree(id_priv)
process_one_req(): for #2
addr_handler():
mutex_lock(&id_priv->handler_mutex);
!! Use after free on id_priv
rdma_addr_cancel() expects there to be one req on the list and only
cancels the first one. The self-removal behavior of the work only happens
after the handler has returned. This yields a situations where the
req_list can have two reqs for the same "handle" but rdma_addr_cancel()
only cancels the first one.
The second req remains active beyond rdma_destroy_id() and will
use-after-free id_priv once it inevitably triggers.
Fix this by remembering if the id_priv has called rdma_resolve_ip() and
always cancel before calling it again. This ensures the req_list never
gets more than one item in it and doesn't cost anything in the normal flow
that never uses this strange error path.
Link: https://lore.kernel.org/r/0-v1-3bc675b8006d+22-syz_cancel_uaf_jgg@nvidia.com
Cc: stable@...r.kernel.org
Fixes: e51060f08a61 ("IB: IP address based RDMA connection manager")
Reported-by: syzbot+dc3dfba010d7671e05f5@...kaller.appspotmail.com
Signed-off-by: Jason Gunthorpe <jgg@...dia.com>
Jason
Powered by blists - more mailing lists