linux-kernel - Re: [PATCH] net: 9p: fix possible refcount leak in p9_read_work() and recv

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f68df7cf-4b72-4c01-9492-103fa67c5e99@gmail.com>
Date:   Tue, 12 Jul 2022 11:24:36 +0800
From:   Hangyu Hua <hbh25y@...il.com>
To:     asmadeus@...ewreck.org
Cc:     ericvh@...il.com, lucho@...kov.net, linux_oss@...debyte.com,
        davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
        pabeni@...hat.com, tomasbortoli@...il.com,
        v9fs-developer@...ts.sourceforge.net, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] net: 9p: fix possible refcount leak in p9_read_work() and
 recv_done()

On 2022/7/11 15:39, asmadeus@...ewreck.org wrote:
> Hangyu Hua wrote on Mon, Jul 11, 2022 at 02:59:07PM +0800:
>> A ref got in p9_tag_lookup needs to be put when functions enter the
>> error path.
>>
>> Fix this by adding p9_req_put in error path.
> 
> I wish it was that simple.
> 
> Did you actually observe a leak? >
>> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
>> index 8f8f95e39b03..c4ccb7b9e1bf 100644
>> --- a/net/9p/trans_fd.c
>> +++ b/net/9p/trans_fd.c
>> @@ -343,6 +343,7 @@ static void p9_read_work(struct work_struct *work)
>>   			p9_debug(P9_DEBUG_ERROR,
>>   				 "No recv fcall for tag %d (req %p), disconnecting!\n",
>>   				 m->rc.tag, m->rreq);
>> +			p9_req_put(m->rreq);
>>   			m->rreq = NULL;
>>   			err = -EIO;
>>   			goto error;
>> @@ -372,6 +373,8 @@ static void p9_read_work(struct work_struct *work)
>>   				 "Request tag %d errored out while we were reading the reply\n",
>>   				 m->rc.tag);
>>   			err = -EIO;
>> +			p9_req_put(m->rreq);
>> +			m->rreq = NULL;
>>   			goto error;
>>   		}
>>   		spin_unlock(&m->client->lock);
> 
> 
> for tcp, we still have that request in m->req_list, so we should be
> calling p9_client_cb which will do the p9_req_put in p9_conn_cancel.
> 
> If you do it here, you'll get a refcount overflow and use after free.
> 


That's a little weird. If you are right, the three return paths of this 
function are inconsistent with the handling of refcount.

static void p9_read_work(struct work_struct *work)
{
...
	if ((m->rreq) && (m->rc.offset == m->rc.capacity)) {
		p9_debug(P9_DEBUG_TRANS, "got new packet\n");
		m->rreq->rc.size = m->rc.offset;
		spin_lock(&m->client->lock);
		if (m->rreq->status == REQ_STATUS_SENT) {
			list_del(&m->rreq->req_list);
			p9_client_cb(m->client, m->rreq, REQ_STATUS_RCVD);	<---- [1]
		} else if (m->rreq->status == REQ_STATUS_FLSHD) {
			/* Ignore replies associated with a cancelled request. */
			p9_debug(P9_DEBUG_TRANS,
				 "Ignore replies associated with a cancelled request\n");	<---- [2]
		} else {
			spin_unlock(&m->client->lock);
			p9_debug(P9_DEBUG_ERROR,
				 "Request tag %d errored out while we were reading the reply\n",
				 m->rc.tag);
			err = -EIO;
			goto error;	<---- [3]
		}
		spin_unlock(&m->client->lock);
		m->rc.sdata = NULL;
		m->rc.offset = 0;
		m->rc.capacity = 0;
		p9_req_put(m->rreq);	<---- [4]
		m->rreq = NULL;
	}
...
error:
	p9_conn_cancel(m, err);		<---- [5]
	clear_bit(Rworksched, &m->wsched);
}

There are three return paths here, [1] and [2] and [3].

[1]: m->rreq will be put twice in [1] and [4]. And m->rreq will be 
deleted from the m->req_list in [1].

[2]: m->rreq will be put in [4]. And m->rreq will not be deleted from 
m->req_list.

[3]: m->rreq will be put in [5]. And m->rreq will be deleted from the 
m->req_list in [5].

If p9_tag_lookup keep the refcount of req which is in m->req_list. There 
will be a double put in return path [1] and a potential UAF in return 
path [2]. And this also means a req in m->req_list without getting 
refcount before p9_tag_lookup.

static void p9_write_work(struct work_struct *work)
{
...
		list_move_tail(&req->req_list, &m->req_list);

		m->wbuf = req->tc.sdata;
		m->wsize = req->tc.size;
		m->wpos = 0;
		p9_req_get(req);
...
}

But if you check out p9_write_work, a refcount already get after 
list_move_tail. We don't need to rely on p9_tag_lookup to keep a list's 
refcount. Whatsmore, code comments in p9_tag_alloc also proves that the 
refcount get by p9_tag_lookup is a temporary refcount.

So i still think there may be a refcount leak.

>> diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
>> index 88e563826674..82b5d6894ee2 100644
>> --- a/net/9p/trans_rdma.c
>> +++ b/net/9p/trans_rdma.c
>> @@ -317,6 +317,7 @@ recv_done(struct ib_cq *cq, struct ib_wc *wc)
>>   	/* Check that we have not yet received a reply for this request.
>>   	 */
>>   	if (unlikely(req->rc.sdata)) {
>> +		p9_req_put(req);
>>   		pr_err("Duplicate reply for request %d", tag);
>>   		goto err_out;
>>   	}
> 
> This one isn't as clear cut, I see that they put the client in a
> FLUSHING state but nothing seems to acton on it... But if this happens
> we're already in the use after free realm -- it means rc.sdata was
> already set so the other thread could be calling p9_client_cb anytime if
> it already hasn't, and yet another thread will then do the final ref put
> and free this.
> We shouldn't free this here as that would also be an overflow. The best
> possible thing to do at this point is just to stop using that pointer.
>

But p9_tag_lookup have a lock inside. Doesn't this mean p9_tag_lookup 
won't return a freed req? Otherwise we should fix the lock to avoid 
falling into the use after free realm.

Thanks,
Hangyu
> 
> If you actually run into a problem with these refcounts (should get a
> warning on umount that something didn't get freed) then by all mean
> let's look further into it, but please don't send such patches without
> testing the error paths you're "fixing" -- I'm pretty sure a reproducer
> to hit these paths would bark errors in dmesg as refcount has an
> overflow check.
> 
> --
> Dominique