linux-kernel - Re: [PATCH v2 03/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_get

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cd7fe397-9785-42f3-b05f-39ab90ba6a9a@linux.alibaba.com>
Date: Mon, 20 May 2024 16:45:45 +0800
From: Gao Xiang <hsiangkao@...ux.alibaba.com>
To: Baokun Li <libaokun@...weicloud.com>,
 Jingbo Xu <jefflexu@...ux.alibaba.com>, netfs@...ts.linux.dev
Cc: zhujia.zj@...edance.com, linux-erofs@...ts.ozlabs.org,
 linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
 yangerkun@...wei.com, houtao1@...wei.com, yukuai3@...wei.com,
 wozizhi@...wei.com, Baokun Li <libaokun1@...wei.com>,
 David Howells <dhowells@...hat.com>, Jeff Layton <jlayton@...nel.org>
Subject: Re: [PATCH v2 03/12] cachefiles: fix slab-use-after-free in
 cachefiles_ondemand_get_fd()



On 2024/5/20 16:38, Baokun Li wrote:
> Hi Jingbo,
> 
> Thanks for your review!
> 
> On 2024/5/20 15:24, Jingbo Xu wrote:
>>
>> On 5/15/24 4:45 PM, libaokun@...weicloud.com wrote:
>>> From: Baokun Li <libaokun1@...wei.com>
>>>
>>> We got the following issue in a fuzz test of randomly issuing the restore
>>> command:
>>>
>>> ==================================================================
>>> BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0x609/0xab0
>>> Write of size 4 at addr ffff888109164a80 by task ondemand-04-dae/4962
>>>
>>> CPU: 11 PID: 4962 Comm: ondemand-04-dae Not tainted 6.8.0-rc7-dirty #542
>>> Call Trace:
>>>   kasan_report+0x94/0xc0
>>>   cachefiles_ondemand_daemon_read+0x609/0xab0
>>>   vfs_read+0x169/0xb50
>>>   ksys_read+0xf5/0x1e0
>>>
>>> Allocated by task 626:
>>>   __kmalloc+0x1df/0x4b0
>>>   cachefiles_ondemand_send_req+0x24d/0x690
>>>   cachefiles_create_tmpfile+0x249/0xb30
>>>   cachefiles_create_file+0x6f/0x140
>>>   cachefiles_look_up_object+0x29c/0xa60
>>>   cachefiles_lookup_cookie+0x37d/0xca0
>>>   fscache_cookie_state_machine+0x43c/0x1230
>>>   [...]
>>>
>>> Freed by task 626:
>>>   kfree+0xf1/0x2c0
>>>   cachefiles_ondemand_send_req+0x568/0x690
>>>   cachefiles_create_tmpfile+0x249/0xb30
>>>   cachefiles_create_file+0x6f/0x140
>>>   cachefiles_look_up_object+0x29c/0xa60
>>>   cachefiles_lookup_cookie+0x37d/0xca0
>>>   fscache_cookie_state_machine+0x43c/0x1230
>>>   [...]
>>> ==================================================================
>>>
>>> Following is the process that triggers the issue:
>>>
>>>       mount  |   daemon_thread1    |    daemon_thread2
>>> ------------------------------------------------------------
>>>   cachefiles_ondemand_init_object
>>>    cachefiles_ondemand_send_req
>>>     REQ_A = kzalloc(sizeof(*req) + data_len)
>>>     wait_for_completion(&REQ_A->done)
>>>
>>>              cachefiles_daemon_read
>>>               cachefiles_ondemand_daemon_read
>>>                REQ_A = cachefiles_ondemand_select_req
>>>                cachefiles_ondemand_get_fd
>>>                copy_to_user(_buffer, msg, n)
>>>              process_open_req(REQ_A)
>>>                                    ------ restore ------
>>>                                    cachefiles_ondemand_restore
>>>                                    xas_for_each(&xas, req, ULONG_MAX)
>>>                                     xas_set_mark(&xas, CACHEFILES_REQ_NEW);
>>>
>>>                                    cachefiles_daemon_read
>>>                                     cachefiles_ondemand_daemon_read
>>>                                      REQ_A = cachefiles_ondemand_select_req
>>>
>>>               write(devfd, ("copen %u,%llu", msg->msg_id, size));
>>>               cachefiles_ondemand_copen
>>>                xa_erase(&cache->reqs, id)
>>>                complete(&REQ_A->done)
>>>     kfree(REQ_A)
>>>                                      cachefiles_ondemand_get_fd(REQ_A)
>>>                                       fd = get_unused_fd_flags
>>>                                       file = anon_inode_getfile
>>>                                       fd_install(fd, file)
>>>                                       load = (void *)REQ_A->msg.data;
>>>                                       load->fd = fd;
>>>                                       // load UAF !!!
>>>
>>> This issue is caused by issuing a restore command when the daemon is still
>>> alive, which results in a request being processed multiple times thus
>>> triggering a UAF. So to avoid this problem, add an additional reference
>>> count to cachefiles_req, which is held while waiting and reading, and then
>>> released when the waiting and reading is over.
>>>
>>>
>>> Note that since there is only one reference count for waiting, we need to
>>> avoid the same request being completed multiple times, so we can only
>>> complete the request if it is successfully removed from the xarray.
>> Sorry the above description makes me confused.  As the same request may
>> be got by different daemon threads multiple times, the introduced
>> refcount mechanism can't protect it from being completed multiple times
>> (which is expected).  The refcount only protects it from being freed
>> multiple times.
> The idea here is that because the wait only holds one reference count,
> complete(&req->done) can only be called when the req has been
> successfully removed from the xarry, otherwise the following UAF may
> occur:
> 
>     daemon_thread1    |    daemon_thread2
> -------------------------------------------
> cachefiles_ondemand_daemon_read
>   xa_lock(&cache->reqs)
>   // select req_A
>   xa_unlock(&cache->reqs)
>                      // restore req_A and read again
>                      cachefiles_ondemand_daemon_read
>                      xa_lock(&cache->reqs)
>                      // select req_A
>                      xa_unlock(&cache->reqs)
> // goto error, erase success
> xa_erase(&cache->reqs, id)
> complete(&req_A->done)
> // free req_A
>                      // goto error, erase failed
>                      complete(&req_A->done)
>                      // req_A use-after-free
> 
> This is also why error requests and CLOSE requests are handled
> together and why xas_load(&xas) == req is checked.
>>> Fixes: e73fa11a356c ("cachefiles: add restore command to recover inflight ondemand read requests")
>>> Suggested-by: Hou Tao <houtao1@...wei.com>
>>> Signed-off-by: Baokun Li <libaokun1@...wei.com>
>>> Reviewed-by: Jia Zhu <zhujia.zj@...edance.com>
>>> ---
>>>   fs/cachefiles/internal.h |  1 +
>>>   fs/cachefiles/ondemand.c | 44 ++++++++++++++++++++++------------------
>>>   2 files changed, 25 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
>>> index d33169f0018b..7745b8abc3aa 100644
>>> --- a/fs/cachefiles/internal.h
>>> +++ b/fs/cachefiles/internal.h
>>> @@ -138,6 +138,7 @@ static inline bool cachefiles_in_ondemand_mode(struct cachefiles_cache *cache)
>>>   struct cachefiles_req {
>>>       struct cachefiles_object *object;
>>>       struct completion done;
>>> +    refcount_t ref;
>>>       int error;
>>>       struct cachefiles_msg msg;
>>>   };
>>> diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
>>> index fd49728d8bae..56d12fe4bf73 100644
>>> --- a/fs/cachefiles/ondemand.c
>>> +++ b/fs/cachefiles/ondemand.c
>>> @@ -4,6 +4,12 @@
>>>   #include <linux/uio.h>
>>>   #include "internal.h"
>>> +static inline void cachefiles_req_put(struct cachefiles_req *req)
>>> +{
>>> +    if (refcount_dec_and_test(&req->ref))
>>> +        kfree(req);
>>> +}
>>> +
>>>   static int cachefiles_ondemand_fd_release(struct inode *inode,
>>>                         struct file *file)
>>>   {
>>> @@ -299,7 +305,6 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
>>>   {
>>>       struct cachefiles_req *req;
>>>       struct cachefiles_msg *msg;
>>> -    unsigned long id = 0;
>>>       size_t n;
>>>       int ret = 0;
>>>       XA_STATE(xas, &cache->reqs, cache->req_id_next);
>>> @@ -330,41 +335,39 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
>>>       xas_clear_mark(&xas, CACHEFILES_REQ_NEW);
>>>       cache->req_id_next = xas.xa_index + 1;
>>> +    refcount_inc(&req->ref);
>>>       xa_unlock(&cache->reqs);
>>> -    id = xas.xa_index;
>>> -
>>>       if (msg->opcode == CACHEFILES_OP_OPEN) {
>>>           ret = cachefiles_ondemand_get_fd(req);
>>>           if (ret) {
>>>               cachefiles_ondemand_set_object_close(req->object);
>>> -            goto error;
>>> +            goto out;
>>>           }
>>>       }
>>> -    msg->msg_id = id;
>>> +    msg->msg_id = xas.xa_index;
>>>       msg->object_id = req->object->ondemand->ondemand_id;
>>>       if (copy_to_user(_buffer, msg, n) != 0) {
>>>           ret = -EFAULT;
>>>           if (msg->opcode == CACHEFILES_OP_OPEN)
>>>               close_fd(((struct cachefiles_open *)msg->data)->fd);
>>> -        goto error;
>>>       }
>>> -
>>> -    /* CLOSE request has no reply */
>>> -    if (msg->opcode == CACHEFILES_OP_CLOSE) {
>>> -        xa_erase(&cache->reqs, id);
>>> -        complete(&req->done);
>>> +out:
>>> +    /* Remove error request and CLOSE request has no reply */
>>> +    if (ret || msg->opcode == CACHEFILES_OP_CLOSE) {
>>> +        xas_reset(&xas);
>>> +        xas_lock(&xas);
>>> +        if (xas_load(&xas) == req) {
>> Just out of curiosity... How could xas_load(&xas) doesn't equal to req?
> 
> As mentioned above, the req may have been deleted or even the id
> 
> may have been reused.
> 
>>
>>> +            req->error = ret;
>>> +            complete(&req->done);
>>> +            xas_store(&xas, NULL);
>>> +        }
>>> +        xas_unlock(&xas);
>>>       }
>>> -
>>> -    return n;
>>> -
>>> -error:
>>> -    xa_erase(&cache->reqs, id);
>>> -    req->error = ret;
>>> -    complete(&req->done);
>>> -    return ret;
>>> +    cachefiles_req_put(req);
>>> +    return ret ? ret : n;
>>>   }
>> This is actually a combination of a fix and a cleanup which combines the
>> logic of removing error request and the CLOSE requests into one place.
>> Also it relies on the cleanup made in patch 2 ("cachefiles: remove
>> err_put_fd tag in cachefiles_ondemand_daemon_read()"), making it
>> difficult to be atomatically back ported to the stable (as patch 2 is
>> not marked as "Fixes").
>>
>> Thus could we make the fix first, and then make the cleanup.
> I don't think that's necessary, stable automatically backports the
> relevant dependency patches in case of backport patch conflicts,
> and later patches modify the logic here as well.
> Or add Fixes tag for patch 2?

I think we might better to avoid unnecessary dependencies
since it relies on some "AI" magic and often mis-backportes
real dependencies.

I tend to leave real bugfixes first, and do cleanup next.
But please don't leave cleanup patches with "Fixes:" tags
anyway since it just misleads people.

Thanks,
Gao Xiang