linux-kernel - Re: [PATCH v2 4/5] cachefiles: cyclic allocation of msg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bc76e529-7904-0650-7fa9-dc5561ff6668@huaweicloud.com>
Date: Mon, 20 May 2024 21:24:50 +0800
From: Baokun Li <libaokun@...weicloud.com>
To: Gao Xiang <hsiangkao@...ux.alibaba.com>, Jeff Layton
 <jlayton@...nel.org>, netfs@...ts.linux.dev, dhowells@...hat.com
Cc: jefflexu@...ux.alibaba.com, zhujia.zj@...edance.com,
 linux-erofs@...ts.ozlabs.org, linux-fsdevel@...r.kernel.org,
 linux-kernel@...r.kernel.org, yangerkun@...wei.com, houtao1@...wei.com,
 yukuai3@...wei.com, wozizhi@...wei.com, Baokun Li <libaokun1@...wei.com>,
 libaokun@...weicloud.com
Subject: Re: [PATCH v2 4/5] cachefiles: cyclic allocation of msg_id to avoid
 reuse

On 2024/5/20 20:54, Gao Xiang wrote:
>
>
> On 2024/5/20 20:42, Baokun Li wrote:
>> On 2024/5/20 18:04, Jeff Layton wrote:
>>> On Mon, 2024-05-20 at 12:06 +0800, Baokun Li wrote:
>>>> Hi Jeff,
>>>>
>>>> Thank you very much for your review!
>>>>
>>>> On 2024/5/19 19:11, Jeff Layton wrote:
>>>>> On Wed, 2024-05-15 at 20:51 +0800, libaokun@...weicloud.com wrote:
>>>>>> From: Baokun Li <libaokun1@...wei.com>
>>>>>>
>>>>>> Reusing the msg_id after a maliciously completed reopen request 
>>>>>> may cause
>>>>>> a read request to remain unprocessed and result in a hung, as 
>>>>>> shown below:
>>>>>>
>>>>>>          t1       |      t2       |      t3
>>>>>> -------------------------------------------------
>>>>>> cachefiles_ondemand_select_req
>>>>>>    cachefiles_ondemand_object_is_close(A)
>>>>>>    cachefiles_ondemand_set_object_reopening(A)
>>>>>>    queue_work(fscache_object_wq, &info->work)
>>>>>>                   ondemand_object_worker
>>>>>>                    cachefiles_ondemand_init_object(A)
>>>>>>                     cachefiles_ondemand_send_req(OPEN)
>>>>>>                       // get msg_id 6
>>>>>> wait_for_completion(&req_A->done)
>>>>>> cachefiles_ondemand_daemon_read
>>>>>>    // read msg_id 6 req_A
>>>>>>    cachefiles_ondemand_get_fd
>>>>>>    copy_to_user
>>>>>>                                   // Malicious completion msg_id 6
>>>>>>                                   copen 6,-1
>>>>>> cachefiles_ondemand_copen
>>>>>> complete(&req_A->done)
>>>>>>                                    // will not set the object to 
>>>>>> close
>>>>>>                                    // because ondemand_id && fd 
>>>>>> is valid.
>>>>>>
>>>>>>                   // ondemand_object_worker() is done
>>>>>>                   // but the object is still reopening.
>>>>>>
>>>>>>                                   // new open req_B
>>>>>> cachefiles_ondemand_init_object(B)
>>>>>> cachefiles_ondemand_send_req(OPEN)
>>>>>>                                    // reuse msg_id 6
>>>>>> process_open_req
>>>>>>    copen 6,A.size
>>>>>>    // The expected failed copen was executed successfully
>>>>>>
>>>>>> Expect copen to fail, and when it does, it closes fd, which sets the
>>>>>> object to close, and then close triggers reopen again. However, 
>>>>>> due to
>>>>>> msg_id reuse resulting in a successful copen, the anonymous fd is 
>>>>>> not
>>>>>> closed until the daemon exits. Therefore read requests waiting 
>>>>>> for reopen
>>>>>> to complete may trigger hung task.
>>>>>>
>>>>>> To avoid this issue, allocate the msg_id cyclically to avoid 
>>>>>> reusing the
>>>>>> msg_id for a very short duration of time.
>>>>>>
>>>>>> Fixes: c8383054506c ("cachefiles: notify the user daemon when 
>>>>>> looking up cookie")
>>>>>> Signed-off-by: Baokun Li <libaokun1@...wei.com>
>>>>>> ---
>>>>>>    fs/cachefiles/internal.h |  1 +
>>>>>>    fs/cachefiles/ondemand.c | 20 ++++++++++++++++----
>>>>>>    2 files changed, 17 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
>>>>>> index 8ecd296cc1c4..9200c00f3e98 100644
>>>>>> --- a/fs/cachefiles/internal.h
>>>>>> +++ b/fs/cachefiles/internal.h
>>>>>> @@ -128,6 +128,7 @@ struct cachefiles_cache {
>>>>>>        unsigned long            req_id_next;
>>>>>>        struct xarray            ondemand_ids;    /* xarray for 
>>>>>> ondemand_id allocation */
>>>>>>        u32                ondemand_id_next;
>>>>>> +    u32                msg_id_next;
>>>>>>    };
>>>>>>    static inline bool cachefiles_in_ondemand_mode(struct 
>>>>>> cachefiles_cache *cache)
>>>>>> diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
>>>>>> index f6440b3e7368..b10952f77472 100644
>>>>>> --- a/fs/cachefiles/ondemand.c
>>>>>> +++ b/fs/cachefiles/ondemand.c
>>>>>> @@ -433,20 +433,32 @@ static int 
>>>>>> cachefiles_ondemand_send_req(struct cachefiles_object *object,
>>>>>>            smp_mb();
>>>>>>            if (opcode == CACHEFILES_OP_CLOSE &&
>>>>>> - !cachefiles_ondemand_object_is_open(object)) {
>>>>>> + !cachefiles_ondemand_object_is_open(object)) {
>>>>>> WARN_ON_ONCE(object->ondemand->ondemand_id == 0);
>>>>>>                xas_unlock(&xas);
>>>>>>                ret = -EIO;
>>>>>>                goto out;
>>>>>>            }
>>>>>> -        xas.xa_index = 0;
>>>>>> +        /*
>>>>>> +         * Cyclically find a free xas to avoid msg_id reuse that 
>>>>>> would
>>>>>> +         * cause the daemon to successfully copen a stale msg_id.
>>>>>> +         */
>>>>>> +        xas.xa_index = cache->msg_id_next;
>>>>>>            xas_find_marked(&xas, UINT_MAX, XA_FREE_MARK);
>>>>>> +        if (xas.xa_node == XAS_RESTART) {
>>>>>> +            xas.xa_index = 0;
>>>>>> +            xas_find_marked(&xas, cache->msg_id_next - 1, 
>>>>>> XA_FREE_MARK);
>>>>>> +        }
>>>>>>            if (xas.xa_node == XAS_RESTART)
>>>>>>                xas_set_err(&xas, -EBUSY);
>>>>>> +
>>>>>>            xas_store(&xas, req);
>>>>>> -        xas_clear_mark(&xas, XA_FREE_MARK);
>>>>>> -        xas_set_mark(&xas, CACHEFILES_REQ_NEW);
>>>>>> +        if (xas_valid(&xas)) {
>>>>>> +            cache->msg_id_next = xas.xa_index + 1;
>>>>> If you have a long-standing stuck request, could this counter wrap
>>>>> around and you still end up with reuse?
>>>> Yes, msg_id_next is declared to be of type u32 in the hope that when
>>>> xa_index == UINT_MAX, a wrap around occurs so that msg_id_next
>>>> goes to zero. Limiting xa_index to no more than UINT_MAX is to avoid
>>>> the xarry being too deep.
>>>>
>>>> If msg_id_next is equal to the id of a long-standing stuck request
>>>> after the wrap-around, it is true that the reuse in the above problem
>>>> may also occur.
>>>>
>>>> But I feel that a long stuck request is problematic in itself, it 
>>>> means
>>>> that after we have sent 4294967295 requests, the first one has not
>>>> been processed yet, and even if we send a million requests per
>>>> second, this one hasn't been completed for more than an hour.
>>>>
>>>> We have a keep-alive process that pulls the daemon back up as
>>>> soon as it exits, and there is a timeout mechanism for requests in
>>>> the daemon to prevent the kernel from waiting for long periods
>>>> of time. In other words, we should avoid the situation where
>>>> a request is stuck for a long period of time.
>>>>
>>>> If you think UINT_MAX is not enough, perhaps we could raise
>>>> the maximum value of msg_id_next to ULONG_MAX?
>>>>> Maybe this should be using
>>>>> ida_alloc/free instead, which would prevent that too?
>>>>>
>>>> The id reuse here is that the kernel has finished the open request
>>>> req_A and freed its id_A and used it again when sending the open
>>>> request req_B, but the daemon is still working on req_A, so the
>>>> copen id_A succeeds but operates on req_B.
>>>>
>>>> The id that is being used by the kernel will not be allocated here
>>>> so it seems that ida _alloc/free does not prevent reuse either,
>>>> could you elaborate a bit more how this works?
>>>>
>>> ida_alloc and free absolutely prevent reuse while the id is in use.
>>> That's sort of the point of those functions. Basically it uses a set of
>>> bitmaps in an xarray to track which IDs are in use, so ida_alloc only
>>> hands out values which are not in use. See the comments over
>>> ida_alloc_range() in lib/idr.c.
>>>
>> Thank you for the explanation!
>>
>> The logic now provides the same guarantees as ida_alloc/free.
>> The "reused" id, indeed, is no longer in use in the kernel, but it is 
>> still
>> in use in the userland, so a multi-threaded daemon could be handling
>> two different requests for the same msg_id at the same time.
>>
>> Previously, the logic for allocating msg_ids was to start at 0 and look
>> for a free xas.index, so it was possible for an id to be allocated to a
>> new request just as the id was being freed.
>>
>> With the change to cyclic allocation, the kernel will not use the same
>> id again until INT_MAX requests have been sent, and during the time
>> it takes to send requests, the daemon has enough time to process
>> requests whose ids are still in use by the daemon, but have already
>> been freed in the kernel.
>
> Again, If I understand correctly, I think the main point
> here is
>
> wait_for_completion(&req_A->done)
>
> which could hang due to some malicious deamon.  But I think it
> should be switched to wait_for_completion_killable() instead. *
> It's up to users to kill the mount instance if there is a
> malicious user daemon.
>
> So in that case, hung task will not be triggered anymore, and
> you don't need to care about cyclic allocation too.
>
> Thanks,
> Gao Xiang
Hi Xiang,

The problem is not as simple as you think.

If you make it killable, it just won't trigger a hung task in
cachefiles_ondemand_send_req(), and the process waiting for the
resource in question will also be hung.

* When the open/read request in the mount process gets stuck,
   the sync/drop cache will trigger a hung task panic in iterate_supers()
   as it waits for sb->umount to be unlocked.
* After umount, anonymous fd is not closed causing a hung task panic
   in fscache_hash_cookie() because of waiting for cookie unhash.
* The dentry is in a loop up state, because the read request is not being
   processed, another process looking for the same dentry is waiting for
   the previous lookup to finish, which triggers a hung task panic in
   d_alloc_parallel().

Can all this be made killable?

Thanks,
Baokun
>
>>
>> Regards,
>> Baokun
>>>>>> + xas_clear_mark(&xas, XA_FREE_MARK);
>>>>>> +            xas_set_mark(&xas, CACHEFILES_REQ_NEW);
>>>>>> +        }
>>>>>>            xas_unlock(&xas);
>>>>>>        } while (xas_nomem(&xas, GFP_KERNEL));
>>>>>>