[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bc76e529-7904-0650-7fa9-dc5561ff6668@huaweicloud.com>
Date: Mon, 20 May 2024 21:24:50 +0800
From: Baokun Li <libaokun@...weicloud.com>
To: Gao Xiang <hsiangkao@...ux.alibaba.com>, Jeff Layton
<jlayton@...nel.org>, netfs@...ts.linux.dev, dhowells@...hat.com
Cc: jefflexu@...ux.alibaba.com, zhujia.zj@...edance.com,
linux-erofs@...ts.ozlabs.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, yangerkun@...wei.com, houtao1@...wei.com,
yukuai3@...wei.com, wozizhi@...wei.com, Baokun Li <libaokun1@...wei.com>,
libaokun@...weicloud.com
Subject: Re: [PATCH v2 4/5] cachefiles: cyclic allocation of msg_id to avoid
reuse
On 2024/5/20 20:54, Gao Xiang wrote:
>
>
> On 2024/5/20 20:42, Baokun Li wrote:
>> On 2024/5/20 18:04, Jeff Layton wrote:
>>> On Mon, 2024-05-20 at 12:06 +0800, Baokun Li wrote:
>>>> Hi Jeff,
>>>>
>>>> Thank you very much for your review!
>>>>
>>>> On 2024/5/19 19:11, Jeff Layton wrote:
>>>>> On Wed, 2024-05-15 at 20:51 +0800, libaokun@...weicloud.com wrote:
>>>>>> From: Baokun Li <libaokun1@...wei.com>
>>>>>>
>>>>>> Reusing the msg_id after a maliciously completed reopen request
>>>>>> may cause
>>>>>> a read request to remain unprocessed and result in a hung, as
>>>>>> shown below:
>>>>>>
>>>>>> t1 | t2 | t3
>>>>>> -------------------------------------------------
>>>>>> cachefiles_ondemand_select_req
>>>>>> cachefiles_ondemand_object_is_close(A)
>>>>>> cachefiles_ondemand_set_object_reopening(A)
>>>>>> queue_work(fscache_object_wq, &info->work)
>>>>>> ondemand_object_worker
>>>>>> cachefiles_ondemand_init_object(A)
>>>>>> cachefiles_ondemand_send_req(OPEN)
>>>>>> // get msg_id 6
>>>>>> wait_for_completion(&req_A->done)
>>>>>> cachefiles_ondemand_daemon_read
>>>>>> // read msg_id 6 req_A
>>>>>> cachefiles_ondemand_get_fd
>>>>>> copy_to_user
>>>>>> // Malicious completion msg_id 6
>>>>>> copen 6,-1
>>>>>> cachefiles_ondemand_copen
>>>>>> complete(&req_A->done)
>>>>>> // will not set the object to
>>>>>> close
>>>>>> // because ondemand_id && fd
>>>>>> is valid.
>>>>>>
>>>>>> // ondemand_object_worker() is done
>>>>>> // but the object is still reopening.
>>>>>>
>>>>>> // new open req_B
>>>>>> cachefiles_ondemand_init_object(B)
>>>>>> cachefiles_ondemand_send_req(OPEN)
>>>>>> // reuse msg_id 6
>>>>>> process_open_req
>>>>>> copen 6,A.size
>>>>>> // The expected failed copen was executed successfully
>>>>>>
>>>>>> Expect copen to fail, and when it does, it closes fd, which sets the
>>>>>> object to close, and then close triggers reopen again. However,
>>>>>> due to
>>>>>> msg_id reuse resulting in a successful copen, the anonymous fd is
>>>>>> not
>>>>>> closed until the daemon exits. Therefore read requests waiting
>>>>>> for reopen
>>>>>> to complete may trigger hung task.
>>>>>>
>>>>>> To avoid this issue, allocate the msg_id cyclically to avoid
>>>>>> reusing the
>>>>>> msg_id for a very short duration of time.
>>>>>>
>>>>>> Fixes: c8383054506c ("cachefiles: notify the user daemon when
>>>>>> looking up cookie")
>>>>>> Signed-off-by: Baokun Li <libaokun1@...wei.com>
>>>>>> ---
>>>>>> fs/cachefiles/internal.h | 1 +
>>>>>> fs/cachefiles/ondemand.c | 20 ++++++++++++++++----
>>>>>> 2 files changed, 17 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
>>>>>> index 8ecd296cc1c4..9200c00f3e98 100644
>>>>>> --- a/fs/cachefiles/internal.h
>>>>>> +++ b/fs/cachefiles/internal.h
>>>>>> @@ -128,6 +128,7 @@ struct cachefiles_cache {
>>>>>> unsigned long req_id_next;
>>>>>> struct xarray ondemand_ids; /* xarray for
>>>>>> ondemand_id allocation */
>>>>>> u32 ondemand_id_next;
>>>>>> + u32 msg_id_next;
>>>>>> };
>>>>>> static inline bool cachefiles_in_ondemand_mode(struct
>>>>>> cachefiles_cache *cache)
>>>>>> diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
>>>>>> index f6440b3e7368..b10952f77472 100644
>>>>>> --- a/fs/cachefiles/ondemand.c
>>>>>> +++ b/fs/cachefiles/ondemand.c
>>>>>> @@ -433,20 +433,32 @@ static int
>>>>>> cachefiles_ondemand_send_req(struct cachefiles_object *object,
>>>>>> smp_mb();
>>>>>> if (opcode == CACHEFILES_OP_CLOSE &&
>>>>>> - !cachefiles_ondemand_object_is_open(object)) {
>>>>>> + !cachefiles_ondemand_object_is_open(object)) {
>>>>>> WARN_ON_ONCE(object->ondemand->ondemand_id == 0);
>>>>>> xas_unlock(&xas);
>>>>>> ret = -EIO;
>>>>>> goto out;
>>>>>> }
>>>>>> - xas.xa_index = 0;
>>>>>> + /*
>>>>>> + * Cyclically find a free xas to avoid msg_id reuse that
>>>>>> would
>>>>>> + * cause the daemon to successfully copen a stale msg_id.
>>>>>> + */
>>>>>> + xas.xa_index = cache->msg_id_next;
>>>>>> xas_find_marked(&xas, UINT_MAX, XA_FREE_MARK);
>>>>>> + if (xas.xa_node == XAS_RESTART) {
>>>>>> + xas.xa_index = 0;
>>>>>> + xas_find_marked(&xas, cache->msg_id_next - 1,
>>>>>> XA_FREE_MARK);
>>>>>> + }
>>>>>> if (xas.xa_node == XAS_RESTART)
>>>>>> xas_set_err(&xas, -EBUSY);
>>>>>> +
>>>>>> xas_store(&xas, req);
>>>>>> - xas_clear_mark(&xas, XA_FREE_MARK);
>>>>>> - xas_set_mark(&xas, CACHEFILES_REQ_NEW);
>>>>>> + if (xas_valid(&xas)) {
>>>>>> + cache->msg_id_next = xas.xa_index + 1;
>>>>> If you have a long-standing stuck request, could this counter wrap
>>>>> around and you still end up with reuse?
>>>> Yes, msg_id_next is declared to be of type u32 in the hope that when
>>>> xa_index == UINT_MAX, a wrap around occurs so that msg_id_next
>>>> goes to zero. Limiting xa_index to no more than UINT_MAX is to avoid
>>>> the xarry being too deep.
>>>>
>>>> If msg_id_next is equal to the id of a long-standing stuck request
>>>> after the wrap-around, it is true that the reuse in the above problem
>>>> may also occur.
>>>>
>>>> But I feel that a long stuck request is problematic in itself, it
>>>> means
>>>> that after we have sent 4294967295 requests, the first one has not
>>>> been processed yet, and even if we send a million requests per
>>>> second, this one hasn't been completed for more than an hour.
>>>>
>>>> We have a keep-alive process that pulls the daemon back up as
>>>> soon as it exits, and there is a timeout mechanism for requests in
>>>> the daemon to prevent the kernel from waiting for long periods
>>>> of time. In other words, we should avoid the situation where
>>>> a request is stuck for a long period of time.
>>>>
>>>> If you think UINT_MAX is not enough, perhaps we could raise
>>>> the maximum value of msg_id_next to ULONG_MAX?
>>>>> Maybe this should be using
>>>>> ida_alloc/free instead, which would prevent that too?
>>>>>
>>>> The id reuse here is that the kernel has finished the open request
>>>> req_A and freed its id_A and used it again when sending the open
>>>> request req_B, but the daemon is still working on req_A, so the
>>>> copen id_A succeeds but operates on req_B.
>>>>
>>>> The id that is being used by the kernel will not be allocated here
>>>> so it seems that ida _alloc/free does not prevent reuse either,
>>>> could you elaborate a bit more how this works?
>>>>
>>> ida_alloc and free absolutely prevent reuse while the id is in use.
>>> That's sort of the point of those functions. Basically it uses a set of
>>> bitmaps in an xarray to track which IDs are in use, so ida_alloc only
>>> hands out values which are not in use. See the comments over
>>> ida_alloc_range() in lib/idr.c.
>>>
>> Thank you for the explanation!
>>
>> The logic now provides the same guarantees as ida_alloc/free.
>> The "reused" id, indeed, is no longer in use in the kernel, but it is
>> still
>> in use in the userland, so a multi-threaded daemon could be handling
>> two different requests for the same msg_id at the same time.
>>
>> Previously, the logic for allocating msg_ids was to start at 0 and look
>> for a free xas.index, so it was possible for an id to be allocated to a
>> new request just as the id was being freed.
>>
>> With the change to cyclic allocation, the kernel will not use the same
>> id again until INT_MAX requests have been sent, and during the time
>> it takes to send requests, the daemon has enough time to process
>> requests whose ids are still in use by the daemon, but have already
>> been freed in the kernel.
>
> Again, If I understand correctly, I think the main point
> here is
>
> wait_for_completion(&req_A->done)
>
> which could hang due to some malicious deamon. But I think it
> should be switched to wait_for_completion_killable() instead. *
> It's up to users to kill the mount instance if there is a
> malicious user daemon.
>
> So in that case, hung task will not be triggered anymore, and
> you don't need to care about cyclic allocation too.
>
> Thanks,
> Gao Xiang
Hi Xiang,
The problem is not as simple as you think.
If you make it killable, it just won't trigger a hung task in
cachefiles_ondemand_send_req(), and the process waiting for the
resource in question will also be hung.
* When the open/read request in the mount process gets stuck,
the sync/drop cache will trigger a hung task panic in iterate_supers()
as it waits for sb->umount to be unlocked.
* After umount, anonymous fd is not closed causing a hung task panic
in fscache_hash_cookie() because of waiting for cookie unhash.
* The dentry is in a loop up state, because the read request is not being
processed, another process looking for the same dentry is waiting for
the previous lookup to finish, which triggers a hung task panic in
d_alloc_parallel().
Can all this be made killable?
Thanks,
Baokun
>
>>
>> Regards,
>> Baokun
>>>>>> + xas_clear_mark(&xas, XA_FREE_MARK);
>>>>>> + xas_set_mark(&xas, CACHEFILES_REQ_NEW);
>>>>>> + }
>>>>>> xas_unlock(&xas);
>>>>>> } while (xas_nomem(&xas, GFP_KERNEL));
>>>>>>
Powered by blists - more mailing lists