[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3536f485-e496-4042-a231-c43a567cee7b@huaweicloud.com>
Date: Fri, 28 Jun 2024 19:37:06 +0800
From: Baokun Li <libaokun@...weicloud.com>
To: Gao Xiang <hsiangkao@...ux.alibaba.com>
Cc: Christian Brauner <brauner@...nel.org>, jlayton@...nel.org,
dhowells@...hat.com, netfs@...ts.linux.dev, jefflexu@...ux.alibaba.com,
zhujia.zj@...edance.com, linux-erofs@...ts.ozlabs.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
yangerkun@...wei.com, houtao1@...wei.com, yukuai3@...wei.com,
wozizhi@...wei.com, Baokun Li <libaokun1@...wei.com>,
Baokun Li <libaokun@...weicloud.com>
Subject: Re: [PATCH v3 0/9] cachefiles: random bugfixes
Hi Xiang,
On 2024/6/28 15:39, Gao Xiang wrote:
> Hi Baokun,
>
> On 2024/6/28 14:29, libaokun@...weicloud.com wrote:
>> From: Baokun Li <libaokun1@...wei.com>
>>
>> Hi all!
>>
>> This is the third version of this patch series, in which another
>> patch set
>> is subsumed into this one to avoid confusing the two patch sets.
>> (https://patchwork.kernel.org/project/linux-fsdevel/list/?series=854914)
>>
>> Thank you, Jia Zhu, Gao Xiang, Jeff Layton, for the feedback in the
>> previous version.
>>
>> We've been testing ondemand mode for cachefiles since January, and we're
>> almost done. We hit a lot of issues during the testing period, and this
>> patch series fixes some of the issues. The patches have passed internal
>> testing without regression.
>>
>> The following is a brief overview of the patches, see the patches for
>> more details.
>>
>> Patch 1-2: Add fscache_try_get_volume() helper function to avoid
>> fscache_volume use-after-free on cache withdrawal.
>>
>> Patch 3: Fix cachefiles_lookup_cookie() and cachefiles_withdraw_cache()
>> concurrency causing cachefiles_volume use-after-free.
>>
>> Patch 4: Propagate error codes returned by vfs_getxattr() to avoid
>> endless loops.
>>
>> Patch 5-7: A read request waiting for reopen could be closed maliciously
>> before the reopen worker is executing or waiting to be scheduled. So
>> ondemand_object_worker() may be called after the info and object and
>> even
>> the cache have been freed and trigger use-after-free. So use
>> cancel_work_sync() in cachefiles_ondemand_clean_object() to cancel the
>> reopen worker or wait for it to finish. Since it makes no sense to wait
>> for the daemon to complete the reopen request, to avoid this pointless
>> operation blocking cancel_work_sync(), Patch 1 avoids request generation
>> by the DROPPING state when the request has not been sent, and Patch 2
>> flushes the requests of the current object before cancel_work_sync().
>>
>> Patch 8: Cyclic allocation of msg_id to avoid msg_id reuse misleading
>> the daemon to cause hung.
>>
>> Patch 9: Hold xas_lock during polling to avoid dereferencing reqs
>> causing
>> use-after-free. This issue was triggered frequently in our tests, and we
>> found that anolis 5.10 had fixed it. So to avoid failing the test, this
>> patch is pushed upstream as well.
>>
>> Comments and questions are, as always, welcome.
>> Please let me know what you think.
>
> Patch 4-9 looks good to me, and they are independent to patch 1-3
> so personally I guess they could go upstream in advance.
Thank you for your review!
Indeed, the first three patches have no dependencies on the later
ones, and the later ones can be merged in first.
>
> I hope the way to fix cachefiles in patch 1-4 could be also
> confirmed by David and Jeff since they relates the generic
> cachefiles logic anyway.
Yes, the first four patches modify the generic process for cachefiles,
and it would be great if David and Jeff could take a look at those.
The logic for patches 2 and 3 is a bit more complex, so the review
may take some time.
Cheers,
Baokun
Powered by blists - more mailing lists