[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240515125136.3714580-1-libaokun@huaweicloud.com>
Date: Wed, 15 May 2024 20:51:31 +0800
From: libaokun@...weicloud.com
To: netfs@...ts.linux.dev,
dhowells@...hat.com,
jlayton@...nel.org
Cc: hsiangkao@...ux.alibaba.com,
jefflexu@...ux.alibaba.com,
zhujia.zj@...edance.com,
linux-erofs@...ts.ozlabs.org,
linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org,
libaokun@...weicloud.com,
yangerkun@...wei.com,
houtao1@...wei.com,
yukuai3@...wei.com,
wozizhi@...wei.com,
Baokun Li <libaokun1@...wei.com>
Subject: [PATCH v2 0/5] cachefiles: some bugfixes for clean object/send req/poll
From: Baokun Li <libaokun1@...wei.com>
Hi all!
This is the second version of this patch series. Thank you, Jia Zhu and
Gao Xiang, for the feedback in the previous version.
We've been testing ondemand mode for cachefiles since January, and we're
almost done. We hit a lot of issues during the testing period, and this
patch set fixes some of the issues related to reopen worker/send req/poll.
The patches have passed internal testing without regression.
Patch 1-3: A read request waiting for reopen could be closed maliciously
before the reopen worker is executing or waiting to be scheduled. So
ondemand_object_worker() may be called after the info and object and even
the cache have been freed and trigger use-after-free. So use
cancel_work_sync() in cachefiles_ondemand_clean_object() to cancel the
reopen worker or wait for it to finish. Since it makes no sense to wait
for the daemon to complete the reopen request, to avoid this pointless
operation blocking cancel_work_sync(), Patch 1 avoids request generation
by the DROPPING state when the request has not been sent, and Patch 2
flushes the requests of the current object before cancel_work_sync().
Patch 4: Cyclic allocation of msg_id to avoid msg_id reuse misleading
the daemon to cause hung.
Patch 5: Hold xas_lock during polling to avoid dereferencing reqs causing
use-after-free. This issue was triggered frequently in our tests, and we
found that anolis 5.10 had fixed it, so to avoid failing the test, this
patch was pushed upstream as well.
Comments and questions are, as always, welcome.
Please let me know what you think.
Thanks,
Baokun
Changes since v1:
* Collect RVB from Jia Zhu and Gao Xiang.(Thanks for your review!)
* Pathch 1,2:Add more commit messages.
* Pathch 3:Add Fixes tag as suggested by Jia Zhu.
* Pathch 4:No longer changing "do...while" to "retry" to focus changes
and optimise commit messages.
* Pathch 5: Drop the internal RVB tag.
[V1]: https://lore.kernel.org/all/20240424033409.2735257-1-libaokun@huaweicloud.com
Baokun Li (3):
cachefiles: stop sending new request when dropping object
cachefiles: flush all requests for the object that is being dropped
cachefiles: cyclic allocation of msg_id to avoid reuse
Hou Tao (1):
cachefiles: flush ondemand_object_worker during clean object
Jingbo Xu (1):
cachefiles: add missing lock protection when polling
fs/cachefiles/daemon.c | 4 ++--
fs/cachefiles/internal.h | 3 +++
fs/cachefiles/ondemand.c | 52 +++++++++++++++++++++++++++++++++++-----
3 files changed, 51 insertions(+), 8 deletions(-)
--
2.39.2
Powered by blists - more mailing lists