linux-kernel - [PATCH v2 0/5] cachefiles: some bugfixes for clean object/send req/poll

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20240515125136.3714580-1-libaokun@huaweicloud.com>
Date: Wed, 15 May 2024 20:51:31 +0800
From: libaokun@...weicloud.com
To: netfs@...ts.linux.dev,
	dhowells@...hat.com,
	jlayton@...nel.org
Cc: hsiangkao@...ux.alibaba.com,
	jefflexu@...ux.alibaba.com,
	zhujia.zj@...edance.com,
	linux-erofs@...ts.ozlabs.org,
	linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	libaokun@...weicloud.com,
	yangerkun@...wei.com,
	houtao1@...wei.com,
	yukuai3@...wei.com,
	wozizhi@...wei.com,
	Baokun Li <libaokun1@...wei.com>
Subject: [PATCH v2 0/5] cachefiles: some bugfixes for clean object/send req/poll

From: Baokun Li <libaokun1@...wei.com>

Hi all!

This is the second version of this patch series. Thank you, Jia Zhu and
Gao Xiang, for the feedback in the previous version.

We've been testing ondemand mode for cachefiles since January, and we're
almost done. We hit a lot of issues during the testing period, and this
patch set fixes some of the issues related to reopen worker/send req/poll.
The patches have passed internal testing without regression.

Patch 1-3: A read request waiting for reopen could be closed maliciously
before the reopen worker is executing or waiting to be scheduled. So
ondemand_object_worker() may be called after the info and object and even
the cache have been freed and trigger use-after-free. So use
cancel_work_sync() in cachefiles_ondemand_clean_object() to cancel the
reopen worker or wait for it to finish. Since it makes no sense to wait
for the daemon to complete the reopen request, to avoid this pointless
operation blocking cancel_work_sync(), Patch 1 avoids request generation
by the DROPPING state when the request has not been sent, and Patch 2
flushes the requests of the current object before cancel_work_sync().

Patch 4: Cyclic allocation of msg_id to avoid msg_id reuse misleading
the daemon to cause hung.

Patch 5: Hold xas_lock during polling to avoid dereferencing reqs causing
use-after-free. This issue was triggered frequently in our tests, and we
found that anolis 5.10 had fixed it, so to avoid failing the test, this
patch was pushed upstream as well.

Comments and questions are, as always, welcome.
Please let me know what you think.

Thanks,
Baokun

Changes since v1:
  * Collect RVB from Jia Zhu and Gao Xiang.(Thanks for your review!)
  * Pathch 1,2：Add more commit messages.
  * Pathch 3：Add Fixes tag as suggested by Jia Zhu.
  * Pathch 4：No longer changing "do...while" to "retry" to focus changes
    and optimise commit messages.
  * Pathch 5: Drop the internal RVB tag.

[V1]: https://lore.kernel.org/all/20240424033409.2735257-1-libaokun@huaweicloud.com

Baokun Li (3):
  cachefiles: stop sending new request when dropping object
  cachefiles: flush all requests for the object that is being dropped
  cachefiles: cyclic allocation of msg_id to avoid reuse

Hou Tao (1):
  cachefiles: flush ondemand_object_worker during clean object

Jingbo Xu (1):
  cachefiles: add missing lock protection when polling

 fs/cachefiles/daemon.c   |  4 ++--
 fs/cachefiles/internal.h |  3 +++
 fs/cachefiles/ondemand.c | 52 +++++++++++++++++++++++++++++++++++-----
 3 files changed, 51 insertions(+), 8 deletions(-)

-- 
2.39.2