[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220802030342.46302-1-jefflexu@linux.alibaba.com>
Date: Tue, 2 Aug 2022 11:03:33 +0800
From: Jingbo Xu <jefflexu@...ux.alibaba.com>
To: dhowells@...hat.com, linux-cachefs@...hat.com
Cc: linux-kernel@...r.kernel.org, xiang@...nel.org
Subject: [RFC PATCH 0/9] cachefiles: content map
Kernel Patchset
===============
Git tree:
https://github.com/lostjeffle/linux.git jingbo/dev-fscache-bitmap-v1
Gitweb:
https://github.com/lostjeffle/linux/commits/jingbo/dev-fscache-bitmap-v1
[Introduction]
==============
Besides the SEEK_[DATA|HOLE] llseek mechanism provided by the backing
filesystem, this patch set is going to introduce a bitmap based
mechanism, in which a self-maintained bitmap is used to track if the
file range has been cached by the backing file.
[Design]
========
[Content Map]
The content map is allocated/expanded/shorten in unit of PAGE_SIZE,
which is multiples times of the block size of the backing filesystem,
so that the backing content map file can be easily punched hole if the
content map gets truncated or invalidated. Each bit of the content map
indicates the existence of 4KB data of the backing file, thus each
(4K sized) chunk of content map covers 128MB data of the backing file.
In the lookup phase, for the case when the backing file already exists,
the content map is loaded from the backing content map file. When the
backing file gets written, the content map gets updated on the
completion of the write (i.e. cachefiles_write_complete()).
When the backing file is truncated to a larger size, we need to expand
the content map accordingly. However the expansion of the content map is
done in a lazy expansion way. That is, the expansion of the content map
is delayed to the point when the content map needs to be marked, inside
cachefiles_write_complete(), i.e. iocb.ki_complete() callback. It shall
be safe to allocate memory with GFP_KERNEL inside the iocb.ki_complete()
callback, since the callback is scheduled by workqueue for DIRECT IO.
While for the case where the backing file doesn't exist, i.e. a new
tmpfile is created as the backing file, the content map will not be
allocated at the lookup phase. Instead, it will be expanded at runtime
in the same way described above.
When the backing file is truncated to a smaller size, only the tailing
part that exceeds the new size gets zeroed, while the content map itself
is not truncated.
Thus the content map size may be smaller or larger than the actual size
of the backing file.
[Backing Content Map File]
The content map is permanentized to the backing content map file.
Currently each sub-directory under one volume maintains one backing
content map file, so that the cacehfilesd only needs to remove the whole
sub-directory (including the content map file and backing files in the
sub-directory) as usual when it's going to cull the whole sub-directory
or volume.
In this case, the content map file will be shared among all backing
files under the same sub-directory. Thus the offset of the content map
in the backing content map file needs to be stored in the xattr for each
backing file. Besides, since the content map size may be smaller or
larger than the actual size of the backing file as we described above,
the content map size also needs to be stored in the xattr of the backing
file.
When expanding the content map, a new offset inside the backing content
map file also needs to be allocated, with the old range starting from
the old offset getting punched hole. Currently the new offset is always
allocated in an appending style, i.e. the previous hole will not be
reused.
[Time Sequence]
===============
I haven't do much work on this yet though... Actually there are three
actions when filling the cache:
1. write data to the backing file
2. write content map to the backing content map file
3. flush the content of xattr to disk
Currently action 1 is through DIRECT IO, while action 2 is buffered IO.
To make sure the content map is flushed to disk _before_ xattr gets
flushed to disk, the backing content map file is opened with O_DSYNC, so
that the following write to the backing content map file will only
return when the written data has been flushed to disk.
[TEST]
======
It passes the test cases for on-demand mode[1].
[1] https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git/tree/tests/fscache?h=experimental-tests-fscache
It also passes xfstests on NFS 4.0 with fscache enabled.
The performance test is still under progress.
Jingbo Xu (9):
cachefiles: improve FSCACHE_COOKIE_NO_DATA_TO_READ optimization
cachefiles: add content map file helpers
cachefiles: allocate per-subdir content map files
cachefiles: alloc/load/save content map
cachefiles: mark content map on write to the backing file
cachefiles: check content map on read/write
cachefiles: free content map on invalidate
cachefiles: resize content map on resize
cachefiles: cull content map file on cull
fs/cachefiles/Makefile | 3 +-
fs/cachefiles/content-map.c | 333 ++++++++++++++++++++++++++++++++++++
fs/cachefiles/interface.c | 10 +-
fs/cachefiles/internal.h | 31 ++++
fs/cachefiles/io.c | 59 +++++--
fs/cachefiles/namei.c | 96 +++++++++++
fs/cachefiles/ondemand.c | 5 +-
fs/cachefiles/volume.c | 14 +-
fs/cachefiles/xattr.c | 26 +++
fs/fscache/cookie.c | 2 +-
10 files changed, 558 insertions(+), 21 deletions(-)
create mode 100644 fs/cachefiles/content-map.c
--
2.27.0
Powered by blists - more mailing lists