[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230414000219.92640-1-sarthakkukreti@chromium.org>
Date: Thu, 13 Apr 2023 17:02:16 -0700
From: Sarthak Kukreti <sarthakkukreti@...omium.org>
To: sarthakkukreti@...gle.com, dm-devel@...hat.com,
linux-block@...r.kernel.org, linux-ext4@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Cc: Jens Axboe <axboe@...nel.dk>,
"Michael S. Tsirkin" <mst@...hat.com>,
Jason Wang <jasowang@...hat.com>,
Stefan Hajnoczi <stefanha@...hat.com>,
Alasdair Kergon <agk@...hat.com>,
Mike Snitzer <snitzer@...nel.org>,
Christoph Hellwig <hch@...radead.org>,
Brian Foster <bfoster@...hat.com>,
Theodore Ts'o <tytso@....edu>,
Andreas Dilger <adilger.kernel@...ger.ca>,
Bart Van Assche <bvanassche@...gle.com>,
Daniil Lunev <dlunev@...gle.com>,
"Darrick J. Wong" <djwong@...nel.org>
Subject: [PATCH v3 0/3] Introduce provisioning primitives for thinly provisioned storage
Hi,
This patch series adds a mechanism to pass through provision requests on
stacked thinly provisioned block devices.
The linux kernel provides several mechanisms to set up thinly provisioned
block storage abstractions (eg. dm-thin, loop devices over sparse files),
either directly as block devices or backing storage for filesystems. Currently,
short of writing data to either the device or filesystme, there is no way for
users to pre-allocate space for use in such storage setups. Consider the
following use-cases:
1) Suspend-to-disk and resume from a dm-thin device: In order to ensure that
the underlying thinpool metadata is not modified during the suspend
mechanism, the dm-thin device needs to be fully provisioned.
2) If a filesystem uses a loop device over a sparse file, fallocate() on the
filesystem will allocate blocks for files but the underlying sparse file
will remain intact.
3) Another example is virtual machine using a sparse file/dm-thin as a storage
device; by default, allocations within the VM boundaries will not affect
the host.
4) Several storage standards support mechanisms for thin provisioning on
real hardware devices. For example:
a. The NVMe spec 1.0b section 2.1.1 loosely talks about thin provisioning:
"When the THINP bit in the NSFEAT field of the Identify Namespace data
structure is set to ‘1’, the controller ... shall track the number of
allocated blocks in the Namespace Utilization field"
b. The SCSi Block Commands reference - 4 section references "Thin
provisioned logical units",
c. UFS 3.0 spec section 13.3.3 references "Thin provisioning".
In all the above situations, currently, the only way for pre-allocating space
is to issue writes (or use WRITE_ZEROES/WRITE_SAME). However, that does not
scale well with larger pre-allocation sizes.
This patchset introduces primitives to support block-level provisioning (note:
the term 'provisioning' is used to prevent overloading the term
'allocations/pre-allocations') requests across filesystems and block devices.
This allows fallocate() and file creation requests to reserve space across
stacked layers of block devices and filesystems. Currently, the patchset covers
a prototype on the device-mapper targets, loop device and ext4, but the same
mechanism can be extended to other filesystems/block devices as well as extended
for use with devices in 4 a-c.
Patch 1 introduces REQ_OP_PROVISION as a new request type.
The provision request acts like the inverse of a discard request; instead
of notifying lower layers that the block range will no longer be used, provision
acts as a request to lower layers to provision disk space for the given block
range. Real hardware storage devices will currently disable the provisioing
capability but for the standards listed in 4a.-c., REQ_OP_PROVISION can be
overloaded for use as the provisioing primitive for future devices.
Patch 2 implements REQ_OP_PROVISION handling for some of the device-mapper
targets. Device-mapper targets will usually mirror the support of underlying
devices. This patch also enables the use of fallocate in mode == 0 for block
devices.
Patch 3 wires up the loop device handling of REQ_OP_PROVISION and calls
fallocate() with mode 0 on the underlying file/block device.
Testing:
--------
- Tested on a VM running a 6.2 kernel.
- Preallocation of dm-thin devices:
As expected, avoiding the need to zero out thinly-provisioned block devices to
preallocate space speeds up the provisioning operation significantly:
The following was tested on a dm-thin device set up on top of a dm-thinp with
skip_block_zeroing=true.
A) Zeroout was measured using `fallocate -z ...`
B) Provision was measured using `fallocate -p ...`.
Size Time A B
512M real 1.093 0.034
user 0 0
sys 0.022 0.01
1G real 2.182 0.048
user 0 0.01
sys 0.022 0
2G real 4.344 0.082
user 0 0.01
sys 0.036 0
4G real 8.679 0.153
user 0 0.01
sys 0.073 0
8G real 17.777 0.318
user 0 0.01
sys 0.144 0
Changelog:
V3:
- Drop FALLOC_FL_PROVISION and use mode == 0 for provision requests.
- Drop fs-specific patches; will be sent out in a follow up series.
- Fix missing shared block handling for thin snapshots.
V2:
- Fix stacked limit handling.
- Enable provision request handling in dm-snapshot
- Don't call truncate_bdev_range if blkdev_fallocate() is called with
FALLOC_FL_PROVISION.
- Clarify semantics of FALLOC_FL_PROVISION and why it needs to be a separate flag
(as opposed to overloading mode == 0).
Sarthak Kukreti (3):
block: Introduce provisioning primitives
dm: Add support for block provisioning
loop: Add support for provision requests
block/blk-core.c | 5 ++
block/blk-lib.c | 53 ++++++++++++++++
block/blk-merge.c | 18 ++++++
block/blk-settings.c | 19 ++++++
block/blk-sysfs.c | 8 +++
block/bounce.c | 1 +
block/fops.c | 14 +++--
drivers/block/loop.c | 42 +++++++++++++
drivers/md/dm-crypt.c | 4 +-
drivers/md/dm-linear.c | 1 +
drivers/md/dm-snap.c | 7 +++
drivers/md/dm-table.c | 25 ++++++++
drivers/md/dm-thin.c | 110 +++++++++++++++++++++++++++++++---
drivers/md/dm.c | 4 ++
include/linux/bio.h | 6 +-
include/linux/blk_types.h | 5 +-
include/linux/blkdev.h | 16 +++++
include/linux/device-mapper.h | 11 ++++
18 files changed, 333 insertions(+), 16 deletions(-)
--
2.40.0.634.g4ca3ef3211-goog
Powered by blists - more mailing lists