lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <CAG9=OMPQEoMVpXD8PeHwkymwk-zfB3mSvDO_W6h0S3Zom62JBQ@mail.gmail.com> Date: Thu, 29 Dec 2022 00:17:00 -0800 From: Sarthak Kukreti <sarthakkukreti@...omium.org> To: Mike Snitzer <snitzer@...hat.com> Cc: Christoph Hellwig <hch@...radead.org>, Daniil Lunev <dlunev@...gle.com>, Jens Axboe <axboe@...nel.dk>, linux-block@...r.kernel.org, "Theodore Ts'o" <tytso@....edu>, "Michael S . Tsirkin" <mst@...hat.com>, Jason Wang <jasowang@...hat.com>, Bart Van Assche <bvanassche@...gle.com>, Mike Snitzer <snitzer@...nel.org>, linux-kernel@...r.kernel.org, Gwendal Grignou <gwendal@...gle.com>, virtualization@...ts.linux-foundation.org, dm-devel@...hat.com, Andreas Dilger <adilger.kernel@...ger.ca>, Stefan Hajnoczi <stefanha@...hat.com>, Paolo Bonzini <pbonzini@...hat.com>, linux-ext4@...r.kernel.org, Evan Green <evgreen@...gle.com>, Alasdair Kergon <agk@...hat.com> Subject: Re: [PATCH RFC 0/8] Introduce provisioning primitives for thinly provisioned storage On Fri, Sep 23, 2022 at 7:08 AM Mike Snitzer <snitzer@...hat.com> wrote: > > On Fri, Sep 23 2022 at 4:51P -0400, > Christoph Hellwig <hch@...radead.org> wrote: > > > On Wed, Sep 21, 2022 at 07:48:50AM +1000, Daniil Lunev wrote: > > > > There is no such thing as WRITE UNAVAILABLE in NVMe. > > > Apologize, that is WRITE UNCORRECTABLE. Chapter 3.2.7 of > > > NVM Express NVM Command Set Specification 1.0b > > > > Write uncorrectable is a very different thing, and the equivalent of the > > horribly misnamed SCSI WRITE LONG COMMAND. It injects an unrecoverable > > error, and does not provision anything. > > > > > * Each application is potentially allowed to consume the entirety > > > of the disk space - there is no strict size limit for application > > > * Applications need to pre-allocate space sometime, for which > > > they use fallocate. Once the operation succeeded, the application > > > assumed the space is guaranteed to be there for it. > > > * Since filesystems on the volumes are independent, filesystem > > > level enforcement of size constraints is impossible and the only > > > common level is the thin pool, thus, each fallocate has to find its > > > representation in thin pool one way or another - otherwise you > > > may end up in the situation, where FS thinks it has allocated space > > > but when it tries to actually write it, the thin pool is already > > > exhausted. > > > * Hole-Punching fallocate will not reach the thin pool, so the only > > > solution presently is zero-writing pre-allocate. > > > > To me it sounds like you want a non-thin pool in dm-thin and/or > > guaranted space reservations for it. > > What is implemented in this patchset: enablement for dm-thinp to > actually provide guarantees which fallocate requires. > > Seems you're getting hung up on the finishing details in HW (details > which are _not_ the point of this patchset). > > The proposed changes are in service to _Linux_ code. The patchset > implements the primitive from top (ext4) to bottom (dm-thinp, loop). > It stops short of implementing handling everywhere that'd need it > (e.g. in XFS, etc). But those changes can come as follow-on work once > the primitive is established top to bottom. > > But you know all this ;) > > > > * Thus, a provisioning block operation allows an interface specific > > > operation that guarantees the presence of the block in the > > > mapped space. LVM Thin-pool itself is the primary target for our > > > use case but the argument is that this operation maps well to > > > other interfaces which allow thinly provisioned units. > > > > I think where you are trying to go here is badly mistaken. With flash > > (or hard drive SMR) there is no such thing as provisioning LBAs. Every > > write is out of place, and a one time space allocation does not help > > you at all. So fundamentally what you try to here just goes against > > the actual physics of modern storage media. While there are some > > layers that keep up a pretence, trying to that an an exposed API > > level is a really bad idea. > > This doesn't need to be so feudal. Reserving an LBA in physical HW > really isn't the point. > > Fact remains: an operation that ensures space is actually reserved via > fallocate is long overdue (just because an FS did its job doesn't mean > underlying layers reflect that). And certainly useful, even if "only" > benefiting dm-thinp and the loop driver. Like other block primitives, > REQ_OP_PROVISION is filtered out by block core if the device doesn't > support it. > > That said, I agree with Brian Foster that we need really solid > documentation and justification for why fallocate mode=0 cannot be > used (but the case has been made in this thread). > > Also, I do see an issue with the implementation (relative to stacked > devices): dm_table_supports_provision() is too myopic about DM. It > needs to go a step further and verify that some layer in the stack > actually services REQ_OP_PROVISION. Will respond to DM patch too. > Thanks all for the suggestions and feedback! I just posted v2 (more than a bit belatedly) on the various mailing lists with the relevant fixes, documentation and some benchmarks on performance. Best Sarthak
Powered by blists - more mailing lists