linux-ext4 - Re: [PATCH v7 0/5] Introduce provisioning primitives

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG9=OMPxHOzYcy8TQRnvNfNvPvvU=A1pceyL72JfyQwJSKNjQQ@mail.gmail.com>
Date:   Thu, 25 May 2023 19:35:14 -0700
From:   Sarthak Kukreti <sarthakkukreti@...omium.org>
To:     Dave Chinner <david@...morbit.com>
Cc:     Mike Snitzer <snitzer@...nel.org>, Joe Thornber <ejt@...hat.com>,
        Jens Axboe <axboe@...nel.dk>, linux-block@...r.kernel.org,
        "Theodore Ts'o" <tytso@....edu>,
        Stefan Hajnoczi <stefanha@...hat.com>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        "Darrick J. Wong" <djwong@...nel.org>,
        Brian Foster <bfoster@...hat.com>,
        Bart Van Assche <bvanassche@...gle.com>,
        linux-kernel@...r.kernel.org,
        Christoph Hellwig <hch@...radead.org>, dm-devel@...hat.com,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org,
        Jason Wang <jasowang@...hat.com>,
        Alasdair Kergon <agk@...hat.com>
Subject: Re: [PATCH v7 0/5] Introduce provisioning primitives

On Thu, May 25, 2023 at 6:36 PM Dave Chinner <david@...morbit.com> wrote:
>
> On Thu, May 25, 2023 at 03:47:21PM -0700, Sarthak Kukreti wrote:
> > On Thu, May 25, 2023 at 9:00 AM Mike Snitzer <snitzer@...nel.org> wrote:
> > > On Thu, May 25 2023 at  7:39P -0400,
> > > Dave Chinner <david@...morbit.com> wrote:
> > > > On Wed, May 24, 2023 at 04:02:49PM -0400, Mike Snitzer wrote:
> > > > > On Tue, May 23 2023 at  8:40P -0400,
> > > > > Dave Chinner <david@...morbit.com> wrote:
> > > > > > It's worth noting that XFS already has a coarse-grained
> > > > > > implementation of preferred regions for metadata storage. It will
> > > > > > currently not use those metadata-preferred regions for user data
> > > > > > unless all the remaining user data space is full.  Hence I'm pretty
> > > > > > sure that a pre-provisioning enhancment like this can be done
> > > > > > entirely in-memory without requiring any new on-disk state to be
> > > > > > added.
> > > > > >
> > > > > > Sure, if we crash and remount, then we might chose a different LBA
> > > > > > region for pre-provisioning. But that's not really a huge deal as we
> > > > > > could also run an internal background post-mount fstrim operation to
> > > > > > remove any unused pre-provisioning that was left over from when the
> > > > > > system went down.
> > > > >
> > > > > This would be the FITRIM with extension you mention below? Which is a
> > > > > filesystem interface detail?
> > > >
> > > > No. We might reuse some of the internal infrastructure we use to
> > > > implement FITRIM, but that's about it. It's just something kinda
> > > > like FITRIM but with different constraints determined by the
> > > > filesystem rather than the user...
> > > >
> > > > As it is, I'm not sure we'd even need it - a preiodic userspace
> > > > FITRIM would acheive the same result, so leaked provisioned spaces
> > > > would get cleaned up eventually without the filesystem having to do
> > > > anything specific...
> > > >
> > > > > So dm-thinp would _not_ need to have new
> > > > > state that tracks "provisioned but unused" block?
> > > >
> > > > No idea - that's your domain. :)
> > > >
> > > > dm-snapshot, for certain, will need to track provisioned regions
> > > > because it has to guarantee that overwrites to provisioned space in
> > > > the origin device will always succeed. Hence it needs to know how
> > > > much space breaking sharing in provisioned regions after a snapshot
> > > > has been taken with be required...
> > >
> > > dm-thinp offers its own much more scalable snapshot support (doesn't
> > > use old dm-snapshot N-way copyout target).
> > >
> > > dm-snapshot isn't going to be modified to support this level of
> > > hardening (dm-snapshot is basically in "maintenance only" now).
>
> Ah, of course. Sorry for the confusion, I was kinda using
> dm-snapshot as shorthand for "dm-thinp + snapshots".
>
> > > But I understand your meaning: what you said is 100% applicable to
> > > dm-thinp's snapshot implementation and needs to be accounted for in
> > > thinp's metadata (inherent 'provisioned' flag).
>
> *nod*
>
> > A bit orthogonal: would dm-thinp need to differentiate between
> > user-triggered provision requests (eg. from fallocate()) vs
> > fs-triggered requests?
>
> Why?  How is the guarantee the block device has to provide to
> provisioned areas different for user vs filesystem internal
> provisioned space?
>
After thinking this through, I stand corrected. I was primarily
concerned with how this would balloon thin snapshot sizes if users
potentially provision a large chunk of the filesystem but that's
putting the cart way before the horse.

Best
Sarthak

> > I would lean towards user provisioned areas not
> > getting dedup'd on snapshot creation,
>
> <twitch>
>
> Snapshotting is a clone operation, not a dedupe operation.
>
> Yes, the end result of both is that you have a block shared between
> multiple indexes that needs COW on the next overwrite, but the two
> operations that get to that point are very different...
>
> </pedantic mode disegaged>
>
> > but that would entail tracking
> > the state of the original request and possibly a provision request
> > flag (REQ_PROVISION_DEDUP_ON_SNAPSHOT) or an inverse flag
> > (REQ_PROVISION_NODEDUP). Possibly too convoluted...
>
> Let's not try to add everyone's favourite pony to this interface
> before we've even got it off the ground.
>
> It's the simple precision of the API, the lack of cross-layer
> communication requirements and the ability to implement and optimise
> the independent layers independently that makes this a very
> appealing solution.
>
> We need to start with getting the simple stuff working and prove the
> concept. Then once we can observe the behaviour of a working system
> we can start working on optimising individual layers for efficiency
> and performance....
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@...morbit.com