[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 9 Sep 2022 11:45:06 -0400
From: bfields@...ldses.org (J. Bruce Fields)
To: Jeff Layton <jlayton@...nel.org>
Cc: Theodore Ts'o <tytso@....edu>, Jan Kara <jack@...e.cz>,
NeilBrown <neilb@...e.de>, adilger.kernel@...ger.ca,
djwong@...nel.org, david@...morbit.com, trondmy@...merspace.com,
viro@...iv.linux.org.uk, zohar@...ux.ibm.com, xiubli@...hat.com,
chuck.lever@...cle.com, lczerner@...hat.com, brauner@...nel.org,
fweimer@...hat.com, linux-man@...r.kernel.org,
linux-api@...r.kernel.org, linux-btrfs@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
ceph-devel@...r.kernel.org, linux-ext4@...r.kernel.org,
linux-nfs@...r.kernel.org, linux-xfs@...r.kernel.org
Subject: Re: [man-pages RFC PATCH v4] statx, inode: document the new
STATX_INO_VERSION field
On Thu, Sep 08, 2022 at 03:07:58PM -0400, Jeff Layton wrote:
> On Thu, 2022-09-08 at 14:22 -0400, J. Bruce Fields wrote:
> > On Thu, Sep 08, 2022 at 01:40:11PM -0400, Jeff Layton wrote:
> > > Yeah, ok. That does make some sense. So we would mix this into the
> > > i_version instead of the ctime when it was available. Preferably, we'd
> > > mix that in when we store the i_version rather than adding it afterward.
> > >
> > > Ted, how would we access this? Maybe we could just add a new (generic)
> > > super_block field for this that ext4 (and other filesystems) could
> > > populate at mount time?
> >
> > Couldn't the filesystem just return an ino_version that already includes
> > it?
> >
>
> Yes. That's simple if we want to just fold it in during getattr. If we
> want to fold that into the values stored on disk, then I'm a little less
> clear on how that will work.
>
> Maybe I need a concrete example of how that will work:
>
> Suppose we have an i_version value X with the previous crash counter
> already factored in that makes it to disk. We hand out a newer version
> X+1 to a client, but that value never makes it to disk.
>
> The machine crashes and comes back up, and we get a query for i_version
> and it comes back as X. Fine, it's an old version. Now there is a write.
> What do we do to ensure that the new value doesn't collide with X+1?
I was assuming we could partition i_version's 64 bits somehow: e.g., top
16 bits store the crash counter. You increment the i_version by: 1)
replacing the top bits by the new crash counter, if it has changed, and
2) incrementing.
Do the numbers work out? 2^16 mounts after unclean shutdowns sounds
like a lot for one filesystem, as does 2^48 changes to a single file,
but people do weird things. Maybe there's a better partitioning, or
some more flexible way of maintaining an i_version that still allows you
to identify whether a given i_version preceded a crash.
--b.
Powered by blists - more mailing lists