lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 09 Sep 2022 12:36:29 -0400
From:   Jeff Layton <>
To:     "J. Bruce Fields" <>
Cc:     Theodore Ts'o <>, Jan Kara <>,
        NeilBrown <>,,,,,,,,,,,,,,,,,,,,
Subject: Re: [man-pages RFC PATCH v4] statx, inode: document the new

On Fri, 2022-09-09 at 11:45 -0400, J. Bruce Fields wrote:
> On Thu, Sep 08, 2022 at 03:07:58PM -0400, Jeff Layton wrote:
> > On Thu, 2022-09-08 at 14:22 -0400, J. Bruce Fields wrote:
> > > On Thu, Sep 08, 2022 at 01:40:11PM -0400, Jeff Layton wrote:
> > > > Yeah, ok. That does make some sense. So we would mix this into the
> > > > i_version instead of the ctime when it was available. Preferably, we'd
> > > > mix that in when we store the i_version rather than adding it afterward.
> > > > 
> > > > Ted, how would we access this? Maybe we could just add a new (generic)
> > > > super_block field for this that ext4 (and other filesystems) could
> > > > populate at mount time?
> > > 
> > > Couldn't the filesystem just return an ino_version that already includes
> > > it?
> > > 
> > 
> > Yes. That's simple if we want to just fold it in during getattr. If we
> > want to fold that into the values stored on disk, then I'm a little less
> > clear on how that will work.
> > 
> > Maybe I need a concrete example of how that will work:
> > 
> > Suppose we have an i_version value X with the previous crash counter
> > already factored in that makes it to disk. We hand out a newer version
> > X+1 to a client, but that value never makes it to disk.
> > 
> > The machine crashes and comes back up, and we get a query for i_version
> > and it comes back as X. Fine, it's an old version. Now there is a write.
> > What do we do to ensure that the new value doesn't collide with X+1? 
> I was assuming we could partition i_version's 64 bits somehow: e.g., top
> 16 bits store the crash counter.  You increment the i_version by: 1)
> replacing the top bits by the new crash counter, if it has changed, and
> 2) incrementing.
> Do the numbers work out?  2^16 mounts after unclean shutdowns sounds
> like a lot for one filesystem, as does 2^48 changes to a single file,
> but people do weird things.  Maybe there's a better partitioning, or
> some more flexible way of maintaining an i_version that still allows you
> to identify whether a given i_version preceded a crash.

We consume one bit to keep track of the "seen" flag, so it would be a
16+47 split. I assume that we'd also reset the version counter to 0 when
the crash counter changes? Maybe that doesn't matter as long as we don't
overflow into the crash counter.

I'm not sure we can get away with 16 bits for the crash counter, as
it'll leave us subject to the version counter wrapping after a long

If you increment a counter every nanosecond, how long until that counter
wraps? With 63 bits, that's 292 years (and change). With 16+47 bits,
that's less than two days. An 8+55 split would give us ~416 days which
seems a bit more reasonable?

For NFS, we can probably live with even less bits in the crash counter. 

If the crash counter changes, then that means the NFS server itself has
(likely) also crashed. The client will have to reestablish sockets,
reclaim, etc. It should get new attributes for the inodes it cares about
at that time.
Jeff Layton <>

Powered by blists - more mailing lists