lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 12 Sep 2022 08:55:04 -0400 From: Jeff Layton <jlayton@...nel.org> To: Florian Weimer <fweimer@...hat.com> Cc: "J. Bruce Fields" <bfields@...ldses.org>, Theodore Ts'o <tytso@....edu>, Jan Kara <jack@...e.cz>, NeilBrown <neilb@...e.de>, adilger.kernel@...ger.ca, djwong@...nel.org, david@...morbit.com, trondmy@...merspace.com, viro@...iv.linux.org.uk, zohar@...ux.ibm.com, xiubli@...hat.com, chuck.lever@...cle.com, lczerner@...hat.com, brauner@...nel.org, linux-man@...r.kernel.org, linux-api@...r.kernel.org, linux-btrfs@...r.kernel.org, linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, ceph-devel@...r.kernel.org, linux-ext4@...r.kernel.org, linux-nfs@...r.kernel.org, linux-xfs@...r.kernel.org Subject: Re: [man-pages RFC PATCH v4] statx, inode: document the new STATX_INO_VERSION field On Mon, 2022-09-12 at 14:13 +0200, Florian Weimer wrote: > * Jeff Layton: > > > To do this we'd need 2 64-bit fields in the on-disk and in-memory > > superblocks for ext4, xfs and btrfs. On the first mount after a crash, > > the filesystem would need to bump s_version_max by the significant > > increment (2^40 bits or whatever). On a "clean" mount, it wouldn't need > > to do that. > > > > Would there be a way to ensure that the new s_version_max value has made > > it to disk? Bumping it by a large value and hoping for the best might be > > ok for most cases, but there are always outliers, so it might be > > worthwhile to make an i_version increment wait on that if necessary. > > How common are unclean shutdowns in practice? Do ex64/XFS/btrfs keep > counters in the superblocks for journal replays that can be read easily? > > Several useful i_version applications could be negatively impacted by > frequent i_version invalidation. > One would hope "not very often", but Oopses _are_ something that happens occasionally, even in very stable environments, and it would be best if what we're building can cope with them. Consider: reader writer ---------------------------------------------------------- start with i_version 1 inode updated in memory, i_version++ query, get i_version 2 <<< CRASH : update never makes it to disk, back at 1 after reboot >>> query, get i_version 1 application restarts and redoes write, i_version at 2^40+1 query, get i_version 2^40+1 The main thing we have to avoid here is giving out an i_version that represents two different states of the same inode. This should achieve that. Something else we should consider though is that with enough crashes on a long-lived filesystem, the value could eventually wrap. I think we should acknowledge that fact in advance, and plan to deal with it (particularly if we're going to expose this to userland eventually). Because of the "seen" flag, we have a 63 bit counter to play with. Could we use a similar scheme to the one we use to handle when "jiffies" wraps? Assume that we'd never compare two values that were more than 2^62 apart? We could add i_version_before/i_version_after macros to make it simple to handle this. -- Jeff Layton <jlayton@...nel.org>
Powered by blists - more mailing lists