linux-kernel - Re: [PATCH v3 19/19] fs: handle inode->i

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20171219092947.GC2277@quack2.suse.cz>
Date:   Tue, 19 Dec 2017 10:29:47 +0100
From:   Jan Kara <jack@...e.cz>
To:     Jeff Layton <jlayton@...nel.org>
Cc:     Jan Kara <jack@...e.cz>, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, viro@...iv.linux.org.uk,
        linux-nfs@...r.kernel.org, bfields@...ldses.org, neilb@...e.de,
        jack@...e.de, linux-ext4@...r.kernel.org, tytso@....edu,
        adilger.kernel@...ger.ca, linux-xfs@...r.kernel.org,
        darrick.wong@...cle.com, david@...morbit.com,
        linux-btrfs@...r.kernel.org, clm@...com, jbacik@...com,
        dsterba@...e.com, linux-integrity@...r.kernel.org,
        zohar@...ux.vnet.ibm.com, dmitry.kasatkin@...il.com,
        linux-afs@...ts.infradead.org, dhowells@...hat.com,
        jaltman@...istor.com
Subject: Re: [PATCH v3 19/19] fs: handle inode->i_version more efficiently

On Mon 18-12-17 12:22:20, Jeff Layton wrote:
> On Mon, 2017-12-18 at 17:34 +0100, Jan Kara wrote:
> > On Mon 18-12-17 10:11:56, Jeff Layton wrote:
> > >  static inline bool
> > >  inode_maybe_inc_iversion(struct inode *inode, bool force)
> > >  {
> > > -	atomic64_t *ivp = (atomic64_t *)&inode->i_version;
> > > +	u64 cur, old, new;
> > >  
> > > -	atomic64_inc(ivp);
> > > +	cur = (u64)atomic64_read(&inode->i_version);
> > > +	for (;;) {
> > > +		/* If flag is clear then we needn't do anything */
> > > +		if (!force && !(cur & I_VERSION_QUERIED))
> > > +			return false;
> > 
> > The fast path here misses any memory barrier. Thus it seems this query
> > could be in theory reordered before any store that happened to modify the
> > inode? Or maybe we could race and miss the fact that in fact this i_version
> > has already been queried? But maybe there's some higher level locking that
> > makes sure this is all a non-issue... But in that case it would deserve
> > some comment I guess.
> > 
> 
> There's no higher-level locking. Getting locking out of this codepath is
> a good thing IMO. The larger question here is whether we really care
> about ordering this with anything else.
> 
> The i_version, as implemented today, is not ordered with actual changes
> to the inode. We only take the i_lock today when modifying it, not when
> querying it. It's possible today that you could see the results of a
> change and then do a fetch of the i_version that doesn't show an
> increment vs. a previous change.

Yeah, so I don't suggest that you should fix unrelated issues but original
i_lock protection did actually provide memory barriers (although
semi-permeable, but in practice they are very often enough) and your patch
removing those could have changed a theoretical issue to a practical
problem. So at least preserving that original acquire-release semantics
of i_version handling would be IMHO good.

> It'd be nice if this were atomic with the actual changes that it
> represents, but I think that would be prohibitively expensive. That may
> be something we need to address. I'm not sure we really want to do it as
> part of this patchset though.
> 
> > > +
> > > +		/* Since lowest bit is flag, add 2 to avoid it */
> > > +		new = (cur & ~I_VERSION_QUERIED) + I_VERSION_INCREMENT;
> > > +
> > > +		old = atomic64_cmpxchg(&inode->i_version, cur, new);
> > > +		if (likely(old == cur))
> > > +			break;
> > > +		cur = old;
> > > +	}
> > >  	return true;
> > >  }
> > >  
> > 
> > ...
> > 
> > >  static inline u64
> > >  inode_query_iversion(struct inode *inode)
> > >  {
> > > -	return inode_peek_iversion(inode);
> > > +	u64 cur, old, new;
> > > +
> > > +	cur = atomic64_read(&inode->i_version);
> > > +	for (;;) {
> > > +		/* If flag is already set, then no need to swap */
> > > +		if (cur & I_VERSION_QUERIED)
> > > +			break;
> > > +
> > > +		new = cur | I_VERSION_QUERIED;
> > > +		old = atomic64_cmpxchg(&inode->i_version, cur, new);
> > > +		if (old == cur)
> > > +			break;
> > > +		cur = old;
> > > +	}
> > 
> > Why not just use atomic64_or() here?
> > 
> 
> If the cmpxchg fails, then either:
> 
> 1) it was incremented
> 2) someone flagged it QUERIED
> 
> If an increment happened then we don't need to flag it as QUERIED if
> we're returning an older value. If we use atomic64_or, then we can't
> tell if an increment happened so we'd end up potentially flagging it
> more than necessary.
> 
> In principle, either outcome is technically OK and we don't have to loop
> if the cmpxchg doesn't work. That said, if we think there might be a
> later i_version available, then I think we probably want to try to query
> it again so we can return as late a one as possible.

OK, makes sense. I'm just a bit vary of cmpxchg loops as they tend to
behave pretty badly in contended cases but I guess i_version won't be
hammered *that* hard.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR