[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20241120-platzen-mundart-8de12841abfc@brauner>
Date: Wed, 20 Nov 2024 10:19:45 +0100
From: Christian Brauner <brauner@...nel.org>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: Al Viro <viro@...iv.linux.org.uk>, Jeongjun Park <aha310510@...il.com>,
jack@...e.cz, linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH] fs: prevent data-race due to missing inode_lock when
calling vfs_getattr
On Wed, Nov 20, 2024 at 02:44:17AM +0100, Mateusz Guzik wrote:
> On Mon, Nov 18, 2024 at 07:03:30AM +0000, Al Viro wrote:
> > On Mon, Nov 18, 2024 at 03:00:39PM +0900, Jeongjun Park wrote:
> > > All the functions that added lock in this patch are called only via syscall,
> > > so in most cases there will be no noticeable performance issue.
> >
> > Pardon me, but I am unable to follow your reasoning.
> >
>
> I suspect the argument is that the overhead of issuing a syscall is big
> enough that the extra cost of taking the lock trip wont be visible, but
> that's not accurate -- atomics are measurable when added to syscalls,
> even on modern CPUs.
>
> > > And
> > > this data-race is not a problem that only occurs in theory. It is
> > > a bug that syzbot has been reporting for years. Many file systems that
> > > exist in the kernel lock inode_lock before calling vfs_getattr, so
> > > data-race does not occur, but only fs/stat.c has had a data-race
> > > for years. This alone shows that adding inode_lock to some
> > > functions is a good way to solve the problem without much
> > > performance degradation.
> >
> > Explain. First of all, these are, by far, the most frequent callers
> > of vfs_getattr(); what "many filesystems" are doing around their calls
> > of the same is irrelevant. Which filesystems, BTW? And which call
> > chains are you talking about? Most of the filesystems never call it
> > at all.
> >
> > Furthermore, on a lot of userland loads stat(2) is a very hot path -
> > it is called a lot. And the rwsem in question has a plenty of takers -
> > both shared and exclusive. The effect of piling a lot of threads
> > that grab it shared on top of the existing mix is not something
> > I am ready to predict without experiments - not beyond "likely to be
> > unpleasant, possibly very much so".
> >
> > Finally, you have not offered any explanations of the reasons why
> > that data race matters - and "syzbot reporting" is not one. It is
> > possible that actual observable bugs exist, but it would be useful
> > to have at least one of those described in details.
> >
> [snip]
>
> On the stock kernel it is at least theoretically possible to transiently
> observe a state which is mid-update (as in not valid), but I was under
> the impression this was known and considered not a problem.
Exactly.
>
> Nonetheless, as an example say an inode is owned by 0:0 and is being
> chowned to 1:1 and this is handled by setattr_copy.
>
> The ids are updated one after another:
> [snip]
> i_uid_update(idmap, attr, inode);
> i_gid_update(idmap, attr, inode);
> [/snip]
>
> So at least in principle it may be someone issuing getattr in parallel
> will happen to spot 1:0 (as opposed to 0:0 or 1:1), which was never set
> on the inode and is merely an artifact of hitting the timing.
>
> This would be a bug, but I don't believe this is serious enough to
> justify taking the inode lock to get out of.
I don't think this is a serious issue. We don't guarantee consistent
snapshots and I don't see a reason why we should complicate setattr()
for that.
Powered by blists - more mailing lists