[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOQ4uxhOMcaVupVVGXV2Srz_pAG+BzDc9Gb4hFdwKUtk45QypQ@mail.gmail.com>
Date: Tue, 16 Sep 2025 15:29:35 +0200
From: Amir Goldstein <amir73il@...il.com>
To: Jan Kara <jack@...e.cz>
Cc: Jakub Acs <acsjakub@...zon.de>, linux-unionfs@...r.kernel.org,
Miklos Szeredi <miklos@...redi.hu>, linux-kernel@...r.kernel.org, stable@...r.kernel.org,
Christian Brauner <brauner@...nel.org>, linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH] ovl: check before dereferencing s_root field
On Tue, Sep 16, 2025 at 1:30 PM Jan Kara <jack@...e.cz> wrote:
>
> On Mon 15-09-25 17:29:40, Amir Goldstein wrote:
> > On Mon, Sep 15, 2025 at 4:07 PM Jan Kara <jack@...e.cz> wrote:
> > > > > diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
> > > > > index 83f80fdb1567..424c73188e06 100644
> > > > > --- a/fs/overlayfs/export.c
> > > > > +++ b/fs/overlayfs/export.c
> > > > > @@ -195,6 +195,8 @@ static int ovl_check_encode_origin(struct inode *inode)
> > > > > if (!ovl_inode_lower(inode))
> > > > > return 0;
> > > > >
> > > > > + if (!inode->i_sb->s_root)
> > > > > + return -ENOENT;
> > > >
> > > > For a filesystem method to have to check that its own root is still alive sounds
> > > > like the wrong way to me.
> > > > That's one of the things that should be taken for granted by fs code.
> > > >
> > > > I don't think this is an overlayfs specific issue, because other fs would be
> > > > happy if encode_fh() would be called with NULL sb->s_root.
> > >
> > > Actually, I don't see where that would blow up? Generally references to
> > > sb->s_root in filesystems outside of mount / remount code are pretty rare.
> > > Also most of the code should be unreachable by the time we set sb->s_root
> > > to NULL because there are no open files at that moment, no exports etc. But
> > > as this report shows, there are occasional surprises (I remember similar
> > > issue with ext4 sysfs files handlers using s_root without checking couple
> > > years back).
> > >
> >
> > I am not sure that I understand what you are arguing for.
> > I did a very naive grep s_root fs/*/export.c and quickly found:
>
> You're better with grep than me ;). I was grepping for '->s_root' as well
> but all the hits I had looked into were related to mounting and similar and
> eventually I got bored. Restricting the grep to export ops indeed shows
> ceph, gfs2 and overlayfs are vulnerable to this kind of problem.
>
> > static int gfs2_encode_fh(struct inode *inode, __u32 *p, int *len,
> > struct inode *parent)
> > {
> > ...
> > if (!parent || inode == d_inode(sb->s_root))
> > return *len;
> >
> > So it's not an overlayfs specific issue, just so happens that zysbot
> > likes to test overlayfs.
> >
> > Are you suggesting that we fix all of those one by one?
>
> No. I agree we need to figure out a way to make sure export ops are not
> called on a filesystem being unmounted. Standard open_by_handle() or NFS
> export cannot race with generic_shutdown_super() (they hold the fs mounted)
> so fsnotify is a special case here.
>
> I actually wonder if fanotify event (e.g. from inode deletion postponed to
> some workqueue or whatever) cannot race with umount as well and cause the
> same problem...
>
Oy. I was thinking that all event happen when holding some mnt ref
but yeh fsnotify_inoderemove() does look like it could be a problem
from sb shutdown context.
How about skipping fsnotify_inoderemove() in case sb is in shutdown?
> > > > Can we change the order of generic_shutdown_super() so that
> > > > fsnotify_sb_delete(sb) is called before setting s_root to NULL?
> > > >
> > > > Or is there a better solution for this race?
> > >
> > > Regarding calling fsnotify_sb_delete() before setting s_root to NULL:
> > > In 2019 (commit 1edc8eb2e9313 ("fs: call fsnotify_sb_delete after
> > > evict_inodes")) we've moved the call after evict_inodes() because otherwise
> > > we were just wasting cycles scanning many inodes without watches. So moving
> > > it earlier wouldn't be great...
> >
> > Yes, I noticed that and I figured there were subtleties.
>
> Right. After thinking more about it I think calling fsnotify_sb_delete()
> earlier is the only practical choice we have (not clearing sb->s_root isn't
> much of an option - we need to prune all dentries to quiesce the filesystem
> and leaving s_root alive would create odd corner cases). But you don't want
> to be iterating millions of inodes just to clear couple of marks so we'll
> have to figure out something more clever there.
I think we only need to suppress the fsnotify_inoderemove() call.
It sounds doable and very local to fs/super.c.
Regarding show_mark_fhandle() WDYT about my suggestion to
guard it with super_trylock_shared()?
Thanks,
Amir.
Powered by blists - more mailing lists