lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <gdovf4egsaqighoig3xg4r2ddwthk2rujenkloqep5kdub75d4@7wkvfnp4xlxx>
Date: Tue, 16 Sep 2025 13:30:39 +0200
From: Jan Kara <jack@...e.cz>
To: Amir Goldstein <amir73il@...il.com>
Cc: Jan Kara <jack@...e.cz>, Jakub Acs <acsjakub@...zon.de>, 
	linux-unionfs@...r.kernel.org, Miklos Szeredi <miklos@...redi.hu>, linux-kernel@...r.kernel.org, 
	stable@...r.kernel.org, Christian Brauner <brauner@...nel.org>, 
	linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH] ovl: check before dereferencing s_root field

On Mon 15-09-25 17:29:40, Amir Goldstein wrote:
> On Mon, Sep 15, 2025 at 4:07 PM Jan Kara <jack@...e.cz> wrote:
> > > > diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
> > > > index 83f80fdb1567..424c73188e06 100644
> > > > --- a/fs/overlayfs/export.c
> > > > +++ b/fs/overlayfs/export.c
> > > > @@ -195,6 +195,8 @@ static int ovl_check_encode_origin(struct inode *inode)
> > > >         if (!ovl_inode_lower(inode))
> > > >                 return 0;
> > > >
> > > > +       if (!inode->i_sb->s_root)
> > > > +               return -ENOENT;
> > >
> > > For a filesystem method to have to check that its own root is still alive sounds
> > > like the wrong way to me.
> > > That's one of the things that should be taken for granted by fs code.
> > >
> > > I don't think this is an overlayfs specific issue, because other fs would be
> > > happy if encode_fh() would be called with NULL sb->s_root.
> >
> > Actually, I don't see where that would blow up? Generally references to
> > sb->s_root in filesystems outside of mount / remount code are pretty rare.
> > Also most of the code should be unreachable by the time we set sb->s_root
> > to NULL because there are no open files at that moment, no exports etc. But
> > as this report shows, there are occasional surprises (I remember similar
> > issue with ext4 sysfs files handlers using s_root without checking couple
> > years back).
> >
> 
> I am not sure that I understand what you are arguing for.
> I did a very naive grep s_root fs/*/export.c and quickly found:

You're better with grep than me ;). I was grepping for '->s_root' as well
but all the hits I had looked into were related to mounting and similar and
eventually I got bored. Restricting the grep to export ops indeed shows
ceph, gfs2 and overlayfs are vulnerable to this kind of problem.

> static int gfs2_encode_fh(struct inode *inode, __u32 *p, int *len,
>                           struct inode *parent)
> {
> ...
>         if (!parent || inode == d_inode(sb->s_root))
>                 return *len;
> 
> So it's not an overlayfs specific issue, just so happens that zysbot
> likes to test overlayfs.
> 
> Are you suggesting that we fix all of those one by one?

No. I agree we need to figure out a way to make sure export ops are not
called on a filesystem being unmounted. Standard open_by_handle() or NFS
export cannot race with generic_shutdown_super() (they hold the fs mounted)
so fsnotify is a special case here.

I actually wonder if fanotify event (e.g. from inode deletion postponed to
some workqueue or whatever) cannot race with umount as well and cause the
same problem...

> > > Can we change the order of generic_shutdown_super() so that
> > > fsnotify_sb_delete(sb) is called before setting s_root to NULL?
> > >
> > > Or is there a better solution for this race?
> >
> > Regarding calling fsnotify_sb_delete() before setting s_root to NULL:
> > In 2019 (commit 1edc8eb2e9313 ("fs: call fsnotify_sb_delete after
> > evict_inodes")) we've moved the call after evict_inodes() because otherwise
> > we were just wasting cycles scanning many inodes without watches. So moving
> > it earlier wouldn't be great...
> 
> Yes, I noticed that and I figured there were subtleties.

Right. After thinking more about it I think calling fsnotify_sb_delete()
earlier is the only practical choice we have (not clearing sb->s_root isn't
much of an option - we need to prune all dentries to quiesce the filesystem
and leaving s_root alive would create odd corner cases). But you don't want
to be iterating millions of inodes just to clear couple of marks so we'll
have to figure out something more clever there.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ