[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAOQ4uxhbDwhb+2Brs1UdkoF0a3NSdBAOQPNfEHjahrgoKJpLEw@mail.gmail.com>
Date: Mon, 15 Sep 2025 17:29:40 +0200
From: Amir Goldstein <amir73il@...il.com>
To: Jan Kara <jack@...e.cz>, Jakub Acs <acsjakub@...zon.de>
Cc: linux-unionfs@...r.kernel.org, Miklos Szeredi <miklos@...redi.hu>,
linux-kernel@...r.kernel.org, stable@...r.kernel.org,
Christian Brauner <brauner@...nel.org>, linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH] ovl: check before dereferencing s_root field
On Mon, Sep 15, 2025 at 4:07 PM Jan Kara <jack@...e.cz> wrote:
>
> On Mon 15-09-25 15:01:13, Amir Goldstein wrote:
> > On Mon, Sep 15, 2025 at 12:15 PM Jakub Acs <acsjakub@...zon.de> wrote:
> > >
> > > Calling intotify_show_fdinfo() on fd watching an overlayfs inode, while
> > > the overlayfs is being unmounted, can lead to dereferencing NULL ptr.
> > >
> > > This issue was found by syzkaller.
> > >
> > > Race Condition Diagram:
> > >
> > > Thread 1 Thread 2
> > > -------- --------
> > >
> > > generic_shutdown_super()
> > > shrink_dcache_for_umount
> > > sb->s_root = NULL
> > >
> > > |
> > > | vfs_read()
> > > | inotify_fdinfo()
> > > | * inode get from mark *
> > > | show_mark_fhandle(m, inode)
> > > | exportfs_encode_fid(inode, ..)
> > > | ovl_encode_fh(inode, ..)
> > > | ovl_check_encode_origin(inode)
> > > | * deref i_sb->s_root *
> > > |
> > > |
> > > v
> > > fsnotify_sb_delete(sb)
> > >
> > > Which then leads to:
> > >
> > > [ 32.133461] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> > > [ 32.134438] KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
> > > [ 32.135032] CPU: 1 UID: 0 PID: 4468 Comm: systemd-coredum Not tainted 6.17.0-rc6 #22 PREEMPT(none)
> > >
> > > <snip registers, unreliable trace>
> > >
> > > [ 32.143353] Call Trace:
> > > [ 32.143732] ovl_encode_fh+0xd5/0x170
> > > [ 32.144031] exportfs_encode_inode_fh+0x12f/0x300
> > > [ 32.144425] show_mark_fhandle+0xbe/0x1f0
> > > [ 32.145805] inotify_fdinfo+0x226/0x2d0
> > > [ 32.146442] inotify_show_fdinfo+0x1c5/0x350
> > > [ 32.147168] seq_show+0x530/0x6f0
> > > [ 32.147449] seq_read_iter+0x503/0x12a0
> > > [ 32.148419] seq_read+0x31f/0x410
> > > [ 32.150714] vfs_read+0x1f0/0x9e0
> > > [ 32.152297] ksys_read+0x125/0x240
> > >
> > > IOW ovl_check_encode_origin derefs inode->i_sb->s_root, after it was set
> > > to NULL in the unmount path.
> > >
> > > Minimize the window of opportunity by adding explicit check.
> > >
> > > Fixes: c45beebfde34 ("ovl: support encoding fid from inode with no alias")
> > > Signed-off-by: Jakub Acs <acsjakub@...zon.de>
> > > Cc: Miklos Szeredi <miklos@...redi.hu>
> > > Cc: Amir Goldstein <amir73il@...il.com>
> > > Cc: linux-unionfs@...r.kernel.org
> > > Cc: linux-kernel@...r.kernel.org
> > > Cc: stable@...r.kernel.org
> > > ---
> > >
> > > I'm happy to take suggestions for a better fix - I looked at taking
> > > s_umount for reading, but it wasn't clear to me for how long would the
> > > fdinfo path need to hold it. Hence the most primitive suggestion in this
> > > v1.
> > >
> > > I'm also not sure if ENOENT or EBUSY is better?.. or even something else?
> > >
> > > fs/overlayfs/export.c | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff --git a/fs/overlayfs/export.c b/fs/overlayfs/export.c
> > > index 83f80fdb1567..424c73188e06 100644
> > > --- a/fs/overlayfs/export.c
> > > +++ b/fs/overlayfs/export.c
> > > @@ -195,6 +195,8 @@ static int ovl_check_encode_origin(struct inode *inode)
> > > if (!ovl_inode_lower(inode))
> > > return 0;
> > >
> > > + if (!inode->i_sb->s_root)
> > > + return -ENOENT;
> >
> > For a filesystem method to have to check that its own root is still alive sounds
> > like the wrong way to me.
> > That's one of the things that should be taken for granted by fs code.
> >
> > I don't think this is an overlayfs specific issue, because other fs would be
> > happy if encode_fh() would be called with NULL sb->s_root.
>
> Actually, I don't see where that would blow up? Generally references to
> sb->s_root in filesystems outside of mount / remount code are pretty rare.
> Also most of the code should be unreachable by the time we set sb->s_root
> to NULL because there are no open files at that moment, no exports etc. But
> as this report shows, there are occasional surprises (I remember similar
> issue with ext4 sysfs files handlers using s_root without checking couple
> years back).
>
I am not sure that I understand what you are arguing for.
I did a very naive grep s_root fs/*/export.c and quickly found:
static int gfs2_encode_fh(struct inode *inode, __u32 *p, int *len,
struct inode *parent)
{
...
if (!parent || inode == d_inode(sb->s_root))
return *len;
So it's not an overlayfs specific issue, just so happens that zysbot
likes to test overlayfs.
Are you suggesting that we fix all of those one by one?
> > Jan,
> >
> > Can we change the order of generic_shutdown_super() so that
> > fsnotify_sb_delete(sb) is called before setting s_root to NULL?
> >
> > Or is there a better solution for this race?
>
> Regarding calling fsnotify_sb_delete() before setting s_root to NULL:
> In 2019 (commit 1edc8eb2e9313 ("fs: call fsnotify_sb_delete after
> evict_inodes")) we've moved the call after evict_inodes() because otherwise
> we were just wasting cycles scanning many inodes without watches. So moving
> it earlier wouldn't be great...
Yes, I noticed that and I figured there were subtleties.
In any case, Jakub, your patch is insufficient because:
1. Checking sb->sb_root without a lock and without READ_ONCE()
and a matching WRITE_ONCE() is not safe
2. sb_root can become NULL after the check since you are not holding
the s_umount lock
Jakub,
Instead of an unsafe check inside ovl_encode_fh(), I think it is better to use
super_trylock_shared() inside show_mark_fhandle() before calling
exportfs_encode_fid()?
Feels like the corner case is show_mark_fhandle() and there is no strong
incentive to make this code very efficient.
Jan, WDYT?
Thanks,
Amir.
Powered by blists - more mailing lists