[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOQ4uxjnb0JmKVpMuEfa_NgHmLRchLz_3=9t2nepdS4QXJ=QVg@mail.gmail.com>
Date: Thu, 26 Aug 2021 13:44:50 +0300
From: Amir Goldstein <amir73il@...il.com>
To: Gabriel Krisman Bertazi <krisman@...labora.com>
Cc: Jan Kara <jack@...e.com>, Linux API <linux-api@...r.kernel.org>,
Ext4 <linux-ext4@...r.kernel.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Khazhismel Kumykov <khazhy@...gle.com>,
David Howells <dhowells@...hat.com>,
Dave Chinner <david@...morbit.com>,
Theodore Tso <tytso@....edu>,
"Darrick J. Wong" <djwong@...nel.org>,
Matthew Bobrowski <repnop@...gle.com>, kernel@...labora.com,
Paul Moore <paul@...l-moore.com>
Subject: Re: [PATCH v6 09/21] fsnotify: Allow events reported with an empty inode
On Thu, Aug 26, 2021 at 12:50 AM Gabriel Krisman Bertazi
<krisman@...labora.com> wrote:
>
> Amir Goldstein <amir73il@...il.com> writes:
>
> > On Wed, Aug 25, 2021 at 9:40 PM Gabriel Krisman Bertazi
> > <krisman@...labora.com> wrote:
> >>
> >> Amir Goldstein <amir73il@...il.com> writes:
> >>
> >> > On Fri, Aug 13, 2021 at 12:41 AM Gabriel Krisman Bertazi
> >> > <krisman@...labora.com> wrote:
> >> >>
> >> >> Some file system events (i.e. FS_ERROR) might not be associated with an
> >> >> inode. For these, it makes sense to associate them directly with the
> >> >> super block of the file system they apply to. This patch allows the
> >> >> event to be reported with a NULL inode, by recovering the superblock
> >> >> directly from the data field, if needed.
> >> >>
> >> >> Signed-off-by: Gabriel Krisman Bertazi <krisman@...labora.com>
> >> >>
> >> >> --
> >> >> Changes since v5:
> >> >> - add fsnotify_data_sb handle to retrieve sb from the data field. (jan)
> >> >> ---
> >> >> fs/notify/fsnotify.c | 16 +++++++++++++---
> >> >> 1 file changed, 13 insertions(+), 3 deletions(-)
> >> >>
> >> >> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> >> >> index 30d422b8c0fc..536db02cb26e 100644
> >> >> --- a/fs/notify/fsnotify.c
> >> >> +++ b/fs/notify/fsnotify.c
> >> >> @@ -98,6 +98,14 @@ void fsnotify_sb_delete(struct super_block *sb)
> >> >> fsnotify_clear_marks_by_sb(sb);
> >> >> }
> >> >>
> >> >> +static struct super_block *fsnotify_data_sb(const void *data, int data_type)
> >> >> +{
> >> >> + struct inode *inode = fsnotify_data_inode(data, data_type);
> >> >> + struct super_block *sb = inode ? inode->i_sb : NULL;
> >> >> +
> >> >> + return sb;
> >> >> +}
> >> >> +
> >> >> /*
> >> >> * Given an inode, first check if we care what happens to our children. Inotify
> >> >> * and dnotify both tell their parents about events. If we care about any event
> >> >> @@ -455,8 +463,10 @@ static void fsnotify_iter_next(struct fsnotify_iter_info *iter_info)
> >> >> * @file_name is relative to
> >> >> * @file_name: optional file name associated with event
> >> >> * @inode: optional inode associated with event -
> >> >> - * either @dir or @inode must be non-NULL.
> >> >> - * if both are non-NULL event may be reported to both.
> >> >> + * If @dir and @inode are NULL, @data must have a type that
> >> >> + * allows retrieving the file system associated with this
> >> >
> >> > Irrelevant comment. sb must always be available from @data.
> >> >
> >> >> + * event. if both are non-NULL event may be reported to
> >> >> + * both.
> >> >> * @cookie: inotify rename cookie
> >> >> */
> >> >> int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
> >> >> @@ -483,7 +493,7 @@ int fsnotify(__u32 mask, const void *data, int data_type, struct inode *dir,
> >> >> */
> >> >> parent = dir;
> >> >> }
> >> >> - sb = inode->i_sb;
> >> >> + sb = inode ? inode->i_sb : fsnotify_data_sb(data, data_type);
> >> >
> >> > const struct path *path = fsnotify_data_path(data, data_type);
> >> > + const struct super_block *sb = fsnotify_data_sb(data, data_type);
> >> >
> >> > All the games with @data @inode and @dir args are irrelevant to this.
> >> > sb should always be available from @data and it does not matter
> >> > if fsnotify_data_inode() is the same as @inode, @dir or neither.
> >> > All those inodes are anyway on the same sb.
> >>
> >> Hi Amir,
> >>
> >> I think this is actually necessary. I could identify at least one event
> >> (FS_CREATE | FS_ISDIR) where fsnotify is invoked with a NULL data field.
> >> In that case, fsnotify_dirent is called with a negative dentry from
> >> vfs_mkdir(). I'm not sure why exactly the dentry is negative after the
> >
> > That doesn't sound right at all.
> > Are you sure about this?
> > Which filesystem was this mkdir called on?
>
> You should be able to reproduce it on top of mainline if you pick only this
> patch and do the change you suggested:
>
> - sb = inode->i_sb;
> + sb = fsnotify_data_sb(data, data_type);
>
> And then boot a Debian stable with systemd. The notification happens on
> the cgroup pseudo-filesystem (/sys/fs/cgroup), which is being monitored
> by systemd itself. The event that arrives with a NULL data is telling the
> directory /sys/fs/cgroup/*/ about the creation of directory
> `init.scope`.
>
> The change above triggers the following null dereference of struct
> super_block, since data is NULL.
>
> I will keep looking but you might be able to answer it immediately...
Yes, I see what is going on.
cgroupfs is a sort of kernfs and kernfs_iop_mkdir() does not instantiate
the negative dentry. Instead, kernfs_dop_revalidate() always invalidates
negative dentries to force re-lookup to find the inode.
Documentation/filesystems/vfs.rst says on create() and friends:
"...you will probably call d_instantiate() with the dentry and the
newly created inode..."
So this behavior seems legit.
Meaning that we have made a wrong assumption in fsnotify_create()
and fsnotify_mkdir().
Please note the comment above fsnotify_link() which anticipates
negative dentries.
I've audited the fsnotify backends and it seems that the
WARN_ON(!inode) in kernel/audit_* is the only immediate implication
of negative dentry with FS_CREATE.
I am the one who added these WARN_ON(), so I will remove them.
I think that missing inode in an FS_CREATE event really breaks
audit on kernfs, but not sure if that is a valid use case (Paul?).
Anyway, regarding your patch, I still prefer the solution proposed by Jan,
but not with a different implementation of fsnotify_data_sb().
Please see branch fsnotify_data_sb[1] with the proposed fixes.
The fixes assert the statement that "sb should always be available
from @data", regardless of kernfs anomaly.
If this works for you, please prepend those patches to your next
submission.
Regarding the state of this patch set in general, I must admit that
I wasn't able to follow if a conclusion was reached about the lifetime
management of fsnotify_error_event and associated sb mark.
Jan is going out on vacation and I think there is little point in spinning
another patch set revision before this issue is settled with Jan.
Thanks,
Amir.
[1] https://github.com/amir73il/linux/commits/fsnotify_data_sb
Powered by blists - more mailing lists