[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOQ4uxjb8kpfaX2Jtq-h6Vai2My67PdqGRbVUP5+GspLCsq_+A@mail.gmail.com>
Date: Fri, 13 Aug 2021 12:35:52 +0300
From: Amir Goldstein <amir73il@...il.com>
To: Gabriel Krisman Bertazi <krisman@...labora.com>
Cc: Jan Kara <jack@...e.com>, Linux API <linux-api@...r.kernel.org>,
Ext4 <linux-ext4@...r.kernel.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Khazhismel Kumykov <khazhy@...gle.com>,
David Howells <dhowells@...hat.com>,
Dave Chinner <david@...morbit.com>,
Theodore Tso <tytso@....edu>,
"Darrick J. Wong" <djwong@...nel.org>,
Matthew Bobrowski <repnop@...gle.com>, kernel@...labora.com
Subject: Re: [PATCH v6 16/21] fanotify: Handle FAN_FS_ERROR events
On Fri, Aug 13, 2021 at 12:41 AM Gabriel Krisman Bertazi
<krisman@...labora.com> wrote:
>
> Wire up FAN_FS_ERROR in the fanotify_mark syscall. The event can only
> be requested for the entire filesystem, thus it requires the
> FAN_MARK_FILESYSTEM.
Please split the Wire-up to fanotify_mark syscall into a separate patch applied
after patches that implement the report of event info records.
>
> FAN_FS_ERROR has to be handled slightly differently from other events
> because it needs to be submitted in an atomic context, using
> preallocated memory. This patch implements the submission path by only
> storing the first error event that happened in the slot (userspace
> resets the slot by reading the event).
>
> Extra error events happening when the slot is occupied are merged to the
> original report, and the only information keep for these extra errors is
> an accumulator counting the number of events, which is part of the
> record reported back to userspace.
>
> Reporting only the first event should be fine, since when a FS error
> happens, a cascade of error usually follows, but the most meaningful
> information is (usually) on the first erro.
>
> The event dequeueing is also a bit special to avoid losing events. Since
> event merging only happens while the event is queued, there is a window
> between when an error event is dequeued (notification_lock is dropped)
> until it is reset (.free_event()) where the slot is full, but no merges
> can happen.
>
> The proposed solution is to copy the event to the stack prior to
> dropping the lock. This way, if a new event arrives in the time between
> the event was dequeued and the time it resets, the new errors will still
> be logged and merged in the recently freed slot.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@...labora.com>
>
> ---
> Changes since v5:
> - Copy to stack instead of replacing the fee slot(jan)
> - prepare error slot outside of the notification lock(jan)
> Changes since v4:
> - Split parts to earlier patches (amir)
> - Simplify fanotify entry replacement
> - Update handle size prediction on overflow
> Changes since v3:
> - Convert WARN_ON to pr_warn (amir)
> - Remove unecessary READ/WRITE_ONCE (amir)
> - Alloc with GFP_KERNEL_ACCOUNT(amir)
> - Simplify flags on mark allocation (amir)
> - Avoid atomic set of error_count (amir)
> - Simplify rules when merging error_event (amir)
> - Allocate new error_event on get_one_event (amir)
> - Report superblock error with invalid FH (amir,jan)
>
> Changes since v2:
> - Support and equire FID mode (amir)
> - Goto error path instead of early return (amir)
> - Simplify get_one_event (me)
> - Base merging on error_count
> - drop fanotify_queue_error_event
>
> Changes since v1:
> - Pass dentry to fanotify_check_fsid (Amir)
> - FANOTIFY_EVENT_TYPE_ERROR -> FANOTIFY_EVENT_TYPE_FS_ERROR
> - Merge previous patch into it
> - Use a single slot
> - Move fanotify_mark.error_event definition to this commit
> - Rename FAN_ERROR -> FAN_FS_ERROR
> - Restrict FAN_FS_ERROR to FAN_MARK_FILESYSTEM
> ---
> fs/notify/fanotify/fanotify.c | 57 +++++++++++++++++++++++++++++-
> fs/notify/fanotify/fanotify.h | 21 +++++++++++
> fs/notify/fanotify/fanotify_user.c | 39 ++++++++++++++++++--
> include/linux/fanotify.h | 6 +++-
> 4 files changed, 119 insertions(+), 4 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 3bf6fd85c634..0c7667d3f5d1 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -709,6 +709,55 @@ static __kernel_fsid_t fanotify_get_fsid(struct fsnotify_iter_info *iter_info)
> return fsid;
> }
>
> +static void fanotify_insert_error_event(struct fsnotify_group *group,
> + struct fsnotify_event *fsn_event)
> +
> +{
> + struct fanotify_event *event = FANOTIFY_E(fsn_event);
> +
> + if (!fanotify_is_error_event(event->mask))
> + return;
> +
> + /*
> + * Prevent the mark from going away while an outstanding error
> + * event is queued. The reference is released by
> + * fanotify_dequeue_first_event.
> + */
> + fsnotify_get_mark(&FANOTIFY_EE(event)->sb_mark->fsn_mark);
> +
> +}
> +
> +static int fanotify_handle_error_event(struct fsnotify_iter_info *iter_info,
> + struct fsnotify_group *group,
> + const struct fs_error_report *report)
> +{
> + struct fanotify_sb_mark *sb_mark =
> + FANOTIFY_SB_MARK(fsnotify_iter_sb_mark(iter_info));
> + struct fanotify_error_event *fee = sb_mark->fee_slot;
> +
> + spin_lock(&group->notification_lock);
> + if (fee->err_count++) {
> + spin_unlock(&group->notification_lock);
> + return 0;
> + }
Please add commentary to explain why logic is before merge()/insert().
> + spin_unlock(&group->notification_lock);
> +
> + fee->fae.type = FANOTIFY_EVENT_TYPE_FS_ERROR;
> +
> + if (fsnotify_insert_event(group, &fee->fae.fse,
> + NULL, fanotify_insert_error_event)) {
> + /*
> + * Even if an error occurred, an overflow event is
> + * queued. Just reset the error count and succeed.
> + */
> + spin_lock(&group->notification_lock);
> + fanotify_reset_error_slot(fee);
> + spin_unlock(&group->notification_lock);
This feels racy.
I think that fanotify_reset_error_slot() should WARN about
trying to reset a queued error event and here we need to
check that fee was not queued while we dropped the lock.
And I am not convinced about correctness of incrementing
err_count while the lock is dropped.
Need to see the commentary.
> + }
> +
> + return 0;
> +}
> +
> /*
> * Add an event to hash table for faster merge.
> */
> @@ -762,7 +811,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
> BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM);
> BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR);
>
> - BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 19);
> + BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 20);
>
> mask = fanotify_group_event_mask(group, iter_info, mask, data,
> data_type, dir);
> @@ -787,6 +836,9 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
> return 0;
> }
>
> + if (fanotify_is_error_event(mask))
> + return fanotify_handle_error_event(iter_info, group, data);
> +
> event = fanotify_alloc_event(group, mask, data, data_type, dir,
> file_name, &fsid);
> ret = -ENOMEM;
> @@ -857,10 +909,13 @@ static void fanotify_free_name_event(struct fanotify_event *event)
>
> static void fanotify_free_error_event(struct fanotify_event *event)
> {
> + struct fanotify_error_event *fee = FANOTIFY_EE(event);
> +
> /*
> * The actual event is tied to a mark, and is released on mark
> * removal
> */
> + fsnotify_put_mark(&fee->sb_mark->fsn_mark);
> }
>
> static void fanotify_free_event(struct fsnotify_event *fsn_event)
> diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
> index 3f03333df32f..eeb4a85af74e 100644
> --- a/fs/notify/fanotify/fanotify.h
> +++ b/fs/notify/fanotify/fanotify.h
> @@ -220,6 +220,8 @@ FANOTIFY_NE(struct fanotify_event *event)
>
> struct fanotify_error_event {
> struct fanotify_event fae;
> + u32 err_count; /* Suppressed errors count */
> +
> struct fanotify_sb_mark *sb_mark; /* Back reference to the mark. */
> };
>
> @@ -320,6 +322,11 @@ static inline struct fanotify_event *FANOTIFY_E(struct fsnotify_event *fse)
> return container_of(fse, struct fanotify_event, fse);
> }
>
> +static inline bool fanotify_is_error_event(u32 mask)
> +{
> + return mask & FAN_FS_ERROR;
> +}
> +
> static inline bool fanotify_event_has_path(struct fanotify_event *event)
> {
> return event->type == FANOTIFY_EVENT_TYPE_PATH ||
> @@ -349,6 +356,7 @@ static inline struct path *fanotify_event_path(struct fanotify_event *event)
> static inline bool fanotify_is_hashed_event(u32 mask)
> {
> return !(fanotify_is_perm_event(mask) ||
> + fanotify_is_error_event(mask) ||
> fsnotify_is_overflow_event(mask));
> }
>
> @@ -358,3 +366,16 @@ static inline unsigned int fanotify_event_hash_bucket(
> {
> return event->hash & FANOTIFY_HTABLE_MASK;
> }
> +
> +/*
> + * Reset the FAN_FS_ERROR event slot
> + *
> + * This is used to restore the error event slot to a a zeroed state,
> + * where it can be used for a new incoming error. It does not
> + * initialize the event, but clear only the required data to free the
> + * slot.
> + */
> +static inline void fanotify_reset_error_slot(struct fanotify_error_event *fee)
> +{
> + fee->err_count = 0;
Makes sense that it should also zero the error field. No?
> +}
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index b77030386d7f..3fff0c994dc8 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -167,6 +167,19 @@ static void fanotify_unhash_event(struct fsnotify_group *group,
> hlist_del_init(&event->merge_list);
> }
>
> +static struct fanotify_event *fanotify_dup_error_to_stack(
> + struct fanotify_error_event *fee,
> + struct fanotify_error_event *error_on_stack)
> +{
> + fanotify_init_event(&error_on_stack->fae, 0, FS_ERROR);
> +
> + error_on_stack->fae.type = FANOTIFY_EVENT_TYPE_FS_ERROR;
> + error_on_stack->err_count = fee->err_count;
> + error_on_stack->sb_mark = fee->sb_mark;
> +
> + return &error_on_stack->fae;
> +}
> +
> /*
> * Get an fanotify notification event if one exists and is small
> * enough to fit in "count". Return an error pointer if the count
> @@ -174,7 +187,9 @@ static void fanotify_unhash_event(struct fsnotify_group *group,
> * updated accordingly.
> */
> static struct fanotify_event *get_one_event(struct fsnotify_group *group,
> - size_t count)
> + size_t count,
> + struct fanotify_error_event *error_on_stack)
> +
> {
> size_t event_size;
> struct fanotify_event *event = NULL;
> @@ -205,6 +220,16 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group,
> FANOTIFY_PERM(event)->state = FAN_EVENT_REPORTED;
> if (fanotify_is_hashed_event(event->mask))
> fanotify_unhash_event(group, event);
> +
> + if (fanotify_is_error_event(event->mask)) {
> + /*
> + * Error events are returned as a copy of the error
> + * slot. The actual error slot is reused.
> + */
> + fanotify_dup_error_to_stack(FANOTIFY_EE(event), error_on_stack);
> + fanotify_reset_error_slot(FANOTIFY_EE(event));
> + event = &error_on_stack->fae;
> + }
> out:
> spin_unlock(&group->notification_lock);
> return event;
> @@ -564,6 +589,7 @@ static __poll_t fanotify_poll(struct file *file, poll_table *wait)
> static ssize_t fanotify_read(struct file *file, char __user *buf,
> size_t count, loff_t *pos)
> {
> + struct fanotify_error_event error_on_stack;
> struct fsnotify_group *group;
> struct fanotify_event *event;
> char __user *start;
> @@ -582,7 +608,7 @@ static ssize_t fanotify_read(struct file *file, char __user *buf,
> * in case there are lots of available events.
> */
> cond_resched();
> - event = get_one_event(group, count);
> + event = get_one_event(group, count, &error_on_stack);
> if (IS_ERR(event)) {
> ret = PTR_ERR(event);
> break;
> @@ -1031,6 +1057,10 @@ static int fanotify_add_mark(struct fsnotify_group *group,
> fanotify_init_event(&fee->fae, 0, FS_ERROR);
> fee->sb_mark = sb_mark;
> sb_mark->fee_slot = fee;
> +
> + /* Mark the error slot ready to receive events. */
> + fanotify_reset_error_slot(fee);
> +
> }
> }
>
> @@ -1459,6 +1489,11 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
> fsid = &__fsid;
> }
>
> + if (mask & FAN_FS_ERROR && mark_type != FAN_MARK_FILESYSTEM) {
> + ret = -EINVAL;
> + goto path_put_and_out;
> + }
> +
Split to Wire-up patch please.
> /* inode held in place by reference to path; group by fget on fd */
> if (mark_type == FAN_MARK_INODE)
> inode = path.dentry->d_inode;
> diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> index c05d45bde8b8..c4d49308b2d0 100644
> --- a/include/linux/fanotify.h
> +++ b/include/linux/fanotify.h
> @@ -88,9 +88,13 @@ extern struct ctl_table fanotify_table[]; /* for sysctl */
> #define FANOTIFY_INODE_EVENTS (FANOTIFY_DIRENT_EVENTS | \
> FAN_ATTRIB | FAN_MOVE_SELF | FAN_DELETE_SELF)
>
> +/* Events that can only be reported with data type FSNOTIFY_EVENT_ERROR */
> +#define FANOTIFY_ERROR_EVENTS (FAN_FS_ERROR)
> +
> /* Events that user can request to be notified on */
> #define FANOTIFY_EVENTS (FANOTIFY_PATH_EVENTS | \
> - FANOTIFY_INODE_EVENTS)
> + FANOTIFY_INODE_EVENTS | \
> + FANOTIFY_ERROR_EVENTS)
>
Split to Wire-up patch please.
Thanks,
Amir.
Powered by blists - more mailing lists