[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOQ4uxicopjm1LffaYrPa_d4AQROOgi5GDPHtbrJQ-Oh=yi8hQ@mail.gmail.com>
Date: Thu, 5 Aug 2021 16:50:40 +0300
From: Amir Goldstein <amir73il@...il.com>
To: Jan Kara <jack@...e.cz>
Cc: Gabriel Krisman Bertazi <krisman@...labora.com>,
Jan Kara <jack@...e.com>,
"Darrick J. Wong" <djwong@...nel.org>,
Theodore Tso <tytso@....edu>,
Dave Chinner <david@...morbit.com>,
David Howells <dhowells@...hat.com>,
Khazhismel Kumykov <khazhy@...gle.com>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Ext4 <linux-ext4@...r.kernel.org>,
Linux API <linux-api@...r.kernel.org>, kernel@...labora.com
Subject: Re: [PATCH v5 18/23] fanotify: Handle FAN_FS_ERROR events
On Thu, Aug 5, 2021 at 3:15 PM Jan Kara <jack@...e.cz> wrote:
>
> On Wed 04-08-21 12:06:07, Gabriel Krisman Bertazi wrote:
> > Wire up FAN_FS_ERROR in the fanotify_mark syscall. The event can only
> > be requested for the entire filesystem, thus it requires the
> > FAN_MARK_FILESYSTEM.
> >
> > FAN_FS_ERROR has to be handled slightly differently from other events
> > because it needs to be submitted in an atomic context, using
> > preallocated memory. This patch implements the submission path by only
> > storing the first error event that happened in the slot (userspace
> > resets the slot by reading the event).
> >
> > Extra error events happening when the slot is occupied are merged to the
> > original report, and the only information keep for these extra errors is
> > an accumulator counting the number of events, which is part of the
> > record reported back to userspace.
> >
> > Reporting only the first event should be fine, since when a FS error
> > happens, a cascade of error usually follows, but the most meaningful
> > information is (usually) on the first erro.
> >
> > The event dequeueing is also a bit special to avoid losing events. Since
> > event merging only happens while the event is queued, there is a window
> > between when an error event is dequeued (notification_lock is dropped)
> > until it is reset (.free_event()) where the slot is full, but no merges
> > can happen.
> >
> > The proposed solution is to replace the event in the slot with a new
> > structure, prior to dropping the lock. This way, if a new event arrives
> > in the time between the event was dequeued and the time it resets, the
> > new errors will still be logged and merged in the new slot.
> >
> > Signed-off-by: Gabriel Krisman Bertazi <krisman@...labora.com>
>
> The splitting of the patches really helped. Now I think I can grok much
> more details than before :) Thanks! Some comments below.
>
> > diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> > index 0678d35432a7..4e9e271a4394 100644
> > --- a/fs/notify/fanotify/fanotify.c
> > +++ b/fs/notify/fanotify/fanotify.c
> > @@ -681,6 +681,42 @@ static __kernel_fsid_t fanotify_get_fsid(struct fsnotify_iter_info *iter_info)
> > return fsid;
> > }
> >
> > +static int fanotify_merge_error_event(struct fsnotify_group *group,
> > + struct fsnotify_event *event)
> > +{
> > + struct fanotify_event *fae = FANOTIFY_E(event);
> > + struct fanotify_error_event *fee = FANOTIFY_EE(fae);
> > +
> > + /*
> > + * When err_count > 0, the reporting slot is full. Just account
> > + * the additional error and abort the insertion.
> > + */
> > + if (fee->err_count) {
> > + fee->err_count++;
> > + return 1;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static void fanotify_insert_error_event(struct fsnotify_group *group,
> > + struct fsnotify_event *event,
> > + const void *data)
> > +{
> > + const struct fs_error_report *report = (struct fs_error_report *) data;
> > + struct fanotify_event *fae = FANOTIFY_E(event);
> > + struct fanotify_error_event *fee;
> > +
> > + /* This might be an unexpected type of event (i.e. overflow). */
> > + if (!fanotify_is_error_event(fae->mask))
> > + return;
> > +
> > + fee = FANOTIFY_EE(fae);
> > + fee->fae.type = FANOTIFY_EVENT_TYPE_FS_ERROR;
> > + fee->error = report->error;
> > + fee->err_count = 1;
> > +}
> > +
> > /*
> > * Add an event to hash table for faster merge.
> > */
> > @@ -735,7 +771,7 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
> > BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM);
> > BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR);
> >
> > - BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 19);
> > + BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 20);
> >
> > mask = fanotify_group_event_mask(group, iter_info, mask, data,
> > data_type, dir);
> > @@ -760,6 +796,18 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask,
> > return 0;
> > }
> >
> > + if (fanotify_is_error_event(mask)) {
> > + struct fanotify_sb_mark *sb_mark =
> > + FANOTIFY_SB_MARK(fsnotify_iter_sb_mark(iter_info));
> > +
> > + ret = fsnotify_insert_event(group,
> > + &sb_mark->fee_slot->fae.fse,
> > + fanotify_merge_error_event,
> > + fanotify_insert_error_event,
> > + data);
> > + goto finish;
> > + }
>
> Hum, seeing this and how you had to extend fsnotify_add_event() to
> accommodate this use, cannot we instead have something like:
>
> if (fanotify_is_error_event(mask)) {
> struct fanotify_sb_mark *sb_mark =
> FANOTIFY_SB_MARK(fsnotify_iter_sb_mark(iter_info));
> struct fanotify_error_event *event = &sb_mark->fee_slot;
> bool queue = false;
>
> spin_lock(&group->notification_lock);
> /* Not yet queued? */
> if (!event->err_count) {
> fee->error = report->error;
> queue = true;
> }
> event->err_count++;
> spin_unlock(&group->notification_lock);
> if (queue) {
> ... fill in other error info in 'event' such as fhandle
> fsnotify_add_event(group, &event->fae.fse, NULL);
> }
> }
>
> It would be IMHO simpler to follow what's going on and we don't have to
> touch fsnotify_add_event(). I do recognize that due to races it may happen
> that some racing fsnotify(FAN_FS_ERROR) call returns before the event is
> actually visible in the event queue. It don't think it really matters but
> if we wanted to be more careful, we would need to preformat fhandle into a
> local buffer and only copy it into the event under notification_lock when
> we see the event is unused.
>
> > +/*
> > + * Replace a mark's error event with a new structure in preparation for
> > + * it to be dequeued. This is a bit annoying since we need to drop the
> > + * lock, so another thread might just steal the event from us.
> > + */
> > +static int fanotify_replace_fs_error_event(struct fsnotify_group *group,
> > + struct fanotify_event *fae)
> > +{
> > + struct fanotify_error_event *new, *fee = FANOTIFY_EE(fae);
> > + struct fanotify_sb_mark *sb_mark = fee->sb_mark;
> > + struct fsnotify_event *fse;
> > +
> > + pr_debug("%s: event=%p\n", __func__, fae);
> > +
> > + assert_spin_locked(&group->notification_lock);
> > +
> > + spin_unlock(&group->notification_lock);
> > + new = fanotify_alloc_error_event(sb_mark);
> > + spin_lock(&group->notification_lock);
> > +
> > + if (!new)
> > + return -ENOMEM;
> > +
> > + /*
> > + * Since we temporarily dropped the notification_lock, the event
> > + * might have been taken from under us and reported by another
> > + * reader. If that is the case, don't play games, just retry.
> > + */
> > + fse = fsnotify_peek_first_event(group);
> > + if (fse != &fae->fse) {
> > + kfree(new);
> > + return -EAGAIN;
> > + }
> > +
> > + sb_mark->fee_slot = new;
> > +
> > + return 0;
> > +}
> > +
> > /*
> > * Get an fanotify notification event if one exists and is small
> > * enough to fit in "count". Return an error pointer if the count
> > @@ -212,9 +252,21 @@ static struct fanotify_event *get_one_event(struct fsnotify_group *group,
> > goto out;
> > }
> >
> > + if (fanotify_is_error_event(event->mask)) {
> > + /*
> > + * Replace the error event ahead of dequeueing so we
> > + * don't need to handle a incorrectly dequeued event.
> > + */
> > + ret = fanotify_replace_fs_error_event(group, event);
> > + if (ret) {
> > + event = ERR_PTR(ret);
> > + goto out;
> > + }
> > + }
> > +
>
> The replacing, retry, and all is hairy. Cannot we just keep the same event
> attached to the sb mark and copy-out to on-stack buffer under
> notification_lock in get_one_event()? The event is big (due to fhandle) but
> fanotify_read() is not called from a deep call chain so we should have
> enough space on stack for that.
>
For the record, this was one of the first implementations from Gabriel.
When I proposed the double buffer implementation it was either that
or go back to copy to stack.
Given the complications, I agree that going back to copy to stack
is preferred.
Thanks,
Amir.
Powered by blists - more mailing lists