linux-kernel - Re: soft lockup in fanotify

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOQ4uxjLaGyOUd5GOV8oHwBY=nGGtgk4=5bRxmHTr5VsocrhiA@mail.gmail.com>
Date:   Tue, 14 Jul 2020 16:10:33 +0300
From:   Amir Goldstein <amir73il@...il.com>
To:     Francesco Ruggeri <fruggeri@...sta.com>
Cc:     linux-kernel <linux-kernel@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Jan Kara <jack@...e.cz>
Subject: Re: soft lockup in fanotify_read

On Tue, Jul 14, 2020 at 5:54 AM Francesco Ruggeri <fruggeri@...sta.com> wrote:
>
> We are getting this soft lockup in fanotify_read.
> The reason is that this code does not seem to scale to cases where there
> are big bursts of events generated by fanotify_handle_event.
> fanotify_read acquires group->notification_lock for each event.
> fanotify_handle_event uses the lock to add one event, which also involves
> fanotify_merge, which scans the whole list trying to find an event to
> merge the new one with.

Yes, that is a terribly inefficient merge algorithm.
If it helps I am carrying a quick brown paper bag fix for this issue in my tree:

@@ -65,6 +74,8 @@ static int fanotify_merge(struct list_head *list,
struct fsnotify_event *event)
 {
        struct fsnotify_event *test_event;
        struct fanotify_event *new;
+       int limit = 128;
+       int i = 0;

        pr_debug("%s: list=%p event=%p\n", __func__, list, event);
        new = FANOTIFY_E(event);

@@ -78,6 +89,9 @@ static int fanotify_merge(struct list_head *list,
struct fsnotify_event *event)
                return 0;

        list_for_each_entry_reverse(test_event, list, list) {
+               /* Event merges are expensive so should be limited */
+               if (++i > limit)
+                       break;
                if (should_merge(test_event, event)) {

It's somewhere down my TODO list to fix this properly with a hash table.

> In our case fanotify_read is invoked with a buffer big enough for 200
> events, and what happens is that every time fanotify_read dequeues an
> event and releases the lock, fanotify_handle_event adds several more,
> scanning a longer and longer list. This causes fanotify_read to wait
> longer and longer for the lock, and the soft lockup happens before
> fanotify_read can reach 200 events.
> Is it intentional for fanotify_read to acquire the lock for each event,
> rather than batching together a user buffer worth of events?

I think it is meant to allow for multiple reader threads to read events
with fairness, but not sure.

Even if it was fine to read a batch of events on every spinlock acquire
making the code in the fanotify_read() loop behave well in case of
an error in an event after reading a bunch of good events looks challenging,
but I didn't try. Anyway, the root cause of the issue seems to be the
inefficient merge and not the spinlock taken per one event read.

Thanks,
Amir.