[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250710083236.V8WA6EFF@linutronix.de>
Date: Thu, 10 Jul 2025 10:32:36 +0200
From: Nam Cao <namcao@...utronix.de>
To: Xi Ruoyao <xry111@...111.site>
Cc: Christian Brauner <brauner@...nel.org>,
Frederic Weisbecker <frederic@...nel.org>,
Valentin Schneider <vschneid@...hat.com>,
Alexander Viro <viro@...iv.linux.org.uk>, Jan Kara <jack@...e.cz>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
John Ogness <john.ogness@...utronix.de>,
Clark Williams <clrkwllms@...nel.org>,
Steven Rostedt <rostedt@...dmis.org>, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-rt-devel@...ts.linux.dev,
linux-rt-users@...r.kernel.org, Joe Damato <jdamato@...tly.com>,
Martin Karsten <mkarsten@...terloo.ca>,
Jens Axboe <axboe@...nel.dk>
Subject: Re: [PATCH v3] eventpoll: Fix priority inversion problem
On Thu, Jul 10, 2025 at 02:54:06PM +0800, Xi Ruoyao wrote:
> On Thu, 2025-07-10 at 08:21 +0200, Nam Cao wrote:
> > I am curious if Gnome is using some epoll options which are unused on my
> > system.
>
> > I presume you can still access dmesg despite the freeze. Do you mind
> > running the below patch, let me know what's in your dmesg? It may help
> > identifying that code path.
>
> Attached the system journal (dmesg was truncated due to too many lines).
> I guess the relevant part should be between line 6947 ("New session 2 of
> user xry111") and line 8022 ("start operation timed out. Terminating").
Thanks! I have an idea..
Looking at the boot log you sent, I noticed some time gap immediately after
EPOLL_CTL_DEL.
So I looked at EPOLL_CTL_DEL again, and noticed something that could
explain your timed out issue:
1. EPOLL_CTL_DEL may need to temporarily remove the entire event list.
2. While the above is happening, another task may do epoll_wait(). It sees
nothing in the event list, and goes to sleep.
3. EPOLL_CTL_DEL is now finished and puts the items back into the event
list. However, the task from (2.) is not woken up, therefore it keep
sleeping despite there are events available.
If this is really what causing you problem, the below patch should fix it:
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 895256cd2786..a8fb8ec51751 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -813,8 +813,13 @@ static bool __ep_remove(struct eventpoll *ep, struct epitem *epi, bool force)
put_back_last = n;
__llist_add(n, &put_back);
}
- if (put_back_last)
+ if (put_back_last) {
llist_add_batch(put_back.first, put_back_last, &ep->rdllist);
+
+ /* borrow the memory barrier from llist_add_batch() */
+ if (waitqueue_active(&ep->wq))
+ wake_up(&ep->wq);
+ }
}
wakeup_source_unregister(ep_wakeup_source(epi));
Powered by blists - more mailing lists