[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250710062127.QnaeZ8c7@linutronix.de>
Date: Thu, 10 Jul 2025 08:21:27 +0200
From: Nam Cao <namcao@...utronix.de>
To: Xi Ruoyao <xry111@...111.site>
Cc: Christian Brauner <brauner@...nel.org>,
Frederic Weisbecker <frederic@...nel.org>,
Valentin Schneider <vschneid@...hat.com>,
Alexander Viro <viro@...iv.linux.org.uk>, Jan Kara <jack@...e.cz>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
John Ogness <john.ogness@...utronix.de>,
Clark Williams <clrkwllms@...nel.org>,
Steven Rostedt <rostedt@...dmis.org>, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-rt-devel@...ts.linux.dev,
linux-rt-users@...r.kernel.org, Joe Damato <jdamato@...tly.com>,
Martin Karsten <mkarsten@...terloo.ca>,
Jens Axboe <axboe@...nel.dk>
Subject: Re: [PATCH v3] eventpoll: Fix priority inversion problem
On Thu, Jul 10, 2025 at 11:08:18AM +0800, Xi Ruoyao wrote:
> After upgrading my kernel to the recent mainline I've encountered some
> stability issue, like:
>
> - When GDM started gnome-shell, the screen often froze and the only
> thing I could do was to switch into a VT and reboot.
> - Sometimes gnome-shell started "fine" but then starting an application
> (like gnome-console) needed to wait for about a minute.
> - Sometimes the system shutdown process hangs waiting for a service to
> stop.
> - Rarely the system boot process hangs for no obvious reason.
>
> Most strangely in all the cases there are nothing alarming in dmesg or
> system journal.
>
> I'm unsure if this is the culprit but I'm almost sure it's the trigger.
> Maybe there's some race condition in my userspace that the priority
> inversion had happened to hide... but anyway reverting this patch
> seemed to "fix" the issue.
>
> Any thoughts or pointers to diagnose further?
I have been running this new epoll on my work machine for weeks by now
without issue, while you seem to reproduce it reliably. I'm guessing that
the problem is on some code path which is dead on my system, but executed
on yours.
I am curious if Gnome is using some epoll options which are unused on my
system.
I presume you can still access dmesg despite the freeze. Do you mind
running the below patch, let me know what's in your dmesg? It may help
identifying that code path.
Best regards,
Nam
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 895256cd2786..e3dafc48a59a 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -532,6 +532,9 @@ static long ep_eventpoll_bp_ioctl(struct file *file, unsigned int cmd,
WRITE_ONCE(ep->busy_poll_usecs, epoll_params.busy_poll_usecs);
WRITE_ONCE(ep->busy_poll_budget, epoll_params.busy_poll_budget);
WRITE_ONCE(ep->prefer_busy_poll, epoll_params.prefer_busy_poll);
+ printk("%s busy_poll_usecs=%d busy_poll_budget=%d prefer_busy_poll=%d\n",
+ __func__, epoll_params.busy_poll_usecs, epoll_params.busy_poll_budget,
+ epoll_params.prefer_busy_poll);
return 0;
case EPIOCGPARAMS:
memset(&epoll_params, 0, sizeof(epoll_params));
@@ -2120,6 +2123,9 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds,
struct epitem *epi;
struct eventpoll *tep = NULL;
+ printk("%s: epfd=%d op=%d fd=%d events=0x%x data=0x%llx nonblock=%d\n",
+ __func__, epfd, op, fd, epds->events, epds->data, nonblock);
+
CLASS(fd, f)(epfd);
if (fd_empty(f))
return -EBADF;
diff --git a/io_uring/epoll.c b/io_uring/epoll.c
index 8d4610246ba0..e9c33c0c8cc5 100644
--- a/io_uring/epoll.c
+++ b/io_uring/epoll.c
@@ -54,6 +54,8 @@ int io_epoll_ctl(struct io_kiocb *req, unsigned int issue_flags)
int ret;
bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
+ printk("%s flags=0x%x\n", __func__, issue_flags);
+
ret = do_epoll_ctl(ie->epfd, ie->op, ie->fd, &ie->event, force_nonblock);
if (force_nonblock && ret == -EAGAIN)
return -EAGAIN;
Powered by blists - more mailing lists