lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250710083236.V8WA6EFF@linutronix.de>
Date: Thu, 10 Jul 2025 10:32:36 +0200
From: Nam Cao <namcao@...utronix.de>
To: Xi Ruoyao <xry111@...111.site>
Cc: Christian Brauner <brauner@...nel.org>,
	Frederic Weisbecker <frederic@...nel.org>,
	Valentin Schneider <vschneid@...hat.com>,
	Alexander Viro <viro@...iv.linux.org.uk>, Jan Kara <jack@...e.cz>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	John Ogness <john.ogness@...utronix.de>,
	Clark Williams <clrkwllms@...nel.org>,
	Steven Rostedt <rostedt@...dmis.org>, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-rt-devel@...ts.linux.dev,
	linux-rt-users@...r.kernel.org, Joe Damato <jdamato@...tly.com>,
	Martin Karsten <mkarsten@...terloo.ca>,
	Jens Axboe <axboe@...nel.dk>
Subject: Re: [PATCH v3] eventpoll: Fix priority inversion problem

On Thu, Jul 10, 2025 at 02:54:06PM +0800, Xi Ruoyao wrote:
> On Thu, 2025-07-10 at 08:21 +0200, Nam Cao wrote:
> > I am curious if Gnome is using some epoll options which are unused on my
> > system.
> 
> > I presume you can still access dmesg despite the freeze. Do you mind
> > running the below patch, let me know what's in your dmesg? It may help
> > identifying that code path.
> 
> Attached the system journal (dmesg was truncated due to too many lines).
> I guess the relevant part should be between line 6947 ("New session 2 of
> user xry111") and line 8022 ("start operation timed out. Terminating").

Thanks! I have an idea..

Looking at the boot log you sent, I noticed some time gap immediately after
EPOLL_CTL_DEL.

So I looked at EPOLL_CTL_DEL again, and noticed something that could
explain your timed out issue:

  1. EPOLL_CTL_DEL may need to temporarily remove the entire event list.

  2. While the above is happening, another task may do epoll_wait(). It sees
     nothing in the event list, and goes to sleep.

  3. EPOLL_CTL_DEL is now finished and puts the items back into the event
     list. However, the task from (2.) is not woken up, therefore it keep
     sleeping despite there are events available.

If this is really what causing you problem, the below patch should fix it:


diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 895256cd2786..a8fb8ec51751 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -813,8 +813,13 @@ static bool __ep_remove(struct eventpoll *ep, struct epitem *epi, bool force)
 				put_back_last = n;
 			__llist_add(n, &put_back);
 		}
-		if (put_back_last)
+		if (put_back_last) {
 			llist_add_batch(put_back.first, put_back_last, &ep->rdllist);
+
+			/* borrow the memory barrier from llist_add_batch() */
+			if (waitqueue_active(&ep->wq))
+				wake_up(&ep->wq);
+		}
 	}
 
 	wakeup_source_unregister(ep_wakeup_source(epi));

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ