[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=whniDdvSeCEQD5aUzr79HnhZ=A+ftzr0p_mY+n_f0AMHg@mail.gmail.com>
Date: Mon, 14 Jul 2025 09:16:56 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Christian Brauner <brauner@...nel.org>
Cc: Nam Cao <namcao@...utronix.de>, Xi Ruoyao <xry111@...111.site>,
Frederic Weisbecker <frederic@...nel.org>, Valentin Schneider <vschneid@...hat.com>,
Alexander Viro <viro@...iv.linux.org.uk>, Jan Kara <jack@...e.cz>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>, John Ogness <john.ogness@...utronix.de>,
Clark Williams <clrkwllms@...nel.org>, Steven Rostedt <rostedt@...dmis.org>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-rt-devel@...ts.linux.dev, linux-rt-users@...r.kernel.org,
Joe Damato <jdamato@...tly.com>, Martin Karsten <mkarsten@...terloo.ca>, Jens Axboe <axboe@...nel.dk>
Subject: Re: [PATCH v3] eventpoll: Fix priority inversion problem
On Mon, 14 Jul 2025 at 02:00, Christian Brauner <brauner@...nel.org> wrote:
>
> I was on the fence myself and I juggled the commit between vfs.fixes and
> vfs-6.17.misc because I wasn't sure whether we should consider such
> priority inversion fix something that's urgent or not.
Well, this time it actually helped that it didn't come in through the
merge window, because it made the bisection much shorter.
But in general, I do think that eventpoll should be considered to be
something that needs to die, rather than something that needs to be
improved upon. It's horrendous.
The indirections it does have been huge problems, even if they are
"powerful", because we've had lots of issues with recursion and loops,
which are all bad for reference counting - and not using reference
counting for lifetimes is just fundamentally a design bug.
For example, the vfs file close thing has a special
"eventpoll_release()" thing just because epoll can't use file
references for the references it holds (because that would just cause
recursive refs), and dammit, that's just the result of a fundamental
mis-design. And this is all after all the years of fixing outright
bugs (with hidden ones still lurking - unusually we had *another*
long-standing epoll bug fixed last week)
(Don't get me wrong: unix domain fd passing has caused all these
problems and more, so it's not like epoll is the *only* thing that
causes these kinds of horrendous issues, but unix domain fd passing
was something we did due to external reasons, not some self-inflicted
pain)
So this is just a heads-up that I will *NOT* be taking any epoll
patches AT ALL unless they are
(a) obvious bug fixes
(b) clearly separated into well-explained series with each individual
patch simple and obvious.
Because it was really a mistake to take that big epoll patch. That was
not a "small and obvious" fix to a big bug. That was literally a
"makes things worse" thing.
I didn't react very much to that patch because epoll has been fairly
calm for the last decade, and I had forgotten how much of a pain it
could be. So I was "whatever".
But this all re-awakened my "epoll is horrendous" memories.
Nam - please disregard performance as a primary thing in epoll. The
*only* thing that matters is "make it simpler, fix bugs".
Because long-term, epoll needs to die, or at least be seen as a legacy
interface that should be cut down, not something to be improved upon.
And yes, I hate epoll. It has caused *so* many problems over the
years. And it causes problems *outside* of epoll, ie we have that
horrendous pipe hackery:
* Epoll nonsensically wants a wakeup whether the pipe
* was already empty or not.
and the pipe code has that "poll_usage" flag just to deal with the
fallout of bad epoll fallout.
THAT was fun too.
Not.
Linus
Powered by blists - more mailing lists