linux-kernel - Re: [PATCH 2/2] eventpoll: Fix epoll

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87zfat19i7.fsf@yellow.woof>
Date: Wed, 17 Sep 2025 15:41:52 +0200
From: Nam Cao <namcao@...utronix.de>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: Alexander Viro <viro@...iv.linux.org.uk>, Christian Brauner
 <brauner@...nel.org>, Jan Kara <jack@...e.cz>, Shuah Khan
 <shuah@...nel.org>, Davidlohr Bueso <dave@...olabs.net>, Soheil Hassas
 Yeganeh <soheil@...gle.com>, Khazhismel Kumykov <khazhy@...gle.com>,
 Willem de Bruijn <willemb@...gle.com>, Eric Dumazet <edumazet@...gle.com>,
 Jens Axboe <axboe@...nel.dk>, linux-fsdevel@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org,
 stable@...r.kernel.org
Subject: Re: [PATCH 2/2] eventpoll: Fix epoll_wait() report false negative

Mateusz Guzik <mjguzik@...il.com> writes:
> I'll say upfront I'm not an epoll person, just looked here out of
> curiosity.

Feedbacks always welcomed.

> I can agree there is a bug. The event is generated before any of the
> threads even exist and they only poll for it, none of them consume it.
>
> However, the commit message fails to explain why the change fixes
> anything and I think your patch de facto reverts e59d3c64cba6 ("epoll:
> eliminate unnecessary lock for zero timeout"). Looking at that diff
> the point was to avoid the expensive lock trip if timeout == 0 and there
> are no events.
>
> As for the bug is, from my reading the ep_start_scan()/ep_done_scan()
> pair transiently disturbs the state checked by ep_events_available(),
> resulting in false-negatives. Then the locked check works because by the
> time you acquire it, the damage is undone.

Exactly so. I can add this into the description.

> Given the commits referenced in Fixes:, I suspect the real fix would be
> to stop destroying that state of course.
>
> But if that's not feasible, I would check if a sequence counter around
> this would do the trick -- then the racy ep_events_available(ep) upfront
> would become safe with smaller overhead than with your proposal for the
> no-event case, but with higher overhead when there is something.
>
> My proposal is trivial to implement, I have no idea if it will get a
> buy-in though.

My question is whether the performance of epoll_wait() with zero
timeout is really that important that we have to complicate
things. If epoll_wait() with zero timeout is called repeatedly in a loop
but there is no event, I'm sure there will be measurabled performance
drop. But sane user would just use timeout in that case.

epoll's data is protected by a lock. Therefore I think the most
straightforward solution is just taking the lock before reading the
data.

Lockless is hard to get right and may cause hard-to-debug problems. So
unless this performance drop somehow bothers someone, I would prefer
"keep it simple, stupid".

Best regards,
Nam