[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAG48ez3mzb=qTQ9H3qwYaRc3aVtnA0pu=eB8JFHgqv1YUNYTrw@mail.gmail.com>
Date: Mon, 4 Aug 2025 16:55:36 +0200
From: Jann Horn <jannh@...gle.com>
To: Carlos Llamas <cmllamas@...gle.com>
Cc: Alexander Viro <viro@...iv.linux.org.uk>, Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH] eventpoll: Fix semi-unbounded recursion
On Fri, Aug 1, 2025 at 12:31 AM Carlos Llamas <cmllamas@...gle.com> wrote:
> On Fri, Jul 11, 2025 at 06:33:36PM +0200, Jann Horn wrote:
> > Ensure that epoll instances can never form a graph deeper than
> > EP_MAX_NESTS+1 links.
> >
> > Currently, ep_loop_check_proc() ensures that the graph is loop-free and
> > does some recursion depth checks, but those recursion depth checks don't
> > limit the depth of the resulting tree for two reasons:
> >
> > - They don't look upwards in the tree.
> > - If there are multiple downwards paths of different lengths, only one of
> > the paths is actually considered for the depth check since commit
> > 28d82dc1c4ed ("epoll: limit paths").
> >
> > Essentially, the current recursion depth check in ep_loop_check_proc() just
> > serves to prevent it from recursing too deeply while checking for loops.
> >
> > A more thorough check is done in reverse_path_check() after the new graph
> > edge has already been created; this checks, among other things, that no
> > paths going upwards from any non-epoll file with a length of more than 5
> > edges exist. However, this check does not apply to non-epoll files.
> >
> > As a result, it is possible to recurse to a depth of at least roughly 500,
> > tested on v6.15. (I am unsure if deeper recursion is possible; and this may
> > have changed with commit 8c44dac8add7 ("eventpoll: Fix priority inversion
> > problem").)
> >
> > To fix it:
> >
> > 1. In ep_loop_check_proc(), note the subtree depth of each visited node,
> > and use subtree depths for the total depth calculation even when a subtree
> > has already been visited.
> > 2. Add ep_get_upwards_depth_proc() for similarly determining the maximum
> > depth of an upwards walk.
> > 3. In ep_loop_check(), use these values to limit the total path length
> > between epoll nodes to EP_MAX_NESTS edges.
> >
> > Fixes: 22bacca48a17 ("epoll: prevent creating circular epoll structures")
> > Cc: stable@...r.kernel.org
> > Signed-off-by: Jann Horn <jannh@...gle.com>
> > ---
>
> Hey Jann,
>
> I've bisected an LTP test failure to this commit and I can't find any
> reports of this. The test is epoll_ctl04:
>
> https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/epoll_ctl/epoll_ctl04.c
>
> This is what I get:
> ####################################################################3
> root@...ian:~# ./epoll_ctl04
> tst_test.c:2004: TINFO: LTP version: 20250530-116-g91e6272fe
> tst_test.c:2007: TINFO: Tested kernel: 6.16.0-rc1-00017-gf2e467a48287 #28 SMP PREEMPT Thu Jul 31 21:12:22 UTC 2025 aarch64
> tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
> tst_test.c:1825: TINFO: Overall timeout per run is 0h 00m 30s
> epoll_ctl04.c:59: TFAIL: epoll_ctl(..., EPOLL_CTL_ADD, ...) with number of nesting is 5 expected EINVAL: ELOOP (40)
>
> Summary:
> passed 0
> failed 1
> broken 0
> skipped 0
> warnings 0
> ####################################################################3
>
>
> I haven't looked much into this but it seems the test expects EINVAL at
> nesting depth 5 and is instead getting ELOOP. Any chance there is an
> off-by-one error in ep_loop_check() as it fails with depth=4 and
> upwards_depth=0, which isn't correct?
This is an area where the existing code is very inconsistent in how it
reports errors; limits on the structure of the epoll graph (in
particular limits on the graph depth) are enforced by both
ep_loop_check() (which fails with ELOOP) and reverse_path_check()
(which fails with EINVAL). The comments suggest that ep_loop_check()
is supposed to be doing depth checks, and reverse_path_check() is
mostly supposed to count wakeup paths, but reverse_path_check()
happens to be what you hit in that LTP testcase.
The manpage also says that ELOOP is for too-deep nesting, and does not
mention this case of EINVAL at all. (It doesn't even mention the
wakeup path count restriction in the ERRORS section...)
I think this is one of those cases where the existing semantics are
too convoluted and tainted with kernel implementation details for
userspace to have handled the existing error cases well enough to be
broken by this change; the existing behavior was something like (not
sure if I'm getting it exactly right) "ELOOP is for loops; EINVAL is
for hitting a depth limit when constructing a chain of epoll instances
with a file at the bottom; ELOOP is for hitting a depth limit when
constructing a chain of epoll instances with no file at the bottom and
you'd only get it depending on which way around you build the chain
and, for more complex structures, in what order the addresses of
kernel objects are"; and the implementation was different from what
the manpage says.
So my opinion is that the right fix is to change the testcase to also
accept ELOOP, though I can see how a bugfix that breaks a unit test is
going to raise eyebrows.
> ---
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 44648cc09250..811960b2a74c 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -2237,7 +2237,7 @@ static int ep_loop_check(struct eventpoll *ep, struct eventpoll *to)
> upwards_depth = ep_get_upwards_depth_proc(ep, 0);
> rcu_read_unlock();
>
> - return (depth+1+upwards_depth > EP_MAX_NESTS) ? -1 : 0;
> + return (depth+upwards_depth > EP_MAX_NESTS) ? -1 : 0;
Here I am calculating: We want to create a new link between two nodes,
and want to know how long the longest resulting chain of epoll
instances will be. For that, I add the following numbers of links:
- The number of links going upwards from "ep".
- One for the new link we're adding.
- The number of links going downward from "to".
So I think this is correct.
Powered by blists - more mailing lists