[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <26269528616bb41dcb2b5a3314f87fb36d45acac.camel@redhat.com>
Date: Tue, 21 Mar 2023 16:19:56 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Eric Dumazet <edumazet@...gle.com>,
LKML <linux-kernel@...r.kernel.org>, Xiumei Mu <xmu@...hat.com>,
Jacob Keller <jacob.e.keller@...el.com>,
Soheil Hassas Yeganeh <soheil@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>
Cc: Hillf Danton <hdanton@...a.com>
Subject: Re: syzbot + epoll
On Mon, 2023-03-20 at 14:17 -0700, Eric Dumazet wrote:
> This is about this recent syzbot report (with a C repro)
>
> https://lore.kernel.org/lkml/000000000000c6dc0305f75b4d74@google.com/T/#u
>
> I think this is caused by:
>
> commit fc02a95bb6d8bf58c6efd7e362814558eea2ef28
> Author: Paolo Abeni <pabeni@...hat.com>
> Date: Tue Mar 7 19:46:37 2023 +0100
>
> epoll: use refcount to reduce ep_mutex contention
>
> Problem is that __ep_remove() might return early, without removing epi
> from the rbtree (ep->rbr)
>
> This happens when epi->dying has been set to true here :
>
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/fs/eventpoll.c?id=6f72958a49f68553f2b6ff713e8c8e51a34c1e1e#n954
>
> So we loop, while holding the ep->mtx held, meaning that the other
> thread is blocked here
>
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/fs/eventpoll.c?id=6f72958a49f68553f2b6ff713e8c8e51a34c1e1e#n962
>
> So this dead locks.
>
> Maybe fix this with:
>
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 25a59640748a0fd22a84a5aecb90815fbbca9cef..1db56c6175aab5af7bc637a452b68ed8bc11fd7f
> 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -784,7 +784,7 @@ static void ep_remove_safe(struct eventpoll *ep,
> struct epitem *epi)
>
> static void ep_clear_and_put(struct eventpoll *ep)
> {
> - struct rb_node *rbp;
> + struct rb_node *rbp, *next;
> struct epitem *epi;
> bool dispose;
>
> @@ -810,7 +810,8 @@ static void ep_clear_and_put(struct eventpoll *ep)
> * Since we still own a reference to the eventpoll struct, the
> loop can't
> * dispose it.
> */
> - while ((rbp = rb_first_cached(&ep->rbr)) != NULL) {
> + for (rbp = rb_first_cached(&ep->rbr); rbp; rbp = next) {
> + next = rb_next(rbp);
> epi = rb_entry(rbp, struct epitem, rbn);
> ep_remove_safe(ep, epi);
> cond_resched();
(adding Hillf, as was looking to this issue, too)
The fix LGTM and syzkaller says it addresses the issue:
https://groups.google.com/g/syzkaller-bugs/c/oiBUmGsqz_Q/m/1IQ4vbROAgAJ
I see Andrew removed the patch from the -mm tree. I guess at this point
a new version of "epoll: use refcount to reduce ep_mutex contention",
including the above is needed?
If the above is correct, would a co-devel tag fit you Eric?
Thanks,
Paolo
Powered by blists - more mailing lists