[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <819762b6eb549f74d0ebbb6663f042ae9b6cd86d.camel@redhat.com>
Date: Tue, 22 Nov 2022 17:59:18 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Soheil Hassas Yeganeh <soheil@...gle.com>
Cc: linux-fsdevel@...r.kernel.org, Al Viro <viro@...iv.linux.org.uk>,
Davidlohr Bueso <dave@...olabs.net>,
Jason Baron <jbaron@...mai.com>,
Roman Penyaev <rpenyaev@...e.de>, netdev@...r.kernel.org,
Carlos Maiolino <cmaiolino@...hat.com>
Subject: Re: [REPOST PATCH] epoll: use refcount to reduce ep_mutex contention
Hello,
Thank you for the prompt feedback!
On Tue, 2022-11-22 at 11:18 -0500, Soheil Hassas Yeganeh wrote:
> On Tue, Nov 22, 2022 at 10:43 AM Paolo Abeni <pabeni@...hat.com> wrote:
> >
> > We are observing huge contention on the epmutex during an http
> > connection/rate test:
> >
> > 83.17% 0.25% nginx [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe
> > [...]
> > |--66.96%--__fput
> > |--60.04%--eventpoll_release_file
> > |--58.41%--__mutex_lock.isra.6
> > |--56.56%--osq_lock
> >
> > The application is multi-threaded, creates a new epoll entry for
> > each incoming connection, and does not delete it before the
> > connection shutdown - that is, before the connection's fd close().
> >
> > Many different threads compete frequently for the epmutex lock,
> > affecting the overall performance.
> >
> > To reduce the contention this patch introduces explicit reference counting
> > for the eventpoll struct. Each registered event acquires a reference,
> > and references are released at ep_remove() time. ep_free() doesn't touch
> > anymore the event RB tree, it just unregisters the existing callbacks
> > and drops a reference to the ep struct. The struct itself is freed when
> > the reference count reaches 0. The reference count updates are protected
> > by the mtx mutex so no additional atomic operations are needed.
> >
> > Since ep_free() can't compete anymore with eventpoll_release_file()
> > for epitems removal, we can drop the epmutex usage at disposal time.
> >
> > With the patched kernel, in the same connection/rate scenario, the mutex
> > operations disappear from the perf report, and the measured connections/rate
> > grows by ~60%.
>
> I locally tried this patch and I can reproduce the results. Thank you
> for the nice optimization!
>
> > Tested-by: Xiumei Mu <xmu@...hat.com>
> > Signed-off-by: Paolo Abeni <pabeni@...hat.com>
> > ---
> > This is just a repost reaching out for more recipents,
> > as suggested by Carlos.
> >
> > Previous post at:
> >
> > https://lore.kernel.org/linux-fsdevel/20221122102726.4jremle54zpcapia@andromeda/T/#m6f98d4ccbe0a385d10c04fd4018e782b793944e6
> > ---
> > fs/eventpoll.c | 113 ++++++++++++++++++++++++++++---------------------
> > 1 file changed, 64 insertions(+), 49 deletions(-)
> >
> > diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> > index 52954d4637b5..6e415287aeb8 100644
> > --- a/fs/eventpoll.c
> > +++ b/fs/eventpoll.c
> > @@ -226,6 +226,12 @@ struct eventpoll {
> > /* tracks wakeup nests for lockdep validation */
> > u8 nests;
> > #endif
> > +
> > + /*
> > + * protected by mtx, used to avoid races between ep_free() and
> > + * ep_eventpoll_release()
> > + */
> > + unsigned int refcount;
>
> nitpick: Given that napi_id and nest are both macro protected, you
> might want to pull it right after min_wait_ts.
Just to be on the same page: the above is just for an aesthetic reason,
right? Is there some functional aspect I don't see?
[...]
> > @@ -2165,10 +2174,16 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds,
> > error = -EEXIST;
> > break;
> > case EPOLL_CTL_DEL:
> > - if (epi)
> > - error = ep_remove(ep, epi);
> > - else
> > + if (epi) {
> > + /*
> > + * The eventpoll itself is still alive: the refcount
> > + * can't go to zero here.
> > + */
> > + WARN_ON_ONCE(ep_remove(ep, epi));
>
> There are similar examples of calling ep_remove() without checking the
> return value in ep_insert().
Yes, the error paths in ep_insert(). I added a comment referring to all
of them, trying to explain that ep_dispose() is not needed there.
> I believe we should add a similar comment there, and maybe a
> WARN_ON_ONCE. I'm not sure, but it might be worth adding a new helper
> given this repeated pattern?
I like the idea of such helper. I'll use it in the next iteration, if
there is a reasonable agreement on this patch.
Whould 'ep_remove_safe()' fit as the helper's name?
Thanks,
Paolo
Powered by blists - more mailing lists