[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e9767b9d708db2593805e8507d3ca43532dad59e.camel@redhat.com>
Date: Tue, 29 Nov 2022 10:05:46 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Jacob Keller <jacob.e.keller@...el.com>,
linux-fsdevel@...r.kernel.org
Cc: Soheil Hassas Yeganeh <soheil@...gle.com>,
Al Viro <viro@...iv.linux.org.uk>,
Davidlohr Bueso <dave@...olabs.net>,
Jason Baron <jbaron@...mai.com>, netdev@...r.kernel.org,
Carlos Maiolino <cmaiolino@...hat.com>,
Eric Biggers <ebiggers@...nel.org>
Subject: Re: [PATCH v3] epoll: use refcount to reduce ep_mutex contention
Hello,
On Mon, 2022-11-28 at 16:06 -0800, Jacob Keller wrote:
> > @@ -217,6 +218,12 @@ struct eventpoll {
> > u64 gen;
> > struct hlist_head refs;
> >
> > + /*
> > + * usage count, protected by mtx, used together with epitem->dying to
> > + * orchestrate the disposal of this struct
> > + */
> > + unsigned int refcount;
> > +
>
> Why not use a kref (or at least struct refcount?) those provide some
> guarantees like guaranteeing atomic operations and saturation when the
> refcount value would overflow.
Thank you for the feedback!
I thought about that options and ultimately opted otherwise because we
can avoid the additional atomic operations required by kref/refcount_t.
The reference count is always touched under the ep->mtx mutex.
Reasonably this does not introduce performance regressions even in the
stranger corner case.
The above was explicitly noted in the previous revisions commit
message, but I removed it by mistake while updating the message for v3.
I can switch to kref if there is agreement WRT such performance trade-
off. Another option would be adding a couple of additional checks for
wrap-arounds in ep_put() and ep_get() - so that we get similar safety
guarantees but no additional atomic operations.
> > #ifdef CONFIG_NET_RX_BUSY_POLL
> > /* used to track busy poll napi_id */
> > unsigned int napi_id;
> > @@ -240,9 +247,7 @@ struct ep_pqueue {
> > /* Maximum number of epoll watched descriptors, per user */
> > static long max_user_watches __read_mostly;
> >
> > -/*
> > - * This mutex is used to serialize ep_free() and eventpoll_release_file().
> > - */
> > +/* Used for cycles detection */
> > static DEFINE_MUTEX(epmutex);
> >
> > static u64 loop_check_gen = 0;
> > @@ -555,8 +560,7 @@ static void ep_remove_wait_queue(struct eppoll_entry *pwq)
> >
> > /*
> > * This function unregisters poll callbacks from the associated file
> > - * descriptor. Must be called with "mtx" held (or "epmutex" if called from
> > - * ep_free).
> > + * descriptor. Must be called with "mtx" held.
> > */
> > static void ep_unregister_pollwait(struct eventpoll *ep, struct epitem *epi)
> > {
> > @@ -679,11 +683,38 @@ static void epi_rcu_free(struct rcu_head *head)
> > kmem_cache_free(epi_cache, epi);
> > }
> >
> > +static void ep_get(struct eventpoll *ep)
> > +{
> > + ep->refcount++;
> > +}
> This would become something like "kref_get(&ep->kref)" or maybe even
> something like "kref_get_unless_zero" or some other form depending on
> exactly how you acquire a pointer to an eventpoll structure.
No need for kref_get_unless_zero here, in all ep_get() call-sites we
know that at least onother reference is alive and can't go away
concurrently.
> > +
> > +/*
> > + * Returns true if the event poll can be disposed
> > + */
> > +static bool ep_put(struct eventpoll *ep)
> > +{
> > + if (--ep->refcount)
> > + return false;
> > +
> > + WARN_ON_ONCE(!RB_EMPTY_ROOT(&ep->rbr.rb_root));
> > + return true;
> > +}
>
> This could become kref_put(&ep->kref, ep_dispose).
I think it would be necessary releasing the ep->mtx mutex before
invoking ep_dispose()...
> > +
> > +static void ep_dispose(struct eventpoll *ep)
> > +{
> > + mutex_destroy(&ep->mtx);
> > + free_uid(ep->user);
> > + wakeup_source_unregister(ep->ws);
> > + kfree(ep);
> > +}
> This would takea kref pointer, use container_of to get to the eventpoll
> structure, and then perform necessary cleanup once all references drop.
>
> The exact specific steps here and whether it would still be safe to call
> mutex_destroy is a bit unclear since you typically would only call
> mutex_destroy when its absolutely sure that no one has locked the mutex.
... due to the above. The current patch code ensures that ep_dispose()
is called only after the ep->mtx mutex is released.
> See Documentation/core-api/kref.rst for a better overview of the API and
> how to use it safely. I suspect that with just kref you could also
> safely avoid the "dying" flag as well, but I am not 100% sure.
I *think* we will still need the 'dying' flag, otherwise ep_free()
can't tell if the traversed epitems entries still held a reference to
struct eventpoll - eventpoll_release_file() and ep_free() could
potentially try to release the same reference twice and kref could
detect that race only for the last reference.
Thanks!
Paolo
Powered by blists - more mailing lists