[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <861c292e-aa26-0bd7-1d1d-39f50d5ef904@intel.com>
Date: Tue, 29 Nov 2022 10:25:14 -0800
From: Jacob Keller <jacob.e.keller@...el.com>
To: Paolo Abeni <pabeni@...hat.com>, <linux-fsdevel@...r.kernel.org>
CC: Soheil Hassas Yeganeh <soheil@...gle.com>,
Al Viro <viro@...iv.linux.org.uk>,
Davidlohr Bueso <dave@...olabs.net>,
Jason Baron <jbaron@...mai.com>, <netdev@...r.kernel.org>,
Carlos Maiolino <cmaiolino@...hat.com>,
Eric Biggers <ebiggers@...nel.org>
Subject: Re: [PATCH v3] epoll: use refcount to reduce ep_mutex contention
On 11/29/2022 1:05 AM, Paolo Abeni wrote:
> Hello,
>
> On Mon, 2022-11-28 at 16:06 -0800, Jacob Keller wrote:
>>> @@ -217,6 +218,12 @@ struct eventpoll {
>>> u64 gen;
>>> struct hlist_head refs;
>>>
>>> + /*
>>> + * usage count, protected by mtx, used together with epitem->dying to
>>> + * orchestrate the disposal of this struct
>>> + */
>>> + unsigned int refcount;
>>> +
>>
>> Why not use a kref (or at least struct refcount?) those provide some
>> guarantees like guaranteeing atomic operations and saturation when the
>> refcount value would overflow.
>
> Thank you for the feedback!
>
> I thought about that options and ultimately opted otherwise because we
> can avoid the additional atomic operations required by kref/refcount_t.
> The reference count is always touched under the ep->mtx mutex.
> Reasonably this does not introduce performance regressions even in the
> stranger corner case.
>
> The above was explicitly noted in the previous revisions commit
> message, but I removed it by mistake while updating the message for v3.
>
> I can switch to kref if there is agreement WRT such performance trade-
> off. Another option would be adding a couple of additional checks for
> wrap-arounds in ep_put() and ep_get() - so that we get similar safety
> guarantees but no additional atomic operations.
>
If its already been assessed and discarded for a good reason then I
don't have a problem with sticking with this version. I didn't see any
reference to this before and krefs feel like the natural choice for
refcounting since it seems easier to follow getting the implementation
correct.
>>> #ifdef CONFIG_NET_RX_BUSY_POLL
>>> /* used to track busy poll napi_id */
>>> unsigned int napi_id;
>>> @@ -240,9 +247,7 @@ struct ep_pqueue {
>>> /* Maximum number of epoll watched descriptors, per user */
>>> static long max_user_watches __read_mostly;
>>>
>>> -/*
>>> - * This mutex is used to serialize ep_free() and eventpoll_release_file().
>>> - */
>>> +/* Used for cycles detection */
>>> static DEFINE_MUTEX(epmutex);
>>>
>>> static u64 loop_check_gen = 0;
>>> @@ -555,8 +560,7 @@ static void ep_remove_wait_queue(struct eppoll_entry *pwq)
>>>
>>> /*
>>> * This function unregisters poll callbacks from the associated file
>>> - * descriptor. Must be called with "mtx" held (or "epmutex" if called from
>>> - * ep_free).
>>> + * descriptor. Must be called with "mtx" held.
>>> */
>>> static void ep_unregister_pollwait(struct eventpoll *ep, struct epitem *epi)
>>> {
>>> @@ -679,11 +683,38 @@ static void epi_rcu_free(struct rcu_head *head)
>>> kmem_cache_free(epi_cache, epi);
>>> }
>>>
>>> +static void ep_get(struct eventpoll *ep)
>>> +{
>>> + ep->refcount++;
>>> +}
>> This would become something like "kref_get(&ep->kref)" or maybe even
>> something like "kref_get_unless_zero" or some other form depending on
>> exactly how you acquire a pointer to an eventpoll structure.
>
> No need for kref_get_unless_zero here, in all ep_get() call-sites we
> know that at least onother reference is alive and can't go away
> concurrently.
>
Fair, yea.
>>> +
>>> +/*
>>> + * Returns true if the event poll can be disposed
>>> + */
>>> +static bool ep_put(struct eventpoll *ep)
>>> +{
>>> + if (--ep->refcount)
>>> + return false;
>>> +
>>> + WARN_ON_ONCE(!RB_EMPTY_ROOT(&ep->rbr.rb_root));
>>> + return true;
>>> +}
>>
>> This could become kref_put(&ep->kref, ep_dispose).
>
> I think it would be necessary releasing the ep->mtx mutex before
> invoking ep_dispose()...
>
Yes you'd have to refactor things a bit since ep_dispose wouldn't
actually get called until all outstanding references get dropped.
>>> +
>>> +static void ep_dispose(struct eventpoll *ep)
>>> +{
>>> + mutex_destroy(&ep->mtx);
>>> + free_uid(ep->user);
>>> + wakeup_source_unregister(ep->ws);
>>> + kfree(ep);
>>> +}
>> This would takea kref pointer, use container_of to get to the eventpoll
>> structure, and then perform necessary cleanup once all references drop.
>>
>> The exact specific steps here and whether it would still be safe to call
>> mutex_destroy is a bit unclear since you typically would only call
>> mutex_destroy when its absolutely sure that no one has locked the mutex.
>
> ... due to the above. The current patch code ensures that ep_dispose()
> is called only after the ep->mtx mutex is released.
>
Correct, because you use the dying flag, yep.
>
>> See Documentation/core-api/kref.rst for a better overview of the API and
>> how to use it safely. I suspect that with just kref you could also
>> safely avoid the "dying" flag as well, but I am not 100% sure.
>
> I *think* we will still need the 'dying' flag, otherwise ep_free()
> can't tell if the traversed epitems entries still held a reference to
> struct eventpoll - eventpoll_release_file() and ep_free() could
> potentially try to release the same reference twice and kref could
> detect that race only for the last reference.
>
> Thanks!
>
> Paolo
>
Right. Switching to krefs would require some thinking to go through the
various cases of what gets a ref and how.
I'm not sure I follow how that would work it sounds like something weird
with the way references are being counted here. Its likely that I simply
don't understand the full context.
Either way, if there is good reason to avoid kref due to cost of atomics
in this case and the fact that we need the mutex anyways I suppose I
have no objections. I'm not sure what the perf cost of an atomic vs a
simple integer is in practice though.
Powered by blists - more mailing lists