lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <861c292e-aa26-0bd7-1d1d-39f50d5ef904@intel.com>
Date:   Tue, 29 Nov 2022 10:25:14 -0800
From:   Jacob Keller <jacob.e.keller@...el.com>
To:     Paolo Abeni <pabeni@...hat.com>, <linux-fsdevel@...r.kernel.org>
CC:     Soheil Hassas Yeganeh <soheil@...gle.com>,
        Al Viro <viro@...iv.linux.org.uk>,
        Davidlohr Bueso <dave@...olabs.net>,
        Jason Baron <jbaron@...mai.com>, <netdev@...r.kernel.org>,
        Carlos Maiolino <cmaiolino@...hat.com>,
        Eric Biggers <ebiggers@...nel.org>
Subject: Re: [PATCH v3] epoll: use refcount to reduce ep_mutex contention



On 11/29/2022 1:05 AM, Paolo Abeni wrote:
> Hello,
> 
> On Mon, 2022-11-28 at 16:06 -0800, Jacob Keller wrote:
>>> @@ -217,6 +218,12 @@ struct eventpoll {
>>>    	u64 gen;
>>>    	struct hlist_head refs;
>>>    
>>> +	/*
>>> +	 * usage count, protected by mtx, used together with epitem->dying to
>>> +	 * orchestrate the disposal of this struct
>>> +	 */
>>> +	unsigned int refcount;
>>> +
>>
>> Why not use a kref (or at least struct refcount?) those provide some
>> guarantees like guaranteeing atomic operations and saturation when the
>> refcount value would overflow.
> 
> Thank you for the feedback!
> 
> I thought about that options and ultimately opted otherwise because we
> can avoid the additional atomic operations required by kref/refcount_t.
> The reference count is always touched under the ep->mtx mutex.
> Reasonably this does not introduce performance regressions even in the
> stranger corner case.
> 
> The above was explicitly noted in the previous revisions commit
> message, but I removed it by mistake while updating the message for v3.
> 
> I can switch to kref if there is agreement WRT such performance trade-
> off. Another option would be adding a couple of additional checks for
> wrap-arounds in ep_put() and ep_get() - so that we get similar safety
> guarantees but no additional atomic operations.
> 

If its already been assessed and discarded for a good reason then I 
don't have a problem with sticking with this version. I didn't see any 
reference to this before and krefs feel like the natural choice for 
refcounting since it seems easier to follow getting the implementation 
correct.

>>>    #ifdef CONFIG_NET_RX_BUSY_POLL
>>>    	/* used to track busy poll napi_id */
>>>    	unsigned int napi_id;
>>> @@ -240,9 +247,7 @@ struct ep_pqueue {
>>>    /* Maximum number of epoll watched descriptors, per user */
>>>    static long max_user_watches __read_mostly;
>>>    
>>> -/*
>>> - * This mutex is used to serialize ep_free() and eventpoll_release_file().
>>> - */
>>> +/* Used for cycles detection */
>>>    static DEFINE_MUTEX(epmutex);
>>>    
>>>    static u64 loop_check_gen = 0;
>>> @@ -555,8 +560,7 @@ static void ep_remove_wait_queue(struct eppoll_entry *pwq)
>>>    
>>>    /*
>>>     * This function unregisters poll callbacks from the associated file
>>> - * descriptor.  Must be called with "mtx" held (or "epmutex" if called from
>>> - * ep_free).
>>> + * descriptor.  Must be called with "mtx" held.
>>>     */
>>>    static void ep_unregister_pollwait(struct eventpoll *ep, struct epitem *epi)
>>>    {
>>> @@ -679,11 +683,38 @@ static void epi_rcu_free(struct rcu_head *head)
>>>    	kmem_cache_free(epi_cache, epi);
>>>    }
>>>    
>>> +static void ep_get(struct eventpoll *ep)
>>> +{
>>> +	ep->refcount++;
>>> +}
>> This would become something like "kref_get(&ep->kref)" or maybe even
>> something like "kref_get_unless_zero" or some other form depending on
>> exactly how you acquire a pointer to an eventpoll structure.
> 
> No need for kref_get_unless_zero here, in all ep_get() call-sites we
> know that at least onother reference is alive and can't go away
> concurrently.
> 

Fair, yea.

>>> +
>>> +/*
>>> + * Returns true if the event poll can be disposed
>>> + */
>>> +static bool ep_put(struct eventpoll *ep)
>>> +{
>>> +	if (--ep->refcount)
>>> +		return false;
>>> +
>>> +	WARN_ON_ONCE(!RB_EMPTY_ROOT(&ep->rbr.rb_root));
>>> +	return true;
>>> +}
>>
>> This could become kref_put(&ep->kref, ep_dispose).
> 
> I think it would be necessary releasing the ep->mtx mutex before
> invoking ep_dispose()...
> 

Yes you'd have to refactor things a bit since ep_dispose wouldn't 
actually get called until all outstanding references get dropped.

>>> +
>>> +static void ep_dispose(struct eventpoll *ep)
>>> +{
>>> +	mutex_destroy(&ep->mtx);
>>> +	free_uid(ep->user);
>>> +	wakeup_source_unregister(ep->ws);
>>> +	kfree(ep);
>>> +}
>> This would takea  kref pointer, use container_of to get to the eventpoll
>> structure, and then perform necessary cleanup once all references drop.
>>
>> The exact specific steps here and whether it would still be safe to call
>> mutex_destroy is a bit unclear since you typically would only call
>> mutex_destroy when its absolutely sure that no one has locked the mutex.
> 
> ... due to the above. The current patch code ensures that ep_dispose()
> is called only after the ep->mtx mutex is released.
> 

Correct, because you use the dying flag, yep.

> 
>> See Documentation/core-api/kref.rst for a better overview of the API and
>> how to use it safely. I suspect that with just kref you could also
>> safely avoid the "dying" flag as well, but I am not 100% sure.
> 
> I *think* we will still need the 'dying' flag, otherwise ep_free()
> can't tell if the traversed epitems entries still held a reference to
> struct eventpoll - eventpoll_release_file() and ep_free() could
> potentially try to release the same reference twice and kref could
> detect that race only for the last reference.
> 
> Thanks!
> 
> Paolo
> 

Right. Switching to krefs would require some thinking to go through the 
various cases of what gets a ref and how.

I'm not sure I follow how that would work it sounds like something weird 
with the way references are being counted here. Its likely that I simply 
don't understand the full context.

Either way, if there is good reason to avoid kref due to cost of atomics 
in this case and the fact that we need the mutex anyways I suppose I 
have no objections. I'm not sure what the perf cost of an atomic vs a 
simple integer is in practice though.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ