[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <38cc83144a2ec332dead4e21214ea068@suse.de>
Date: Wed, 05 Dec 2018 12:16:46 +0100
From: Roman Penyaev <rpenyaev@...e.de>
To: Jason Baron <jbaron@...mai.com>
Cc: Alexander Viro <viro@...iv.linux.org.uk>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 1/1] epoll: use rwlock in order to reduce
ep_poll_callback() contention
On 2018-12-04 18:23, Jason Baron wrote:
> On 12/3/18 6:02 AM, Roman Penyaev wrote:
[...]
>>
>> ep_set_busy_poll_napi_id(epi);
>>
>> @@ -1156,8 +1187,8 @@ static int ep_poll_callback(wait_queue_entry_t
>> *wait, unsigned mode, int sync, v
>> */
>> if (unlikely(ep->ovflist != EP_UNACTIVE_PTR)) {
>> if (epi->next == EP_UNACTIVE_PTR) {
>> - epi->next = ep->ovflist;
>> - ep->ovflist = epi;
>> + /* Atomically exchange tail */
>> + epi->next = xchg(&ep->ovflist, epi);
>
> This also relies on the fact that the same epi can't be added to the
> list in parallel, because the wait queue doing the wakeup will have the
> wait_queue_head lock. That was a little confusing for me b/c we only
> had
> the read_lock() above.
Yes, that is indeed not obvious path, but wq is locked by
wake_up_*_poll()
call or caller of wake_up_locked_poll() has to be sure wq.lock is taken.
I'll add an explicit comment here, thanks for pointing out.
>
>> if (epi->ws) {
>> /*
>> * Activate ep->ws since epi->ws may get
>> @@ -1172,7 +1203,7 @@ static int ep_poll_callback(wait_queue_entry_t
>> *wait, unsigned mode, int sync, v
>>
>> /* If this file is already in the ready list we exit soon */
>> if (!ep_is_linked(epi)) {
>> - list_add_tail(&epi->rdllink, &ep->rdllist);
>> + list_add_tail_lockless(&epi->rdllink, &ep->rdllist);
>> ep_pm_stay_awake_rcu(epi);
>> }
>
> same for this.
... and an explicit comment here.
>
>>
>> @@ -1197,13 +1228,13 @@ static int ep_poll_callback(wait_queue_entry_t
>> *wait, unsigned mode, int sync, v
>> break;
>> }
>> }
>> - wake_up_locked(&ep->wq);
>> + wake_up(&ep->wq);
>
> why the switch here to the locked() variant? Shouldn't the 'reader'
> side, in this case which takes the rwlock for write see all updates in
> a
> coherent state at this point?
lockdep inside __wake_up_common expects wq_head->lock is taken, and
seems this is not a good idea to leave wq naked on wake up path,
when several CPUs can enter wake function. Although
default_wake_function
is protected by spinlock inside try_to_wake_up(), but for example
autoremove_wake_function() can't be called concurrently for the same wq
(it removes wq entry from the list). Also in case of bookmarks
__wake_up_common adds an entry to the list, thus can't be called without
any locks.
I understand you concern and you are right saying that read side sees
wq entries as stable, but that will work only if __wake_up_common does
not modify anything, that is seems true now, but of course it is
too scary to rely on that in the future.
>
>> }
>> if (waitqueue_active(&ep->poll_wait))
>> pwake++;
>>
>> out_unlock:
>> - spin_unlock_irqrestore(&ep->wq.lock, flags);
>> + read_unlock_irqrestore(&ep->lock, flags);
>>
>> /* We have to call this outside the lock */
>> if (pwake)
>> @@ -1489,7 +1520,7 @@ static int ep_insert(struct eventpoll *ep, const
>> struct epoll_event *event,
>> goto error_remove_epi;
>>
>> /* We have to drop the new item inside our item list to keep track
>> of it */
>> - spin_lock_irq(&ep->wq.lock);
>> + write_lock_irq(&ep->lock);
>>
>> /* record NAPI ID of new item if present */
>> ep_set_busy_poll_napi_id(epi);
>> @@ -1501,12 +1532,12 @@ static int ep_insert(struct eventpoll *ep,
>> const struct epoll_event *event,
>>
>> /* Notify waiting tasks that events are available */
>> if (waitqueue_active(&ep->wq))
>> - wake_up_locked(&ep->wq);
>> + wake_up(&ep->wq);
>
> is this necessary to switch as well? Is this to make lockdep happy?
> Looks like there are few more conversions too below...
Yes, necessary, wq.lock should be taken.
--
Roman
Powered by blists - more mailing lists