lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 5 Dec 2018 11:38:09 -0500
From:   Jason Baron <jbaron@...mai.com>
To:     Roman Penyaev <rpenyaev@...e.de>
Cc:     Alexander Viro <viro@...iv.linux.org.uk>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 1/1] epoll: use rwlock in order to reduce
 ep_poll_callback() contention



On 12/5/18 6:16 AM, Roman Penyaev wrote:
> On 2018-12-04 18:23, Jason Baron wrote:
>> On 12/3/18 6:02 AM, Roman Penyaev wrote:
> 
> [...]
> 
>>>
>>>      ep_set_busy_poll_napi_id(epi);
>>>
>>> @@ -1156,8 +1187,8 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
>>>       */
>>>      if (unlikely(ep->ovflist != EP_UNACTIVE_PTR)) {
>>>          if (epi->next == EP_UNACTIVE_PTR) {
>>> -            epi->next = ep->ovflist;
>>> -            ep->ovflist = epi;
>>> +            /* Atomically exchange tail */
>>> +            epi->next = xchg(&ep->ovflist, epi);
>>
>> This also relies on the fact that the same epi can't be added to the
>> list in parallel, because the wait queue doing the wakeup will have the
>> wait_queue_head lock. That was a little confusing for me b/c we only had
>> the read_lock() above.
> 
> Yes, that is indeed not obvious path, but wq is locked by wake_up_*_poll()
> call or caller of wake_up_locked_poll() has to be sure wq.lock is taken.
> 
> I'll add an explicit comment here, thanks for pointing out.
> 
>>
>>>              if (epi->ws) {
>>>                  /*
>>>                   * Activate ep->ws since epi->ws may get
>>> @@ -1172,7 +1203,7 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
>>>
>>>      /* If this file is already in the ready list we exit soon */
>>>      if (!ep_is_linked(epi)) {
>>> -        list_add_tail(&epi->rdllink, &ep->rdllist);
>>> +        list_add_tail_lockless(&epi->rdllink, &ep->rdllist);
>>>          ep_pm_stay_awake_rcu(epi);
>>>      }
>>
>> same for this.
> 
> ... and an explicit comment here.
> 
>>
>>>
>>> @@ -1197,13 +1228,13 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
>>>                  break;
>>>              }
>>>          }
>>> -        wake_up_locked(&ep->wq);
>>> +        wake_up(&ep->wq);
>>
>> why the switch here to the locked() variant? Shouldn't the 'reader'
>> side, in this case which takes the rwlock for write see all updates in a
>> coherent state at this point?
> 
> lockdep inside __wake_up_common expects wq_head->lock is taken, and
> seems this is not a good idea to leave wq naked on wake up path,
> when several CPUs can enter wake function.  Although default_wake_function
> is protected by spinlock inside try_to_wake_up(), but for example
> autoremove_wake_function() can't be called concurrently for the same wq
> (it removes wq entry from the list).  Also in case of bookmarks
> __wake_up_common adds an entry to the list, thus can't be called without
> any locks.
> 
> I understand you concern and you are right saying that read side sees
> wq entries as stable, but that will work only if __wake_up_common does
> not modify anything, that is seems true now, but of course it is
> too scary to rely on that in the future.

I think it might be interesting for, at least testing, to see if not grabbing
wq.lock improves your benchmarks any further? fwiw, epoll only recently started
grabbing wq.lock bc lockdep required it.

Thanks,

-Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ