[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <05416148-bf19-7718-b6b4-7658b5aca2ee@akamai.com>
Date: Wed, 18 Oct 2017 10:03:10 -0400
From: Jason Baron <jbaron@...mai.com>
To: Davidlohr Bueso <dave@...olabs.net>
Cc: akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org,
Alexander Viro <viro@...iv.linux.org.uk>,
Salman Qazi <sqazi@...gle.com>
Subject: Re: [PATCH] epoll: avoid calling ep_call_nested() from
ep_poll_safewake()
On 10/17/2017 11:37 AM, Davidlohr Bueso wrote:
> On Fri, 13 Oct 2017, Jason Baron wrote:
>
>> The ep_poll_safewake() function is used to wakeup potentially nested
>> epoll
>> file descriptors. The function uses ep_call_nested() to prevent entering
>> the same wake up queue more than once, and to prevent excessively deep
>> wakeup paths (deeper than EP_MAX_NESTS). However, this is not necessary
>> since we are already preventing these conditions during EPOLL_CTL_ADD.
>> This
>> saves extra function calls, and avoids taking a global lock during the
>> ep_call_nested() calls.
>
> This makes sense.
>
>>
>> I have, however, left ep_call_nested() for the CONFIG_DEBUG_LOCK_ALLOC
>> case, since ep_call_nested() keeps track of the nesting level, and
>> this is
>> required by the call to spin_lock_irqsave_nested(). It would be nice to
>> remove the ep_call_nested() calls for the CONFIG_DEBUG_LOCK_ALLOC case as
>> well, however its not clear how to simply pass the nesting level through
>> multiple wake_up() levels without more surgery. In any case, I don't
>> think
>> CONFIG_DEBUG_LOCK_ALLOC is generally used for production. This patch,
>> also
>> apparently fixes a workload at Google that Salman Qazi reported by
>> completely removing the poll_safewake_ncalls->lock from wakeup paths.
>
> I'm a bit curious about the workload (which uses lots of EPOLL_CTL_ADDs) as
> I was tackling the nested epoll scaling issue with loop and readywalk lists
> in mind.
>>
I'm not sure the details of the workload - perhaps Salman can elaborate
further about it.
It would seem that the safewake would potentially be the most contended
in general in the nested case, because generally you have a few epoll
fds attached to lots of sources doing wakeups. So those sources are all
going to conflict on the safewake lock. The readywalk is used when
performing a 'nested' poll and in general this is likely going to be
called on a few epoll fds. That said, we should remove it too. I will
post a patch to remove it.
The loop lock is used during EPOLL_CTL_ADD to check for loops and deep
wakeup paths and so I would expect this to be less common, but I
wouldn't doubt there are workloads impacted by it. We can potentially, I
think remove this one too - and the global 'epmutex'. I posted some
ideas a while ago on it:
http://lkml.iu.edu/hypermail//linux/kernel/1501.1/05905.html
We can work through these ideas or others...
Thanks,
-Jason
>> Signed-off-by: Jason Baron <jbaron@...mai.com>
>> Cc: Alexander Viro <viro@...iv.linux.org.uk>
>> Cc: Andrew Morton <akpm@...ux-foundation.org>
>> Cc: Salman Qazi <sqazi@...gle.com>
>
> Acked-by: Davidlohr Bueso <dbueso@...e.de>
>
>> ---
>> fs/eventpoll.c | 47 ++++++++++++++++++-----------------------------
>> 1 file changed, 18 insertions(+), 29 deletions(-)
>
> Yay for getting rid of some of the callback hell.
>
> Thanks,
> Davidlohr
Powered by blists - more mailing lists