[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <73608dd0e5839634966b3b8e03e4b3c9@suse.de>
Date: Mon, 17 Dec 2018 12:49:06 +0100
From: Roman Penyaev <rpenyaev@...e.de>
To: Davidlohr Bueso <dbueso@...e.de>
Cc: Jason Baron <jbaron@...mai.com>, Al Viro <viro@...iv.linux.org.uk>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/3] use rwlock in order to reduce ep_poll_callback()
contention
On 2018-12-13 19:13, Davidlohr Bueso wrote:
> On 2018-12-12 03:03, Roman Penyaev wrote:
>> The last patch targets the contention problem in ep_poll_callback(),
>> which
>> can be very well reproduced by generating events (write to pipe or
>> eventfd)
>> from many threads, while consumer thread does polling.
>>
>> The following are some microbenchmark results based on the test [1]
>> which
>> starts threads which generate N events each. The test ends when all
>> events
>> are successfully fetched by the poller thread:
>>
>> spinlock
>> ========
>>
>> threads events/ms run-time ms
>> 8 6402 12495
>> 16 7045 22709
>> 32 7395 43268
>>
>> rwlock + xchg
>> =============
>>
>> threads events/ms run-time ms
>> 8 10038 7969
>> 16 12178 13138
>> 32 13223 24199
>>
>>
>> According to the results bandwidth of delivered events is
>> significantly
>> increased, thus execution time is reduced.
>>
>> This series is based on linux-next/akpm and differs from RFC in that
>> additional cleanup patches and explicit comments have been added.
>>
>> [1] https://github.com/rouming/test-tools/blob/master/stress-epoll.c
>
> Care to "port" this to 'perf bench epoll', in linux-next? I've been
> trying to unify into perf bench the whole epoll performance testcases
> kernel developers can use when making changes and it would be useful.
Yes, good idea. But frankly I do not want to bloat epoll-wait.c with
my multi-writers-single-reader test case, because soon epoll-wait.c
will become unmaintainable with all possible loads and set of
different options.
Can we have a single, small and separate source for each epoll load?
Easy to fix, easy to maintain, debug/hack.
> I ran these patches on the 'wait' workload which is a epoll_wait(2)
> stresser. On a 40-core IvyBridge it shows good performance
> improvements for increasing number of file descriptors each of the 40
> threads deals with:
>
> 64 fds: +20%
> 512 fds: +30%
> 1024 fds: +50%
>
> (Yes these are pretty raw measurements ops/sec). Unlike your
> benchmark, though, there is only single writer thread, and therefore
> is less ideal to measure optimizations when IO becomes available.
> Hence it would be nice to also have this.
That's weird. One writer thread does not content with anybody, only with
consumers, so should not be any big difference.
--
Roman
Powered by blists - more mailing lists