linux-kernel - Re: [RFC PATCH 0/4] cgroup aware workqueues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 27 May 2016 12:22:19 +0300
From:	"Michael Rapoport" <RAPOPORT@...ibm.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	Bandan Das <bsd@...hat.com>, linux-kernel@...r.kernel.org,
	kvm@...r.kernel.org, mst@...hat.com, jiangshanlai@...il.com
Subject: Re: [RFC PATCH 0/4] cgroup aware workqueues

> Tejun Heo <htejun@...il.com> wrote on 03/31/2016 08:14:35 PM:
>
> Hello, Michael.
> 
> On Thu, Mar 31, 2016 at 08:17:13AM +0200, Michael Rapoport wrote:
> > > There really shouldn't be any difference when using unbound
> > > workqueues.  workqueue becomes a convenience thing which manages
> > > worker pools and there shouldn't be any difference between workqueue
> > > workers and kthreads in terms of behavior.
> > 
> > I agree that there really shouldn't be any performance difference, but 
the 
> > tests I've run show otherwise. I have no idea why and I hadn't time 
yet to 
> > investigate it.
> 
> I'd be happy to help digging into what's going on.  If kvm wants full
> control over the worker thread, kvm can use workqueue as a pure
> threadpool.  Schedule a work item to grab a worker thread with the
> matching attributes and keep using it as it'd a kthread.  While that
> wouldn't be able to take advantage of work item flushing and so on,
> it'd still be a simpler way to manage worker threads and the extra
> stuff like cgroup membership handling doesn't have to be duplicated.
> 
> > > > opportunity for optimization, at least for some workloads...
> > > 
> > > What sort of optimizations are we talking about?
> > 
> > Well, if we take Evlis (1) as for the theoretical base, there could be 

> > benefit of doing I/O scheduling inside the vhost.
> 
> Yeah, if that actually is beneficial, take full control of the
> kworker thread.

It me took a while, but at last I had time to run some benchmarks.
I've compared guest-to-guest netperf with 3 variants of vhost 
implementation:
(1) vanilla 4.4 (baseline)
(2) 4.4 + unbound workqueues based on Bandans patches [1]
(3) 4.4 + "grabbed" worker thread. This is my POC implementation that 
actually follows your proposal to take full control over the worker 
thread.

I've run two guests without any CPU pinning and without any actual 
interaction with cgroups
Here's the results (in MBits/sec):

size |   64  |   256   |  1024   |  4096   |  16384
-----+-------+---------+---------+---------+---------
(1)  | 496.8 | 1346.31 | 6058.49 | 13736.2 | 13541.4
(2)  | 493.3 | 1604.03 | 5723.68 | 10181.4 | 15572.4
(3)  | 489.7 | 1437.86 | 6251.12 | 12774.2 | 12867.9 


>From what I see, for different packet sizes there's different approach 
that outperforms the others.
Moreover, I'd expect that in case when vhost completely takes over the 
worker thread there would no be difference vs. current state.

Tejun, can you help explaining these results? 

[1] http://thread.gmane.org/gmane.linux.network/286858