lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <20160524190320.69761a67@redhat.com> Date: Tue, 24 May 2016 19:03:20 +0200 From: Jesper Dangaard Brouer <brouer@...hat.com> To: "Michael S. Tsirkin" <mst@...hat.com> Cc: linux-kernel@...r.kernel.org, Jason Wang <jasowang@...hat.com>, Eric Dumazet <eric.dumazet@...il.com>, davem@...emloft.net, netdev@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>, brouer@...hat.com Subject: Re: [PATCH v5 2/2] skb_array: ring test On Tue, 24 May 2016 12:28:09 +0200 Jesper Dangaard Brouer <brouer@...hat.com> wrote: > I do like perf, but it does not answer my questions about the > performance of this queue. I will code something up in my own > framework[2] to answer my own performance questions. > > Like what is be minimum overhead (in cycles) achievable with this type > of queue, in the most optimal situation (e.g. same CPU enq+deq cache hot) > for fastpath usage. Coded it up here: https://github.com/netoptimizer/prototype-kernel/commit/b16a3332184 https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/skb_array_bench01.c This is a really fake benchmark, but it sort of shows the minimum overhead achievable with this type of queue, where it is the same CPU enqueuing and dequeuing, and cache is guaranteed to be hot. Measured on a i7-4790K CPU @ 4.00GHz, the average cost of enqueue+dequeue of a single object is around 102 cycles(tsc). To compare this with below, where enq and deq is measured separately: 102 / 2 = 51 cycles > Then I also want to know how this performs when two CPUs are involved. > As this is also a primary use-case, for you when sending packets into a > guest. Coded it up here: https://github.com/netoptimizer/prototype-kernel/commit/75fe31ef62e https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/skb_array_parallel01.c This parallel benchmark try to keep two (or more) CPUs busy enqueuing or dequeuing on the same skb_array queue. It prefills the queue, and stops the test as soon as queue is empty or full, or completes a number of "loops"/cycles. For two CPUs the results are really good: enqueue: 54 cycles(tsc) dequeue: 53 cycles(tsc) Going to 4 CPUs, things break down (but it was not primary use-case?): CPU(0) 927 cycles(tsc) enqueue CPU(1) 921 cycles(tsc) dequeue CPU(2) 927 cycles(tsc) enqueue CPU(3) 898 cycles(tsc) dequeue Next on my todo-list is to implement same tests for e.g. alf_queue, so we can compare the concurrency part (which is the important part). But FYI I'll be busy the next days at conf http://fosd2016.itu.dk/ -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer [1] https://git.kernel.org/cgit/linux/kernel/git/mst/vhost.git/log/?h=vhost [2] https://github.com/netoptimizer/prototype-kernel
Powered by blists - more mailing lists