netdev - Re: rps perfomance WAS(Re: rps: question

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <t2p65634d661004141031xf80f62e7sb64362ea1ce10a1f@mail.gmail.com>
Date:	Wed, 14 Apr 2010 10:31:34 -0700
From:	Tom Herbert <therbert@...gle.com>
To:	hadi@...erus.ca
Cc:	Eric Dumazet <eric.dumazet@...il.com>, netdev@...r.kernel.org,
	robert@...julf.net, David Miller <davem@...emloft.net>,
	Changli Gao <xiaosuo@...il.com>,
	Andi Kleen <andi@...stfloor.org>
Subject: Re: rps perfomance WAS(Re: rps: question

The point of RPS is to increase parallelism, but the cost of that is
more overhead per packet.  If you are running a single flow, then
you'll see latency increase for that flow.  With more concurrent flows
the benefits of parallelism kick in and latency gets better.-- we've
seen the break even point around ten connections in our tests.  Also,
I don't think we've made the claim that RPS should generally perform
better than multi-queue, the primary motivation for RPS is make single
queue NICs give reasonable performance.


On Wed, Apr 14, 2010 at 4:53 AM, jamal <hadi@...erus.ca> wrote:
> Following up like promised:
>
> On Mon, 2010-02-08 at 10:09 -0500, jamal wrote:
>> On Sun, 2010-02-07 at 21:58 -0800, Tom Herbert wrote:
>>
>> > I don't have specific numbers, although we are using this on
>> > application doing forwarding and numbers seem in line with what we see
>> > for an end host.
>> >
>>
>> When i get the chance i will give it a run. I have access to an i7
>> somewhere. It seems like i need some specific nics?
>
> I did step #0 last night on an i7 (single Nehalem). I think more than
> anything i was impressed by the Nehalem's excellent caching system.
> Robert, I am almost tempted to say skb recycling performance will be
> excellent on this  machine given the cost of a cache miss is much lower
> than previous generation hardware.
>
> My test was simple: irq affinity on cpu0(core0) and rps redirection to
> cpu1(core 1); tried also to redirect to different SMT threads (aka CPUs)
> on different cores with similar results. I base tested against no rps
> being used and a kernel which didnt have any RPS config on.
> [BTW, I had to hand-edit the .config since i couldnt do it from
> menuconfig (Is there any reason for it to be so?)]
>
> Traffic was sent from another machine into the i7 via an el-cheapo sky2
> (dont know how shitty this NIC is, but it seems to know how to do MSI so
> probably capable of multiqueueing); the test was several sets of
> a ping first and then a ping -f (I will get more sophisticated in my
> next test likely this weekend).
>
> Results:
> CPU utilization was about 20-30% higher in the case of rps. On cpu0, the
> cpu was being chewed highly by sky2_poll and on the redirected-to-core
> it was always smp_call_function_single.
> Latency was (consistently) on average 5 microseconds.
> So if i sent 1M ping -f packets, without RPS it took on average
> 176 seconds and with RPS it took 181 seconds to do a round-trip.
> Throughput didnt change but this could be attributed to the low amounts
> of data i was sending.
> I observed that we were generating, on average, an IPI per packet even
> with ping -f. (added an extra stat to record when we sent an IPI and
> counted against the number of packets sent).
> In my opinion it is these IPIs that contribute the most to the latency
> and i think it happens that the Nehalem is just highly improved in this
> area. I wish i had a more commonly used machine to test rps on.
> I expect that rps will perform worse on currently cheaper/older hardware
> for the traffic characteristic i tested.
>
> On IPIs:
> Is anyone familiar with what is going on with Nehalem? Why is it this
> good? I expect things will get a lot nastier with other hardware like
> xeon based or even Nehalem with rps going across QPI.
> Here's why i think IPIs are bad, please correct me if i am wrong:
> - they are synchronous. i.e an IPI issuer has to wait for an ACK (which
> is in the form of an IPI).
> - data cache has to be synced to main memory
> - the instruction pipeline is flushed
> - what else did i miss? Andi?
>
> So my question to Tom, Eric and Changli or anyone else who has been
> running RPS:
> What hardware did you use? Is there anyone using older hardware than
> say AMD Opteron or Intel Nehalem?
>
> My impressions of rps so far:
> I think i may end up being impressed when i generate a lot more traffic
> since the cost of IPI will be amortized.
> At this point multiqueue seems a lot more impressive alternative and it
> seems to me multiqueu hardware is a lot more commodity (price-point)
> than a Nehalem.
>
> Plan:
> I plan to still attack the app space (and write a basic udp app that
> binds to one or more rps cpus and try blasting a lot of UDP traffic to
> see what happens) my step after that is to move to forwarding tests..
>
> cheers,
> jamal
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html