lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 22 Aug 2014 15:29:15 +0800
From:	Jason Wang <jasowang@...hat.com>
To:	Mike Galbraith <umgwanakikbuti@...il.com>
CC:	davem@...emloft.net, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org, mst@...hat.com,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: [PATCH net-next 2/2] net: exit busy loop when another process
 is runnable

On 08/22/2014 01:01 PM, Mike Galbraith wrote:
> On Thu, 2014-08-21 at 16:05 +0800, Jason Wang wrote: 
>> > Rx busy loop does not scale well in the case when several parallel
>> > sessions is active. This is because we keep looping even if there's
>> > another process is runnable. For example, if that process is about to
>> > send packet, keep busy polling in current process will brings extra
>> > delay and damage the performance.
>> > 
>> > This patch solves this issue by exiting the busy loop when there's
>> > another process is runnable in current cpu. Simple test that pin two
>> > netperf sessions in the same cpu in receiving side shows obvious
>> > improvement:
> That patch says to me it's a bad idea to spin when someone (anyone) else
> can get some work done on a CPU, which intuitively makes sense.  But..
>
> (ponders net goop: with silly 1 byte ping-pong load, throughput is bound
> by fastpath latency, net plus sched plus fixable nohz and governor crud
> if not polling, so you can't get a lot of data moved byte at a time no
> matter how sexy the pipe whether polling or not due to bound.  If OTOH
> net hardware is a blazing fast large bore packet cannon, net overhead
> per unit payload drops, sched+crud is a constant)

Polling could be done by either rx busy loop in process context or NAPI
in softirq. Rx busy loop may only spin and poll when no packet were
found in socket receive queue. It spins in the hope that at least one
packet will come (in this case the process will exit rx busy loop) in a
short while. In this way, it eliminates the overheads of NAPI, wakeup
and scheduling. This patch just make the busy polling less aggressive:
Since the process finds nothing to receive when still spinning in this
loop, there's no need to waste cpu cycles ( or even call cpu_relax()) if
there's another work could be done by current CPU.

For stream workload like you mentioned here, if the card was fast
enough, the socket receive queue was not easy to be drained. Rx busy
loop won't help or even won't be triggered in this case.
>
> Seems the only time it's a good idea to poll is if blasting big packets
> on sexy hardware, and if you're doing that, you want to poll regardless
> of whether somebody else is waiting, or?

NAPI will work instead of rx busy loop in this case. It will poll and
try to drain nic's rx ring in softirq regardless somebody else.

Btw, current rx busy loop does not perform well on stream workload since
it bypasses GRO to reduce latency. But this issue beyond the scope of
this patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists