lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:	Tue, 10 Nov 2009 22:53:06 -0800
From:	Tom Herbert <therbert@...gle.com>
To:	David Miller <davem@...emloft.net>, netdev@...r.kernel.org
Subject: [PATCH 0/2] rps: Receive packet steering

This is the third version of the the patch that implements software receive
side packet steering (RPS).  RPS distributes the load of received packet
processing across multiple CPUs.  This version allows per NAPI steering maps
as well as using HW provided hash as an optimization.

These patches are also the basis for per-flow steering which was previously
discussed; we are still working on a general solution for the per flow
steering which prevents OOO packets.

Problem statement: Protocol processing done in the NAPI context for received
packets is serialized per device queue and becomes a bottleneck under high
packet load.  This substantially limits pps that can be achieved on a single
queue NIC and provides no scaling with multiple cores.

This solution queues packets early on in the receive path on the backlog queues
of other CPUs.   This allows protocol processing (e.g. IP and TCP) to be
performed on packets in parallel.   For each device (or NAPI instance for
a multi-queue device) a mask of CPUs is set to indicate the CPUs that can
process packets for the device. A CPU is selected on a per packet basis by
hashing contents of the packet header (the TCP or UDP 4-tuple) and using the
result to index into the CPU mask.  The IPI mechanism is used to raise
networking receive softirqs between CPUs.  This effectively emulates in
software what a multi-queue NIC can provide, but is generic requiring no device
support.

Many devices now provide a hash over the 4-tuple on a per packet basis
(Toeplitz is popular).  This patch allow drivers to set the HW reported hash
in an skb field, and that value in turn is used to index into the RPS maps.
Using the HW generated hash can avoid cache misses on the packet when
steering the packet to a remote CPU.

The CPU masks is set on a per device basis in the sysfs variable
/sys/class/net/<device>/rps_cpus.  This is a set of canonical bit maps for
each NAPI nstance of the device.  For example:

echo "0b 0b0 0b00 0b000" > /sys/class/net/eth0/rps_cpus

would set maps for four NAPI instances on eth0.

The first patch in this set adds the RPS functionality to the core networking.

The second patch adds support to the bnx2x driver to record the Toeplitz hash
reported by the device for received skbs.

Generally, we have found this technique increases pps capabilities of a single
queue device with good CPU utilization.  Optimal settings for the CPU mask
seems to depend on architectures and cache hierarcy.  Below are some results
running 700 instances of netperf TCP_RR test with 1 byte req. and resp.
Results show cumulative transaction rate and system CPU utilization.

tg3 on 8 core Intel
  Without RPS: 90K tps at 34% CPU
  With RPS:    285K tps at 70% CPU

e1000 on 8 core Intel
  Without RPS: 90K tps at 34% CPU
  With RPS:    292K tps at 66% CPU

foredeth on 16 core AMD
  Without RPS: 117K tps at 10% CPU
  With RPS:    327K tps at 29% CPU

bnx2x on 16 core AMD
  Single queue without RPS:        139K tps at 17% CPU
  Single queue with RPS:           352K tps at 30% CPU
  Multi queue (1 queues per CPU)   204K tps at 12%

We have been running a variant of this patch on production servers for a while
with good results.  In some of our more networking intensive applications we
have seen 30-50% gains in end application performance.

Tom

Signed-off-by: Tom Herbert <therbert@...gle.com>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ