netdev - Re: [PATCH] rps: selective flow shedding during softnet overflow

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+FuTSfrox_gYhzhu8RyWbEiK9rnqSsQkbHpyxKyKT3T=_D16g@mail.gmail.com>
Date:	Fri, 19 Apr 2013 16:11:46 -0400
From:	Willem de Bruijn <willemb@...gle.com>
To:	Stephen Hemminger <stephen@...workplumber.org>
Cc:	netdev@...r.kernel.org, David Miller <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>
Subject: Re: [PATCH] rps: selective flow shedding during softnet overflow

On Fri, Apr 19, 2013 at 3:03 PM, Stephen Hemminger
<stephen@...workplumber.org> wrote:
> On Fri, 19 Apr 2013 13:46:52 -0400
> Willem de Bruijn <willemb@...gle.com> wrote:
>
>> A cpu executing the network receive path sheds packets when its input
>> queue grows to netdev_max_backlog. A single high rate flow (such as a
>> spoofed source DoS) can exceed a single cpu processing rate and will
>> degrade throughput of other flows hashed onto the same cpu.
>>
>> This patch adds a more fine grained hashtable. If the netdev backlog
>> is above a threshold, IRQ cpus track the ratio of total traffic of
>> each flow (using 1024 buckets, configurable). The ratio is measured
>> by counting the number of packets per flow over the last 256 packets
>> from the source cpu. Any flow that occupies a large fraction of this
>> (set at 50%) will see packet drop while above the threshold.
>>
>> Tested:
>> Setup is a muli-threaded UDP echo server with network rx IRQ on cpu0,
>> kernel receive (RPS) on cpu0 and application threads on cpus 2--7
>> each handling 20k req/s. Throughput halves when hit with a 400 kpps
>> antagonist storm. With this patch applied, antagonist overload is
>> dropped and the server processes its complete load.
>>
>> The patch is effective when kernel receive processing is the
>> bottleneck. The above RPS scenario is a extreme, but the same is
>> reached with RFS and sufficient kernel processing (iptables, packet
>> socket tap, ..).
>>
>> Signed-off-by: Willem de Bruijn <willemb@...gle.com>
>
> The netdev_backlog only applies for RPS and non-NAPI devices.
> So this won't help if receive packet steering is not enabled.
> Seems like a deficiency in the receive steering design rather
> than the netdev_backlog.

The patch specifically intends to address a consequence of
perfect flow-hashing: that unbalanced input translates into cpu
load imbalance. It is less relevant to servers that do not use
flow hashing to spread traffic (i.e., no rps/rfs).

In normal server workloads, hashing works well, but it makes
machine state subject to external influence. In particular, to
local resource exhaustion (partial DoS). This patch hardens
against these extreme input patterns that should not occur in
normal workloads. The netdev backlog is the clearest indicator of
unsustainable load due to imbalance.

> Can't you do this with existing ingress stuff?
> The trend seems to be put in more fixed infrastructure to deal with
> performance and server problems rather than building general purpose
> solutions.

This isn't necessarily mutually exclusive with iptables/policing/..
mechanisms to filter out bad flows, of course. The earlier in the
pipeline packets are dropped, the fewer cycles are spent, so this
is another layer of (early) defense.

For instance, I recently sent a patch to handle load imbalance in
packet sockets. Those socket queues fill up if the application
threads are the bottleneck instead of the kernel receive path, so
this rps fix would not be relevant.

>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html