[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <eb07f943-9033-8da6-7958-8ecdc94d25ac@gmail.com>
Date: Wed, 27 Jun 2018 06:56:52 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: Flavio Leitner <fbl@...hat.com>, netdev@...r.kernel.org
Cc: Eric Dumazet <eric.dumazet@...il.com>,
Paolo Abeni <pabeni@...hat.com>,
David Miller <davem@...emloft.net>,
Florian Westphal <fw@...len.de>,
netfilter-devel@...r.kernel.org
Subject: Re: [PATCH v2 net-next 2/2] skbuff: preserve sock reference when
scrubbing the skb.
On 06/27/2018 06:34 AM, Flavio Leitner wrote:
> The sock reference is lost when scrubbing the packet and that breaks
> TSQ (TCP Small Queues) and XPS (Transmit Packet Steering) causing
> performance impacts of about 50% in a single TCP stream when crossing
> network namespaces.
>
> XPS breaks because the queue mapping stored in the socket is not
> available, so another random queue might be selected when the stack
> needs to transmit something like a TCP ACK, or TCP Retransmissions.
> That causes packet re-ordering and/or performance issues.
Note we do not care if another random queue is selected when TCP retransmit
after timeout happens.
The problem is really when sending a normal train of packets (being retransmission
or not). We want all of them going through one queue to avoid reorders.
After an idle period (no packets are in any qdisc/NIC queue), we absolutely
are okay to select another "random queue".
This choice is driven by skb->ooo_okay
Most TCP ACK packets are sent while no prior packet is in qdisc, so should
have ooo_okay set to 1.
>
> TSQ breaks because it orphans the packet while it is still in the
> host, so packets are queued contributing to the buffer bloat problem.
>
> Preserving the sock reference fixes both issues. The socket is
> orphaned anyways in the receiving path before any relevant action
> and on TX side the netfilter checks if the reference is local before
> use it.
>
> Signed-off-by: Flavio Leitner <fbl@...hat.com>
> ---
> Documentation/networking/ip-sysctl.txt | 10 +++++-----
> net/core/skbuff.c | 1 -
> 2 files changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index ce8fbf5aa63c..f4c042be0216 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -733,11 +733,11 @@ tcp_limit_output_bytes - INTEGER
> Controls TCP Small Queue limit per tcp socket.
> TCP bulk sender tends to increase packets in flight until it
> gets losses notifications. With SNDBUF autotuning, this can
> - result in a large amount of packets queued in qdisc/device
> - on the local machine, hurting latency of other flows, for
> - typical pfifo_fast qdiscs.
> - tcp_limit_output_bytes limits the number of bytes on qdisc
> - or device to reduce artificial RTT/cwnd and reduce bufferbloat.
> + result in a large amount of packets queued on the local machine
> + (e.g.: qdiscs, CPU backlog, or device) hurting latency of other
> + flows, for typical pfifo_fast qdiscs. tcp_limit_output_bytes
> + limits the number of bytes on qdisc or device to reduce artificial
> + RTT/cwnd and reduce bufferbloat.
> Default: 262144
>
> tcp_challenge_ack_limit - INTEGER
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index b1f274f22d85..f59e98ca72c5 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4911,7 +4911,6 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet)
> return;
>
> ipvs_reset(skb);
> - skb_orphan(skb);
> skb->mark = 0;
> }
> EXPORT_SYMBOL_GPL(skb_scrub_packet);
>
Powered by blists - more mailing lists