[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ad564469-c999-3658-d94c-07301702ad27@ssi.bg>
Date: Mon, 12 Jun 2023 16:51:29 +0300 (EEST)
From: Julian Anastasov <ja@....bg>
To: Terin Stock <terin@...udflare.com>
cc: horms@...ge.net.au, netdev@...r.kernel.org, lvs-devel@...r.kernel.org,
kernel-team@...udflare.com, pablo@...filter.org, hengqing.hu@...il.com,
kuba@...nel.org, netfilter-devel@...r.kernel.org, fw@...len.de,
coreteam@...filter.org, davem@...emloft.net, kadlec@...filter.org,
pabeni@...hat.com, edumazet@...gle.com
Subject: Re: [PATCH v2] ipvs: align inner_mac_header for encapsulation
Hello,
On Fri, 9 Jun 2023, Terin Stock wrote:
> When using encapsulation the original packet's headers are copied to the
> inner headers. This preserves the space for an inner mac header, which
> is not used by the inner payloads for the encapsulation types supported
> by IPVS. If a packet is using GUE or GRE encapsulation and needs to be
> segmented, flow can be passed to __skb_udp_tunnel_segment() which
> calculates a negative tunnel header length. A negative tunnel header
> length causes pskb_may_pull() to fail, dropping the packet.
>
> This can be observed by attaching probes to ip_vs_in_hook(),
> __dev_queue_xmit(), and __skb_udp_tunnel_segment():
>
> perf probe --add '__dev_queue_xmit skb->inner_mac_header \
> skb->inner_network_header skb->mac_header skb->network_header'
> perf probe --add '__skb_udp_tunnel_segment:7 tnl_hlen'
> perf probe -m ip_vs --add 'ip_vs_in_hook skb->inner_mac_header \
> skb->inner_network_header skb->mac_header skb->network_header'
>
> These probes the headers and tunnel header length for packets which
> traverse the IPVS encapsulation path. A TCP packet can be forced into
> the segmentation path by being smaller than a calculated clamped MSS,
> but larger than the advertised MSS.
>
> probe:ip_vs_in_hook: inner_mac_header=0x0 inner_network_header=0x0 mac_header=0x44 network_header=0x52
> probe:ip_vs_in_hook: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32
> probe:dev_queue_xmit: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32
> probe:__skb_udp_tunnel_segment_L7: tnl_hlen=-2
>
> When using veth-based encapsulation, the interfaces are set to be
> mac-less, which does not preserve space for an inner mac header. This
> prevents this issue from occurring.
>
> In our real-world testing of sending a 32KB file we observed operation
> time increasing from ~75ms for veth-based encapsulation to over 1.5s
> using IPVS encapsulation due to retries from dropped packets.
>
> This changeset modifies the packet on the encapsulation path in
> ip_vs_tunnel_xmit() and ip_vs_tunnel_xmit_v6() to remove the inner mac
> header offset. This fixes UDP segmentation for both encapsulation types,
> and corrects the inner headers for any IPIP flows that may use it.
>
> Fixes: 84c0d5e96f3a ("ipvs: allow tunneling with gue encapsulation")
> Signed-off-by: Terin Stock <terin@...udflare.com>
Looks good to me for nf/net tree, thanks!
Acked-by: Julian Anastasov <ja@....bg>
> ---
> net/netfilter/ipvs/ip_vs_xmit.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
> index c7652da78c88..9193e109e6b3 100644
> --- a/net/netfilter/ipvs/ip_vs_xmit.c
> +++ b/net/netfilter/ipvs/ip_vs_xmit.c
> @@ -1207,6 +1207,7 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
> skb->transport_header = skb->network_header;
>
> skb_set_inner_ipproto(skb, next_protocol);
> + skb_set_inner_mac_header(skb, skb_inner_network_offset(skb));
>
> if (tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE) {
> bool check = false;
> @@ -1349,6 +1350,7 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
> skb->transport_header = skb->network_header;
>
> skb_set_inner_ipproto(skb, next_protocol);
> + skb_set_inner_mac_header(skb, skb_inner_network_offset(skb));
>
> if (tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE) {
> bool check = false;
> --
> 2.40.1
Regards
--
Julian Anastasov <ja@....bg>
Powered by blists - more mailing lists