lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 15 Feb 2022 07:35:10 -0800 From: Eric Dumazet <edumazet@...gle.com> To: Willem de Bruijn <willemdebruijn.kernel@...il.com> Cc: Tao Liu <thomas.liu@...oud.cn>, David Miller <davem@...emloft.net>, Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>, David Ahern <dsahern@...nel.org>, Jakub Kicinski <kuba@...nel.org>, "Samudrala, Sridhar" <sridhar.samudrala@...el.com>, Network Development <netdev@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org> Subject: Re: [PATCH] gso: do not skip outer ip header in case of ipip and net_failover On Tue, Feb 15, 2022 at 7:01 AM Willem de Bruijn <willemdebruijn.kernel@...il.com> wrote: > > On Mon, Feb 14, 2022 at 8:38 PM Tao Liu <thomas.liu@...oud.cn> wrote: > > > > Sorry to resend it. > > > > 2022年2月14日 12:27,Willem de Bruijn <willemdebruijn.kernel@...il.com> 写道: > > > > On Sun, Feb 13, 2022 at 11:03 PM Tao Liu <thomas.liu@...oud.cn> wrote: > > > > > > Sorry for bothering, just repost it. > > > > 2022年2月14日 09:28,Willem de Bruijn <willemdebruijn.kernel@...il.com> 写道: > > > > On Sun, Feb 13, 2022 at 10:10 AM Tao Liu <thomas.liu@...oud.cn> wrote: > > > > > > We encouter a tcp drop issue in our cloud environment. Packet GROed in host > > forwards to a VM virtio_net nic with net_failover enabled. VM acts as a > > IPVS LB with ipip encapsulation. The full path like: > > host gro -> vm virtio_net rx -> net_failover rx -> ipvs fullnat > > -> ipip encap -> net_failover tx -> virtio_net tx > > > > When net_failover transmits a ipip pkt (gso_type = 0x0103), there is no gso > > performed because it supports TSO and GSO_IPXIP4. But network_header has > > been pointing to inner ip header. > > > > > > If the packet is configured correctly, and net_failover advertises > > that it can handle TSO packets with IPIP encap, then still virtio_net > > should not advertise it and software GSO be applied on its > > dev_queue_xmit call. > > > > This is assuming that the packet not only has SKB_GSO_IPXIP4 correctly > > set, but also tunneling fields like skb->encapsulated and > > skb_inner_network_header. > > > > Thanks very much for your comment! > > > > Yes, the packet is correct. Another thing i have not pointed directly is > > that the pkt has SKB_GSO_DODGY. net_failover do not advertises GSO_ROBUST > > but virtio_net do. > > > > > > If net_failover does not advertise NETIF_F_GSO_ROBUST, then > > tcp_gso_segment will pass a packet with SKB_GSO_DODGY to the > > software gso stack, not taking the branch > > > > if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) { > > > > As i tested, packet with SKB_GSO_DODGY hits this branch. packet's gso_type=0x0103, which > > means SKB_GSO_TCPV4, SKB_GSO_DODGY and SKB_GSO_IPXIP4. net_failover matches > > the condition. > > > > Consequently, tcp_gso_segment returns NULL, there is no software gso did here. And > > network_header points to inner iph. > > > > Software gso is did by virtio_net which not advertises NETIF_F_GSO_IPXIP4. It skips the outer > > iph, and keeps it unchanged. > > > > --- > > net/ipv4/af_inet.c | 10 +++++++++- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > > index 9c465ba..f8b3f8a 100644 > > --- a/net/ipv4/af_inet.c > > +++ b/net/ipv4/af_inet.c > > @@ -1425,10 +1425,18 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb, > > static struct sk_buff *ipip_gso_segment(struct sk_buff *skb, > > netdev_features_t features) > > { > > + struct sk_buff *segs; > > + int nhoff; > > + > > if (!(skb_shinfo(skb)->gso_type & SKB_GSO_IPXIP4)) > > return ERR_PTR(-EINVAL); > > > > - return inet_gso_segment(skb, features); > > + nhoff = skb_network_header(skb) - skb_mac_header(skb); > > + segs = inet_gso_segment(skb, features); > > + if (!segs) > > + skb->network_header = skb_mac_header(skb) + nhoff - skb->head; > > + > > + return segs; > > } > > > > > > If this would be needed for IPIP, then the same would be needed for SIT, etc. > > > > Is the skb_network_header > > > > 1. correctly pointing to the outer header of the TSO packet before the > > call to inet_gso_segment > > 2. incorrectly pointing to the inner header of the (still) TSO packet > > after the call to inet_gso_segment > > > > inet_gso_segment already does the same operation: save nhoff, pull > > network header, call callbacks.gso_segment (which can be > > ipip_gso_segment->inet_gso_segment), then place the network header > > back at nhoff. > > > > values print in skb_mac_gso_segment() before callbacks.gso_segment: > > ipip: vlan_depth=0 mac_len=0 skb->network_header=206 > > net_failover: vlan_depth=14 mac_len=14 skb->network_header=186 > > virtio_net: vlan_depth=34 mac_len=34 skb->network_header=206 > > > > agree to add sit/ip4ip6/ip6ip6, and patch can be simplified as: > > > > > > If IPIP GSO was so broken, I think we would have found it long before. > > > > As said, inet_gso_segment should already do the right thing for ipip: > > it will be called twice. > > > > > > SKB_GSO_DODGY flag and net_failover conduct this issue. local traffic just works fine. > > Got it. That is an uncommon combination. SKB_GSO_DODGY is set from > external virtio_net, which does not support tunnels. But a path with > an added tunnel might cause this combination. > > And inet_gso_segment resets the network header, both times, before > calling callbacks.gso_segment() > > skb_reset_network_header(skb); > nhoff = skb_network_header(skb) - skb_mac_header(skb); > > [...] > > if (likely(ops && ops->callbacks.gso_segment)) > segs = ops->callbacks.gso_segment(skb, features); > > And resets that after for each skb in segs. > > skb = segs; > do { > [...] > skb->network_header = (u8 *)iph - skb->head; > > But does not do this if segs == NULL. > > The packet has to be restored before it is passed to the device. I > think we have to handle this case correctly in inet_gso_segment, > instead of patching it up in all the various tunnel devices. > > The same holds for ipv6_gso_segment. Back in the days, GRO was modified so that we passed a context (nhoff) in called functions, instead of changing skb offsets. The concept of outer/inner header only works with 1 encap. Perhaps it is time to do the same in GSO, to allow arbitrary levels of encapsulation. Then we no longer mess with these limited 'network_header/inner_network_header' fields in the skb. Stuffing state in the skb has been a mistake I think.
Powered by blists - more mailing lists