[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <60DDAD73-D59E-449E-B85C-2A79F41C687D@drivenets.com>
Date: Wed, 21 Jul 2021 17:36:47 +0000
From: Gilad Naaman <gnaaman@...venets.com>
To: "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Possible GRE-TAP/ECT bug
Hey,
We’ve encountered a bug with handling of the ECT bits in GRE-tap encapsulated IPv4 traffic on Kernel 5.4.
This bug manifests when a packet in the following form enters the system:
+-----------+
|Inner IPv4 | ECT=0b00
+-----------+
|Ethernet |
+-----------+
|GRE |
+-----------+
|Outer IPv4 | ECT=0b01 | ECT=0b10
+-----------+
When inspected by tcpdump-ing the outer interface, the packet is fine;
when inspected by tcpdump-ing the GRE interface, a small amount of bits is changed in the source/destination addresses of the inner Ethernet header.
Upon further inspection, we found out that the bits changed in the ethernet headers correspond to the position of the ECT/Checksum fields in the IPv4 header.
In other words, some code path tries to copy the ECT bits from outer to the inner IPv4 header and correct the checksum, but it accidentally treats the ethernet header as an IP header.
Unfortunately this results in the loss of all IPv4 traffic.
This doesn’t happen if the internal protocol is IPv6 or IS-IS, or if the external ECT is 0.
We’ve tried to search for places that modify the `tos` field of an skb, and found just enough to say “yes, some code wants to sync the ECT bits”,
but not to find the right one.
Most prominently, xfrm does just that, but as far as I can tell it is not relevant to these packets.
(It is done by calling `ipv4_copy_dscp` from `xfrm4_remove_tunnel_encap`)
We’ve gone over the ip_tunnel.c code dmesg shows that `ip_tunnel: non-ECT from 10.254.3.1 with TOS=0x1` is printed for the packet mentioned below.
Would love if someone can direct me at where to look for this ECT copying code for further analysis.
Thank you!
=================================================
As an example to a modified packet, here’s an ICMP ping.
The sent packet is
```
<IP version=4 ihl=5 tos=0x1 id=0 flags= frag=0 ttl=64 proto=gre src=10.254.3.1 dst=10.254.1.6 |<GRE chksum_present=0 routing_present=0 key_present=1 seqnum_present=0 strict_route_source=0 recursion_control=0 flags=0 version=0 proto=TEB key=0x49000000 |<Ether dst=18:be:92:a0:ee:26 src=18:b0:92:a0:6c:26 type=IPv4 |<IP version=4 ihl=5 tos=0x0 id=35717 flags=DF frag=0 ttl=1 proto=icmp src=130.130.130.1 dst=130.130.130.2 |<ICMP type=echo-request code=0 id=0xa6eb seq=0x331e unused='' |<Raw load='\x1e\\xf3\\xf7`\x00\x00\x00\x00ZN\x00\x00\x00\x00\x00\x00\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@...DEFGHIJKLMNOPQRSTUVWXYZ' |>>>>>>
```
or in hex:
450100a700000000402f60250afe03010afe0106200065584900000018be92a0ee2618b092a06c2608004500007d8b8540000101e4f2828282018282820208006411a6eb331e1e5c7866335c78663760000000005a4e000000000000101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a
The received packet is
```
<Ether dst=18:bd:92:a0:ee:26 src=18:b0:92:a0:6c:27 type=IPv4 |<IP version=4 ihl=5 tos=0x0 len=125 id=35717 flags=DF frag=0 ttl=1 proto=icmp chksum=0xe4f2 src=130.130.130.1 dst=130.130.130.2 |<ICMP type=echo-request code=0 chksum=0x6411 id=0xa6eb seq=0x331e unused='' |<Raw load='\x1e\\xf3\\xf7`\x00\x00\x00\x00ZN\x00\x00\x00\x00\x00\x00\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@...DEFGHIJKLMNOPQRSTUVWXYZ' |>>>>
```
or
18bd92a0ee2618b092a06c2708004500007d8b8540000101e4f2828282018282820208006411a6eb331e1e5c7866335c78663760000000005a4e000000000000101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a
Note how the destination address changes from `18:be:92:a0:ee:26` to `18:bd:92:a0:ee:26`.
Where the lower 2 bits of the second octet are overwritten with the ECT `0b01`.
The source address is then changed from `18:b0:92:a0:6c:26` to `18:b0:92:a0:6c:27`, probably as an attempt to fix the checksum.
In practice it’s always incremented by 1, even though I’m not sure it results in a correct checksum.
If the MAC address is changed on the interface to match the ECT, or the other ECT is used in the packet, nothing changes and the packet is received correctly.
Powered by blists - more mailing lists