[<prev] [next>] [day] [month] [year] [list]
Message-ID: <16469_1650744138_62645B4A_16469_436_1_4756cc37-340b-f2f6-e004-0d77573f33df@orange.com>
Date: Sat, 23 Apr 2022 22:02:18 +0200
From: <alexandre.ferrieux@...nge.com>
To: <netdev@...r.kernel.org>
Subject: Zero-Day bug in VLAN offloading + cooked AF_PACKET
Hi,
I know the subject sounds like this belongs in libpcap bug reports; indeed it
started there [1]. However, after some digging, it really looks like there's an
issue in what the kernel itself provides.
TL;DR: outgoing VLAN-tagged traffic to non-offloaded interfaces is captured as
corrupted in cooked mode, and has been so since at least 3.4...
One popular way of doing captures with libpcap-based tools like tcpdump, is the
so-called "cooked mode". This is what you get with "tcpdump -i any". The kernel
API used for this, documented in packet(7), is a socket of family AF_PACKET and
protocol level SOCK_DGRAM. Contrarily to SOCK_RAW, SOCK_DGRAM provides a kind of
"near L3" abstraction, stripping most of the L2 headers from the original
packets. For example, when using recvmsg(),
- the .msg_iov (main payload) of the recvmsg() is the packet starting at the
L3 header
- the .msg_name (aka "address") is a sockaddr_ll structure containing some L2
information: ethertype, source MAC address.
- the .msg_control (aka metadata, activated with PACKET_AUXDATA sockopt) may
contain VLAN information: TCI, TPID.
All this works beautifully most of the time, with or without VLAN tags, as the
ethertype is correctly extracted and conveyed in the sockaddr_ll. This allows
any consumer of the L3 frame to decode it properly, knowing exactly wich L3 it's
looking at.
However, there's a catch: for outgoing packets, *if* the interface has no
hardware VLAN offloading, the ethertype gets overwritten by ... the TPID
(0x8100). As a result, a consumer of the L3 frame has absolutely no way to
recover its type.
As a demo, here is what the venerable "tcpdump -i any" says of an outgoing ARP
packet on VLAN interface eth0.24, after VLAN offloading has been disabled via
"ethtool -K". Two lines are generated, as the packet is seen on both eth0.24
(first line) and eth0 (second line):
15:06:37.681328 ARP, Request who-has 1.0.24.3 tell 1.0.24.1, length 28
15:06:37.681336 ethertype IPv4, IP0
The first line is correct, as the frame is captured before handling by the 8021q
module. The second is not !!
This is the result of the ethertype being overwritten. The actual value is
0x8100, which tcpdump decodes as a 802.1Q TPID, thus shifting the L3 beginning
by 4 bytes, ending up seeing a nonsensical "IPv0" frame.
To prove that this is *not* an issue in libpcap or tcpdump, here are the three
aforementioned pieces of the packet, gotten by a simple test program doing
recvmsg() on an AF_PACKET+SOCK_DGRAM capture socket:
On VLAN interface eth0.24: (the "^^^^" show the ethertype's position)
--------------------------
- metadata: 107:8:010000001c0000001c0000000000000000000000
- sockaddr_ll: 1100080606000000010004060025903285a70000
^^^^
- L3 frame: 00010800060400010025903285a70100180100000000000001001803
On parent interface eth0:
-------------------------
- metadata: 107:8:010000001c0000001c0000000000000000000000
- sockaddr_ll: 1100810004000000010004060025903285a70000
^^^^
- L3 frame: 00010800060400010025903285a70100180100000000000001001803
As is clear above, the second instance contains no trace of the original ARP
ethertype 0x0806.
By contrast, if we re-enable VLAN offloading,
- the first instance (on subinterface) is unchanged
- the second instance (on parent interface) is back to normal, with a
correct ARP ethertype (^^^^=0806) *and* VLAN info in the metadata (TCI-TPID,
byte-swapped =1800,0081):
On parent interface eth0:
-------------------------
- metadata: 107:8:510000001c0000001c0000000000000018000081
TCI-TPID
- sockaddr_ll: 1100080604000000010004060025903285a70000
^^^^
- L3 frame: 00010800060400010025903285a70100180100000000000001001803
And sure enough, tcpdump is happy again:
21:44:18.481331 ARP, Request who-has 1.0.24.3 tell 1.0.24.1, length 28
21:44:18.481338 ethertype ARP, ARP, Request who-has 1.0.24.3 tell 1.0.24.1,
length 28
I have found this bug active on an old machine with kernel 3.4.
In the URL below you'll find more details on ftrace-based evidence, hinting at
the 8021q module.
However, I am *not* familiar enough with the Linux network stack (and special
cases like offloading) to suggest a fix, sorry.
I hope a knowledgeable person will consider this nasty enough to deserve their
attention.
Thanks in advance !
-Alex
[1] https://github.com/the-tcpdump-group/libpcap/issues/1105
_________________________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.
This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.
Powered by blists - more mailing lists