lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z7D9cR22BDPN7WSJ@shredder>
Date: Sat, 15 Feb 2025 22:47:45 +0200
From: Ido Schimmel <idosch@...sch.org>
To: Stephen Hemminger <stephen@...workplumber.org>, lucien.xin@...il.com,
	edumazet@...gle.com
Cc: netdev@...r.kernel.org, fmei@....com
Subject: Re: Fw: [Bug 219766] New: Garbage Ethernet Frames

On Mon, Feb 10, 2025 at 08:49:31AM -0800, Stephen Hemminger wrote:
> Not really enough information to do any deep analysis but forwarding to netdev
> anyway as it is not junk.
> 
> Begin forwarded message:
> 
> Date: Sun, 09 Feb 2025 12:24:32 +0000
> From: bugzilla-daemon@...nel.org
> To: stephen@...workplumber.org
> Subject: [Bug 219766] New: Garbage Ethernet Frames
> 
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=219766
> 
>             Bug ID: 219766
>            Summary: Garbage Ethernet Frames
>            Product: Networking
>            Version: 2.5
>           Hardware: All
>                 OS: Linux
>             Status: NEW
>           Severity: normal
>           Priority: P3
>          Component: Other
>           Assignee: stephen@...workplumber.org
>           Reporter: fmei@....com
>         Regression: No
> 
> I am currently troubleshooting a very strange problem which appears when
> upgrading Kernel 6.6.58 to 6.6.60. The kernel version change is part of a
> change of talos linux (www.talos.dev) from 1.8.2 to 1.8.3.
> 
> We are running this machines at hetzner - a company which is providing server
> hosting. they complain that we are using mac addresses which are not allowed
> (are not the mac addresses of the physical nic)
> 
> In the investigation of the problem I did tcpdumps on the physical adapters and
> captured this suspicious ethernet frames. The frames do neither have a known
> ethertype, nor do they have a mac address of a known vendor or a known virtual
> mac address range. They seem garbage to me. Below an example. More can be found
> in the github issue. This frames are not emitted very often and the systems are
> operating normally. If I would not be informed by the hosting provider I would
> not have noticed it at all.
> 
> I also tried to track it down to a specific hardware (r8169), but we have the
> same problem with e1000e.
> 
> I checked the changelogs of the two kernel versions (6.6.59 & 6.6.60) and
> noticed there were some changes which could be the problem, but I simply do not
> have the experience for it.
> 
> Can anybody check the changelog of the 2 versions and see if there is a change
> which might cause the problem? Can anybody give me a hint how to track it down
> further?
> 
> tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
> listening on enp9s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> 22:07:02.329668 20:00:40:11:18:fb > 45:00:00:44:f4:94, ethertype Unknown
> (0x58c6), length 68:
>         0x0000:  8dda 74ca f1ae ca6c ca6c 0098 969c 0400  ..t....l.l......
>         0x0010:  0000 4730 3f18 6800 0000 0000 0000 9971  ..G0?.h........q
>         0x0020:  c4c9 9055 a157 0a70 9ead bf83 38ca ab38  ...U.W.p....8..8
>         0x0030:  8add ab96 e052                           .....R
> 
> 
> Issue with more information: https://github.com/siderolabs/talos/issues/9837

Adding Xin and Eric as I believe this is caused by commit 22600596b675
("ipv4: give an IPv4 dev to blackhole_netdev"). It's present in v6.6.59,
but not in v6.6.58 which fits the report.

The Ethernet headers of all the captured packets start with 0x45, so
it's quite apparent that these are IPv4 packets that are transmitted
without an Ethernet header. Specifically, these seem to be UDP packets
(10th byte is always 0x11).

After 22600596b675 neighbours can be constructed on the blackhole device
and they are constructed with an output function (neigh_direct_output())
that simply calls dev_queue_xmit(). The latter will transmit packets via
skb->dev which might not be the blackhole device if dst_dev_put()
switched dst->dev to the blackhole device when another CPU was using
this dst entry in ip_output().

I verified this using these hackish scripts [1][2]. They create a
nexthop exception where a dst entry is cached. Two UDP applications use
this dst entry, but a background process continuously bumps the IPv4
generation ID so as to force these applications to perform a route
lookup instead of using the dst entry that they previously resolved.

After creating a new dst entry, one application will try to cache it in
the nexthop exception by calling rt_bind_exception() which will call
dst_dev_put() on the previously cached dst entry that might still be
used by the other application on a different CPU.

There's a counter at the end of the test that counts the number of
packets that were transmitted without an Ethernet header. The problem
does not reproduce with 22600596b675 reverted.

One possible solution is to add the dst entry to the uncached list by
converting dst_dev_put() to rt_add_uncached_list() [3]. The dst entry
will be eventually removed from this list when the dst entry is
destroyed or when dst->dev is unregistered. However, I am not sure it
will solve this report as rt_bind_exception() is not the only caller of
dst_dev_put().

Another possible solution is to have the blackhole device consume the
packets instead of letting them go out without an Ethernet header [4].
Doesn't seem great as the packets disappear without telling anyone
(before 22600596b675 an error was returned).

Let me know what you think. Especially if you have a better fix.

Thanks

[1]
#!/bin/bash
# blackhole.sh

for ns in client router server; do
	ip netns add $ns
	ip -n $ns link set dev lo up
done

ip -n client link add name veth0 type veth peer name veth1 netns router
ip -n router link add name veth2 type veth peer name veth3 netns server

# Client
ip -n client link set dev veth0 up
ip -n client address add 192.0.2.1/28 dev veth0
ip -n client route add default via 192.0.2.2
tc -n client qdisc add dev veth0 clsact
tc -n client filter add dev veth0 egress pref 1 proto all \
	flower dst_mac 45:00:00:00:00:00/ff:ff:00:00:00:00 action pass

# Router
ip netns exec router sysctl -wq net.ipv4.conf.all.forwarding=1
ip -n router link set dev veth1 up
ip -n router address add 192.0.2.2/28 dev veth1
ip -n router link set dev veth2 up mtu 1300
ip -n router address add 192.0.2.17/28 dev veth2

# Server
ip -n server link set dev veth3 up mtu 1300
ip -n server address add 192.0.2.18/28 dev veth3
ip -n server route add default via 192.0.2.17

sleep 1

# Create a nexthop exception where the dst entry will be cached.
ip netns exec client ping 192.0.2.18 -c 1 -M want -s 1400 -q &> /dev/null

# Continuously invalidate cached dst entries by bumping the IPv4 generation ID.
# When replacing the cached dst entry in the nexthop exception,
# rt_bind_exception() will call dst_dev_put(), thereby setting the dst entry's
# device to the blackhole device.
./bump_genid.sh &

ip netns exec server iperf3 -s -p 6000 -D
ip netns exec server iperf3 -s -p 7000 -D
ip netns exec client iperf3 -c 192.0.2.18 -p 6000 -u -b 0 -t 60 &> /dev/null &
ip netns exec client iperf3 -c 192.0.2.18 -p 7000 -u -b 0 -t 60 &> /dev/null

tc -n client -s filter show dev veth0 egress

ip netns pids server | xargs kill
pkill bump_genid.sh
ip -all netns delete

[2]
#!/bin/bash
# bump_genid.sh

while true; do
	ip -n client route add 198.51.100.1/32 dev lo
	ip -n client route del 198.51.100.1/32 dev lo
done

[3]
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 753704f75b2c..f40f860c0bec 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1461,7 +1461,7 @@ static bool rt_bind_exception(struct rtable *rt, struct fib_nh_exception *fnhe,
 			dst_hold(&rt->dst);
 			rcu_assign_pointer(*porig, rt);
 			if (orig) {
-				dst_dev_put(&orig->dst);
+				rt_add_uncached_list(orig);
 				dst_release(&orig->dst);
 			}
 			ret = true;

[4]
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index c8840c3b9a1b..448f352c3c92 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -244,8 +244,22 @@ static netdev_tx_t blackhole_netdev_xmit(struct sk_buff *skb,
 	return NETDEV_TX_OK;
 }
 
+static int blackhole_neigh_output(struct neighbour *n, struct sk_buff * skb)
+{
+	kfree_skb(skb);
+	return 0;
+}
+
+static int blackhole_neigh_construct(struct net_device *dev,
+				     struct neighbour *n)
+{
+	n->output = blackhole_neigh_output;
+	return 0;
+}
+
 static const struct net_device_ops blackhole_netdev_ops = {
 	.ndo_start_xmit = blackhole_netdev_xmit,
+	.ndo_neigh_construct = blackhole_neigh_construct,
 };
 
 /* This is a dst-dummy device used specifically for invalidated

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ