lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <76z4ckbvjimtrf2foaislezs4vlru5upxn3i5ysu4au2m2pfei@slgxispho2iv>
Date: Tue, 21 Oct 2025 14:31:51 +0200
From: Gabriel Goller <g.goller@...xmox.com>
To: Ido Schimmel <idosch@...sch.org>
Cc: davem@...emloft.net, dsahern@...nel.org, edumazet@...gle.com, 
	kuba@...nel.org, pabeni@...hat.com, horms@...nel.org, netdev@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: Wrong source address selection in arp_solicit for forwarded
 packets

On 20.10.2025 17:06, Ido Schimmel wrote:
> On Fri, Oct 17, 2025 at 04:47:27PM +0200, Gabriel Goller wrote:
> > Hi,
> > I have a question about the arp solicit behavior:
> > 
> > I have the following simple infrastructure with linux hosts where the ip
> > addresses are configured on dummy interfaces and all other interfaces are
> > unnumbered:
> > 
> >   ┌────────┐     ┌────────┐     ┌────────┐    │ node1  ├─────┤ node2
> > ├─────┤ node3  │    │10.0.1.1│     │10.0.1.2│     │10.0.1.3│    └────────┘
> > └────────┘     └────────┘
> 
> The diagram looks mangled. At least I don't understand it.

Ah sorry about that, looks like I had format=flowed configured on my
client.

Diagram should be correct now:

   ┌────────┐     ┌────────┐     ┌────────┐
   │ node1  ├─────┤ node2  ├─────┤ node3  │
   │10.0.1.1│     │10.0.1.2│     │10.0.1.3│
   └────────┘     └────────┘     └────────┘

If it's still not right it's correctly rendered on lore:
https://lore.kernel.org/netdev/eykjh3y2bse2tmhn5rn2uvztoepkbnxpb7n2pvwq62pjetdu7o@r46lgxf4azz7/

> > All nodes have routes configured and can ping each other. ipv4 forwarding is
> > enabled on all nodes, so pinging from node1 to node3 should work. However, I'm
> > encountering an issue where node2 does not send correct arp solicitation
> > packets when forwarding icmp packets from node1 to node3.
> 
> I believe ICMP is irrelevant here.

Yep, ICMP is just an example.

> > For example, when pinging from node1 to node3, node2 sends out the
> > following arp packet:
> > 
> > 13:57:43.198959 bc:24:11:a4:f6:cd > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100),
> > length 46: vlan 300, p 0, ethertype ARP (0x0806), Ethernet (len 6),
> > IPv4 (len 4), Request who-has 10.0.1.3 tell 172.16.0.102, length 28
> > 
> > Here, 172.16.0.102 is an ip address configured on a different interface on
> > node2. This request will never receive a response because `rp_filter=2`.
> > 
> > node2 has the following (correct) routes installed:
> > 
> > 10.0.1.3 nhid 18 via 10.0.1.3 dev ens22 proto openfabric src 10.0.1.2 metric 20 onlink
> > 
> > Since arp_announce is set to 0 (the default), arp_solicit selects the first
> > interface with an ip address (inet_select_addr), which results in
> > selecting the wrong source address (172.16.0.102) for the arp request.
> > Because rp_filter is set to 2, we won't receive an answer to this arp
> > packet, and the ping will fail unless we explicitly ping from node2 to
> > node3.
> > 
> > I'm wondering if it would be possible (and correct) to modify arp_solicit to
> > perform a fib lookup to check if there's a route with an explicit source
> > address (e.g., the route above using src 10.0.1.2) and use that address as the
> > source address for the arp packet. Of course, this wouldn't be backward
> > compatible, as some users might rely on the current interface ordering behavior
> > (or the loopback interface being selected first), so it would need to be
> > controlled via a sysctl configuration flag. Perhaps I'm missing something
> > obvious here though.
> 
> This would probably entail adding a new arp_announce level, but nobody
> added a new level in at least 20 years, so you will need to explain why
> your setup is special and why the same functionality cannot be achieved
> in a different way that does not require kernel changes.

To add a bit more context, I'm using FRR on all nodes and the dummy
interface ips are distributed using OpenFabric. But this shouldn't
matter because the routes are inserted correctly and work fine.

> A few things you can consider:
> 
> 1. You wrote that the router interfaces are unnumbered. Modern
> unnumbered networks usually assign IPv6 link-local addresses to these
> interfaces. These addresses are only used for neighbour resolution and
> can be used as the nexthop address for IPv4 routes. For example:
> 
> ip route add 192.0.2.1/32 nexthop via inet6 fe80::1 dev dummy1
> 
> Or using nexthop objects:
> 
> ip nexthop add id 1 via fe80::1 dev dummy1
> ip route add 192.0.2.1/32 nhid 1

Hmm I don't know how this would help? There is a link-local address set
on the interface, but we would have to add a ipv6 source address to the
arp packet which wouldn't be right?

The route already exists (see `dev ens22` and `onlink`).

> 2. If you have interfaces whose addresses should not be considered as
> source addresses when generating IP/ARP packets out of other interfaces,
> then you can try placing them in a different VRF if it's viable.

Yep, this is definitely a solution as the "loopback" address of the VRF
is its master device. Still, what if the master device or the loopback
device have multiple ips?

> 3. Requires some work and I didn't look too much into it, but I believe
> it should be possible to derive the preferred source address and rewrite
> it in ARP packets using tc-bpf on egress. See:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dab4e1f06cabb6834de14264394ccab197007302

Yeah ebpf is definitely also a solution, but IMO this is a bit of a
weird behavior and should be fixed in the kernel.

We have all the information we need (from the routes) and just need to
use them to select the correct source address, and not just give up and
select randomly.


Thanks for the answer!
Gabriel


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ