netdev - Re: [PATCH net-next 1/8] ipvlan: Implement learnable L2-bridge

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aPjo76T8c8SbOB04@horms.kernel.org>
Date: Wed, 22 Oct 2025 15:23:43 +0100
From: Simon Horman <horms@...nel.org>
To: Dmitry Skorodumov <skorodumov.dmitry@...wei.com>
Cc: netdev@...r.kernel.org, linux-doc@...r.kernel.org,
	linux-kernel@...r.kernel.org, andrey.bokhanko@...wei.com,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Jonathan Corbet <corbet@....net>,
	Andrew Lunn <andrew+netdev@...n.ch>
Subject: Re: [PATCH net-next 1/8] ipvlan: Implement learnable L2-bridge

On Tue, Oct 21, 2025 at 05:44:03PM +0300, Dmitry Skorodumov wrote:
> Now it is possible to create link in L2E mode: learnable
> bridge. The IPs will be learned from TX-packets of child interfaces.

Is there a standard for this approach - where does the L2E name come from?

> 
> Also, dev_add_pack() protocol is attached to the main port
> to support communication from main to child interfaces.
> 
> This mode is intended for the desktop virtual machines, for
> bridging to Wireless interfaces.
> 
> The mode should be specified while creating first child interface.
> It is not possible to change it after this.
> 
> Signed-off-by: Dmitry Skorodumov <skorodumov.dmitry@...wei.com>

...

> diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h

...

It is still preferred in networking code to linewrap lines
so that they are not wider than 80 columns, where than can be done without
reducing readability. Which appears to be the case here.

Flagged by checkpatch.pl --max-line-length=80

...
> diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c

...

> @@ -414,6 +426,77 @@ struct ipvl_addr *ipvlan_addr_lookup(struct ipvl_port *port, void *lyr3h,
>  	return addr;
>  }
>  
> +static inline bool is_ipv4_usable(__be32 addr)
> +{
> +	return !ipv4_is_lbcast(addr) && !ipv4_is_multicast(addr) &&
> +	       !ipv4_is_zeronet(addr);
> +}
> +
> +static inline bool is_ipv6_usable(const struct in6_addr *addr)
> +{
> +	return !ipv6_addr_is_multicast(addr) && !ipv6_addr_loopback(addr) &&
> +	       !ipv6_addr_any(addr);
> +}

Please don't use the inline keyword in .c files unless there
is a demonstrable reason to do so - usually performance.
Rather, please let the compiler inline functions as it sees fit.

> +
> +static void ipvlan_addr_learn(struct ipvl_dev *ipvlan, void *lyr3h,
> +			      int addr_type)
> +{
> +	void *addr = NULL;
> +	bool is_v6;
> +
> +	switch (addr_type) {
> +#if IS_ENABLED(CONFIG_IPV6)
> +	/* No need to handle IPVL_ICMPV6, since it never has valid src-address */
> +	case IPVL_IPV6: {
> +		struct ipv6hdr *ip6h;
> +
> +		ip6h = (struct ipv6hdr *)lyr3h;
> +		if (!is_ipv6_usable(&ip6h->saddr))

It is preferred to avoid #if / #ifdef in order to improve compile coverage
(and, I would argue, readability).

In this case I think that can be achieved by changing the line above to:

		if (!IS_ENABLED(CONFIG_IPV6) || !is_ipv6_usable(&ip6h->saddr))

I think it would be interesting to see if a similar approach can be used
to remove other #if CONFIG_IPV6 conditions in this file, and if successful
provide that as a clean-up as the opening patch in this series.

However, without that, I can see how one could argue for the approach
you have taken here on the basis of consistency.

> +			return;
> +		is_v6 = true;
> +		addr = &ip6h->saddr;
> +		break;
> +	}
> +#endif

...

> @@ -618,15 +701,56 @@ static int ipvlan_xmit_mode_l3(struct sk_buff *skb, struct net_device *dev)
>  
>  static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
>  {
> -	const struct ipvl_dev *ipvlan = netdev_priv(dev);
> -	struct ethhdr *eth = skb_eth_hdr(skb);
> -	struct ipvl_addr *addr;
>  	void *lyr3h;
> +	struct ipvl_addr *addr;
>  	int addr_type;
> +	bool same_mac_addr;
> +	struct ipvl_dev *ipvlan = netdev_priv(dev);
> +	struct ethhdr *eth = skb_eth_hdr(skb);

I realise that the convention is not followed in the existing code,
but please prefer to arrange local variables in reverse xmas tree order -
longest line to shortest.

In this case I think we can avoid moving things away
from that order like this (completely untested):

-	const struct ipvl_dev *ipvlan = netdev_priv(dev);
+	struct ipvl_dev *ipvlan = netdev_priv(dev);
 	struct ethhdr *eth = skb_eth_hdr(skb);
 	struct ipvl_addr *addr;
+	bool same_mac_addr;
 	void *lyr3h;
 	int addr_type;

Likewise elsewhere in this patch.

This too can be helpful in this area
github.com/ecree-solarflare/xmastree/commits/master/

> +
> +	if (ipvlan_is_learnable(ipvlan->port) &&
> +	    ether_addr_equal(eth->h_source, dev->dev_addr)) {
> +		/* ignore tx-packets from host */
> +		goto out_drop;
> +	}
> +
> +	same_mac_addr = ether_addr_equal(eth->h_dest, eth->h_source);
> +
> +	lyr3h = ipvlan_get_L3_hdr(ipvlan->port, skb, &addr_type);
>  
> -	if (!ipvlan_is_vepa(ipvlan->port) &&
> -	    ether_addr_equal(eth->h_dest, eth->h_source)) {
> -		lyr3h = ipvlan_get_L3_hdr(ipvlan->port, skb, &addr_type);
> +	if (ipvlan_is_learnable(ipvlan->port)) {
> +		if (lyr3h)
> +			ipvlan_addr_learn(ipvlan, lyr3h, addr_type);
> +		/* Mark SKB in advance */
> +		skb = skb_share_check(skb, GFP_ATOMIC);
> +		if (!skb)
> +			return NET_XMIT_DROP;

I think that when you drop packets a counter should be incremented.
Likewise elsewhere in this function.

> +		ipvlan_mark_skb(skb, ipvlan->phy_dev);
> +	}
> +
> +	if (is_multicast_ether_addr(eth->h_dest)) {
> +		skb_reset_mac_header(skb);
> +		ipvlan_skb_crossing_ns(skb, NULL);
> +		ipvlan_multicast_enqueue(ipvlan->port, skb, true);
> +		return NET_XMIT_SUCCESS;
> +	}
> +
> +	if (ipvlan_is_vepa(ipvlan->port))
> +		goto tx_phy_dev;
> +
> +	if (!same_mac_addr &&
> +	    ether_addr_equal(eth->h_dest, ipvlan->phy_dev->dev_addr)) {
> +		/* It is a packet from child with destination to main port.
> +		 * Pass it to main.
> +		 */
> +		skb = skb_share_check(skb, GFP_ATOMIC);
> +		if (!skb)
> +			return NET_XMIT_DROP;
> +		skb->pkt_type = PACKET_HOST;
> +		skb->dev = ipvlan->phy_dev;
> +		dev_forward_skb(ipvlan->phy_dev, skb);
> +		return NET_XMIT_SUCCESS;
> +	} else if (same_mac_addr) {
>  		if (lyr3h) {
>  			addr = ipvlan_addr_lookup(ipvlan->port, lyr3h, addr_type, true);
>  			if (addr) {
> @@ -649,16 +773,14 @@ static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
>  		 */
>  		dev_forward_skb(ipvlan->phy_dev, skb);
>  		return NET_XMIT_SUCCESS;
> -
> -	} else if (is_multicast_ether_addr(eth->h_dest)) {
> -		skb_reset_mac_header(skb);
> -		ipvlan_skb_crossing_ns(skb, NULL);
> -		ipvlan_multicast_enqueue(ipvlan->port, skb, true);
> -		return NET_XMIT_SUCCESS;
>  	}
>  
> +tx_phy_dev:
>  	skb->dev = ipvlan->phy_dev;
>  	return dev_queue_xmit(skb);
> +out_drop:
> +	consume_skb(skb);
> +	return NET_XMIT_DROP;
>  }
>  
>  int ipvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev)

...

> diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c

...

> +static int ipvlan_port_receive(struct sk_buff *skb, struct net_device *wdev,
> +			       struct packet_type *pt, struct net_device *orig_wdev)
> +{
> +	struct ipvl_port *port;
> +	struct ipvl_addr *addr;
> +	struct ethhdr *eth;
> +	void *lyr3h;
> +	int addr_type;
> +
> +	port = container_of(pt, struct ipvl_port, ipvl_ptype);
> +	/* We are interested only in outgoing packets.
> +	 * rx-path is handled in rx_handler().
> +	 */
> +	if (skb->pkt_type != PACKET_OUTGOING || ipvlan_is_skb_marked(skb, port->dev))
> +		goto out;
> +
> +	skb = skb_share_check(skb, GFP_ATOMIC);
> +	if (!skb)
> +		goto no_mem;
> +
> +	/* data should point to eth-header */
> +	skb_push(skb, skb->data - skb_mac_header(skb));
> +	skb->dev = port->dev;
> +	eth = eth_hdr(skb);
> +
> +	if (is_multicast_ether_addr(eth->h_dest)) {
> +		ipvlan_skb_crossing_ns(skb, NULL);
> +		skb->protocol = eth_type_trans(skb, skb->dev);
> +		skb->pkt_type = PACKET_HOST;
> +		ipvlan_mark_skb(skb, port->dev);
> +		ipvlan_multicast_enqueue(port, skb, false);
> +		return 0;
> +	}
> +
> +	lyr3h = ipvlan_get_L3_hdr(port, skb, &addr_type);
> +	if (!lyr3h)
> +		goto out;
> +
> +	addr = ipvlan_addr_lookup(port, lyr3h, addr_type, true);
> +	if (addr) {
> +		int ret, len;
> +
> +		ipvlan_skb_crossing_ns(skb, addr->master->dev);
> +		skb->protocol = eth_type_trans(skb, skb->dev);
> +		skb->pkt_type = PACKET_HOST;
> +		ipvlan_mark_skb(skb, port->dev);
> +		len = skb->len + ETH_HLEN;
> +		ret = netif_rx(skb);
> +		ipvlan_count_rx(ipvlan, len, ret == NET_RX_SUCCESS, false);

This fails to build because ipvlan is not declared in this scope.
Perhaps something got missed due to an edit?

> +		return 0;
> +	}
> +
> +out:
> +	dev_kfree_skb(skb);
> +no_mem:
> +	return 0; // actually, ret value is ignored

Maybe, but it seems to me that the return values
should follow that of netif_receive_skb_core().

> +}

...

-- 
pw-bot: changes-requested