linux-kernel - Re: [PATCH net-next 1/8] ipvlan: Implement learnable L2-bridge

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aPoR_HWEgmrs97Qd@horms.kernel.org>
Date: Thu, 23 Oct 2025 12:31:08 +0100
From: Simon Horman <horms@...nel.org>
To: Dmitry Skorodumov <skorodumov.dmitry@...wei.com>
Cc: netdev@...r.kernel.org, linux-doc@...r.kernel.org,
	linux-kernel@...r.kernel.org, andrey.bokhanko@...wei.com,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Jonathan Corbet <corbet@....net>,
	Andrew Lunn <andrew+netdev@...n.ch>
Subject: Re: [PATCH net-next 1/8] ipvlan: Implement learnable L2-bridge

On Thu, Oct 23, 2025 at 01:21:20PM +0300, Dmitry Skorodumov wrote:
> On 22.10.2025 17:23, Simon Horman wrote:
> > On Tue, Oct 21, 2025 at 05:44:03PM +0300, Dmitry Skorodumov wrote:
> >> Now it is possible to create link in L2E mode: learnable
> >> bridge. The IPs will be learned from TX-packets of child interfaces.
> > Is there a standard for this approach - where does the L2E name come from?
> 
> Actually, I meant "E" here as "Extended". But more or less standard naming - is "MAC NAT" - "Mac network address translation". I discussed a bit naming with LLM, and it suggested name "macsnat".. looks like  it is a better name. Hope it is ok, but I don't mind to rename if anyone has better idea

I was more curious than anything else. But perhaps it would
be worth providing some explanation of the name in the
commit message.

...

> >> +static void ipvlan_addr_learn(struct ipvl_dev *ipvlan, void *lyr3h,
> >> +			      int addr_type)
> >> +{
> >> +	void *addr = NULL;
> >> +	bool is_v6;
> >> +
> >> +	switch (addr_type) {
> >> +#if IS_ENABLED(CONFIG_IPV6)
> >> +	/* No need to handle IPVL_ICMPV6, since it never has valid src-address */
> >> +	case IPVL_IPV6: {
> >> +		struct ipv6hdr *ip6h;
> >> +
> >> +		ip6h = (struct ipv6hdr *)lyr3h;
> >> +		if (!is_ipv6_usable(&ip6h->saddr))
> > It is preferred to avoid #if / #ifdef in order to improve compile coverage
> > (and, I would argue, readability).
> ..
> > In this case I think that can be achieved by changing the line above to:
> >
> > 		if (!IS_ENABLED(CONFIG_IPV6) || !is_ipv6_usable(&ip6h->saddr))
> >
> > I think it would be interesting to see if a similar approach can be used
> > to remove other #if CONFIG_IPV6 conditions in this file, and if successful
> > provide that as a clean-up as the opening patch in this series.
> >
> > However, without that, I can see how one could argue for the approach
> > you have taken here on the basis of consistency.
> >
> 
> Hmmmm.... this raises a complicated for me questions of testing this refactoring: 
> 
> - whether IPv6 specific functions (like csum_ipv6_magic(), register_inet6addr_notifier()) are available if kernel is compiled without CONFIG_IPV6
> 
> - ideally the code should be retested with kernel without CONFIG_IPV6
> 
> This looks like a separate work that requires more or less additional efforts...

Understood, I agree this can be left as future work.

> 
> > static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
> >>  {
> >> -	const struct ipvl_dev *ipvlan = netdev_priv(dev);
> >> -	struct ethhdr *eth = skb_eth_hdr(skb);
> >> -	struct ipvl_addr *addr;
> >>  	void *lyr3h;
> >> +	struct ipvl_addr *addr;
> >>  	int addr_type;
> >> +	bool same_mac_addr;
> >> +	struct ipvl_dev *ipvlan = netdev_priv(dev);
> >> +	struct ethhdr *eth = skb_eth_hdr(skb);
> > I realise that the convention is not followed in the existing code,
> > but please prefer to arrange local variables in reverse xmas tree order -
> > longest line to shortest.
> I fixed all my changes to follow this style, except one - where it seems a bit unnatural to to declare dependent variable before "parent" variable. Hope it is ok.

I would lean towards reverse xmas here too.
But I understand if you feel otherwise.
And given the current state of this file, I think that is ok.

> >> +	    ether_addr_equal(eth->h_source, dev->dev_addr)) {
> >> +		/* ignore tx-packets from host */
> >> +		goto out_drop;
> >> +	}
> >> +
> >> +	same_mac_addr = ether_addr_equal(eth->h_dest, eth->h_source);
> >> +
> >> +	lyr3h = ipvlan_get_L3_hdr(ipvlan->port, skb, &addr_type);
> >>  
> >> -	if (!ipvlan_is_vepa(ipvlan->port) &&
> >> -	    ether_addr_equal(eth->h_dest, eth->h_source)) {
> >> -		lyr3h = ipvlan_get_L3_hdr(ipvlan->port, skb, &addr_type);
> >> +	if (ipvlan_is_learnable(ipvlan->port)) {
> >> +		if (lyr3h)
> >> +			ipvlan_addr_learn(ipvlan, lyr3h, addr_type);
> >> +		/* Mark SKB in advance */
> >> +		skb = skb_share_check(skb, GFP_ATOMIC);
> >> +		if (!skb)
> >> +			return NET_XMIT_DROP;
> > I think that when you drop packets a counter should be incremented.
> > Likewise elsewhere in this function.
> The counter appears to be handled in parent function - in ipvlan_start_xmit()

Thanks, I see that now.

> >> +	addr = ipvlan_addr_lookup(port, lyr3h, addr_type, true);
> >> +	if (addr) {
> >> +		int ret, len;
> >> +
> >> +		ipvlan_skb_crossing_ns(skb, addr->master->dev);
> >> +		skb->protocol = eth_type_trans(skb, skb->dev);
> >> +		skb->pkt_type = PACKET_HOST;
> >> +		ipvlan_mark_skb(skb, port->dev);
> >> +		len = skb->len + ETH_HLEN;
> >> +		ret = netif_rx(skb);
> >> +		ipvlan_count_rx(ipvlan, len, ret == NET_RX_SUCCESS, false);
> >>
> >> This fails to build because ipvlan is not declared in this scope.
> >> Perhaps something got missed due to an edit?
> Oops, really. Compilation was fixed in later patches.

Stuff happens :)

...