[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <07b3b417-c951-b9ce-743d-0fbe50e39c39@linux.alibaba.com>
Date: Fri, 28 Jun 2019 16:59:35 +0800
From: Zhiyuan Hou <zhiyuan2048@...ux.alibaba.com>
To: Paolo Abeni <pabeni@...hat.com>, davem@...emloft.net,
idosch@...lanox.com, daniel@...earbox.net, petrm@...lanox.com,
jiri@...lanox.com, tglx@...utronix.de, linmiaohe@...wei.com
Cc: zhabin@...ux.alibaba.com, caspar@...ux.alibaba.com,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
wei.yang1@...ux.alibaba.com
Subject: Re: [PATCH net-next] net: ipvlan: forward ingress packet to slave's
l2 in l3s mode
在 2019/6/26 下午4:16, Paolo Abeni 写道:
> Hi,
>
> On Tue, 2019-06-25 at 14:42 +0800, Zhiyuan Hou wrote:
>> In ipvlan l3s mode, ingress packet is switched to slave interface and
>> delivers to l4 stack. This may cause two problems:
>>
>> 1. When slave is in an ns different from master, the behavior of stack
>> in slave ns may cause confusion for users. For example, iptables, tc,
>> and other l2/l3 functions are not available for ingress packet.
>>
>> 2. l3s mode is not used for tap device, and cannot support ipvtap. But
>> in VM or container based VM cases, tap device is a very common device.
>>
>> In l3s mode's input nf_hook, this patch calles the skb_forward_dev() to
>> forward ingress packet to slave and uses nf_conntrack_confirm() to make
>> conntrack work with new mode.
>>
>> Signed-off-by: Zha Bin <zhabin@...ux.alibaba.com>
>> Signed-off-by: Zhiyuan Hou <zhiyuan2048@...ux.alibaba.com>
>> ---
>> drivers/net/ipvlan/ipvlan.h | 9 ++++++++-
>> drivers/net/ipvlan/ipvlan_l3s.c | 16 ++++++++++++++--
>> 2 files changed, 22 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
>> index 3837c897832e..48c814e24c3f 100644
>> --- a/drivers/net/ipvlan/ipvlan.h
>> +++ b/drivers/net/ipvlan/ipvlan.h
>> @@ -172,6 +172,14 @@ void ipvlan_link_delete(struct net_device *dev, struct list_head *head);
>> void ipvlan_link_setup(struct net_device *dev);
>> int ipvlan_link_register(struct rtnl_link_ops *ops);
>> #ifdef CONFIG_IPVLAN_L3S
>> +
>> +#include <net/netfilter/nf_conntrack_core.h>
>> +
>> +static inline int ipvlan_confirm_conntrack(struct sk_buff *skb)
>> +{
>> + return nf_conntrack_confirm(skb);
>> +}
>> +
>> int ipvlan_l3s_register(struct ipvl_port *port);
>> void ipvlan_l3s_unregister(struct ipvl_port *port);
>> void ipvlan_migrate_l3s_hook(struct net *oldnet, struct net *newnet);
>> @@ -206,5 +214,4 @@ static inline bool netif_is_ipvlan_port(const struct net_device *dev)
>> {
>> return rcu_access_pointer(dev->rx_handler) == ipvlan_handle_frame;
>> }
>> -
>> #endif /* __IPVLAN_H */
>> diff --git a/drivers/net/ipvlan/ipvlan_l3s.c b/drivers/net/ipvlan/ipvlan_l3s.c
>> index 943d26cbf39f..ed210002f593 100644
>> --- a/drivers/net/ipvlan/ipvlan_l3s.c
>> +++ b/drivers/net/ipvlan/ipvlan_l3s.c
>> @@ -95,14 +95,26 @@ static unsigned int ipvlan_nf_input(void *priv, struct sk_buff *skb,
>> {
>> struct ipvl_addr *addr;
>> unsigned int len;
>> + int ret = NF_ACCEPT;
>> + bool success;
>>
>> addr = ipvlan_skb_to_addr(skb, skb->dev);
>> if (!addr)
>> goto out;
>>
>> - skb->dev = addr->master->dev;
>> len = skb->len + ETH_HLEN;
>> - ipvlan_count_rx(addr->master, len, true, false);
>> +
>> + ret = ipvlan_confirm_conntrack(skb);
>> + if (ret != NF_ACCEPT) {
>> + ipvlan_count_rx(addr->master, len, false, false);
>> + goto out;
>> + }
>> +
>> + skb_push_rcsum(skb, ETH_HLEN);
>> + success = dev_forward_skb(addr->master->dev, skb) == NET_RX_SUCCESS;
> This looks weird to me: if I read the code correctly, the skb will
> traverse twice NF_INET_LOCAL_IN, once due to the l3s hooking and
> another one due to dev_forward_skb().
>
> Also, tc ingreess, etc will run after the first traversing of
> NF_INET_LOCAL_IN.
Yes, but the skb's device has been modified from the master to slave.
In most use cases of
ipvlan, the master device and slave device are in different namespace
(ns), so the second
triggered LOCAL_IN is completely isolated from the first triggered
LOCAL_IN.
When the master device and slave device are in the same ns, the behavior
of this patch is
similar to that of L2 over L3 tunnel (forwarding from L3 to L2 device).
Since the device has been modified, the second triggered tc-ingress is
thus different.
>
> All in all I think that if full l2 processing is required, a different
> mode or a different virtual device should be used.
We can implement it in a new mode, but such a way is similar to the
current ipvlan l3s mode.
Also, ipvlan l3s mode has two problems described in patch's commit log.
I think that a more
appropriate solution is to modify ipvlan l3s.
> Cheers,
>
> Paolo
Powered by blists - more mailing lists