[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210803064730.pmkm7xesffzjscze@kgollan-pc>
Date: Tue, 3 Aug 2021 09:47:31 +0300
From: Lahav Schlesinger <lschlesinger@...venets.com>
To: David Ahern <dsahern@...il.com>
Cc: netdev@...r.kernel.org, davem@...emloft.net, kuba@...nel.org,
dsahern@...nel.org
Subject: Re: [PATCH] neigh: Support filtering neighbours for L3 slave
On Mon, Aug 02, 2021 at 09:51:29PM -0600, David Ahern wrote:
> On 8/2/21 2:23 AM, Lahav Schlesinger wrote:
> > On Sun, Aug 01, 2021 at 11:50:16AM -0600, David Ahern wrote:
> >> On 8/1/21 3:01 AM, Lahav Schlesinger wrote:
> >>> Currently there's support for filtering neighbours for interfaces which
> >>> are in a specific VRF (passing the VRF interface in 'NDA_MASTER'), but
> >>> there's not support for filtering interfaces which are not in an L3
> >>> domain (the "default VRF").
> >>>
> >>> This means userspace is unable to show/flush neighbours in the default VRF
> >>> (in contrast to a "real" VRF - Using "ip neigh show vrf <vrf_dev>").
> >>>
> >>> Therefore for userspace to be able to do so, it must manually iterate
> >>> over all the interfaces, check each one if it's in the default VRF, and
> >>> if so send the matching flush/show message.
> >>>
> >>> This patch adds the ability to do so easily, by passing a dummy value as
> >>> the 'NDA_MASTER' ('NDV_NOT_L3_SLAVE').
> >>> Note that 'NDV_NOT_L3_SLAVE' is a negative number, meaning it is not a valid
> >>> ifindex, so it doesn't break existing programs.
> >>>
> >>> I have a patch for iproute2 ready for adding this support in userspace.
> >>>
> >>> Signed-off-by: Lahav Schlesinger <lschlesinger@...venets.com>
> >>> Cc: David S. Miller <davem@...emloft.net>
> >>> Cc: Jakub Kicinski <kuba@...nel.org>
> >>> Cc: David Ahern <dsahern@...nel.org>
> >>> ---
> >>> include/uapi/linux/neighbour.h | 2 ++
> >>> net/core/neighbour.c | 3 +++
> >>> 2 files changed, 5 insertions(+)
> >>>
> >>> diff --git a/include/uapi/linux/neighbour.h b/include/uapi/linux/neighbour.h
> >>> index dc8b72201f6c..d4f4c2189c63 100644
> >>> --- a/include/uapi/linux/neighbour.h
> >>> +++ b/include/uapi/linux/neighbour.h
> >>> @@ -196,4 +196,6 @@ enum {
> >>> };
> >>> #define NFEA_MAX (__NFEA_MAX - 1)
> >>>
> >>> +#define NDV_NOT_L3_SLAVE (-10)
> >>> +
> >>> #endif
> >>> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> >>> index 53e85c70c6e5..b280103b6806 100644
> >>> --- a/net/core/neighbour.c
> >>> +++ b/net/core/neighbour.c
> >>> @@ -2529,6 +2529,9 @@ static bool neigh_master_filtered(struct net_device *dev, int master_idx)
> >>> {
> >>> struct net_device *master;
> >>>
> >>> + if (master_idx == NDV_NOT_L3_SLAVE)
> >>> + return netif_is_l3_slave(dev);
> >>> +
> >>> if (!master_idx)
> >>> return false;
> >>>
> >>>
> >>
> >> you can not special case VRFs like this, and such a feature should apply
> >> to links and addresses as well.
> >
> > Understandable, I'll change it.
> > In this case though, how would you advice to efficiently filter
> > neighbours for interfaces in the default VRF in userspace (without
> > quering the master of every interface that is being dumped)?
> > I reckoned that because there's support in iproute2 for filtering based
> > on a specific VRF, filtering for the default VRF is a natural extension
>
> iproute2 has support for a link database (ll_cache). You would basically
> have to expand the cache to track any master device a link is associated
> with and then fill the cache with a link dump first. It's expensive at
> scale; the "no stats" filter helps a bit.
>
> This is the reason for kernel side filtering on primary attributes
> (coarse grain filtering at reasonable cost).
>
Nice, didn't know about the ll_cache.
Does filtering based on being in the default VRF is something you think
can be merged into iproute2 (say with "novrf" keyword, following the "nomaster"
convention below - e.g. "ip link show novrf")?
If so I'll add it as a patch to iproute2.
> >
> >>
> >> One idea is to pass "*_MASTER" as -1 (use "none" keyword for iproute2)
> >> and then update kernel side to only return entries if the corresponding
> >> device is not enslaved to another device. Unfortunately since I did not
> >> check that _MASTER was non-zero in the current code, we can not use 0 as
> >> a valid flag for "not enslaved". Be sure to document why -1 is used.
> >
> > Do you mean the command will look like "ip link show master none"?
> > If so, wouldn't this cause an ambiguity if an interface names "none" is present?
> >
>
> You could always detect "none" as a valid device name and error out or
> use "nomaster" as a way to say "show all devices not enslaved to a
> bridge or VRF.
I think "nomaster" sounds reasonable, especially since it's already has
a meaning with "ip link set".
Powered by blists - more mailing lists