netdev - RE: [PATCH net] ice: Fix VF Reset paths when interface in a failed over aggregate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <MW5PR11MB581140F496EF3F70CCDFC31CDDB7A@MW5PR11MB5811.namprd11.prod.outlook.com>
Date: Fri, 17 Nov 2023 22:03:18 +0000
From: "Ertman, David M" <david.m.ertman@...el.com>
To: Jay Vosburgh <jay.vosburgh@...onical.com>
CC: "Fijalkowski, Maciej" <maciej.fijalkowski@...el.com>, "Nguyen, Anthony L"
	<anthony.l.nguyen@...el.com>, "davem@...emloft.net" <davem@...emloft.net>,
	"kuba@...nel.org" <kuba@...nel.org>, "pabeni@...hat.com" <pabeni@...hat.com>,
	"edumazet@...gle.com" <edumazet@...gle.com>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>, "Wyborny, Carolyn" <carolyn.wyborny@...el.com>,
	"daniel.machon@...rochip.com" <daniel.machon@...rochip.com>, "Kitszel,
 Przemyslaw" <przemyslaw.kitszel@...el.com>, "Buvaneswaran, Sujai"
	<sujai.buvaneswaran@...el.com>
Subject: RE: [PATCH net] ice: Fix VF Reset paths when interface in a failed
 over aggregate



> -----Original Message-----
> From: Jay Vosburgh <jay.vosburgh@...onical.com>
> Sent: Friday, November 17, 2023 1:22 PM
> To: Ertman, David M <david.m.ertman@...el.com>
> Cc: Fijalkowski, Maciej <maciej.fijalkowski@...el.com>; Nguyen, Anthony L
> <anthony.l.nguyen@...el.com>; davem@...emloft.net; kuba@...nel.org;
> pabeni@...hat.com; edumazet@...gle.com; netdev@...r.kernel.org;
> Wyborny, Carolyn <carolyn.wyborny@...el.com>;
> daniel.machon@...rochip.com; Kitszel, Przemyslaw
> <przemyslaw.kitszel@...el.com>; Buvaneswaran, Sujai
> <sujai.buvaneswaran@...el.com>
> Subject: Re: [PATCH net] ice: Fix VF Reset paths when interface in a failed
> over aggregate
> 
> Ertman, David M <david.m.ertman@...el.com> wrote:
> 
> >> -----Original Message-----
> >> From: Fijalkowski, Maciej <maciej.fijalkowski@...el.com>
> >> Sent: Friday, November 17, 2023 9:44 AM
> >> To: Ertman, David M <david.m.ertman@...el.com>
> >> Cc: Nguyen, Anthony L <anthony.l.nguyen@...el.com>;
> >> davem@...emloft.net; kuba@...nel.org; pabeni@...hat.com;
> >> edumazet@...gle.com; netdev@...r.kernel.org; Wyborny, Carolyn
> >> <carolyn.wyborny@...el.com>; daniel.machon@...rochip.com; Kitszel,
> >> Przemyslaw <przemyslaw.kitszel@...el.com>; Buvaneswaran, Sujai
> >> <sujai.buvaneswaran@...el.com>
> >> Subject: Re: [PATCH net] ice: Fix VF Reset paths when interface in a failed
> >> over aggregate
> >>
> >> On Thu, Nov 16, 2023 at 10:36:37PM +0100, Ertman, David M wrote:
> >> > > -----Original Message-----
> >> > > From: Fijalkowski, Maciej <maciej.fijalkowski@...el.com>
> >> > > Sent: Thursday, November 16, 2023 6:22 AM
> >> > > To: Nguyen, Anthony L <anthony.l.nguyen@...el.com>
> >> > > Cc: davem@...emloft.net; kuba@...nel.org; pabeni@...hat.com;
> >> > > edumazet@...gle.com; netdev@...r.kernel.org; Ertman, David M
> >> > > <david.m.ertman@...el.com>; Wyborny, Carolyn
> >> > > <carolyn.wyborny@...el.com>; daniel.machon@...rochip.com;
> Kitszel,
> >> > > Przemyslaw <przemyslaw.kitszel@...el.com>; Buvaneswaran, Sujai
> >> > > <sujai.buvaneswaran@...el.com>
> >> > > Subject: Re: [PATCH net] ice: Fix VF Reset paths when interface in a
> failed
> >> > > over aggregate
> >> > >
> >> > > On Wed, Nov 15, 2023 at 01:12:41PM -0800, Tony Nguyen wrote:
> >> > > > From: Dave Ertman <david.m.ertman@...el.com>
> >> > > >
> >> > > > There is an error when an interface has the following conditions:
> >> > > > - PF is in an aggregate (bond)
> >> > > > - PF has VFs created on it
> >> > > > - bond is in a state where it is failed-over to the secondary interface
> >> > > > - A VF reset is issued on one or more of those VFs
> >> > > >
> >> > > > The issue is generated by the originating PF trying to rebuild or
> >> > > > reconfigure the VF resources.  Since the bond is failed over to the
> >> > > > secondary interface the queue contexts are in a modified state.
> >> > > >
> >> > > > To fix this issue, have the originating interface reclaim its resources
> >> > > > prior to the tear-down and rebuild or reconfigure.  Then after the
> >> process
> >> > > > is complete, move the resources back to the currently active
> interface.
> >> > > >
> >> > > > There are multiple paths that can be used depending on what
> triggered
> >> the
> >> > > > event, so create a helper function to move the queues and use
> paired
> >> calls
> >> > > > to the helper (back to origin, process, then move back to active
> >> interface)
> >> > > > under the same lag_mutex lock.
> >> > > >
> >> > > > Fixes: 1e0f9881ef79 ("ice: Flesh out implementation of support for
> >> SRIOV
> >> > > on bonded interface")
> >> > > > Signed-off-by: Dave Ertman <david.m.ertman@...el.com>
> >> > > > Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@...el.com>
> >> > > > Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@...el.com>
> >> > > > Signed-off-by: Tony Nguyen <anthony.l.nguyen@...el.com>
> >> > > > ---
> >> > > > This is the net patch mentioned yesterday:
> >> > > > https://lore.kernel.org/netdev/71058999-50d9-cc17-d940-
> >> > > 3f043734e0ee@...el.com/
> >> > > >
> >> > > >  drivers/net/ethernet/intel/ice/ice_lag.c      | 42
> >> +++++++++++++++++++
> >> > > >  drivers/net/ethernet/intel/ice/ice_lag.h      |  1 +
> >> > > >  drivers/net/ethernet/intel/ice/ice_vf_lib.c   | 20 +++++++++
> >> > > >  drivers/net/ethernet/intel/ice/ice_virtchnl.c | 25 +++++++++++
> >> > > >  4 files changed, 88 insertions(+)
> >> > > >
> >> > > > diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c
> >> > > b/drivers/net/ethernet/intel/ice/ice_lag.c
> >> > > > index cd065ec48c87..9eed93baa59b 100644
> >> > > > --- a/drivers/net/ethernet/intel/ice/ice_lag.c
> >> > > > +++ b/drivers/net/ethernet/intel/ice/ice_lag.c
> >> > > > @@ -679,6 +679,48 @@ static void ice_lag_move_vf_nodes(struct
> >> ice_lag
> >> > > *lag, u8 oldport, u8 newport)
> >> > > >  			ice_lag_move_single_vf_nodes(lag, oldport,
> >> > > newport, i);
> >> > > >  }
> >> > > >
> >> > > > +/**
> >> > > > + * ice_lag_move_vf_nodes_cfg - move VF nodes outside LAG
> netdev
> >> > > event context
> >> > > > + * @lag: local lag struct
> >> > > > + * @src_prt: lport value for source port
> >> > > > + * @dst_prt: lport value for destination port
> >> > > > + *
> >> > > > + * This function is used to move nodes during an out-of-netdev-
> event
> >> > > situation,
> >> > > > + * primarily when the driver needs to reconfigure or recreate
> >> resources.
> >> > > > + *
> >> > > > + * Must be called while holding the lag_mutex to avoid lag events
> from
> >> > > > + * processing while out-of-sync moves are happening.  Also, paired
> >> > > moves,
> >> > > > + * such as used in a reset flow, should both be called under the
> same
> >> > > mutex
> >> > > > + * lock to avoid changes between start of reset and end of reset.
> >> > > > + */
> >> > > > +void ice_lag_move_vf_nodes_cfg(struct ice_lag *lag, u8 src_prt, u8
> >> > > dst_prt)
> >> > > > +{
> >> > > > +	struct ice_lag_netdev_list ndlist, *nl;
> >> > > > +	struct list_head *tmp, *n;
> >> > > > +	struct net_device *tmp_nd;
> >> > > > +
> >> > > > +	INIT_LIST_HEAD(&ndlist.node);
> >> > > > +	rcu_read_lock();
> >> > > > +	for_each_netdev_in_bond_rcu(lag->upper_netdev,
> tmp_nd) {
> >> > >
> >> > > Why do you need rcu section for that?
> >> > >
> >> > > under mutex? lacking context here.
> >> > >
> >> >
> >> > Mutex lock is to stop lag event thread from processing any other event
> >> until
> >> > after the asynchronous reset is processed.  RCU lock is to stop upper
> kernel
> >> > bonding driver from changing elements while reset is building a list.
> >>
> >> Can you point me to relevant piece of code for upper kernel bonding
> >> driver? Is there synchronize_rcu() on updates?
> >
> >Here is the benning of the bonding struct from /include/net/bonding.h
> >
> >/*
> > * Here are the locking policies for the two bonding locks:
> > * Get rcu_read_lock when reading or RTNL when writing slave list.
> > */
> >struct bonding {
> >	struct   net_device *dev; /* first - useful for panic debug */
> >	struct   slave __rcu *curr_active_slave;
> >	struct   slave __rcu *current_arp_slave;
> >	struct   slave __rcu *primary_slave;
> >	struct   bond_up_slave __rcu *usable_slaves;
> >	struct   bond_up_slave __rcu *all_slaves;
> >
> >> >
> >> > > > +		nl = kzalloc(sizeof(*nl), GFP_ATOMIC);
> >> > >
> >> > > do these have to be new allocations or could you just use list_move?
> >> > >
> >> >
> >> > Building a list from scratch - nothing to move until it is created.
> >>
> >> Sorry got confused.
> >>
> >> Couldn't you keep the up-to-date list of netdevs instead? And avoid all
> >> the building list and then deleting it after processing event?
> >>
> >
> >The bonding driver is generating netdev events for things changing in the
> aggregate. The ice
> >driver's event handler which takes a snapshot of the members of the bond
> and creates a work
> >item which gets put on the event processing thread and then returns.  The
> events are processed
> >one at a time in sequence asynchronously to the event handler on the
> processing thread.  The
> >contents of the member list for the work item is only valid for that work
> item and cannot be used
> >for a reset event happening asynchronously to the processing queue.
> 
> 	Why is ice keeping track of the bonding state?  I see there's
> also concurrently a patch [0] to block PF reinitialization if attached
> to a bond, as well as some upstream discussion regarding ice issues with
> bonding and SR-IOV [1], are these things related in some way?
> 
> 	Whatever that reason is, should the logic apply to other drivers
> that do similar things?  For example, team and openvswitch both have
> functionality that is largely similar to what bonding does, so I'm
> curious as to what specifically is going on that requires special
> handling on the part of the device driver.
> 
> 	-J
> 

The original patches are implementing support for SRIOV VFs on an aggregate
interface (bonded PFs).  This results in a single VF being backed up by two PFs.  The VF
and users of the VF are not even aware that an aggregate is involved. To accomplish this
in software, the ice driver has to manipulate resources (queues, scheduling nodes, etc)
so that VFs can communicate through whichever interface is active.  It has to move
resources back and forth based on events generated by the bonding driver. Also as interfaces
are added to an aggregate, setup has to be done to prepare to support SRIOV VFs.

As far as other entities that act like the bonding driver - they will only have support for SRIOV
VFs if they also generate events like the bonding driver.

> [0]
> https://lore.kernel.org/netdev/20231117164427.912563-1-
> sachin.bahadur@...el.com/
> 
> [1]
> https://lore.kernel.org/lkml/CC024511-980A-4508-8ABF-
> 659A04367C2B@...il.com/T/#mde6cc7110fedf54771aa3c13044ae6c0936525f
> b
> 
> >Thanks,
> >DaveE
> >
> >> >
> >> > > > +		if (!nl)
> >> > > > +			break;
> >> > > > +
> >> > > > +		nl->netdev = tmp_nd;
> >> > > > +		list_add(&nl->node, &ndlist.node);
> >> > >
> >> > > list_add_rcu ?
> >> > >
> >> >
> >> > I thought list_add_rcu was for internal list manipulation when prev and
> >> next
> >> > Are both known and defined?
> >>
> >> First time I hear this TBH but disregard the suggestion.
> >>
> >> >
> >> > > > +	}
> >> > > > +	rcu_read_unlock();
> >> > >
> >> > > you have the very same chunk of code in
> >> ice_lag_move_new_vf_nodes().
> >> > > pull
> >> > > this out to common function?
> >> > >
> >> > > ...and in ice_lag_rebuild().
> >> > >
> >> >
> >> > Nice catch - for v2, pulled out code into two helper function:
> >> > ice_lag_build_netdev_list()
> >> > Iie_lag_destroy_netdev_list()
> >> >
> >> >
> >> > > > +	lag->netdev_head = &ndlist.node;
> >> > > > +	ice_lag_move_vf_nodes(lag, src_prt, dst_prt);
> >> > > > +
> >> > > > +	list_for_each_safe(tmp, n, &ndlist.node) {
> >> > >
> >> > > use list_for_each_entry_safe()
> >> > >
> >> >
> >> > Changed in v2.
> >> >
> >> > > > +		nl = list_entry(tmp, struct ice_lag_netdev_list, node);
> >> > > > +		list_del(&nl->node);
> >> > > > +		kfree(nl);
> >> > > > +	}
> >> > > > +	lag->netdev_head = NULL;
> >> >
> >> > Thanks for the review!
> >> > DaveE
> >
> 
> ---
> 	-Jay Vosburgh, jay.vosburgh@...onical.com