netdev - Re: [PATCH net] ice: Fix VF Reset paths when interface in a failed over aggregate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <10021.1700256111@famine>
Date: Fri, 17 Nov 2023 13:21:51 -0800
From: Jay Vosburgh <jay.vosburgh@...onical.com>
To: "Ertman, David M" <david.m.ertman@...el.com>
cc: "Fijalkowski, Maciej" <maciej.fijalkowski@...el.com>,
    "Nguyen,
    Anthony L" <anthony.l.nguyen@...el.com>,
    "davem@...emloft.net" <davem@...emloft.net>,
    "kuba@...nel.org" <kuba@...nel.org>,
    "pabeni@...hat.com" <pabeni@...hat.com>,
    "edumazet@...gle.com" <edumazet@...gle.com>,
    "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
    "Wyborny,
    Carolyn" <carolyn.wyborny@...el.com>,
    "daniel.machon@...rochip.com" <daniel.machon@...rochip.com>,
    "Kitszel, Przemyslaw" <przemyslaw.kitszel@...el.com>,
    "Buvaneswaran,
    Sujai" <sujai.buvaneswaran@...el.com>
Subject: Re: [PATCH net] ice: Fix VF Reset paths when interface in a failed over aggregate

Ertman, David M <david.m.ertman@...el.com> wrote:

>> -----Original Message-----
>> From: Fijalkowski, Maciej <maciej.fijalkowski@...el.com>
>> Sent: Friday, November 17, 2023 9:44 AM
>> To: Ertman, David M <david.m.ertman@...el.com>
>> Cc: Nguyen, Anthony L <anthony.l.nguyen@...el.com>;
>> davem@...emloft.net; kuba@...nel.org; pabeni@...hat.com;
>> edumazet@...gle.com; netdev@...r.kernel.org; Wyborny, Carolyn
>> <carolyn.wyborny@...el.com>; daniel.machon@...rochip.com; Kitszel,
>> Przemyslaw <przemyslaw.kitszel@...el.com>; Buvaneswaran, Sujai
>> <sujai.buvaneswaran@...el.com>
>> Subject: Re: [PATCH net] ice: Fix VF Reset paths when interface in a failed
>> over aggregate
>> 
>> On Thu, Nov 16, 2023 at 10:36:37PM +0100, Ertman, David M wrote:
>> > > -----Original Message-----
>> > > From: Fijalkowski, Maciej <maciej.fijalkowski@...el.com>
>> > > Sent: Thursday, November 16, 2023 6:22 AM
>> > > To: Nguyen, Anthony L <anthony.l.nguyen@...el.com>
>> > > Cc: davem@...emloft.net; kuba@...nel.org; pabeni@...hat.com;
>> > > edumazet@...gle.com; netdev@...r.kernel.org; Ertman, David M
>> > > <david.m.ertman@...el.com>; Wyborny, Carolyn
>> > > <carolyn.wyborny@...el.com>; daniel.machon@...rochip.com; Kitszel,
>> > > Przemyslaw <przemyslaw.kitszel@...el.com>; Buvaneswaran, Sujai
>> > > <sujai.buvaneswaran@...el.com>
>> > > Subject: Re: [PATCH net] ice: Fix VF Reset paths when interface in a failed
>> > > over aggregate
>> > >
>> > > On Wed, Nov 15, 2023 at 01:12:41PM -0800, Tony Nguyen wrote:
>> > > > From: Dave Ertman <david.m.ertman@...el.com>
>> > > >
>> > > > There is an error when an interface has the following conditions:
>> > > > - PF is in an aggregate (bond)
>> > > > - PF has VFs created on it
>> > > > - bond is in a state where it is failed-over to the secondary interface
>> > > > - A VF reset is issued on one or more of those VFs
>> > > >
>> > > > The issue is generated by the originating PF trying to rebuild or
>> > > > reconfigure the VF resources.  Since the bond is failed over to the
>> > > > secondary interface the queue contexts are in a modified state.
>> > > >
>> > > > To fix this issue, have the originating interface reclaim its resources
>> > > > prior to the tear-down and rebuild or reconfigure.  Then after the
>> process
>> > > > is complete, move the resources back to the currently active interface.
>> > > >
>> > > > There are multiple paths that can be used depending on what triggered
>> the
>> > > > event, so create a helper function to move the queues and use paired
>> calls
>> > > > to the helper (back to origin, process, then move back to active
>> interface)
>> > > > under the same lag_mutex lock.
>> > > >
>> > > > Fixes: 1e0f9881ef79 ("ice: Flesh out implementation of support for
>> SRIOV
>> > > on bonded interface")
>> > > > Signed-off-by: Dave Ertman <david.m.ertman@...el.com>
>> > > > Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@...el.com>
>> > > > Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@...el.com>
>> > > > Signed-off-by: Tony Nguyen <anthony.l.nguyen@...el.com>
>> > > > ---
>> > > > This is the net patch mentioned yesterday:
>> > > > https://lore.kernel.org/netdev/71058999-50d9-cc17-d940-
>> > > 3f043734e0ee@...el.com/
>> > > >
>> > > >  drivers/net/ethernet/intel/ice/ice_lag.c      | 42
>> +++++++++++++++++++
>> > > >  drivers/net/ethernet/intel/ice/ice_lag.h      |  1 +
>> > > >  drivers/net/ethernet/intel/ice/ice_vf_lib.c   | 20 +++++++++
>> > > >  drivers/net/ethernet/intel/ice/ice_virtchnl.c | 25 +++++++++++
>> > > >  4 files changed, 88 insertions(+)
>> > > >
>> > > > diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c
>> > > b/drivers/net/ethernet/intel/ice/ice_lag.c
>> > > > index cd065ec48c87..9eed93baa59b 100644
>> > > > --- a/drivers/net/ethernet/intel/ice/ice_lag.c
>> > > > +++ b/drivers/net/ethernet/intel/ice/ice_lag.c
>> > > > @@ -679,6 +679,48 @@ static void ice_lag_move_vf_nodes(struct
>> ice_lag
>> > > *lag, u8 oldport, u8 newport)
>> > > >  			ice_lag_move_single_vf_nodes(lag, oldport,
>> > > newport, i);
>> > > >  }
>> > > >
>> > > > +/**
>> > > > + * ice_lag_move_vf_nodes_cfg - move VF nodes outside LAG netdev
>> > > event context
>> > > > + * @lag: local lag struct
>> > > > + * @src_prt: lport value for source port
>> > > > + * @dst_prt: lport value for destination port
>> > > > + *
>> > > > + * This function is used to move nodes during an out-of-netdev-event
>> > > situation,
>> > > > + * primarily when the driver needs to reconfigure or recreate
>> resources.
>> > > > + *
>> > > > + * Must be called while holding the lag_mutex to avoid lag events from
>> > > > + * processing while out-of-sync moves are happening.  Also, paired
>> > > moves,
>> > > > + * such as used in a reset flow, should both be called under the same
>> > > mutex
>> > > > + * lock to avoid changes between start of reset and end of reset.
>> > > > + */
>> > > > +void ice_lag_move_vf_nodes_cfg(struct ice_lag *lag, u8 src_prt, u8
>> > > dst_prt)
>> > > > +{
>> > > > +	struct ice_lag_netdev_list ndlist, *nl;
>> > > > +	struct list_head *tmp, *n;
>> > > > +	struct net_device *tmp_nd;
>> > > > +
>> > > > +	INIT_LIST_HEAD(&ndlist.node);
>> > > > +	rcu_read_lock();
>> > > > +	for_each_netdev_in_bond_rcu(lag->upper_netdev, tmp_nd) {
>> > >
>> > > Why do you need rcu section for that?
>> > >
>> > > under mutex? lacking context here.
>> > >
>> >
>> > Mutex lock is to stop lag event thread from processing any other event
>> until
>> > after the asynchronous reset is processed.  RCU lock is to stop upper kernel
>> > bonding driver from changing elements while reset is building a list.
>> 
>> Can you point me to relevant piece of code for upper kernel bonding
>> driver? Is there synchronize_rcu() on updates?
>
>Here is the benning of the bonding struct from /include/net/bonding.h
>
>/*
> * Here are the locking policies for the two bonding locks:
> * Get rcu_read_lock when reading or RTNL when writing slave list.
> */
>struct bonding {
>	struct   net_device *dev; /* first - useful for panic debug */
>	struct   slave __rcu *curr_active_slave;
>	struct   slave __rcu *current_arp_slave;
>	struct   slave __rcu *primary_slave;
>	struct   bond_up_slave __rcu *usable_slaves;
>	struct   bond_up_slave __rcu *all_slaves;
>
>> >
>> > > > +		nl = kzalloc(sizeof(*nl), GFP_ATOMIC);
>> > >
>> > > do these have to be new allocations or could you just use list_move?
>> > >
>> >
>> > Building a list from scratch - nothing to move until it is created.
>> 
>> Sorry got confused.
>> 
>> Couldn't you keep the up-to-date list of netdevs instead? And avoid all
>> the building list and then deleting it after processing event?
>> 
>
>The bonding driver is generating netdev events for things changing in the aggregate. The ice
>driver's event handler which takes a snapshot of the members of the bond and creates a work
>item which gets put on the event processing thread and then returns.  The events are processed
>one at a time in sequence asynchronously to the event handler on the processing thread.  The
>contents of the member list for the work item is only valid for that work item and cannot be used
>for a reset event happening asynchronously to the processing queue.

	Why is ice keeping track of the bonding state?  I see there's
also concurrently a patch [0] to block PF reinitialization if attached
to a bond, as well as some upstream discussion regarding ice issues with
bonding and SR-IOV [1], are these things related in some way?

	Whatever that reason is, should the logic apply to other drivers
that do similar things?  For example, team and openvswitch both have
functionality that is largely similar to what bonding does, so I'm
curious as to what specifically is going on that requires special
handling on the part of the device driver.

	-J

[0] 
https://lore.kernel.org/netdev/20231117164427.912563-1-sachin.bahadur@intel.com/

[1]
https://lore.kernel.org/lkml/CC024511-980A-4508-8ABF-659A04367C2B@gmail.com/T/#mde6cc7110fedf54771aa3c13044ae6c0936525fb

>Thanks, 
>DaveE
>
>> >
>> > > > +		if (!nl)
>> > > > +			break;
>> > > > +
>> > > > +		nl->netdev = tmp_nd;
>> > > > +		list_add(&nl->node, &ndlist.node);
>> > >
>> > > list_add_rcu ?
>> > >
>> >
>> > I thought list_add_rcu was for internal list manipulation when prev and
>> next
>> > Are both known and defined?
>> 
>> First time I hear this TBH but disregard the suggestion.
>> 
>> >
>> > > > +	}
>> > > > +	rcu_read_unlock();
>> > >
>> > > you have the very same chunk of code in
>> ice_lag_move_new_vf_nodes().
>> > > pull
>> > > this out to common function?
>> > >
>> > > ...and in ice_lag_rebuild().
>> > >
>> >
>> > Nice catch - for v2, pulled out code into two helper function:
>> > ice_lag_build_netdev_list()
>> > Iie_lag_destroy_netdev_list()
>> >
>> >
>> > > > +	lag->netdev_head = &ndlist.node;
>> > > > +	ice_lag_move_vf_nodes(lag, src_prt, dst_prt);
>> > > > +
>> > > > +	list_for_each_safe(tmp, n, &ndlist.node) {
>> > >
>> > > use list_for_each_entry_safe()
>> > >
>> >
>> > Changed in v2.
>> >
>> > > > +		nl = list_entry(tmp, struct ice_lag_netdev_list, node);
>> > > > +		list_del(&nl->node);
>> > > > +		kfree(nl);
>> > > > +	}
>> > > > +	lag->netdev_head = NULL;
>> >
>> > Thanks for the review!
>> > DaveE
>

---
	-Jay Vosburgh, jay.vosburgh@...onical.com