lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 23 Sep 2020 03:00:31 -0500 From: Lijun Pan <ljp@...ux.vnet.ibm.com> To: Sukadev Bhattiprolu <sukadev@...ux.ibm.com> Cc: netdev@...r.kernel.org, drt@...ux.ibm.com, ljp@...ux.ibm.com Subject: Re: [PATCH 1/1] powerpc/vnic: Extend "failover pending" window > On Sep 22, 2020, at 11:53 PM, Sukadev Bhattiprolu <sukadev@...ux.ibm.com> wrote: > > > From 547fa5627b63102f3ef80edffff3a032d62c88c5 Mon Sep 17 00:00:00 2001 > From: Sukadev Bhattiprolu <sukadev@...ux.vnet.ibm.com> > Date: Thu, 10 Sep 2020 11:18:41 -0700 > Subject: [PATCH 1/1] powerpc/vnic: Extend "failover pending" window > > Commit 5a18e1e0c193b introduced the 'failover_pending' state to track > the "failover pending window" - where we wait for the partner to become > ready (after a transport event) before actually attempting to failover. > i.e window is between following two events: > > a. we get a transport event due to a FAILOVER > > b. later, we get CRQ_INITIALIZED indicating the partner is > ready at which point we schedule a FAILOVER reset. > > and ->failover_pending is true during this window. > > If during this window, we attempt to open (or close) a device, we pretend > that the operation succeded and let the FAILOVER reset path complete the > operation. > > This is fine, except if the transport event ("a" above) occurs during the > open and after open has already checked whether a failover is pending. If > that happens, we fail the open, which can cause the boot scripts to leave > the interface down requiring administrator to manually bring up the device. > > This fix "extends" the failover pending window till we are _actually_ > ready to perform the failover reset (i.e until after we get the RTNL > lock). Since open() holds the RTNL lock, we can be sure that we either > finish the open or if the open() fails due to the failover pending window, > we can again pretend that open is done and let the failover complete it. > > Signed-off-by: Sukadev Bhattiprolu <sukadev@...ux.ibm.com> > --- > drivers/net/ethernet/ibm/ibmvnic.c | 33 +++++++++++++++++++++++++----- > 1 file changed, 28 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c > index 1b702a43a5d0..cf75a649ed8b 100644 > --- a/drivers/net/ethernet/ibm/ibmvnic.c > +++ b/drivers/net/ethernet/ibm/ibmvnic.c > @@ -1197,18 +1197,29 @@ static int ibmvnic_open(struct net_device *netdev) > if (adapter->state != VNIC_CLOSED) { > rc = ibmvnic_login(netdev); > if (rc) > - return rc; > + goto out; > > rc = init_resources(adapter); > if (rc) { > - netdev_err(netdev, "failed to initialize resources\n"); > + netdev_err(netdev, > + "failed to initialize resources, failover %d\n", > + adapter->failover_pending); Would “..., failover_pending=%d\n” be more explicit than "failover %d”? > release_resources(adapter); > - return rc; > + goto out; > } > } > > rc = __ibmvnic_open(netdev); > > +out: > + /* > + * If open fails due to a pending failover, set device state and > + * return. Device operation will be handled by reset routine. > + */ > + if (rc && adapter->failover_pending) { > + adapter->state = VNIC_OPEN; > + rc = 0; > + } > return rc; > } > > @@ -1931,6 +1942,13 @@ static int do_reset(struct ibmvnic_adapter *adapter, > rwi->reset_reason); > > rtnl_lock(); > + /* > + * Now that we have the rtnl lock, clear any pending failover. > + * This will ensure ibmvnic_open() has either completed or will > + * block until failover is complete. > + */ > + if (rwi->reset_reason == VNIC_RESET_FAILOVER) > + adapter->failover_pending = false; The window extends till here. And sometimes VNIC_RESET_FAILOVER case will call do_hard_reset instead of do_reset, depending on adapter->force_reset_recovery is true or false. > > netif_carrier_off(netdev); > adapter->reset_reason = rwi->reset_reason; > @@ -2275,9 +2293,15 @@ static int ibmvnic_reset(struct ibmvnic_adapter *adapter, > unsigned long flags; > int ret; > > + /* > + * If failover is pending don't schedule any other reset. > + * Instead let the failover complete. If there is already a > + * a failover reset scheduled, we will detect and drop the > + * duplicate reset when walking the ->rwi_list below. > + */ > if (adapter->state == VNIC_REMOVING || > adapter->state == VNIC_REMOVED || > - adapter->failover_pending) { > + (adapter->failover_pending && reason != VNIC_RESET_FAILOVER)) { I don’t quite get “reason !=VNIC_RESET_FAILOVER”. Isn’t failover_pending to describe VNIC_RESET_FAILOVER only? Please list an example that failover_pending is true and reason is not VNIC_RESET_FAILOVER. Lijun > ret = EBUSY; > netdev_dbg(netdev, "Adapter removing or pending failover, skipping reset\n"); > goto err; > @@ -4653,7 +4677,6 @@ static void ibmvnic_handle_crq(union ibmvnic_crq *crq, > case IBMVNIC_CRQ_INIT: > dev_info(dev, "Partner initialized\n"); > adapter->from_passive_init = true; > - adapter->failover_pending = false; > if (!completion_done(&adapter->init_done)) { > complete(&adapter->init_done); > adapter->init_done_rc = -EIO; > -- > 2.26.2 >
Powered by blists - more mailing lists