[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAOhMmr5Z_nuzQ5EkX=tkv4v_OOyjP99ykQh3sCn9nsYgVLCxrA@mail.gmail.com>
Date: Thu, 11 Feb 2021 00:35:25 -0600
From: Lijun Pan <lijunp213@...il.com>
To: Sukadev Bhattiprolu <sukadev@...ux.ibm.com>
Cc: netdev@...r.kernel.org, Dany Madden <drt@...ux.ibm.com>,
Lijun Pan <ljp@...ux.ibm.com>,
Rick Lindsley <ricklind@...ux.ibm.com>, abdhalee@...ibm.com
Subject: Re: [PATCH 1/1] ibmvnic: fix a race between open and reset
On Wed, Feb 10, 2021 at 7:44 PM Sukadev Bhattiprolu
<sukadev@...ux.ibm.com> wrote:
>
> __ibmvnic_reset() currently reads the adapter->state before getting the
> rtnl and saves that state as the "target state" for the reset. If this
> read occurs when adapter is in PROBED state, the target state would be
> PROBED.
>
> Just after the target state is saved, and before the actual reset process
> is started (i.e before rtnl is acquired) if we get an ibmvnic_open() call
> we would move the adapter to OPEN state.
>
> But when the reset is processed (after ibmvnic_open()) drops the rtnl),
> it will leave the adapter in PROBED state even though we already moved
> it to OPEN.
>
> To fix this, use the RTNL to improve the serialization when reading/updating
> the adapter state. i.e determine the target state of a reset only after
> getting the RTNL. And if a reset is in progress during an open, simply
> set the target state of the adapter and let the reset code finish the
> open (like we currently do if failover is pending).
>
> One twist to this serialization is if the adapter state changes when we
> drop the RTNL to update the link state. Account for this by checking if
> there was an intervening open and update the target state for the reset
> accordingly (see new comments in the code). Note that only the reset
> functions and ibmvnic_open() can set the adapter to OPEN state and this
> must happen under rtnl.
>
> Fixes: 7d7195a026ba ("ibmvnic: Do not process device remove during device reset")
> Signed-off-by: Sukadev Bhattiprolu <sukadev@...ux.ibm.com>
> Reviewed-by: Dany Madden <drt@...ux.ibm.com>
> ---
> Changelog[v3]
> [Jakub Kicinski] Rebase to current net and fix comment style.
>
> Changelog[v2]
> [Jakub Kicinski] Use ASSERT_RTNL() instead of WARN_ON_ONCE()
> and rtnl_is_locked());
> ---
> drivers/net/ethernet/ibm/ibmvnic.c | 71 ++++++++++++++++++++++++++----
> 1 file changed, 63 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
> index a536fdbf05e1..96c2b0985484 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.c
> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> @@ -1197,12 +1197,25 @@ static int ibmvnic_open(struct net_device *netdev)
> struct ibmvnic_adapter *adapter = netdev_priv(netdev);
> int rc;
>
> - /* If device failover is pending, just set device state and return.
> - * Device operation will be handled by reset routine.
> + ASSERT_RTNL();
> +
> + /* If device failover is pending or we are about to reset, just set
> + * device state and return. Device operation will be handled by reset
> + * routine.
> + *
> + * It should be safe to overwrite the adapter->state here. Since
> + * we hold the rtnl, either the reset has not actually started or
> + * the rtnl got dropped during the set_link_state() in do_reset().
> + * In the former case, no one else is changing the state (again we
> + * have the rtnl) and in the latter case, do_reset() will detect and
> + * honor our setting below.
> */
> - if (adapter->failover_pending) {
> + if (adapter->failover_pending || (test_bit(0, &adapter->resetting))) {
> + netdev_dbg(netdev, "[S:%d FOP:%d] Resetting, deferring open\n",
> + adapter->state, adapter->failover_pending);
> adapter->state = VNIC_OPEN;
> - return 0;
> + rc = 0;
> + goto out;
> }
>
> if (adapter->state != VNIC_CLOSED) {
> @@ -1221,11 +1234,12 @@ static int ibmvnic_open(struct net_device *netdev)
> rc = __ibmvnic_open(netdev);
>
> out:
> - /*
> - * If open fails due to a pending failover, set device state and
> - * return. Device operation will be handled by reset routine.
> + /* If open failed and there is a pending failover or in-progress reset,
> + * set device state and return. Device operation will be handled by
> + * reset routine. See also comments above regarding rtnl.
> */
> - if (rc && adapter->failover_pending) {
> + if (rc &&
> + (adapter->failover_pending || (test_bit(0, &adapter->resetting)))) {
> adapter->state = VNIC_OPEN;
> rc = 0;
> }
> @@ -1939,6 +1953,14 @@ static int do_change_param_reset(struct ibmvnic_adapter *adapter,
> netdev_dbg(adapter->netdev, "Change param resetting driver (%d)\n",
> rwi->reset_reason);
>
> + /* read the state and check (again) after getting rtnl */
> + reset_state = adapter->state;
> +
> + if (reset_state == VNIC_REMOVING || reset_state == VNIC_REMOVED) {
> + rc = -EBUSY;
> + goto out;
> + }
> +
The changes in do_change_param_reset will cause merge conflict with
net-next tree since do_change_param_reset is no longer there in
net-next tree. I suggest sending this patch after net-next merges into
mainline, which should be soon.
> netif_carrier_off(netdev);
> adapter->reset_reason = rwi->reset_reason;
>
> @@ -2037,6 +2059,14 @@ static int do_reset(struct ibmvnic_adapter *adapter,
> if (rwi->reset_reason == VNIC_RESET_FAILOVER)
> adapter->failover_pending = false;
>
> + /* read the state and check (again) after getting rtnl */
> + reset_state = adapter->state;
> +
> + if (reset_state == VNIC_REMOVING || reset_state == VNIC_REMOVED) {
> + rc = -EBUSY;
> + goto out;
> + }
> +
> netif_carrier_off(netdev);
> adapter->reset_reason = rwi->reset_reason;
>
> @@ -2063,7 +2093,24 @@ static int do_reset(struct ibmvnic_adapter *adapter,
> if (rc)
> goto out;
>
> + if (adapter->state == VNIC_OPEN) {
> + /* When we dropped rtnl, ibmvnic_open() got
> + * it and noticed that we are resetting and
> + * set the adapter state to OPEN. Update our
> + * new "target" state, and resume the reset
> + * from VNIC_CLOSING state.
> + */
> + netdev_dbg(netdev,
> + "Open changed state from %d, updating.\n",
> + reset_state);
> + reset_state = VNIC_OPEN;
> + adapter->state = VNIC_CLOSING;
> + }
> +
> if (adapter->state != VNIC_CLOSING) {
> + /* If someone else changed the adapter state
> + * when we dropped the rtnl, fail the reset
> + */
> rc = -1;
> goto out;
> }
> @@ -2197,6 +2244,14 @@ static int do_hard_reset(struct ibmvnic_adapter *adapter,
> netdev_dbg(adapter->netdev, "Hard resetting driver (%d)\n",
> rwi->reset_reason);
>
> + /* read the state and check (again) after getting rtnl */
> + reset_state = adapter->state;
> +
> + if (reset_state == VNIC_REMOVING || reset_state == VNIC_REMOVED) {
> + rc = -EBUSY;
> + goto out;
> + }
> +
> netif_carrier_off(netdev);
> adapter->reset_reason = rwi->reset_reason;
>
> --
> 2.26.2
>
Powered by blists - more mailing lists