[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aF6cmKkrJSV_AWBN@b3410ffb93c4>
Date: Fri, 27 Jun 2025 16:28:56 +0300
From: Ian Ray <ian.ray@...ealthcare.com>
To: Jacob Keller <jacob.e.keller@...el.com>
Cc: Jakub Kicinski <kuba@...nel.org>, horms@...nel.org,
Tony Nguyen <anthony.l.nguyen@...el.com>,
Przemek Kitszel <przemyslaw.kitszel@...el.com>,
Andrew Lunn <andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
brian.ruley@...ealthcare.com, intel-wired-lan@...ts.osuosl.org,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [Intel-wired-lan] [PATCH v2] igb: Fix watchdog_task race with
shutdown
On Mon, Jun 16, 2025 at 02:47:29PM -0700, Jacob Keller wrote:
> On 6/10/2025 5:44 AM, Ian Ray wrote:
> > On Mon, Jun 09, 2025 at 04:10:39PM -0700, Jakub Kicinski wrote:
:
> > IIUC set_bit() is an atomic operation (via bitops.h), and so
> > my previous comment still stands.
> >
> > (Sorry if I have misunderstood your question.)
> >
> > Either watchdog_task runs just before __IGB_DOWN is set (and
> > the timer is stopped by this patch) -- or watchdog_task runs
> > just after __IGB_DOWN is set (and thus the timer will not be
> > restarted).
> >
> > In both cases, the final cancel_work_sync ensures that the
> > watchdog_task completes before igb_down() continues.
> >
> > Regards,
> > Ian
>
> Hmm. Well set_bit is atomic, but I don't think it has ordering
> guarantees on its own. Wouldn't we need to be using a barrier here to
> guarantee ordering here?
>
> Perhaps cancel_work_sync has barriers implied and that makes this work
> properly?
Ah, I see. I checked the cancel_work_documentation and implementation
and I am not sure we can make any assumptions about barriers.
Would two additional calls to smp_mb__after_atomic() be acceptable?
Something like this (on top of this series v2).
-- >8 --
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index a65ae7925ae8..9b63dc594454 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2179,6 +2179,7 @@ void igb_down(struct igb_adapter *adapter)
* disable watchdog from being rescheduled.
*/
set_bit(__IGB_DOWN, &adapter->state);
+ smp_mb__after_atomic();
timer_delete_sync(&adapter->watchdog_timer);
timer_delete_sync(&adapter->phy_info_timer);
@@ -3886,6 +3887,7 @@ static void igb_remove(struct pci_dev *pdev)
* disable watchdog from being rescheduled.
*/
set_bit(__IGB_DOWN, &adapter->state);
+ smp_mb__after_atomic();
timer_delete_sync(&adapter->watchdog_timer);
timer_delete_sync(&adapter->phy_info_timer);
-- >8 --
Thanks,
Ian
>
> > ORDERING
> > --------
> >
> > Like with atomic_t, the rule of thumb is:
> >
> > - non-RMW operations are unordered;
> >
> > - RMW operations that have no return value are unordered;
> >
> > - RMW operations that have a return value are fully ordered.
> >
> > - RMW operations that are conditional are fully ordered.
> >
> > Except for a successful test_and_set_bit_lock() which has ACQUIRE semantics,
> > clear_bit_unlock() which has RELEASE semantics and test_bit_acquire which has
> > ACQUIRE semantics.
> >
>
> set_bit is listed as a RMW without a return value, so its unordered.
> That makes me think we'd want clear_bit_unlock() if the cancel_work_sync
> itself doesn't provide the barriers we need.
>
> Thanks,
> Jake
Powered by blists - more mailing lists