linux-kernel - Re: [PATCH] igb: Fix watchdog

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <aBG_jm62ngj0Mqq-@0ec9f3ddc3bf>
Date: Wed, 30 Apr 2025 09:13:34 +0300
From: Ian Ray <ian.ray@...ealthcare.com>
To: Simon Horman <horms@...nel.org>
Cc: Tony Nguyen <anthony.l.nguyen@...el.com>,
	Przemek Kitszel <przemyslaw.kitszel@...el.com>,
	Andrew Lunn <andrew+netdev@...n.ch>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	brian.ruley@...ealthcare.com, intel-wired-lan@...ts.osuosl.org,
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
	Toke Høiland-Jørgensen <toke@...hat.com>,
	ian.ray@...ealthcare.com
Subject: Re: [PATCH] igb: Fix watchdog_task race with shutdown

On Tue, Apr 29, 2025 at 04:20:21PM +0100, Simon Horman wrote:
> + Toke
> 
> On Mon, Apr 28, 2025 at 02:54:49PM +0300, Ian Ray wrote:
> > A rare [1] race condition is observed between the igb_watchdog_task and
> > shutdown on a dual-core i.MX6 based system with two I210 controllers.
> >
> > Using printk, the igb_watchdog_task is hung in igb_read_phy_reg because
> > __igb_shutdown has already called __igb_close.
> >
> > Fix this by locking in igb_watchdog_task (in the same way as is done in
> > igb_reset_task).
> >
> > reboot             kworker
> >
> > __igb_shutdown
> >   rtnl_lock
> >   __igb_close
> >   :                igb_watchdog_task
> >   :                :
> >   :                igb_read_phy_reg (hung)
> >   rtnl_unlock
> >
> > [1] Note that this is easier to reproduce with 'initcall_debug' logging
> > and additional and printk logging in igb_main.
> >
> > Signed-off-by: Ian Ray <ian.ray@...ealthcare.com>
> 
> Hi Ian,
> 
> Thanks for your patch.
> 
> While I think that the simplicity of this approach may well be appropriate
> as a fix for the problem described I do have a concern.
> 
> I am worried that taking RTNL each time the watchdog tasks will create
> unnecessary lock contention. That may manifest in weird and wonderful ways
> in future.  Maybe this patch doesn't make things materially worse in that
> regard.  But it would be nice to have a plan to move away from using RTNL,
> as is happening elsewhere.
> 
> ...

Hi Simon,

Many thanks for the review.  I've been reflecting on the patch (and
discussing internally) and I think it would be better to model the
behaviour on igb_remove instead of igb_reset_task.  Meaning that the
timer should be deleted, and the work cancelled, after setting bit
IGB_DOWN.  This would mirror igb_up.  (And has the advantage of not
using the RTNL.)

(As you can probably tell) I am not very familiar with this subsystem,
but the modified proposal, below, works well in my testing.  I will
happily send a V2 if you think this is a better direction.

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 291348505868..d4b905469cc2 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2173,10 +2173,14 @@ void igb_down(struct igb_adapter *adapter)
        u32 tctl, rctl;
        int i;

-       /* signal that we're down so the interrupt handler does not
-        * reschedule our watchdog timer
+       /* The watchdog timer may be rescheduled, so explicitly
+        * disable watchdog from being rescheduled.
         */
        set_bit(__IGB_DOWN, &adapter->state);
+       del_timer_sync(&adapter->watchdog_timer);
+       del_timer_sync(&adapter->phy_info_timer);
+
+       cancel_work_sync(&adapter->watchdog_task);

        /* disable receives in the hardware */
        rctl = rd32(E1000_RCTL);
@@ -2207,11 +2211,6 @@ void igb_down(struct igb_adapter *adapter)
                }
        }

-       del_timer_sync(&adapter->watchdog_timer);
-       del_timer_sync(&adapter->phy_info_timer);
-
-       cancel_work_sync(&adapter->watchdog_task);
-
        /* record the stats before reset*/
        spin_lock(&adapter->stats64_lock);
        igb_update_stats(adapter);