linux-kernel - Re: [Bug] "possible deadlock in rtnl

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aDjyua1-GYt8mNa1@LQ3V64L9R2>
Date: Thu, 29 May 2025 16:50:17 -0700
From: Joe Damato <jdamato@...tly.com>
To: Jacob Keller <jacob.e.keller@...el.com>
Cc: John <john.cs.hey@...il.com>, "David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Simon Horman <horms@...nel.org>, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [Bug] "possible deadlock in rtnl_newlink" in Linux kernel v6.13

On Thu, May 22, 2025 at 04:05:05PM -0700, Jacob Keller wrote:
> 
> 
> On 5/21/2025 5:52 PM, John wrote:
> > Dear Linux Kernel Maintainers,
> > 
> > I hope this message finds you well.
> > 
> > I am writing to report a potential vulnerability I encountered during
> > testing of the Linux Kernel version v6.13.
> > 
> > Git Commit: ffd294d346d185b70e28b1a28abe367bbfe53c04 (tag: v6.13)
> > 
> > Bug Location: rtnl_newlink+0x86c/0x1dd0 net/core/rtnetlink.c:4011
> > 
> > Bug report: https://hastebin.com/share/ajavibofik.bash
> > 
> > Complete log: https://hastebin.com/share/derufumuxu.perl
> > 
> > Entire kernel config:  https://hastebin.com/share/lovayaqidu.ini
> > 
> > Root Cause Analysis:
> > The deadlock warning is caused by a circular locking dependency
> > between two subsystems:
> > 
> > Path A (CPU 0):
> > Holds rtnl_mutex in rtnl_newlink() ->
> > Then calls e1000_close() ->
> > Triggers e1000_down_and_stop() ->
> > Calls __cancel_work_sync() ->
> > Tries to flush adapter->reset_task (-> needs work_completion lock)
> > 
> > Path B (CPU 1):
> > Holds work_completion lock while running e1000_reset_task() ->
> > Then calls e1000_down() ->
> > Which tries to acquire rtnl_mutex
> > These two execution paths result in a circular dependency:
> > 
> 
> I guess this implies you can't cancel_work_sync while holding RTNL lock?
> Hmm. Or maybe its because calling e1000_down from the e1000_reset_task
> is a problem.
> 
> > CPU 0: rtnl_mutex -> work_completion
> > CPU 1: work_completion -> rtnl_mutex
> > 
> > This violates lock ordering and can lead to a deadlock under contention.
> > This bug represents a classic case of lock inversion between
> > networking core (rtnl_mutex) and a device driver (e1000 workqueue
> > reset`).
> > It is a design-level concurrency flaw that can lead to deadlocks under
> > stress or fuzzing workloads.
> > 
> > At present, I have not yet obtained a minimal reproducer for this
> > issue. However, I am actively working on reproducing it, and I will
> > promptly share any additional findings or a working reproducer as soon
> > as it becomes available.
> > 
> 
> This is likely a regression in e400c7444d84 ("e1000: Hold RTNL when
> e1000_down can be called")
> 
> @Joe, thoughts?

Sorry for the delay, was out of the office for a bit. I agree with
the report that the locking order is problematic and with your
report that it was introduced by the above commit.

I wonder if e1000_down needs to cancel the reset_task at all?

If you look a layer below the original bug report, you'll note that
e1000_down calls e1000_reinit_locked which itself has the following
code:

  /* only run the task if not already down */
  if (!test_bit(__E1000_DOWN, &adapter->flags)) {
          e1000_down(adapter);
          e1000_up(adapter);
  }

So, it seems like the flow in the e1000_down case would be something like this
(please correct me if I've gotten it wrong):

e1000_down -> e1000_down_and_stop (which sets the __E1000_DOWN bit) ->
  cancel_work_sync -> e1000_reset_task -> grabs RTNL, calls e1000_reinit_locked
   e1000_reinit_locked -> checks the bit via the code above and does nothing

I could be totally off here, but it seems like in the e1000_down case, calling
e1000_reinit_locked is unnecessary since the __E1000_DOWN bit prevents anything
from happening.

Maybe a potential solution might be to move the cancel_work_sync out of
e1000_down_and_stop and move it into e1000_remove directly?

Something vaguely like (untested):

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 3f089c3d47b2..62a77b34c9ff 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -477,10 +477,6 @@ static void e1000_down_and_stop(struct e1000_adapter *adapter)

        cancel_delayed_work_sync(&adapter->phy_info_task);
        cancel_delayed_work_sync(&adapter->fifo_stall_task);
-
-       /* Only kill reset task if adapter is not resetting */
-       if (!test_bit(__E1000_RESETTING, &adapter->flags))
-               cancel_work_sync(&adapter->reset_task);
 }

 void e1000_down(struct e1000_adapter *adapter)
@@ -1262,6 +1258,11 @@ static void e1000_remove(struct pci_dev *pdev)
        bool disable_dev;

        e1000_down_and_stop(adapter);
+
+       /* Only kill reset task if adapter is not resetting */
+       if (!test_bit(__E1000_RESETTING, &adapter->flags))
+               cancel_work_sync(&adapter->reset_task);
+
        e1000_release_manageability(adapter);

        unregister_netdev(netdev);