[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c9b62eaa-e05e-4958-bbf5-73b1e3c46b33@intel.com>
Date: Thu, 22 May 2025 16:05:05 -0700
From: Jacob Keller <jacob.e.keller@...el.com>
To: John <john.cs.hey@...il.com>, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, "Paolo
Abeni" <pabeni@...hat.com>, Joe Damato <jdamato@...tly.com>
CC: Simon Horman <horms@...nel.org>, <netdev@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [Bug] "possible deadlock in rtnl_newlink" in Linux kernel v6.13
On 5/21/2025 5:52 PM, John wrote:
> Dear Linux Kernel Maintainers,
>
> I hope this message finds you well.
>
> I am writing to report a potential vulnerability I encountered during
> testing of the Linux Kernel version v6.13.
>
> Git Commit: ffd294d346d185b70e28b1a28abe367bbfe53c04 (tag: v6.13)
>
> Bug Location: rtnl_newlink+0x86c/0x1dd0 net/core/rtnetlink.c:4011
>
> Bug report: https://hastebin.com/share/ajavibofik.bash
>
> Complete log: https://hastebin.com/share/derufumuxu.perl
>
> Entire kernel config: https://hastebin.com/share/lovayaqidu.ini
>
> Root Cause Analysis:
> The deadlock warning is caused by a circular locking dependency
> between two subsystems:
>
> Path A (CPU 0):
> Holds rtnl_mutex in rtnl_newlink() →
> Then calls e1000_close() →
> Triggers e1000_down_and_stop() →
> Calls __cancel_work_sync() →
> Tries to flush adapter->reset_task (→ needs work_completion lock)
>
> Path B (CPU 1):
> Holds work_completion lock while running e1000_reset_task() →
> Then calls e1000_down() →
> Which tries to acquire rtnl_mutex
> These two execution paths result in a circular dependency:
>
I guess this implies you can't cancel_work_sync while holding RTNL lock?
Hmm. Or maybe its because calling e1000_down from the e1000_reset_task
is a problem.
> CPU 0: rtnl_mutex → work_completion
> CPU 1: work_completion → rtnl_mutex
>
> This violates lock ordering and can lead to a deadlock under contention.
> This bug represents a classic case of lock inversion between
> networking core (rtnl_mutex) and a device driver (e1000 workqueue
> reset`).
> It is a design-level concurrency flaw that can lead to deadlocks under
> stress or fuzzing workloads.
>
> At present, I have not yet obtained a minimal reproducer for this
> issue. However, I am actively working on reproducing it, and I will
> promptly share any additional findings or a working reproducer as soon
> as it becomes available.
>
This is likely a regression in e400c7444d84 ("e1000: Hold RTNL when
e1000_down can be called")
@Joe, thoughts?
> Thank you very much for your time and attention to this matter. I
> truly appreciate the efforts of the Linux kernel community.
>
> Best regards,
> John
>
Powered by blists - more mailing lists